-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Get outputs from intermediate layers #4224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It's very difficult to do that at this point. We will try to improve this in #2783 (no ETA) |
I see. Now suppose if I'm only interested in the execution latency and not the accuracy, can we simply remove some layers inside |
The easiest thing to do is something like this: diff --git a/llama.cpp b/llama.cpp
index 9fb7244b..f6d8ea2d 100644
--- a/llama.cpp
+++ b/llama.cpp
@@ -3867,6 +3867,8 @@ struct llm_build_context {
}
for (int il = 0; il < n_layer; ++il) {
+ if (il > 5 && il < 20) continue; // skip layers [6, 20)
+
struct ggml_tensor * inpSA = inpL;
// norm |
Ah thanks! It works! |
I want to analyze the output from intermediate layers of Llama 2 7b during the forward pass, any hint on where to look at the code? The whole forward seems to base on a graph and it's not clearly to me where the boundary of layers are. Any help appreciated!
The text was updated successfully, but these errors were encountered: