Get outputs from intermediate layers #4224

BDHU · 2023-11-26T11:28:16Z

I want to analyze the output from intermediate layers of Llama 2 7b during the forward pass, any hint on where to look at the code? The whole forward seems to base on a graph and it's not clearly to me where the boundary of layers are. Any help appreciated!

ggerganov · 2023-11-26T11:34:23Z

It's very difficult to do that at this point. We will try to improve this in #2783 (no ETA)

BDHU · 2023-11-26T11:46:35Z

I see. Now suppose if I'm only interested in the execution latency and not the accuracy, can we simply remove some layers inside llama_model_load in llama.cpp? I tried to manually remove layers using model.layers.erase(std::next(model.layers.begin())); and reduce n_gpu_layers so that the model will only execute certain layers. I do this after llm_load_tensors is invoked, but it seem it has no impact on tokens/sec even when I remove 30 layers, which seems strange to me. Is it possible to simply remove some layers before the graph is constructed?

ggerganov · 2023-11-26T11:50:11Z

The easiest thing to do is something like this:

diff --git a/llama.cpp b/llama.cpp
index 9fb7244b..f6d8ea2d 100644
--- a/llama.cpp
+++ b/llama.cpp
@@ -3867,6 +3867,8 @@ struct llm_build_context {
         }
 
         for (int il = 0; il < n_layer; ++il) {
+            if (il > 5 && il < 20) continue; // skip layers [6, 20)
+
             struct ggml_tensor * inpSA = inpL;
 
             // norm

BDHU · 2023-11-26T12:20:10Z

Ah thanks! It works!

BDHU closed this as completed Nov 26, 2023

tc-wolf mentioned this issue Dec 13, 2024

Is LayerSkip / self-speculative decoding possible (requires getting one intermediate layer's output + some KV cache changes)? #10820

Closed

thiswillbeyourgithub mentioned this issue May 13, 2025

Debug intermediate layers of tensor compute values #3325

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Get outputs from intermediate layers #4224

Get outputs from intermediate layers #4224

BDHU commented Nov 26, 2023

ggerganov commented Nov 26, 2023

Uh oh!

BDHU commented Nov 26, 2023

Uh oh!

ggerganov commented Nov 26, 2023

Uh oh!

BDHU commented Nov 26, 2023

Uh oh!

Get outputs from intermediate layers #4224

Get outputs from intermediate layers #4224

Comments

BDHU commented Nov 26, 2023

ggerganov commented Nov 26, 2023

Uh oh!

BDHU commented Nov 26, 2023

Uh oh!

ggerganov commented Nov 26, 2023

Uh oh!

BDHU commented Nov 26, 2023

Uh oh!