Replies: 3 comments
-
You can use an early |
Beta Was this translation helpful? Give feedback.
-
Btw, slightly related, in some fusing cases you might get the following nodes:
Where For example, this patch would no longer be needed: Lines 1038 to 1045 in 4b8560a |
Beta Was this translation helpful? Give feedback.
-
When I look at the nodes in the debugger I see GGML_OP_VIEW, GGML_OP_RESHAPE while creating the graph. When does the GGML_OP_NONE show up? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
In MoE models, a popular architecture is topK, then softmax. In
llama_graph::build_moe_ffn
this is the particular blockThis particular block is a good candidate for fusion, however I'm not able to really get these nodes together, they are not ordered one after the other in the cgraph, and they are not of the same shape. What would be an acceptable way to fuse these together? One simple is way to implement a new operator, but that would cause a CPU fallback in backends which don't support it.
Beta Was this translation helpful? Give feedback.
All reactions