Skip to content

Conversation

am17an
Copy link
Collaborator

@am17an am17an commented Oct 19, 2025

This PR adds ggml_can_fuse_subgraph which is a less strict extension of ggml_can_fuse. It checks given inputs/outputs of a subgraph, whether all the intermediate tensors can be fused. Putting as draft to iterate on the correct API

@am17an am17an marked this pull request as draft October 19, 2025 06:43
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Oct 19, 2025
@am17an am17an changed the title Ggml can fuse subgraph ggml: add ggml_can_fuse_subgraph Oct 19, 2025
@am17an am17an requested a review from jeffbolznv October 19, 2025 06:45
@jeffbolznv
Copy link
Collaborator

While this does check that the internal nodes aren't used outside of the fusion region, we also need to check that the internal connectivity of the graph is what we expect. IMO this necessarily has to be verbose, because we have to check that all the node->src[] values are what we expect them to be.

The first idea that comes to mind would be to pass in a list of triples, where each triple is a { dst_node, src_idx, src_node }, and verify that nodes[start + dst_node]->src[src_idx] == nodes[start + src_node].

@am17an
Copy link
Collaborator Author

am17an commented Oct 20, 2025

While this does check that the internal nodes aren't used outside of the fusion region, we also need to check that the internal connectivity of the graph is what we expect. IMO this necessarily has to be verbose, because we have to check that all the node->src[] values are what we expect them to be.

I think this is better done at the caller site, which has more context about the fusion. This is just to avoid cases where the write is needed elsewhere we end up fusing it and have a common function for other sanity checks. Maybe a better a name for this function should be is_fusion_candidate. Passing triples like you mentioned is equivalent to building the graph at the caller site, but just not doing equality checks.

@jeffbolznv
Copy link
Collaborator

I think this is better done at the caller site

It's probably fine to have it as a separate function, but IMO it can still be common code (as much as any of this can be common code - there will always be special cases we want to handle differently).

2. add check for views: view_src should be part of the subgraph
@am17an am17an force-pushed the ggml_can_fuse_subgraph branch from 3059ed3 to d853036 Compare October 20, 2025 14:52

// if node is a view, check if the view src is within the subgraph
if (node->view_src) {
const struct ggml_tensor * view_src = node->view_src;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe need to walk the tree till we get the non-view parent instead of what I did here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the most conservative thing would be to check all parents. It seems plausible we'll want to fuse a view of a view in the future.

const int * idxs,
int count,
const struct ggml_tensor * tensor) {
if (idxs == NULL || cgraph == NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this could be removed or just be an assertion.


const struct ggml_tensor * node = cgraph->nodes[node_idxs[i]];

if (node->flags & GGML_TENSOR_FLAG_OUTPUT) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest moving this check after the ggml_find_tensor_node_list(cgraph, outputs, num_outputs, node) != -1 check. It's OK for the output nodes to have the output flag, since they won't be elided.

continue;
}

interior_nodes[interior_nodes_count++] = node_idxs[i];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary anymore to have the two loops and this interior_nodes array.


// if interior-node has n-uses, ensure that all of them lie within in this subgraph
int subgraph_uses = 0;
for (int j = 0; j < count; ++j) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you combine the two loops, this loop could start from j = i+1.

for (int j = 0; j < count; ++j) {
const struct ggml_tensor * other_node = cgraph->nodes[node_idxs[j]];
for (int src_idx = 0; src_idx < GGML_MAX_SRC; src_idx++) {
if (other_node->src[src_idx] && other_node->src[src_idx] == node) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (other_node->src[src_idx] && other_node->src[src_idx] == node) {
if (other_node->src[src_idx] == node) {


// if node is a view, check if the view src is within the subgraph
if (node->view_src) {
const struct ggml_tensor * view_src = node->view_src;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the most conservative thing would be to check all parents. It seems plausible we'll want to fuse a view of a view in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants