Skip to content

Conversation

@orionpapadakis
Copy link
Collaborator

This PR adds a new Tornado kernel for parallel attention and improves grid sizes calculations.

rmsNormWorker.setGlobalWork(config.dim(), 1, 1); // Set global work size to total dimension
rmsNormWorker.setLocalWork(32, 1, 1); // Set local work size to 256 (standard efficient size)

// Parallel attention worker configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it into a seperate method:
and make 64 constant etc

    public static int computeOptimalLocalSize(int headSize) {
        int optimalLocalSize = Math.min(headSize, 64);

        if (headSize % optimalLocalSize != 0) {
            for (int size = optimalLocalSize; size >= 1; size--) {
                if (headSize % size == 0) {
                    return size;
                }
            }
        }

        return optimalLocalSize;
    }

@mikepapadim mikepapadim merged commit efbe261 into beehive-lab:main Sep 4, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants