Skip to content

Conversation

@orionpapadakis
Copy link
Collaborator

@orionpapadakis orionpapadakis commented Jul 2, 2025

On going work for #19

  • add model support
  • refactor model loaders and inference engines to be modular

Check list:

[x] CPU inference path in a working state
[x] GPU inference path in a working state

@mikepapadim mikepapadim moved this to In Progress in TornadoVM 1.X (Public) Jul 2, 2025
@mikepapadim mikepapadim added enhancement New feature or request models labels Jul 2, 2025
@mikepapadim mikepapadim changed the title [WIP] Support for Qwen3 model [WIP] Support for Qwen3 models Jul 2, 2025
@orionpapadakis orionpapadakis force-pushed the feat/qwen3 branch 2 times, most recently from 1306591 to 0369d50 Compare July 29, 2025 16:45
@mikepapadim mikepapadim marked this pull request as ready for review July 30, 2025 11:42
@orionpapadakis
Copy link
Collaborator Author

ready for review

@mikepapadim mikepapadim requested a review from Copilot July 30, 2025 13:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Qwen3 models to the codebase, implementing a modular architecture that refactors model loading and inference engines to support multiple model types. The implementation includes both CPU and GPU inference paths through TornadoVM for Qwen3 models, alongside architectural improvements to the existing LLaMA and Mistral model support.

Key changes include:

  • Adding Qwen3 model support with specialized tokenization, configuration, and inference logic
  • Refactoring the model loading system to use a modular pattern with abstract base classes
  • Implementing separate state management and weight handling for different model architectures

Reviewed Changes

Copilot reviewed 44 out of 44 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
TornadoVMLayerPlanner.java Refactored to support generic model types with parameterized base class
Qwen3TornadoVMLayerPlanner.java New Qwen3-specific GPU execution planner with custom kernel configurations
Qwen3Kernels.java Qwen3-specific GPU kernels including RMSNorm and RoPE rotation implementations
Qwen3Tokenizer.java Complete Qwen3 tokenizer implementation with BPE encoding/decoding
Model architecture files New Qwen3Configuration, Qwen3 model class, and supporting infrastructure
Weight/State refactoring Separated standard and TornadoVM weight classes, model-specific state classes
Model loader refactoring Abstract ModelLoader base with concrete implementations for each model type
Comments suppressed due to low confidence (3)

src/main/java/com/example/tornadovm/TransformerComputeKernelsLayered.java:441

  • [nitpick] The variable name 'shared_tile_max_holder' is verbose and the comment suggests it's a workaround. Consider renaming to 'tileMaxBuffer' for clarity and consistency with other buffer variables.
        float[] shared_tile_max_holder = context.allocateFloatLocalArray(1); // FIX: For broadcasting tile max

src/main/java/com/example/tornadovm/TransformerComputeKernelsLayered.java:623

  • [nitpick] The parameter name 'hb' is not descriptive. Consider renaming to 'output' or 'outputBuffer' to match the comment and improve readability.
            FloatArray hb,                  // output

src/main/java/com/example/inference/state/Qwen3State.java:25

  • The variable 'nEmbdHead' is assigned 'numberOfHeads()' but based on context, it should likely be 'numberOfHeadsValue()' or a calculated embedding head size. This naming suggests a mismatch between the variable name and its actual value.
        int nEmbdHead = qwen3config.numberOfHeads();

@mikepapadim
Copy link
Member

mikepapadim commented Jul 30, 2025

@orionpapadakis also, update the readmen with Qwen models instructiosn etc

@mikepapadim mikepapadim changed the title [WIP] Support for Qwen3 models [models] Support for Qwen3 models Jul 31, 2025
Applied consistent formatting using @Formatter directives to enhance readability. Improved class documentation with detailed JavaDoc comments for methods and constructors, clarifying their purpose and parameters. Adjusted code style for multiline constructs and added missing comments where necessary.
@mikepapadim mikepapadim merged commit d053e9c into beehive-lab:main Jul 31, 2025
1 check passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in TornadoVM 1.X (Public) Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request models

Projects

Development

Successfully merging this pull request may close these issues.

2 participants