[models] Support for Qwen3 models #37

orionpapadakis · 2025-07-02T12:13:26Z

On going work for #19

add model support
refactor model loaders and inference engines to be modular

Check list:

[x] CPU inference path in a working state
[x] GPU inference path in a working state

… base class hierarchy

…orrect results

orionpapadakis · 2025-07-30T13:29:09Z

ready for review

src/main/java/com/example/inference/state/LlamaState.java

src/main/java/com/example/inference/state/Qwen3State.java

src/main/java/com/example/inference/state/State.java

src/main/java/com/example/inference/weights/standard/LlamaStandardWeights.java

src/main/java/com/example/tornadovm/Qwen3TornadoVMLayerPlanner.java

src/main/java/com/example/tornadovm/TransformerComputeKernelsLayered.java

llama-tornado

Copilot

Pull Request Overview

This PR adds support for Qwen3 models to the codebase, implementing a modular architecture that refactors model loading and inference engines to support multiple model types. The implementation includes both CPU and GPU inference paths through TornadoVM for Qwen3 models, alongside architectural improvements to the existing LLaMA and Mistral model support.

Key changes include:

Adding Qwen3 model support with specialized tokenization, configuration, and inference logic
Refactoring the model loading system to use a modular pattern with abstract base classes
Implementing separate state management and weight handling for different model architectures

Reviewed Changes

Copilot reviewed 44 out of 44 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
TornadoVMLayerPlanner.java	Refactored to support generic model types with parameterized base class
Qwen3TornadoVMLayerPlanner.java	New Qwen3-specific GPU execution planner with custom kernel configurations
Qwen3Kernels.java	Qwen3-specific GPU kernels including RMSNorm and RoPE rotation implementations
Qwen3Tokenizer.java	Complete Qwen3 tokenizer implementation with BPE encoding/decoding
Model architecture files	New Qwen3Configuration, Qwen3 model class, and supporting infrastructure
Weight/State refactoring	Separated standard and TornadoVM weight classes, model-specific state classes
Model loader refactoring	Abstract ModelLoader base with concrete implementations for each model type

Comments suppressed due to low confidence (3)

src/main/java/com/example/tornadovm/TransformerComputeKernelsLayered.java:441

[nitpick] The variable name 'shared_tile_max_holder' is verbose and the comment suggests it's a workaround. Consider renaming to 'tileMaxBuffer' for clarity and consistency with other buffer variables.

        float[] shared_tile_max_holder = context.allocateFloatLocalArray(1); // FIX: For broadcasting tile max

src/main/java/com/example/tornadovm/TransformerComputeKernelsLayered.java:623

[nitpick] The parameter name 'hb' is not descriptive. Consider renaming to 'output' or 'outputBuffer' to match the comment and improve readability.

            FloatArray hb,                  // output

src/main/java/com/example/inference/state/Qwen3State.java:25

The variable 'nEmbdHead' is assigned 'numberOfHeads()' but based on context, it should likely be 'numberOfHeadsValue()' or a calculated embedding head size. This naming suggests a mismatch between the variable name and its actual value.

        int nEmbdHead = qwen3config.numberOfHeads();

src/main/java/com/example/tornadovm/TornadoVMLayerPlanner.java

src/main/java/com/example/tokenizer/impl/Qwen3Tokenizer.java

src/main/java/com/example/model/qwen3/Qwen3Configuration.java

src/main/java/com/example/model/loader/Qwen3ModelLoader.java

src/main/java/com/example/model/format/Qwen3ChatFormat.java

mikepapadim · 2025-07-30T13:53:37Z

@orionpapadakis also, update the readmen with Qwen models instructiosn etc

@Formatter

Applied consistent formatting using @Formatter directives to enhance readability. Improved class documentation with detailed JavaDoc comments for methods and constructors, clarifying their purpose and parameters. Adjusted code style for multiline constructs and added missing comments where necessary.

orionpapadakis requested a review from mikepapadim July 2, 2025 12:13

mikepapadim assigned orionpapadakis Jul 2, 2025

mikepapadim added this to TornadoVM 1.X (Public) Jul 2, 2025

mikepapadim moved this to In Progress in TornadoVM 1.X (Public) Jul 2, 2025

mikepapadim added enhancement New feature or request models labels Jul 2, 2025

mikepapadim changed the title ~~[WIP] Support for Qwen3 model~~ [WIP] Support for Qwen3 models Jul 2, 2025

orionpapadakis force-pushed the feat/qwen3 branch 2 times, most recently from 1306591 to 0369d50 Compare July 29, 2025 16:45

mikepapadim marked this pull request as ready for review July 30, 2025 11:42

orionpapadakis added 20 commits July 30, 2025 16:04

Add initial support for Qwen3 models on CPU

df3a6aa

Move ModelLoader classes to model.loader package

fa30068

Move State classes to inference.state package

489d1ee

Move Weights to inference.weights package

6e8b0f5

[WIP] Refactor Weight class for modularity and extensibility

388aecd

Add weights for qwen3 in tornado format

c1ae6bc

Refactor Model design. Abandon Records, adopt interface with abstract…

3652383

… base class hierarchy

Increase bytecode size

7893bfa

[WIP] Add a initial Tornado inference implementation for Qwen3 with c…

68e2d70

…orrect results

[WIP] Cleanup

e1eed87

Use optimized tornado kernel for Attention

6b01570

Optimize Qcur rmsnorm

7fa548e

Apply optimizations to Kcur rmsnorm and rename some Qcur fields

86378ad

Add an optimized kernel for attention

e6b279c

Cleanup and add some comments in forwardJavaQwen3

9f5929a

Cleanup InferenceEngine

5f3b6c2

Fix naming consistency of generateTokensXXX methods and add comments

717257a

Cleanup dbg buffers functionality

4a29063

Clean up Qwen3State

c4cd588

Cleanup model

4541c14

orionpapadakis added 4 commits July 30, 2025 16:04

Provide an optimized rmsnorm kernel that fuses steps 1 and 2

e1a4632

Cleanup Qwen3Kernels

6224ae0

Cleanup Qwen3TornadoVMLayerPlanner

5b5e9e3

General cleanup

7dc5056

orionpapadakis force-pushed the feat/qwen3 branch from 2343297 to 7dc5056 Compare July 30, 2025 13:04

Point to latest tornadovm

7b5f052

mikepapadim reviewed Jul 30, 2025

View reviewed changes

mikepapadim requested a review from Copilot July 30, 2025 13:44

Copilot AI reviewed Jul 30, 2025

View reviewed changes

Additional cleanup

480a1f0

mikepapadim changed the title ~~[WIP] Support for Qwen3 models~~ [models] Support for Qwen3 models Jul 31, 2025

orionpapadakis added 5 commits July 31, 2025 12:35

Move things around for smooth interactive mode and consistency

5224381

Remove duplicative kernel

dabbdfb

Finalize review comments

16f5114

Update README.md

2fd98ef

orionpapadakis force-pushed the feat/qwen3 branch from 62a3dd8 to 2fd98ef Compare July 31, 2025 13:37

mikepapadim merged commit d053e9c into beehive-lab:main Jul 31, 2025
1 check passed

github-project-automation bot moved this from In Progress to Done in TornadoVM 1.X (Public) Jul 31, 2025

mikepapadim mentioned this pull request Jul 31, 2025

Support for Qwen models #19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[models] Support for Qwen3 models #37

[models] Support for Qwen3 models #37

Uh oh!

orionpapadakis commented Jul 2, 2025 •

edited

Loading

Uh oh!

orionpapadakis commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikepapadim commented Jul 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[models] Support for Qwen3 models #37

[models] Support for Qwen3 models #37

Uh oh!

Conversation

orionpapadakis commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orionpapadakis commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikepapadim commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

orionpapadakis commented Jul 2, 2025 •

edited

Loading

mikepapadim commented Jul 30, 2025 •

edited

Loading