-
Notifications
You must be signed in to change notification settings - Fork 24
[models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
36ddcd6 to
3dd3474
Compare
|
Fixes #19 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for Qwen2.5 and Deepseek-Distilled-Qwen models to the LLaMA inference framework. It introduces new model types, loaders, and computation kernels to handle these model architectures with their specific requirements.
Key changes:
- Added new model types
QWEN_2andDEEPSEEK_R1_DISTILL_QWENwith corresponding configurations and state management - Implemented specialized TornadoVM computation kernels for Qwen2 models including bias addition operations
- Added automatic reasoning token injection for Deepseek-R1-Distill-Qwen models
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| TransformerComputeKernelsLayered.java | Added addInPlace kernel for element-wise array addition |
| TornadoVMMasterPlan.java | Added Qwen2 planner support and refactored model type dispatching |
| Qwen2TornadoVMLayerPlanner.java | New TornadoVM layer planner for Qwen2 models with bias operations |
| Qwen3Tokenizer.java | Updated token display logic to include reasoning tokens |
| Qwen3.java | Added shouldAddBeginOfText override |
| Qwen2Configuration.java | New configuration record for Qwen2 models |
| Qwen2.java | Main Qwen2 model implementation with DeepSeek-R1 specific behavior |
| Phi3.java | Added shouldAddBeginOfText override |
| Qwen2ModelLoader.java | Model loader for Qwen2/DeepSeek models with bias weight handling |
| ModelLoader.java | Updated model type detection logic |
| ModelType.java | Added new model types and DeepSeek detection |
| Model.java | Added reasoning token injection for DeepSeek models |
| Qwen2TornadoWeights.java | TornadoVM weights implementation for Qwen2 |
| Qwen2StandardWeights.java | Standard weights implementation for Qwen2 |
| Qwen2State.java | State management for Qwen2 models |
| InferenceCore.java | Java inference implementation for Qwen2 |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/main/java/org/beehive/gpullama3/tornadovm/Qwen2TornadoVMLayerPlanner.java
Outdated
Show resolved
Hide resolved
| .task("qbias", TransformerComputeKernelsLayered::addInPlace, state.wrapQ, weights.q_biasLayered[layerIndex], config.dim()) | ||
| .task("kbias", TransformerComputeKernelsLayered::addInPlace, state.wrapK, weights.k_biasLayered[layerIndex], config.kvDim()) | ||
| .task("vbias", TransformerComputeKernelsLayered::addInPlace, state.wrapV, weights.v_biasLayered[layerIndex], config.kvDim()) | ||
| .task("rope", Qwen3Kernels::ropeRotation,context, state.positionHolder, state.wrapQ, state.wrapK, config.numberOfKeyValueHeads(), |
Copilot
AI
Aug 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Qwen3Kernels::ropeRotation for Qwen2 models may be incorrect. Verify that Qwen2 and Qwen3 use identical RoPE implementations, or create a Qwen2-specific RoPE kernel.
| /** | ||
| * Executes the forward pass of a LLaMA transformer model using TornadoVM acceleration. This method processes the transformer layers in sequence for a particular token position in the context | ||
| * Executes the forward pass of a LLaMA transformer model using TornadoVM acceleration. | ||
| *This method processes the transformer layers in sequence for a particular token position in the context |
Copilot
AI
Aug 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after the asterisk in the comment. Should be '* This method' instead of '*This method'.
|
|
||
| @Override | ||
| public int contextLengthModel() { | ||
| return contextLengthModel; |
Copilot
AI
Aug 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method returns the field contextLengthModel but should return the parameter contextLengthModel() to match the interface contract. This creates infinite recursion.
| try (var ignored = Timer.log("Load " + modelName + " model")) { | ||
| // reuse method of Qwen3 | ||
| Vocabulary vocabulary = loadQwen3Vocabulary(metadata); | ||
| boolean isDeepSeekR1DistillQwen = "DeepSeek-R1-Distill-Qwen".equals(metadata.get("general.basename")); |
Copilot
AI
Aug 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string literal 'DeepSeek-R1-Distill-Qwen' is duplicated on lines 42 and 49. Consider extracting it to a constant to avoid duplication.
…yerPlanner.java Co-authored-by: Copilot <[email protected]>
No description provided.