Skip to content

Conversation

@orionpapadakis
Copy link
Collaborator

No description provided.

@mikepapadim mikepapadim changed the title [WIP] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models [WIP][models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models Aug 1, 2025
@mikepapadim mikepapadim requested a review from Copilot August 4, 2025 10:50

This comment was marked as outdated.

@mikepapadim mikepapadim requested a review from Copilot August 6, 2025 13:02

This comment was marked as outdated.

@mikepapadim mikepapadim changed the title [WIP][models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models [models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models Aug 29, 2025
@mikepapadim mikepapadim marked this pull request as ready for review August 29, 2025 11:12
@mikepapadim
Copy link
Member

Fixes #19

@mikepapadim mikepapadim requested a review from Copilot August 29, 2025 11:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Qwen2.5 and Deepseek-Distilled-Qwen models to the LLaMA inference framework. It introduces new model types, loaders, and computation kernels to handle these model architectures with their specific requirements.

Key changes:

  • Added new model types QWEN_2 and DEEPSEEK_R1_DISTILL_QWEN with corresponding configurations and state management
  • Implemented specialized TornadoVM computation kernels for Qwen2 models including bias addition operations
  • Added automatic reasoning token injection for Deepseek-R1-Distill-Qwen models

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
TransformerComputeKernelsLayered.java Added addInPlace kernel for element-wise array addition
TornadoVMMasterPlan.java Added Qwen2 planner support and refactored model type dispatching
Qwen2TornadoVMLayerPlanner.java New TornadoVM layer planner for Qwen2 models with bias operations
Qwen3Tokenizer.java Updated token display logic to include reasoning tokens
Qwen3.java Added shouldAddBeginOfText override
Qwen2Configuration.java New configuration record for Qwen2 models
Qwen2.java Main Qwen2 model implementation with DeepSeek-R1 specific behavior
Phi3.java Added shouldAddBeginOfText override
Qwen2ModelLoader.java Model loader for Qwen2/DeepSeek models with bias weight handling
ModelLoader.java Updated model type detection logic
ModelType.java Added new model types and DeepSeek detection
Model.java Added reasoning token injection for DeepSeek models
Qwen2TornadoWeights.java TornadoVM weights implementation for Qwen2
Qwen2StandardWeights.java Standard weights implementation for Qwen2
Qwen2State.java State management for Qwen2 models
InferenceCore.java Java inference implementation for Qwen2

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

.task("qbias", TransformerComputeKernelsLayered::addInPlace, state.wrapQ, weights.q_biasLayered[layerIndex], config.dim())
.task("kbias", TransformerComputeKernelsLayered::addInPlace, state.wrapK, weights.k_biasLayered[layerIndex], config.kvDim())
.task("vbias", TransformerComputeKernelsLayered::addInPlace, state.wrapV, weights.v_biasLayered[layerIndex], config.kvDim())
.task("rope", Qwen3Kernels::ropeRotation,context, state.positionHolder, state.wrapQ, state.wrapK, config.numberOfKeyValueHeads(),
Copy link

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Qwen3Kernels::ropeRotation for Qwen2 models may be incorrect. Verify that Qwen2 and Qwen3 use identical RoPE implementations, or create a Qwen2-specific RoPE kernel.

Copilot uses AI. Check for mistakes.
/**
* Executes the forward pass of a LLaMA transformer model using TornadoVM acceleration. This method processes the transformer layers in sequence for a particular token position in the context
* Executes the forward pass of a LLaMA transformer model using TornadoVM acceleration.
*This method processes the transformer layers in sequence for a particular token position in the context
Copy link

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after the asterisk in the comment. Should be '* This method' instead of '*This method'.

Copilot uses AI. Check for mistakes.

@Override
public int contextLengthModel() {
return contextLengthModel;
Copy link

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method returns the field contextLengthModel but should return the parameter contextLengthModel() to match the interface contract. This creates infinite recursion.

Copilot uses AI. Check for mistakes.
try (var ignored = Timer.log("Load " + modelName + " model")) {
// reuse method of Qwen3
Vocabulary vocabulary = loadQwen3Vocabulary(metadata);
boolean isDeepSeekR1DistillQwen = "DeepSeek-R1-Distill-Qwen".equals(metadata.get("general.basename"));
Copy link

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string literal 'DeepSeek-R1-Distill-Qwen' is duplicated on lines 42 and 49. Consider extracting it to a constant to avoid duplication.

Copilot uses AI. Check for mistakes.
@mikepapadim mikepapadim merged commit 9a4a81d into beehive-lab:main Sep 1, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants