Skip to content

Conversation

@orionpapadakis
Copy link
Collaborator

@orionpapadakis orionpapadakis commented Jun 11, 2025

Summary

This PR integrates the Mistral LLM into the GPULlama3 repository.

To support this integration, several architectural changes and refactorings were made to promote component abstractions and a modular, extensible design.


Key Changes

1. Model Abstraction

  • The original Llama class has been refactored into a Model interface (under a new model package).
  • Llama-specific functionality is moved to model.llama.Llama.
  • A new implementation for Mistral is introduced in model.mistral.Mistral.

2. Tokenizer Abstraction

  • Introduced a Tokenizer interface under the tokenizer.impl package.

  • Implemented two tokenizers:

    • LlamaTokenizer for GPT-2-style BPE.
    • MistralTokenizer for TikToken-style BPE.
  • The Vocabulary class has been relocated to the tokenizer package.

3. ChatFormat Abstraction

  • ChatFormat functionality has been refactored into an abstract form to support model-specific formatting and enable future extensions.

4. Inference Refactoring

  • Inference logic is decoupled from the model classes.

  • Introduced a dedicated inference package with:

    • InferenceEngine: Entry point for token generation (generateToken, generateTokenGPU methods).
    • InferenceCore: Contains reusable core operations (e.g., rmsnorm, forward, etc.).

orionpapadakis and others added 12 commits June 11, 2025 18:38
Introduced `kvDim` and `kvMul` methods in `Configuration` and `MistralConfiguration` to enhance model configuration flexibility. Refactored TornadoVM classes to generalize handling of different models by replacing `Llama`-specific types with `Model` interface. Streamlined token generation logic to support conditional GPU execution with TornadoVM.
Move format classes from auxiliary.format to model.format to fix dependency direction. These classes are only used by model classes, so co-locating them improves package cohesion.
@CLAassistant
Copy link

CLAassistant commented Jun 11, 2025

CLA assistant check
All committers have signed the CLA.

@stratika stratika requested review from mikepapadim and stratika and removed request for mikepapadim June 11, 2025 17:19
@stratika stratika added the enhancement New feature or request label Jun 11, 2025
@stratika stratika requested a review from mikepapadim June 11, 2025 17:23
@mikepapadim mikepapadim requested review from Copilot and mikepapadim and removed request for mikepapadim and stratika June 11, 2025 17:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates the Mistral LLM into the GPULlama3 codebase by introducing model and tokenizer abstractions, extending chat formatting logic, and refactoring inference loading.

  • Refactor Llama into a generic Model interface and add Mistral implementation
  • Introduce Tokenizer interface with LlamaTokenizer and MistralTokenizer
  • Abstract ChatFormat for both Llama and Mistral and enhance ModelLoader to detect and load each

Reviewed Changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
model/llama/LlamaConfiguration.java New record implementing Configuration for Llama
model/llama/Llama.java Updated to implement generic Model interface
model/format/ChatFormat.java Interface and factory to select Llama/Mistral format
model/format/MistralChatFormat.java New chat formatter for Mistral-specific tokens
model/format/LlamaChatFormat.java Refined Llama chat formatting under ChatFormat
loader/weights/ModelLoader.java Detects GGUF metadata and loads the appropriate model
Model.java Unified inference entry points under the Model interface
Comments suppressed due to low confidence (2)

src/main/java/com/example/model/format/LlamaChatFormat.java:60

  • This loop refers to LlamaChatFormat.Message, but Message is defined in the ChatFormat interface. Change to for (ChatFormat.Message message : dialog) to match the interface type.
for (LlamaChatFormat.Message message : dialog) {

src/main/java/com/example/Model.java:103

  • List<Integer> does not have getLast() or removeLast() methods. Use responseTokens.get(responseTokens.size() - 1) and responseTokens.remove(responseTokens.size() - 1) instead.
if (!responseTokens.isEmpty() && stopTokens.contains(responseTokens.getLast())) {


public static Llama loadModel(Path ggufPath, int contextLength, boolean loadWeights) throws IOException {
// Check by vocabulary size as fallback
if (vocabSize != null) {
Copy link
Member

@mikepapadim mikepapadim Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any other way to detect the model here instead of checking sizes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed checking the vocab size seems odd, but, it should be noted that it's the third check in line and just acts as a fallback in case the model name (1st check) and the tokenizer metadata (2nd check) are not enough. imho, we can keep only the model name check.

@mikepapadim mikepapadim changed the title Mistral LLM Integration [model] Add support for Mistral models Jun 11, 2025
@mikepapadim
Copy link
Member

Fixes #18

Copy link
Member

@mikepapadim mikepapadim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing! Can you update the README with the link and instruction to download the mistral model ->https://huggingface.co/beehive-lab

@orionpapadakis
Copy link
Collaborator Author

One more thing! Can you update the README with the link and instruction to download the mistral model ->https://huggingface.co/beehive-lab

done

@mikepapadim
Copy link
Member

@orionpapadakis thanks LGTM, let me test it and add the changes for the normalization layers, then we can merge.

@mikepapadim mikepapadim merged commit 65e4888 into beehive-lab:main Jun 18, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants