[model] Add support for Mistral models #17

orionpapadakis · 2025-06-11T16:17:13Z

Summary

This PR integrates the Mistral LLM into the GPULlama3 repository.

To support this integration, several architectural changes and refactorings were made to promote component abstractions and a modular, extensible design.

Key Changes

1. Model Abstraction

The original Llama class has been refactored into a Model interface (under a new model package).
Llama-specific functionality is moved to model.llama.Llama.
A new implementation for Mistral is introduced in model.mistral.Mistral.

2. Tokenizer Abstraction

Introduced a Tokenizer interface under the tokenizer.impl package.
Implemented two tokenizers:
- LlamaTokenizer for GPT-2-style BPE.
- MistralTokenizer for TikToken-style BPE.
The Vocabulary class has been relocated to the tokenizer package.

3. ChatFormat Abstraction

ChatFormat functionality has been refactored into an abstract form to support model-specific formatting and enable future extensions.

4. Inference Refactoring

Inference logic is decoupled from the model classes.
Introduced a dedicated inference package with:
- InferenceEngine: Entry point for token generation (generateToken, generateTokenGPU methods).
- InferenceCore: Contains reusable core operations (e.g., rmsnorm, forward, etc.).

Introduced `kvDim` and `kvMul` methods in `Configuration` and `MistralConfiguration` to enhance model configuration flexibility. Refactored TornadoVM classes to generalize handling of different models by replacing `Llama`-specific types with `Model` interface. Streamlined token generation logic to support conditional GPU execution with TornadoVM.

Move format classes from auxiliary.format to model.format to fix dependency direction. These classes are only used by model classes, so co-locating them improves package cohesion.

CLAassistant · 2025-06-11T16:17:22Z

All committers have signed the CLA.

Copilot

Pull Request Overview

This PR integrates the Mistral LLM into the GPULlama3 codebase by introducing model and tokenizer abstractions, extending chat formatting logic, and refactoring inference loading.

Refactor Llama into a generic Model interface and add Mistral implementation
Introduce Tokenizer interface with LlamaTokenizer and MistralTokenizer
Abstract ChatFormat for both Llama and Mistral and enhance ModelLoader to detect and load each

Reviewed Changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
model/llama/LlamaConfiguration.java	New record implementing `Configuration` for Llama
model/llama/Llama.java	Updated to implement generic `Model` interface
model/format/ChatFormat.java	Interface and factory to select Llama/Mistral format
model/format/MistralChatFormat.java	New chat formatter for Mistral-specific tokens
model/format/LlamaChatFormat.java	Refined Llama chat formatting under `ChatFormat`
loader/weights/ModelLoader.java	Detects GGUF metadata and loads the appropriate model
Model.java	Unified inference entry points under the `Model` interface

Comments suppressed due to low confidence (2)

src/main/java/com/example/model/format/LlamaChatFormat.java:60

This loop refers to LlamaChatFormat.Message, but Message is defined in the ChatFormat interface. Change to for (ChatFormat.Message message : dialog) to match the interface type.

for (LlamaChatFormat.Message message : dialog) {

src/main/java/com/example/Model.java:103

List<Integer> does not have getLast() or removeLast() methods. Use responseTokens.get(responseTokens.size() - 1) and responseTokens.remove(responseTokens.size() - 1) instead.

if (!responseTokens.isEmpty() && stopTokens.contains(responseTokens.getLast())) {

src/main/java/com/example/model/format/MistralChatFormat.java

src/main/java/com/example/Options.java

mikepapadim · 2025-06-11T19:43:36Z

src/main/java/com/example/loader/weights/ModelLoader.java


-    public static Llama loadModel(Path ggufPath, int contextLength, boolean loadWeights) throws IOException {
+        // Check by vocabulary size as fallback
+        if (vocabSize != null) {


Is there any other way to detect the model here instead of checking sizes?

indeed checking the vocab size seems odd, but, it should be noted that it's the third check in line and just acts as a fallback in case the model name (1st check) and the tokenizer metadata (2nd check) are not enough. imho, we can keep only the model name check.

src/main/java/com/example/loader/weights/ModelLoader.java

src/main/java/com/example/model/mistral/MistralConfiguration.java

src/main/java/com/example/loader/weights/ModelLoader.java

mikepapadim · 2025-06-11T19:52:01Z

Fixes #18

src/main/java/com/example/Options.java

mikepapadim

One more thing! Can you update the README with the link and instruction to download the mistral model ->https://huggingface.co/beehive-lab

orionpapadakis · 2025-06-12T16:56:18Z

One more thing! Can you update the README with the link and instruction to download the mistral model ->https://huggingface.co/beehive-lab

done

mikepapadim · 2025-06-12T17:11:54Z

@orionpapadakis thanks LGTM, let me test it and add the changes for the normalization layers, then we can merge.

orionpapadakis and others added 12 commits June 11, 2025 18:38

Initial commit of Mistral port

9b68bf7

Refactor for Mistral integration with abstractions

a2ec8cc

Move sampler classes to dedicated package

115f25a

Decouple LastRunMetrics class from Llama and reuse it for Mistral

726eec2

Add comments for tokenizers

41a1733

Decouple inference implementation from Model

1a18ad4

Fully integrate TornadoVM for Mistral

b6b693f

Generalize interactive mode implementation for Llama and Mistral

420a119

Generalize instruct mode implementation for Llama and Mistral

613062c

Relocate ChatFormat classes to model.format package

2407938

Move format classes from auxiliary.format to model.format to fix dependency direction. These classes are only used by model classes, so co-locating them improves package cohesion.

Clean up

1c514ad

stratika requested review from mikepapadim and stratika and removed request for mikepapadim June 11, 2025 17:19

stratika added the enhancement New feature or request label Jun 11, 2025

stratika requested a review from mikepapadim June 11, 2025 17:23

stratika assigned orionpapadakis Jun 11, 2025

mikepapadim requested review from Copilot and mikepapadim and removed request for mikepapadim and stratika June 11, 2025 17:31

Copilot AI reviewed Jun 11, 2025

View reviewed changes

src/main/java/com/example/model/format/MistralChatFormat.java Show resolved Hide resolved

src/main/java/com/example/Options.java Show resolved Hide resolved

mikepapadim reviewed Jun 11, 2025

View reviewed changes

mikepapadim changed the title ~~Mistral LLM Integration~~ [model] Add support for Mistral models Jun 11, 2025

mikepapadim added the models label Jun 11, 2025

mikepapadim reviewed Jun 12, 2025

View reviewed changes

src/main/java/com/example/Options.java Outdated Show resolved Hide resolved

Remove redundant

1640e90

orionpapadakis added 3 commits June 12, 2025 17:46

Move ModelType enum to dedicated file

733815b

Merge createTokenizer methods into Tokenizer constructors

340b35e

Move loadModel methods to dedicated model classes

8e63862

mikepapadim reviewed Jun 12, 2025

View reviewed changes

orionpapadakis added 4 commits June 12, 2025 19:14

Add support for --suffix option in llama-tornado python script

548b55b

Generalize names and comments in llama-tornado python script

72a2b8b

Apply a formatter pass

3b4bb62

Update README

2371a7c

orionpapadakis closed this Jun 12, 2025

orionpapadakis reopened this Jun 12, 2025

This was referenced Jun 13, 2025

Refactor and Restructure #22

Open

Add JUnit 5 support and initial unit test for Tokenizer #26

Open

svntax mentioned this pull request Jun 18, 2025

Build GUI Chatbox for GPULlama3.java Inference with embedded GPU resource monitoring #24

Open

mikepapadim merged commit 65e4888 into beehive-lab:main Jun 18, 2025
1 check passed

mikepapadim mentioned this pull request Jun 25, 2025

Support for Mistral Models #18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[model] Add support for Mistral models #17

[model] Add support for Mistral models #17

Uh oh!

orionpapadakis commented Jun 11, 2025 •

edited by mikepapadim

Loading

Uh oh!

CLAassistant commented Jun 11, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

mikepapadim Jun 11, 2025 •

edited by stratika

Loading

Uh oh!

orionpapadakis Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikepapadim commented Jun 11, 2025

Uh oh!

Uh oh!

mikepapadim left a comment

Uh oh!

orionpapadakis commented Jun 12, 2025

Uh oh!

mikepapadim commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[model] Add support for Mistral models #17

[model] Add support for Mistral models #17

Uh oh!

Conversation

orionpapadakis commented Jun 11, 2025 • edited by mikepapadim Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

1. Model Abstraction

2. Tokenizer Abstraction

3. ChatFormat Abstraction

4. Inference Refactoring

Uh oh!

CLAassistant commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

mikepapadim Jun 11, 2025 • edited by stratika Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

orionpapadakis Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikepapadim commented Jun 11, 2025

Uh oh!

Uh oh!

mikepapadim left a comment

Choose a reason for hiding this comment

Uh oh!

orionpapadakis commented Jun 12, 2025

Uh oh!

mikepapadim commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

orionpapadakis commented Jun 11, 2025 •

edited by mikepapadim

Loading

CLAassistant commented Jun 11, 2025 •

edited

Loading

mikepapadim Jun 11, 2025 •

edited by stratika

Loading