Add arcee model #21296

alyosha-swamy · 2025-07-21T09:25:05Z

[New Model] Support Arcee (Arcee Foundational Models)

1. Purpose (Why this PR?)

Add inference support for Arcee Foundational Model (AFM) so that users can serve it with vLLM in both Python and API-server workflows. AFM uses a unique ReLU² activation in its MLP layers, differentiating it from standard Llama-based models.

2. Model details

Field	Value / Reference
Source repo / HF id	huggingface.co/arcee-ai/AFM-4.5B-Base
Architecture	Llama-style decoder-only transformer with ReLU² MLP activation
Context length	64k tokens
Hidden size / #layers	4096 / 32
License	CC BY-NC 4.0
Special quirks	Uses ReLU² (squared ReLU) activation instead of SiLU in MLP layers

3. Implementation overview

Added ArceeForCausalLM class in vllm/model_executor/models/arcee.py with custom ArceeMLP using ReLU² activation
Registered model in _TEXT_GENERATION_MODELS in vllm/model_executor/models/registry.py
Updated docs/models/supported_models.md with Arcee entry in text generation table
Reused LlamaAttention from existing Llama implementation for attention layers
Implemented proper LoRA and Pipeline Parallelism support

4. Performance / sanity check

$ python -m vllm.entrypoints.openai.api_server --model arcee-ai/AFM-4.5B-Base --trust-remote-code
$ curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
    "model": "arcee-ai/AFM-4.5B-Base",
    "prompt": "The future of artificial intelligence is",
    "max_tokens": 50
}'

Expected: Coherent completion about life's meaning

Observed: " a question that has been asked throughout the history of mankind. The search for an answer to this question has inspired countless works of art, literature, and philosophy. Whether we consider the existentialist ideas of Albert Camus or the religious perspectives of spiritual leaders"

5. Test plan ✔️

Test	Command	Expected
Unit	`pytest tests/models/test_arcee.py`	All tests pass
Model Loading	`python -c "from vllm import LLM; llm = LLM('arcee-ai/AFM-4.5B-Base')"`	Model loads without errors
Integration	`vllm serve arcee-ai/AFM-4.5B-Base --trust-remote-code`	Server starts, responds to requests
Generation	`curl localhost:8000/v1/completions`	200 OK + valid completions

6. Documentation

Added row to docs/models/supported_models.md under Text Generation models
Model listed as ArceeForCausalLM with example model arcee-ai/AFM-4.5B-Base
Marked as supporting LoRA (✅), Pipeline Parallel (✅), and V1 (✅)

Checklist

I ran pre-commit run --all-files (ruff formatting)
All CI tests pass locally (pytest -q)
The PR description follows vLLM's "Essential Elements" template
No breaking changes for existing model classes

Notes for reviewers

The key architectural difference from standard Llama models is the MLP activation function. Arcee uses ReLU² (squared ReLU) instead of SiLU:

ArceeMLP implements: x = torch.pow(torch.relu(x), 2)
No gating mechanism (no gate_proj), only up_proj and down_proj
All other components (attention, layer norm, etc.) reuse existing Llama implementations

The model has been tested with an internal HF repo during development, but the official model is arcee-ai/AFM-4.5B-Base.

Test result

seq	Prompt	vLLM Output
0	"The meaning of life is"	" a question that has been asked throughout the history of mankind. The search for an answer to this question has inspired countless works of art, literature, and philosophy. Whether we consider the existentialist ideas of Albert Camus or the religious perspectives of spiritual leaders"
1	"Climate change is primarily caused by"	" human activity, specifically the emission of greenhouse gases such as carbon dioxide (CO2) and methane (CH4). It leads to changes in average temperatures and weather patterns, impacting both nature and human society."
2	"Machine learning algorithms work by"	" training a predictive model using labeled training data: the model detects patterns in the training data and learns from it. That model is then tested using a test set, which it must predict to achieve a good accuracy rate."

All outputs are coherent and contextually appropriate.

Signed-off-by: alyosha-swamy <[email protected]>

…ment - Remove deprecated supported_lora_modules attribute - Add ArceeForCausalLM to test registry Signed-off-by: alyosha-swamy <[email protected]>

- Set is_available_online=False in test registry for CI compatibility Signed-off-by: alyosha-swamy <[email protected]>

- Inherit from LlamaForCausalLM for most functionality - Set is_available_online=False in test registry for CI compatibility Signed-off-by: alyosha-swamy <[email protected]>

Signed-off-by: alyosha-swamy <[email protected]>

github-actions · 2025-07-21T09:25:16Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

alyosha-swamy · 2025-07-21T09:25:32Z

Generated Outputs:

Prompt: 'Hello, my name is'
Output: ' Helen and I am from Shanghai. I’m 15 years old. I’m'

Prompt: 'The president of the United States is'
Output: ' the head of state and head of government of the United States. He or she'

Prompt: 'The capital of France is'
Output: ' Paris. Paris is also the largest city in France. The city of Paris is'

Prompt: 'The future of AI is'
Output: ' being decided now,, and you can make a difference.\nFebruary 21,'

cc @jeejeelee

gemini-code-assist

Code Review

The code changes introduce the Arcee model. Refactor the weight loading mechanism to align better with vLLM's standard practices, and correct the test registry entry for the Arcee model.

gemini-code-assist · 2025-07-21T09:26:27Z

vllm/model_executor/models/arcee.py

+        # Use AutoWeightsLoader for consistency with vLLM's loading mechanism
+        from vllm.model_executor.models.utils import AutoWeightsLoader
+        loader = AutoWeightsLoader(
+            self,
+            skip_prefixes=(["lm_head."]
+                           if self.config.tie_word_embeddings else None))
+        # AutoWeightLoader handles weight name remapping, including fusing
+        # separate q_proj, k_proj, v_proj into qkv_proj
+        return loader.load_weights(weights)


This load_weights method can be improved to fully leverage AutoWeightsLoader and simplify the overall implementation. The custom load_weights method in ArceeModel (lines 275-368) duplicates logic from AutoWeightsLoader and is less maintainable. By adding skip_prefixes and skip_substrs to the AutoWeightsLoader initialization here, you can handle the Arcee-specific weight skipping. This allows for the complete removal of the ArceeModel.load_weights method, resulting in cleaner code that follows vLLM's standard practices.

from vllm.model_executor.models.utils import AutoWeightsLoader loader = AutoWeightsLoader( self, skip_prefixes=(['lm_head.'] if self.config.tie_word_embeddings else None), skip_substrs=["gate_proj"]) return loader.load_weights(weights)

tests/models/registry.py

jeejeelee

overall LGTM, please fix the pre-commit failure

Signed-off-by: alyosha-swamy <[email protected]>

hmellor · 2025-07-21T10:41:51Z

@alyosha-swamy please stop making new PRs for this model.

It's making discussion about the PR unnecessarily hard to follow:

hmellor · 2025-07-21T10:43:22Z

In #21267 it was pointed out that

If the architecture is the same as Llama with the activation function changed it probably will work with --model-impl transformers.

Meaning that it's not necessary to make any changes to vLLM.

Repeatedly making new PRs will hide this from reviewers.

alyosha-swamy · 2025-07-21T10:44:55Z

In #21267 it was pointed out that

If the architecture is the same as Llama with the activation function changed it probably will work with --model-impl transformers.
Meaning that it's not necessary to make any changes to vLLM.

Repeatedly making new PRs will hide this from reviewers.

Understood, however to enable native vLLM support for better performance, it would be better if we could add this

hmellor · 2025-07-21T10:49:03Z

Have you tested the performance with --model-impl transformers? (you'll probably also need --trust-remote-code because your model is not in the Transformers library yet)

The performance should be close to or the same as a natively supported model.

alyosha-swamy · 2025-07-21T10:53:55Z

Model outputs are nearly identical; however, the model loads in less than half the time when using it directly versus with the model impl HF flag.

alyosha-swamy · 2025-07-21T10:55:11Z

AFM is in this release https://github.com/huggingface/transformers/releases/tag/v4.53.0

hmellor · 2025-07-21T11:05:46Z

Model outputs are nearly identical; however, the model loads in less than half the time when using it directly versus with the model impl HF flag.

How is the performance of the model once loaded with --model-impl transformers?

Thank you for the feedback about model loading speed, this is very useful to know. Some effort could be put into improving this, which would affect all models loaded with --model-impl transformers.

AFM is in this release https://github.com/huggingface/transformers/releases/tag/v4.53.0

Oh I see, thank you for the additional context. When you said that the checkpoint hadn't been released, I assumed the implementation hadn't been released on Transformers either. In that case you wouldn't need --trust-remote-code to use --model-impl transformers.

alyosha-swamy · 2025-07-21T11:25:13Z

How is the performance of the model once loaded with --model-impl transformers?

I have verified the logprobs for these and can confirm they are matching with the native impl.

Since all the other fully supported models still need to be included in registry.py as well as have their own file, we would like to have this PR implemented.

Signed-off-by: Jee Jee Li <[email protected]>

hmellor · 2025-07-21T14:38:55Z

Since all the other fully supported models still need to be included in registry.py as well as have their own file, we would like to have this PR implemented.

This isn't actually necessary. Models can be officially supported using the Transformers backend with an entry like:

    "ArceeForCausalLM": ("transformers", "TransformersForCausalLM"),

If it's necessary to add this model explicitly now, I won't block it. But in future we'd prefer not to maintain copies of models from Transformers.

alyosha-swamy · 2025-07-21T15:58:01Z

I'd prefer to merge it for now, understood for future reference

Signed-off-by: alyosha-swamy <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Signed-off-by: qizixi <[email protected]>

Signed-off-by: alyosha-swamy <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Signed-off-by: x22x22 <[email protected]>

Signed-off-by: alyosha-swamy <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Jee Jee Li <[email protected]>

Signed-off-by: alyosha-swamy <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

Signed-off-by: alyosha-swamy <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Signed-off-by: Paul Pak <[email protected]>

Signed-off-by: alyosha-swamy <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

Signed-off-by: alyosha-swamy <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Jee Jee Li <[email protected]>

alyosha-swamy added 8 commits July 20, 2025 16:31

Add Arcee (AFM) model support to vLLM

2823d64

Signed-off-by: alyosha-swamy <[email protected]>

Update arcee.py with code review improvements and syntax fix

3a047a6

Signed-off-by: alyosha-swamy <[email protected]>

Fix linting issues: line length E501 errors

0f5be4b

Signed-off-by: alyosha-swamy <[email protected]>

Fix linting issues: line length E501 errors

ac8b421

Signed-off-by: alyosha-swamy <[email protected]>

Address review comments: remove supported_lora_modules and update com…

d81a460

…ment - Remove deprecated supported_lora_modules attribute - Add ArceeForCausalLM to test registry Signed-off-by: alyosha-swamy <[email protected]>

- Inherit from LlamaForCausalLM for most functionality

cb931bd

- Set is_available_online=False in test registry for CI compatibility Signed-off-by: alyosha-swamy <[email protected]>

Add Arcee model support with ReLU^2 activation

9a3c1c1

- Inherit from LlamaForCausalLM for most functionality - Set is_available_online=False in test registry for CI compatibility Signed-off-by: alyosha-swamy <[email protected]>

Set is_available_online=False for ArceeForCausalLM in test registry

942b6af

Signed-off-by: alyosha-swamy <[email protected]>

alyosha-swamy requested review from DarkLight1337, hmellor and ywang96 as code owners July 21, 2025 09:25

mergify bot added documentation Improvements or additions to documentation new-model Requests to new models labels Jul 21, 2025

gemini-code-assist bot reviewed Jul 21, 2025

View reviewed changes

jeejeelee approved these changes Jul 21, 2025

View reviewed changes

Improve Arcee model implementation

42c9084

Signed-off-by: alyosha-swamy <[email protected]>

Fix FMT

ca686bc

Signed-off-by: Jee Jee Li <[email protected]>

DarkLight1337 added this to the v0.10.0 milestone Jul 21, 2025

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 21, 2025

Fix dot

238dba5

Signed-off-by: Jee Jee Li <[email protected]>

DarkLight1337 added the force-merge label Jul 22, 2025

simon-mo merged commit 82b8027 into vllm-project:main Jul 22, 2025
67 of 69 checks passed

bartowski1182 mentioned this pull request Jul 28, 2025

Add ArceeForCausalLM support mlc-ai/mlc-llm#3294

Open

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

Add arcee model (vllm-project#21296)

bd42475

Signed-off-by: alyosha-swamy <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Jee Jee Li <[email protected]>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

Add arcee model (vllm-project#21296)

dad42b3

Signed-off-by: alyosha-swamy <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Jee Jee Li <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Add arcee model (vllm-project#21296)

c3d505d

Signed-off-by: alyosha-swamy <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Jee Jee Li <[email protected]>

Uh oh!

Add arcee model #21296

Add arcee model #21296

Uh oh!

Conversation

alyosha-swamy commented Jul 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[New Model] Support Arcee (Arcee Foundational Models)

1. Purpose (Why this PR?)

2. Model details

3. Implementation overview

4. Performance / sanity check

5. Test plan ✔️

6. Documentation

Checklist

Notes for reviewers

Test result

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

alyosha-swamy commented Jul 21, 2025

Generated Outputs:

Prompt: 'Hello, my name is' Output: ' Helen and I am from Shanghai. I’m 15 years old. I’m'

Prompt: 'The president of the United States is' Output: ' the head of state and head of government of the United States. He or she'

Prompt: 'The capital of France is' Output: ' Paris. Paris is also the largest city in France. The city of Paris is'

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor commented Jul 21, 2025

Uh oh!

hmellor commented Jul 21, 2025

Uh oh!

alyosha-swamy commented Jul 21, 2025

Uh oh!

hmellor commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alyosha-swamy commented Jul 21, 2025

Uh oh!

alyosha-swamy commented Jul 21, 2025

Uh oh!

hmellor commented Jul 21, 2025

Uh oh!

alyosha-swamy commented Jul 21, 2025

Uh oh!

hmellor commented Jul 21, 2025

Uh oh!

alyosha-swamy commented Jul 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alyosha-swamy commented Jul 21, 2025 •

edited by github-actions bot

Loading

Prompt: 'Hello, my name is'
Output: ' Helen and I am from Shanghai. I’m 15 years old. I’m'

Prompt: 'The president of the United States is'
Output: ' the head of state and head of government of the United States. He or she'

Prompt: 'The capital of France is'
Output: ' Paris. Paris is also the largest city in France. The city of Paris is'

hmellor commented Jul 21, 2025 •

edited

Loading