Add Arcee (AFM) model support to vLLM #21263

alyosha-swamy · 2025-07-20T20:47:09Z

[New Model] Support Arcee (Arcee Foundational Models)

1. Purpose (Why this PR?)

Add inference support for Arcee Foundational Model (AFM) so that users can serve it with vLLM in both Python and API-server workflows. AFM uses a unique ReLU² activation in its MLP layers, differentiating it from standard Llama-based models.

2. Model details

Field	Value / Reference
Source repo / HF id	huggingface.co/arcee-ai/AFM-4.5B-Base
Architecture	Llama-style decoder-only transformer with ReLU² MLP activation
Context length	64k tokens
Hidden size / #layers	4096 / 32
License	CC BY-NC 4.0
Special quirks	Uses ReLU² (squared ReLU) activation instead of SiLU in MLP layers

3. Implementation overview

Added ArceeForCausalLM class in vllm/model_executor/models/arcee.py with custom ArceeMLP using ReLU² activation
Registered model in _TEXT_GENERATION_MODELS in vllm/model_executor/models/registry.py
Updated docs/models/supported_models.md with Arcee entry in text generation table
Reused LlamaAttention from existing Llama implementation for attention layers
Implemented proper LoRA and Pipeline Parallelism support

4. Performance / sanity check

$ python -m vllm.entrypoints.openai.api_server --model arcee-ai/AFM-4.5B-Base --trust-remote-code
$ curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
    "model": "arcee-ai/AFM-4.5B-Base",
    "prompt": "The future of artificial intelligence is",
    "max_tokens": 50
}'

Expected: Coherent completion about AI

Observed: "The future of artificial intelligence is bright and full of possibilities. As AI continues to evolve, we can expect to see significant advancements in areas such as natural language processing, computer vision, and machine learning..."

5. Test plan ✔️

Test	Command	Expected
Unit	`pytest tests/models/test_arcee.py`	All tests pass
Model Loading	`python -c "from vllm import LLM; llm = LLM('arcee-ai/AFM-4.5B-Base')"`	Model loads without errors
Integration	`vllm serve arcee-ai/AFM-4.5B-Base --trust-remote-code`	Server starts, responds to requests
Generation	`curl localhost:8000/v1/completions`	200 OK + valid completions

6. Documentation

Added row to docs/models/supported_models.md under Text Generation models
Model listed as ArceeForCausalLM with example model arcee-ai/AFM-4.5B-Base
Marked as supporting LoRA (✅), Pipeline Parallel (✅), and V1 (✅)

Checklist

I ran pre-commit run --all-files (ruff formatting)
All CI tests pass locally (pytest -q)
The PR description follows vLLM's "Essential Elements" template
No breaking changes for existing model classes

Notes for reviewers

The key architectural difference from standard Llama models is the MLP activation function. Arcee uses ReLU² (squared ReLU) instead of SiLU:

ArceeMLP implements: x = torch.pow(torch.relu(x), 2)
No gating mechanism (no gate_proj), only up_proj and down_proj
All other components (attention, layer norm, etc.) reuse existing Llama implementations

The model has been tested with an internal HF repo during development, but the official model is arcee-ai/AFM-4.5B-Base.

Test result

seq	Prompt	vLLM Output
0	"Hello, world!"	"Hello, world! Welcome to the exciting realm of programming..."
1	"The meaning of life is"	"The meaning of life is a profound question that has puzzled philosophers..."
2	"In 2025, technology will"	"In 2025, technology will continue to reshape our daily lives with advances in AI..."

All outputs are coherent and contextually appropriate.

github-actions · 2025-07-20T20:47:17Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

The code introduces a new model, ArceeForCausalLM, with a unique ReLU² activation. The implementation is well-structured, but some import statements are misplaced and a minor performance improvement can be made.

gemini-code-assist · 2025-07-20T20:49:04Z