Skip to content

Commit cd1a661

Browse files
2015arorasErfanBaghaei
authored andcommitted
Add FlexOlmo model (huggingface#40921)
* transformers add-new-model-like * Add FlexOlmo implementation * Update FlexOlmo docs * Set default tokenization for flex olmo * Update FlexOlmo tests * Update attention comment * Remove unneeded use of `sliding_window`
1 parent f962aaf commit cd1a661

File tree

12 files changed

+1541
-0
lines changed

12 files changed

+1541
-0
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -485,6 +485,8 @@
485485
title: FLAN-UL2
486486
- local: model_doc/flaubert
487487
title: FlauBERT
488+
- local: model_doc/flex_olmo
489+
title: FlexOlmo
488490
- local: model_doc/fnet
489491
title: FNet
490492
- local: model_doc/fsmt
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
<!--Copyright 2025 the HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License.
14+
15+
16+
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.
17+
18+
-->
19+
*This model was released on 2025-07-09 and added to Hugging Face Transformers on 2025-09-15.*
20+
<div style="float: right;">
21+
<div class="flex flex-wrap space-x-1">
22+
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
23+
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
24+
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
25+
</div>
26+
</div>
27+
28+
# FlexOlmo
29+
30+
[FlexOlmo](https://huggingface.co/papers/2507.07024) is a new class of language models (LMs) that supports (1) distributed training without data sharing, where different model parameters are independently trained on closed datasets, and (2) data-flexible inference, where these parameters along with their associated data can be flexibly included or excluded from model inferences with no further training. FlexOlmo employs a mixture-of-experts (MoE) architecture where each expert is trained independently on closed datasets and later integrated through a new domain-informed routing without any joint training. FlexOlmo is trained on FlexMix, a corpus we curate comprising publicly available datasets alongside seven domain-specific sets, representing realistic approximations of closed sets.
31+
32+
You can find all the original FlexOlmo checkpoints under the [FlexOlmo](https://huggingface.co/collections/allenai/flexolmo-68471177a386b6e20a54c55f) collection.
33+
34+
> [!TIP]
35+
> Click on the FlexOlmo models in the right sidebar for more examples of how to apply FlexOlmo to different language tasks.
36+
37+
The example below demonstrates how to generate text with [`Pipeline`], [`AutoModel`] and from the command line.
38+
39+
<hfoptions id="usage">
40+
<hfoption id="Pipeline">
41+
42+
```py
43+
import torch
44+
from transformers import pipeline
45+
46+
pipe = pipeline(
47+
task="text-generation",
48+
model="allenai/FlexOlmo-7x7B-1T",
49+
dtype=torch.bfloat16,
50+
device=0,
51+
)
52+
53+
result = pipe("Plants create energy through a process known as")
54+
print(result)
55+
```
56+
57+
</hfoption>
58+
<hfoption id="AutoModel">
59+
60+
```py
61+
import torch
62+
from transformers import AutoModelForCausalLM, AutoTokenizer
63+
64+
tokenizer = AutoTokenizer.from_pretrained(
65+
"allenai/FlexOlmo-7x7B-1T"
66+
)
67+
68+
model = AutoModelForCausalLM.from_pretrained(
69+
"allenai/FlexOlmo-7x7B-1T",
70+
dtype=torch.bfloat16,
71+
device_map="auto",
72+
attn_implementation="sdpa"
73+
)
74+
input_ids = tokenizer("Plants create energy through a process known as", return_tensors="pt").to(model.device)
75+
76+
output = model.generate(**input_ids, max_length=50, cache_implementation="static")
77+
print(tokenizer.decode(output[0], skip_special_tokens=True))
78+
```
79+
80+
</hfoption>
81+
<hfoption id="transformers CLI">
82+
83+
```bash
84+
echo -e "Plants create energy through a process known as" | transformers-cli run --task text-generation --model allenai/FlexOlmo-7x7B-1T --device 0
85+
```
86+
87+
</hfoption>
88+
</hfoptions>
89+
90+
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
91+
92+
The example below uses [torchao](../quantization/torchao) to only quantize the weights to 4-bits.
93+
```py
94+
95+
#pip install torchao
96+
import torch
97+
from transformers import AutoModelForCausalLM, AutoTokenizer, TorchAoConfig
98+
99+
torchao_config = TorchAoConfig(
100+
"int4_weight_only",
101+
group_size=128
102+
)
103+
104+
tokenizer = AutoTokenizer.from_pretrained(
105+
"allenai/FlexOlmo-7x7B-1T"
106+
)
107+
108+
model = AutoModelForCausalLM.from_pretrained(
109+
"allenai/FlexOlmo-7x7B-1T",
110+
quantization_config=torchao_config,
111+
dtype=torch.bfloat16,
112+
device_map="auto",
113+
attn_implementation="sdpa"
114+
)
115+
input_ids = tokenizer("Plants create energy through a process known as", return_tensors="pt").to(model.device)
116+
117+
output = model.generate(**input_ids, max_length=50, cache_implementation="static")
118+
print(tokenizer.decode(output[0], skip_special_tokens=True))
119+
120+
```
121+
122+
123+
## FlexOlmoConfig
124+
125+
[[autodoc]] FlexOlmoConfig
126+
127+
## FlexOlmoForCausalLM
128+
129+
[[autodoc]] FlexOlmoForCausalLM
130+
131+
## FlexOlmoModel
132+
133+
[[autodoc]] FlexOlmoModel
134+
- forward
135+
136+
## FlexOlmoPreTrainedModel
137+
138+
[[autodoc]] FlexOlmoPreTrainedModel
139+
- forward

src/transformers/models/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@
123123
from .fastspeech2_conformer import *
124124
from .flaubert import *
125125
from .flava import *
126+
from .flex_olmo import *
126127
from .florence2 import *
127128
from .fnet import *
128129
from .focalnet import *

src/transformers/models/auto/configuration_auto.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@
148148
("fastspeech2_conformer_with_hifigan", "FastSpeech2ConformerWithHifiGanConfig"),
149149
("flaubert", "FlaubertConfig"),
150150
("flava", "FlavaConfig"),
151+
("flex_olmo", "FlexOlmoConfig"),
151152
("florence2", "Florence2Config"),
152153
("fnet", "FNetConfig"),
153154
("focalnet", "FocalNetConfig"),
@@ -580,6 +581,7 @@
580581
("flan-ul2", "FLAN-UL2"),
581582
("flaubert", "FlauBERT"),
582583
("flava", "FLAVA"),
584+
("flex_olmo", "FlexOlmo"),
583585
("florence2", "Florence2"),
584586
("fnet", "FNet"),
585587
("focalnet", "FocalNet"),

src/transformers/models/auto/modeling_auto.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
150150
("fastspeech2_conformer_with_hifigan", "FastSpeech2ConformerWithHifiGan"),
151151
("flaubert", "FlaubertModel"),
152152
("flava", "FlavaModel"),
153+
("flex_olmo", "FlexOlmoModel"),
153154
("florence2", "Florence2Model"),
154155
("fnet", "FNetModel"),
155156
("focalnet", "FocalNetModel"),
@@ -653,6 +654,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
653654
("falcon", "FalconForCausalLM"),
654655
("falcon_h1", "FalconH1ForCausalLM"),
655656
("falcon_mamba", "FalconMambaForCausalLM"),
657+
("flex_olmo", "FlexOlmoForCausalLM"),
656658
("fuyu", "FuyuForCausalLM"),
657659
("gemma", "GemmaForCausalLM"),
658660
("gemma2", "Gemma2ForCausalLM"),

src/transformers/models/auto/tokenization_auto.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,7 @@
245245
("FastSpeech2ConformerTokenizer" if is_g2p_en_available() else None, None),
246246
),
247247
("flaubert", ("FlaubertTokenizer", None)),
248+
("flex_olmo", (None, "GPT2TokenizerFast" if is_tokenizers_available() else None)),
248249
("fnet", ("FNetTokenizer", "FNetTokenizerFast" if is_tokenizers_available() else None)),
249250
("fsmt", ("FSMTTokenizer", None)),
250251
("funnel", ("FunnelTokenizer", "FunnelTokenizerFast" if is_tokenizers_available() else None)),
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# coding=utf-8
2+
# Copyright 2025 the HuggingFace Team. All rights reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
from typing import TYPE_CHECKING
17+
18+
from ...utils import _LazyModule
19+
from ...utils.import_utils import define_import_structure
20+
21+
22+
if TYPE_CHECKING:
23+
from .configuration_flex_olmo import *
24+
from .modeling_flex_olmo import *
25+
else:
26+
import sys
27+
28+
_file = globals()["__file__"]
29+
sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)

0 commit comments

Comments
 (0)