You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/model_doc/bamba.md
+25-32Lines changed: 25 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,51 +16,44 @@ rendered properly in your Markdown viewer.
16
16
17
17
# Bamba
18
18
19
-
## Overview
20
19
21
-
TODO
20
+
## Overview
22
21
23
-
Tips:
22
+
Bamba-9B is a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. It is trained from scratch using a two-stage training approach. In the first stage, the model is trained on 2 trillion tokens from the Dolma v1.7 dataset. In the second stage, it undergoes additional training on 200 billion tokens, leveraging a carefully curated blend of high-quality data to further refine its performance and enhance output quality.
24
23
25
-
```python
26
-
import torch
27
-
from transformers import AutoModelForCausalLM, AutoTokenizer
24
+
Checkout all Bamba-9B model checkpoints [here](https://github.com/foundation-model-stack/bamba).
0 commit comments