GitHub - large-loris-models/casa: Constrained sampling for language models.

Constrained aligned sampling algorithms for language models via CARS, MCMC, and rejection sampling variants.

Installation

Prerequisites

Python 3.12+
CUDA-compatible GPU (recommended)

Install from source

# Clone the repository
git clone https://github.com/LargeLorisModels/casa.git
cd casa

# Create virtual environment (optional but recommended)
python -m venv .venv
source .venv/bin/activate

# Install package in editable mode
pip install -e .

Using uv (faster)

# Install uv if you haven't
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create environment and install
uv venv
source .venv/bin/activate
uv pip install -e .

Quick Start

from casa import LLM, Grammar, CARS

llm = LLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

grammar_str = """
start: CHARACTER " " ACTION " " LOCATION "."
CHARACTER: "a dragon" | "a knight" | "a wizard"
ACTION: "discovered" | "protected" | "enchanted"
LOCATION: "the castle" | "the forest" | "the treasure"
"""

prompt = "Once upon a time,"
grammar = Grammar.from_string(grammar_str, llm.tokenizer)
sampler = CARS(llm, grammar, max_new_tokens=32, verbose=True)
results = sampler.sample(prompt, n_samples=10, max_attempts=100)

if results:
	print("\nGenerated samples,")
	for i, result in enumerate(results, 1):
		print(f"  {i}. {prompt} {result.text}")
else:
	print("Failed to generate any samples")

Example Output

With verbose=True, you'll see the rejection samplers performance in real-time. Running above code,

Sample 01/10: ████████████████████████████████████████ 78 attempts
Sample 02/10: ████████ 12 attempts
Sample 03/10: █ 2 attempts
Sample 04/10: █ 1 attempts
Sample 05/10: █ 1 attempts
...

Generated samples:
  1. Once upon a time, a dragon enchanted the castle.
  2. Once upon a time, a dragon enchanted the forest.
  3. Once upon a time, a dragon enchanted the forest.
 ...

Available Samplers

CARS: Constrained Adaptive Rejection Sampling
MCMC: Markov Chain Monte Carlo sampling. Avaliable variants,
- Uniform - Randomly resamples from any position. Balances exploration with structural preservation.
- Priority - Resample higher perplexity regions first. Targets uncertain tokens for refinement.
- Restart - Generates from scratch. Independent proposals via importance sampling.
ARS: Adaptive Rejection Sampling
RSFT: Rejection Sampling with First Token constraints
RS: Basic Rejection Sampling

Running Tests

# Run the example
python tests/test_cars.py

# Or other samplers
python tests/test_mcmc.py

References

CASA implements the following algorithms from:

RS, ARS, RSFT, CARS
Constrained Adaptive Rejection Sampling
Preprint | arXiv:2510.01902
MCMC - Uniform, Priority, Restart
Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective
NeurIPS 2025 | arXiv:2506.05754

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src/casa		src/casa
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logo.png		logo.png
pyproject.toml		pyproject.toml
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Prerequisites

Install from source

Using uv (faster)

Quick Start

Example Output

Available Samplers

Running Tests

References

License

About

Uh oh!

Releases

Packages

Languages

License

large-loris-models/casa

Folders and files

Latest commit

History

Repository files navigation

Installation

Prerequisites

Install from source

Using uv (faster)

Quick Start

Example Output

Available Samplers

Running Tests

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages