$Logo$ Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Official repository of "Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning".

Install

conda create -n mathpuma python=3.9 -y
conda activate mathpuma
pip install -r requirements.txt

Model Weights

The model weights for this project are hosted on Hugging Face.

Model	Download
Math-PUMA_Qwen2VL-1.5B	🤗 Hugging Face
Math-PUMA_Qwen2VL-7B	🤗 Hugging Face
Math-PUMA_DeepSeek-Math-VL-7B	🤗 Hugging Face

Data for Training

The training data used for this model is also available on Hugging Face. You can find the dataset by visiting this link.

Train

Stage 1: Enhancing the Language Model's Mathematical Reasoning Abilities

We leverage the fine-tuning code from two repositories:

Stage 2: Progressive Upward Multimodal Alignment (PUMA)

In ./train/deepseek_math/train_script.py or ./train/qwen2/train_script.py:

Set USE_KL to "true", and set KL hyperparameters ALPHA_KL, LAMBDA_KL, and TEMP_KL.
Set TRAINABLE_PARTS to "aligner, vision_tower_low, vision_tower_high".
Set DATA_PATH, it is worth noting that the data files must contain keys image_url_2, instruction_2, and output_2.
Run ./train/deepseek_math/train_script.py or ./train/qwen2/train_script.py.

Stage 3: Multimodal Instruction Tuning

In ./train/deepseek_math/train_script.py or ./train/qwen2/train_script.py:

Set USE_KL to "false".
Set TRAINABLE_PARTS to "all".
Set DATA_PATH.
Run ./train/deepseek_math/train_script.py or ./train/qwen2/train_script.py.

Evaluate

Download images of MathVerse, MathVista, and We-Math, and put them into ./eval/data/<benchmark>/images.

In ./eval/evaluate/benchmark.py:

Set benchmark to one of ["mathverse", "mathvista", "wemath"].
To evaluate DeepSeek-Math based MLLM, set model_type to deepseek-vl, is_customvlm to "false", and provide model_path; to evaluate Qwen2 based MLLM or other customized MLLMs, set is_customvlm to "true", and provide model_path.
Run ./eval/evaluate/benchmark.py.

Citation

If you find Math-PUMA useful for your research and applications, please kindly cite using this BibTeX:

@inproceedings{zhuang2025math,
  title={Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning},
  author={Zhuang, Wenwen and Huang, Xin and Zhang, Xiantao and Zeng, Jin},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={24},
  pages={26183--26191},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
deepseek_vl		deepseek_vl
eval		eval
images		images
llava		llava
models		models
qwen2vlm		qwen2vlm
train		train
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

$Logo$ Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Install

Model Weights

Data for Training

Train

Stage 1: Enhancing the Language Model's Mathematical Reasoning Abilities

Stage 2: Progressive Upward Multimodal Alignment (PUMA)

Stage 3: Multimodal Instruction Tuning

Evaluate

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Uh oh!

License

Uh oh!

wwzhuang01/Math-PUMA

Folders and files

Latest commit

History

Repository files navigation

Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Install

Model Weights

Data for Training

Train

Stage 1: Enhancing the Language Model's Mathematical Reasoning Abilities

Stage 2: Progressive Upward Multimodal Alignment (PUMA)

Stage 3: Multimodal Instruction Tuning

Evaluate

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

$Logo$ Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Packages