Skip to content

AIDC-AI/Marco-o1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“ Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Version License Stars Issues Python

⭐ MarcoPolo Team ⭐

Alibaba International Digital Commerce

:octocat: Github πŸ€— Hugging Face πŸ“ Paper πŸ§‘β€πŸ’» Model πŸ—‚οΈ Data πŸ“½οΈ Demo

🎯 Marco-o1 not only focuses on subjects with standard answers, such as mathematics, physics, and coding that are highly suitable for the use of Reinforcement Learning, but we also emphasize some open-ended solutions. Our goal is to build a general model applicable to agentic, incorporating comprehensive planning capabilities and function call abilities.

⚠️ Limitations: We would like to emphasize that this research work is inspired by OpenAI's o1 (from which the name is also derived). This work aims to explore potential approaches to shed light on the currently unclear technical roadmap for large reasoning models. Besides, our focus is on open-ended questions, and we have observed interesting phenomena in multilingual applications. However, we must acknowledge that the current model primarily exhibits o1-like reasoning characteristics and its performance still fall short of a fully realized "o1" model. This is not a one-time effort, and we remain committed to continuous optimization and ongoing improvement.

Figure Description or Alt Text

Figure 1: A classic 'strawberry' question reasoned by our Marco-o1 model: "How many 'r' are in strawberry". Although the answer is correct, the final letter 'y' is overlooked during CoT. This is an interesting finding, which is discussed in issue #3.

πŸ”₯ News

  • [Coming Soon] πŸƒ Marco-o1 ???: We are working on training a more powerful reinforcement learning-based model. The new model will provide better support for agents, with enhanced planning capabilities, task decomposition abilities, and function call capabilities.

  • [Coming Soon] πŸƒ Marco-o1 ???: We are working on training a more efficient reasoning model that can actively skip serveral steps in the reasoning process while maintaining performance, thereby improving reasoning efficiency. Notably, this does not require significant changes to the original model,user ca control the model's reasoning granularity.

  • [2025/05/15] πŸ”₯ Our paper γ€ŠMarco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models》 has been accepted to the main conference of ACL 2025.

  • [2025/02/14] πŸ”₯ We released Marco-o1 v2. This version entirely relies on self-built data and has undergone DPO. It has been optimized more comprehensively for mathematical problem-solving、planning and instruction-following capabilities. 🍬 This time, our model's ability in counting letters is quite impressive! 😁

  • [2024/11/13] πŸ”₯ We released Marco-o1 v1. This initial release includes our reasoning model, optimized for complex problem-solving and versatile applications across various domains.

πŸ”” Introduction

Marco-o1 v1

OpenAI recently introduced the groundbreaking o1 model, renowned for its exceptional reasoning capabilities. This model has demonstrated outstanding performance on platforms such as AIME and CodeForces, surpassing other leading models. Inspired by this success, we aimed to push the boundaries of LLMs even further, enhancing their reasoning abilities to tackle complex, real-world challenges.

🌍 Marco-o1 leverages advanced techniques like CoT fine-tuning, MCTS, and Reasoning Action Strategies to enhance its reasoning power. As shown in Figure 2, by fine-tuning Qwen2-7B-Instruct with a combination of the filtered Open-O1 CoT dataset, Marco-o1 CoT dataset, and Marco-o1 Instruction dataset, Marco-o1 improved its handling of complex tasks. MCTS allows exploration of multiple reasoning paths using confidence scores derived from softmax-applied log probabilities of the top-k alternative tokens, guiding the model to optimal solutions. Moreover, our reasoning action strategy involves varying the granularity of actions within steps and mini-steps to optimize search efficiency and accuracy.

Figure Description or Alt Text

Figure 2: The overview of Marco-o1.

🌏 As shown in Figure 3, Marco-o1 achieved accuracy improvements of +6.17% on the MGSM (English) dataset and +5.60% on the MGSM (Chinese) dataset, showcasing enhanced reasoning capabilities.

Figure Description or Alt Text

Figure 3: The main results of Marco-o1.

🌎 Additionally, in translation tasks, we demonstrate that Marco-o1 excels in translating slang expressions, such as translating "θΏ™δΈͺιž‹ζ‹₯ζœ‰θΈ©ε±Žζ„Ÿ" (literal translation: "This shoe offers a stepping-on-poop sensation.") to "This shoe has a comfortable sole," demonstrating its superior grasp of colloquial nuances.

Figure Description or Alt Text

Figure 4: The demonstration of translation task using Marco-o1.

For more detail please refer to this or our paper.

Marco-o1 v2

For Marco-o1 v2, we have removed some data from Open-O1 and replaced it entirely with Marco-o1 CoT data. We have expanded both the categories and quantity of our CoT data, Additionally, we improved our MCTS architecture to enable dynamic addition of reflections, as shown in Figure 5. While also conducting DPO using naturally data pairs from MCTS.

Figure Description or Alt Text

Figure 5: In Marco-o1 v2, we restructured the MCTS architecture.

As mentioned in our paper, we found that models like R1 and QwQ often engage in reflection for the sake of reflection itself, which we called formalistic long-time thinking. This has a certain impact on the distillation learning of smaller models, leading to behaviors such as repetitive generate and redundant thinking.

Figure Description or Alt Text

Figure 6: Example for formalistic long-time thinking

Data constructed using MCTS is more suitable for smaller models, as it does not involve redundant thinking and reflection. Instead, we start with planning at the very beginning of the CoT process and then gradually work through the problem. We only guide the model to reflect at appropriate moments. This aligns better with the capabilities and thinking patterns of lower-capacity smaller models.

Additionally, we have conducted DPO using naturally formed positive and negative pairs from MCTS and have made some preliminary findings.

We have open-sourced our MCTS search code. For more detail please refer to this or our paper.

Marco-o1 ???

We are now working on expanding the Marco-o1 family. These expansions include a more robust model based on RL, tailored for agent scenarios. This model places greater emphasis on the accuracy of function call and planning abilities, which are crucial for current agent applications.

Additionally, as mentioned earlier, the outputs of current reasoning models tend to be quite redundancy. Unlike other works that focus on compression to enable models to distinguish problem difficulty and provide outputs of varying lengths, our goal is for the model to dynamically select skipping unnecessary reasoning steps based on a hyperparameter provided by the user.

πŸ”₯πŸ”₯ For more details, we will open source and update our latest work later.

⚑️ Released Resources

Models and Datasets

πŸ“₯ Marco-o1 v1

πŸ“₯ Marco-o1 v2

Installation

To install Marco-o1, follow these steps:

# Clone the repository
git clone https://github.com/AIDC-AI/Marco-o1

# Change to the Macaw-LLM directory
cd Marco-o1

# Install required packages
pip install -r requirements.txt

Usage

  1. Load Marco-o1-CoT model:

    # Load model directly
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1")
    model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-o1")
    
  2. Inference:

    Execute the inference script (you can give any customized inputs inside):

    ./src/output/talk_with_model.py
    
    # Use vLLM
    ./src/output/talk_with_model_vllm.py
    
  3. Deploy using FastAPI:

    Check the README.md file in examples folder.

πŸ‘¨πŸ»β€πŸ’» Acknowledgement

Main Contributors

From MarcoPolo Team, AI Business, Alibaba International Digital Commerce:

Citation

If you find Marco-o1 useful for your research and applications, please cite:

@misc{zhao2024marcoo1openreasoningmodels,
      title={Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions}, 
      author={Yu Zhao and Huifeng Yin and Bo Zeng and Hao Wang and Tianqi Shi and Chenyang Lyu and Longyue Wang and Weihua Luo and Kaifu Zhang},
      year={2024},
      eprint={2411.14405},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.14405}, 
}

@misc{yin2025wideningdistillationbottleneckreasoning,
      title={Marco o1 v2:Towards Widening The Distillation Bottleneck for Reasoning Models}, 
      author={Huifeng Yin and Yu Zhao and Minghao Wu and Xuanfan Ni and Bo Zeng and Hao Wang and Tianqi Shi and Liangying Shao and Chenyang Lyu and Longyue Wang and Weihua Luo and Kaifu Zhang},
      year={2025},
      eprint={2503.01461},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.01461}, 
}

LICENSE

This project is licensed under Apache License Version 2 (SPDX-License-identifier: Apache-2.0).

DISCLAIMER

We used compliance checking algorithms during the training process, to ensure the compliance of the trained model and dataset to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.

About

An Open Large Reasoning Model for Real-World Solutions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6