Welcome to the official repository for KG-TRACES! 🚀 We're enhancing Large Language Models to reason with explainable, accuracy, and traceability by leveraging the power of Knowledge Graphs.
Vanilla LLMs are amazing, but when it comes to complex, multi-hop reasoning, they can sometimes...
- 🤯 Hallucinate facts
- ❓ Provide answers without clear justification
- 🚧 Hit a wall in scenarios demanding trustworthy, step-by-step explanations
This limits their use in critical domains. That's where KG-TRACES steps in!
*Figure 1: KG-TRACES (d) stands out by generating faithful, attributable responses, adapting to different KG access conditions.*KG-TRACES is a novel framework that explicitly teaches LLMs how to reason by supervising their internal "thought process" with knowledge graphs guidance. We guide them to:
- 🗺️ Chart the Course: Predict symbolic knowledge graph reasoning paths from question to answer.
- 📝 Show Their Work: Generate attribution-aware reasoning explanations, clearly claim whether each step comes from the KG or the LLM's internal knowledge 🧠, and how effective it was!
- 🔍 Crystal-Clear Explanations: Understand why the LLM reached its conclusion.
- 🛡️ Trustworthy & Attributable: Know the evidence source of each reasoning step.
- 💪 Robust Performance: Excels even with limited or no direct KG access during inference.
- 🌍 Versatile: Shows strong generalization to specialized fields like medicine.
[2025-06-04]: We opensource KG-TRACES codebase and the training dataset of KG-TRACES.[2025-06-03]: arxiv KG-TRACES paper is live! Check it out on arXiv.
Ready to dive in? Here's how:
Make sure you have:
- Python 3.12+
- PyTorch 2.60+
- 🤗 Transformers & Datasets
- deepspeed 0.16+
git clone https://github.com/Edaizi/KG-TRACES.git
cd KG-TRACES
conda create -n kg_traces python=3.12
pip install -r requirements.txtWe've meticulously prepared augmented SFT datasets for WebQSP and CWQ, packed with reasoning paths and augmented reasoning process with source attributions. Find them on Hugging Face:
Using the Datasets:
from datasets import load_dataset
webqsp_sft_data = load_dataset("Edaizi/KG-TRACES-WebQSP")
cwq_sft_data = load_dataset("Edaizi/KG-TRACES-CWQ")
print("Example WebQSP SFT instance:")
print(webqsp_sft_data['train'][0]) # Show an exampleJust run scripts/train.sh easily:
bash scripts/train.shJust run scripts/predict.sh easily:
bash scripts/predict.shDon't want to train from scratch? Grab our fine-tuned KG-TRACES models from the Hugging Face Model Hub: KG-TRACES
from transformers import AutoModelForCausalLM, AutoTokenizer
model_hub_name = "Edaizi/KG-TRACES"
tokenizer = AutoTokenizer.from_pretrained(model_hub_name)
model = AutoModelForCausalLM.from_pretrained(model_hub_name)For any questions or feedback, please:
- Open an issue in the GitHub repository
- Reach out to us at [email protected]
If KG-TRACES helps your research or project, we'd love a shout-out! Please cite:
@misc{wu2025kgtracesenhancinglargelanguage,
title={KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision},
author={Rong Wu and Pinlong Cai and Jianbiao Mei and Licheng Wen and Tao Hu and Xuemeng Yang and Daocheng Fu and Botian Shi},
year={2025},
eprint={2506.00783},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.00783},
}We utilized the following repos during development:



