Skip to content

Conversation

@remi-or
Copy link
Collaborator

@remi-or remi-or commented Nov 28, 2025

Currently, to_dict method of GenerationConfig always deletes the compile_config attribute. This is because when we call save_pretrained on a GenerationConfig object, we don't want to save the CompileConfig object.
But this create an issue where compile_config is deleted even when we load the model. For instance, this snippet:

import torch
from transformers import AutoModelForCausalLM, CompileConfig, GenerationConfig

compile_config = CompileConfig(fullgraph=True)
generation_config = GenerationConfig(compile_config=compile_config)
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct", dtype=torch.bfloat16, generation_config=generation_config)

print(f"After model loading:\n{model.generation_config = }")
model.generation_config.save_pretrained("test.json")

reloaded_generation_config = GenerationConfig.from_pretrained("test.json")
print(f"After reloading:\n{reloaded_generation_config = }")

Running on main produces:

After model loading: model.generation_config = GenerationConfig {}
After reloading from file: reloaded_generation_config = GenerationConfig {}

Running after this PR:

After model loading: model.generation_config = GenerationConfig {
  "compile_config": {
    "backend": "inductor",
    "dynamic": null,
    "fullgraph": true,
    "mode": "reduce-overhead",
    "options": null
  }
}
After reloading from file: reloaded_generation_config = GenerationConfig {}

Thus we get the desired behavior of not saving the compile_config without deleting it as soon as we load the model.

@remi-or remi-or requested a review from ArthurZucker November 28, 2025 13:36
@remi-or
Copy link
Collaborator Author

remi-or commented Nov 28, 2025

The fails in tests_tokenization and tests_torch seem unrelated. test_processors fails with the same error as on main.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys_to_pop is a bit better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants