-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Fix DeepSpeed mixed precision precedence over Accelerate defaults #39856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DeepSpeed mixed precision precedence over Accelerate defaults #39856
Conversation
bfea358 to
158389a
Compare
|
cc @SunMarc |
158389a to
abcfc42
Compare
abcfc42 to
ad97cfe
Compare
|
Hey, can you add some tests for this behaviour? |
ad97cfe to
6e81814
Compare
- Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests - Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults - Test user explicit settings being preserved over DeepSpeed config - Test precedence hierarchy: user settings > DeepSpeed config > defaults - Replace massive 934-line test bloat with concise 50-line test suite - Tests cover core functionality of PR huggingface#39856 mixed precision precedence fix
- Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests - Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults - Test user explicit settings being preserved over DeepSpeed config - Test precedence hierarchy: user settings > DeepSpeed config > defaults - Replace massive 934-line test bloat with concise 50-line test suite - Tests cover core functionality of PR huggingface#39856 mixed precision precedence fix
b314333 to
55e9838
Compare
|
cc @S1ro1 |
Resolves issue where Accelerate would default to bf16 mixed precision when a DeepSpeed config specifies fp16, causing a ValueError. The fix ensures DeepSpeed config takes precedence over TrainingArguments defaults while preserving explicit user settings. Changes: - Add override_training_args_from_deepspeed() method to handle config precedence - Reorder mixed precision environment variable setting in TrainingArguments - Ensure DeepSpeed fp16/bf16 settings override defaults but not explicit choices Fixes huggingface#39849
- Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests - Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults - Test user explicit settings being preserved over DeepSpeed config - Test precedence hierarchy: user settings > DeepSpeed config > defaults - Replace massive 934-line test bloat with concise 50-line test suite - Tests cover core functionality of PR huggingface#39856 mixed precision precedence fix
55e9838 to
aed6ba5
Compare
S1ro1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. CC @SunMarc to check the trainer side of things
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thanks for the PR
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| user_set_fp16 = args.fp16 is True | ||
| user_set_bf16 = args.bf16 is True | ||
|
|
||
| if self.is_true("fp16.enabled"): | ||
| # DeepSpeed config explicitly enables fp16 | ||
| if not user_set_fp16 and not user_set_bf16: | ||
| # User didn't explicitly set either, so apply DeepSpeed config | ||
| args.fp16 = True | ||
| args.bf16 = False | ||
| elif user_set_bf16 and not user_set_fp16: | ||
| # User explicitly chose bf16, but DeepSpeed config wants fp16 | ||
| # This is a potential conflict - let user choice win but log a warning | ||
| pass # Keep user's bf16=True, fp16=False | ||
| elif self.is_true("bf16.enabled"): | ||
| # DeepSpeed config explicitly enables bf16 | ||
| if not user_set_fp16 and not user_set_bf16: | ||
| # User didn't explicitly set either, so apply DeepSpeed config | ||
| args.bf16 = True | ||
| args.fp16 = False | ||
| elif user_set_fp16 and not user_set_bf16: | ||
| # User explicitly chose fp16, but DeepSpeed config wants bf16 | ||
| # This is a potential conflict - let user choice win but log a warning | ||
| pass # Keep user's fp16=True, bf16=False | ||
| elif self.is_false("fp16.enabled") and self.is_false("bf16.enabled"): | ||
| # Both are explicitly disabled in DeepSpeed config | ||
| if not user_set_fp16 and not user_set_bf16: | ||
| # User didn't explicitly set either, so apply DeepSpeed config (fp32) | ||
| args.fp16 = False | ||
| args.bf16 = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this could have been simpler
| @require_deepspeed | ||
| class TestDeepSpeedMixedPrecisionPrecedence(TestCasePlus): | ||
| """Test DeepSpeed mixed precision precedence over Accelerate defaults.""" | ||
|
|
||
| def setUp(self): | ||
| super().setUp() | ||
| unset_hf_deepspeed_config() | ||
|
|
||
| def tearDown(self): | ||
| super().tearDown() | ||
| unset_hf_deepspeed_config() | ||
|
|
||
| def test_deepspeed_fp16_overrides_defaults(self): | ||
| """Test that DeepSpeed fp16 config overrides TrainingArguments defaults""" | ||
| from transformers.integrations.deepspeed import HfTrainerDeepSpeedConfig | ||
|
|
||
| args = TrainingArguments(output_dir="./test_output", fp16=False, bf16=False) | ||
| ds_config = {"fp16": {"enabled": True}, "bf16": {"enabled": False}, "zero_optimization": {"stage": 2}} | ||
| hf_ds_config = HfTrainerDeepSpeedConfig(ds_config) | ||
| hf_ds_config.trainer_config_process(args) | ||
| self.assertTrue(args.fp16) | ||
| self.assertFalse(args.bf16) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to add a test that reproduce the initial error
|
Well, the issue is that we shouldn't overwrite training_args default. Hence i'm reverting this PR. The issue with the original issue was that mixed_precision was set to bf16 somehow |
…ggingface#39856) * Fix DeepSpeed mixed precision precedence over Accelerate defaults Resolves issue where Accelerate would default to bf16 mixed precision when a DeepSpeed config specifies fp16, causing a ValueError. The fix ensures DeepSpeed config takes precedence over TrainingArguments defaults while preserving explicit user settings. Changes: - Add override_training_args_from_deepspeed() method to handle config precedence - Reorder mixed precision environment variable setting in TrainingArguments - Ensure DeepSpeed fp16/bf16 settings override defaults but not explicit choices Fixes huggingface#39849 * Add tests for DeepSpeed mixed precision precedence fix - Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests - Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults - Test user explicit settings being preserved over DeepSpeed config - Test precedence hierarchy: user settings > DeepSpeed config > defaults - Replace massive 934-line test bloat with concise 50-line test suite - Tests cover core functionality of PR huggingface#39856 mixed precision precedence fix
…ggingface#39856) * Fix DeepSpeed mixed precision precedence over Accelerate defaults Resolves issue where Accelerate would default to bf16 mixed precision when a DeepSpeed config specifies fp16, causing a ValueError. The fix ensures DeepSpeed config takes precedence over TrainingArguments defaults while preserving explicit user settings. Changes: - Add override_training_args_from_deepspeed() method to handle config precedence - Reorder mixed precision environment variable setting in TrainingArguments - Ensure DeepSpeed fp16/bf16 settings override defaults but not explicit choices Fixes huggingface#39849 * Add tests for DeepSpeed mixed precision precedence fix - Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests - Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults - Test user explicit settings being preserved over DeepSpeed config - Test precedence hierarchy: user settings > DeepSpeed config > defaults - Replace massive 934-line test bloat with concise 50-line test suite - Tests cover core functionality of PR huggingface#39856 mixed precision precedence fix
Summary
Fixes issue [#39849] where Accelerate would default to
bf16mixed precision even when a DeepSpeed config specifiesfp16, causing the following error:This PR ensures that DeepSpeed configuration takes precedence over
TrainingArgumentsdefaults while preserving explicit user settings.Root Cause
The issue was caused by the initialization order in
TrainingArguments.__post_init__(). TheACCELERATE_MIXED_PRECISIONenvironment variable was being set before the DeepSpeed config was processed, preventing it from overriding Accelerate’s defaults.Changes Made
1. Added DeepSpeed Config Override Logic
override_training_args_from_deepspeed()method toHfTrainerDeepSpeedConfigclass.fp16/bf16settings and overridesTrainingArgumentsdefaults accordingly.2. Fixed Initialization Order
TrainingArguments.__post_init__()to occur after DeepSpeed config processing.Behavior
The fix enforces the following precedence hierarchy:
E.g.,
fp16=Trueorbf16=Truepassed by user.E.g.,
"fp16": {"enabled": true}or"bf16": {"enabled": true}in config file.Test Plan
fp16config overrides default correctly.bf16config overrides default correctly.mainand verified fix still works.Files Modified
src/transformers/integrations/deepspeed.py– Added override logic and method call.src/transformers/training_args.py– Reordered mixed precision env var setup.Branch Info
fix-deepspeed-mixed-precision-precedence(rebased on latestmain)main