-
Notifications
You must be signed in to change notification settings - Fork 200
[1/N] Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher; seperated quant modules and score modules #586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bbb5a23 to
17e0c3f
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #586 +/- ##
==========================================
+ Coverage 74.58% 74.76% +0.17%
==========================================
Files 183 183
Lines 18412 18630 +218
==========================================
+ Hits 13733 13929 +196
- Misses 4679 4701 +22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9ebd69f to
b7bd107
Compare
e38a551 to
a310038
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughThe changes refactor quantization search architecture by introducing abstract base classes and renaming module parameters from Changes
Sequence DiagramsequenceDiagram
participant User
participant AutoQuantizeSearcher as AutoQuantizeGradientSearcher
participant BaseClass as _AutoQuantizeBaseSearcher
participant Hparam as QuantRecipeHparam
User->>AutoQuantizeSearcher: before_search()
activate AutoQuantizeSearcher
Note over AutoQuantizeSearcher: Apply grouping rules to<br/>quant_modules/score_modules
AutoQuantizeSearcher->>BaseClass: estimate_sensitivity_scores()
activate BaseClass
BaseClass-->>AutoQuantizeSearcher: Scores computed
deactivate BaseClass
Note over AutoQuantizeSearcher: Insert hparams per group<br/>with associated score_modules
AutoQuantizeSearcher->>Hparam: Create QuantRecipeHparam<br/>(quant_modules, score_modules)
activate Hparam
Hparam-->>AutoQuantizeSearcher: Hparam instance
deactivate Hparam
deactivate AutoQuantizeSearcher
User->>AutoQuantizeSearcher: run_search_with_stats()
activate AutoQuantizeSearcher
loop For each hparam
AutoQuantizeSearcher->>AutoQuantizeSearcher: Compute cost/score<br/>across recipes
AutoQuantizeSearcher->>AutoQuantizeSearcher: get_dist_syncd_score()<br/>get_dist_syncd_cost()
end
AutoQuantizeSearcher-->>User: Search results
deactivate AutoQuantizeSearcher
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelopt/torch/quantization/algorithms.py (1)
424-475: Bug: mutatingquant_recipescauses all groups after a disabled one to be treated as disabledIn
insert_hparams_after_merge_rules, this line:quant_recipes = None if disabled else quant_recipesrebinds the local
quant_recipesvariable. Once any group hasdisabled=True,quant_recipesbecomesNone, and every subsequent group in the loop will also seequant_recipes is None, even ifdisabledisFalse. That will incorrectly createQuantRecipeHparams with only the no‑quant option for all later groups and effectively disable search for them.Use a per‑group variable instead so the original list is preserved:
- quant_recipes = None if disabled else quant_recipes - hparam = QuantRecipeHparam( - quant_recipes, + group_quant_recipes = None if disabled else quant_recipes + hparam = QuantRecipeHparam( + group_quant_recipes, quant_modules=quant_modules, score_modules=score_modules, name=str(group_key), )
🧹 Nitpick comments (4)
modelopt/torch/quantization/algorithms.py (3)
350-390: Grouping and score‑module helper utilities are well‑structured; minor wording nitThe
_apply_quant_group_rule/_apply_score_group_rulehelpers give a clear, extensible contract for regex vs callable rules, and_get_score_module_from_name’sget_submodulefallback to the quant module keeps behavior robust when a score target is missing.Tiny nit: the warning text in
Line 408reads “Score will estimated…” — consider changing to “Score will be estimated…” for clarity.Also applies to: 391-412
968-973: ClarifyAutoQuantizeLossSearcherstatus or keep it explicitly abstract
AutoQuantizeLossSearchersubclasses_AutoQuantizeBaseSearcherbut doesn’t implement the abstract methods, so it remains abstract and cannot be instantiated even though its docstring suggests it is a usable searcher.Consider either:
- Implementing the loss‑based methods (even as
NotImplementedErrorstubs with clear messaging), or- Marking the class as experimental/placeholder in the docstring and keeping it obviously abstract.
This will avoid user confusion if they attempt to construct it.
579-585: Add defensive guard toget_dist_syncd_scoreinAutoQuantizeGradientSearcherto prevent crashes with modules lackingparallel_stateThe method at line 906-914 directly accesses
hparam.quant_modules[0].parallel_statewithout validation. Test models like_ToyLinearQuantdo not initializeparallel_stateduring_setup(), and the codebase patterns in Megatron plugins use defensivehasattr()checks before accessing this attribute, confirming it is not guaranteed. A simple guard prevents bothIndexError(emptyquant_modules) andAttributeError(missing attribute):def get_dist_syncd_score(self, score: float, hparam: QuantRecipeHparam) -> float: """Sync the score across the distributed process group.""" + if not hparam.quant_modules or not hasattr(hparam.quant_modules[0], "parallel_state"): + return score _ps = hparam.quant_modules[0].parallel_state score = DistributedProcessGroup.get_dist_syncd_obj( score, [_ps.data_parallel_group, _ps.tensor_parallel_group], sum ) return scoretests/unit/torch/quantization/plugins/test_huggingface.py (1)
163-164: Avoid unconditionalThe
print(search_history, model)intest_autoquantize_huggingfaceadds noisy output to test logs without affecting assertions. Consider removing it or guarding it behind a debug flag if you still need it for local debugging.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
modelopt/torch/opt/hparam.py(2 hunks)modelopt/torch/quantization/algorithms.py(13 hunks)modelopt/torch/quantization/model_quant.py(2 hunks)tests/unit/torch/quantization/plugins/test_huggingface.py(1 hunks)tests/unit/torch/quantization/test_autoquant.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-18T20:15:04.615Z
Learnt from: realAsma
Repo: NVIDIA/TensorRT-Model-Optimizer PR: 332
File: modelopt/torch/quantization/algorithms.py:323-326
Timestamp: 2025-09-18T20:15:04.615Z
Learning: In modelopt/torch/quantization/algorithms.py, the `_is_auto_quantize_module` method requires `isinstance(module, QuantModule)` because some modules like MCore Column/Row Parallel Linear are `QuantModule` but not `QuantLinearConvBase`. The check ensures all quantization-capable modules are included in AutoQuantize search.
Applied to files:
tests/unit/torch/quantization/test_autoquant.pymodelopt/torch/quantization/algorithms.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: linux
🔇 Additional comments (5)
modelopt/torch/opt/hparam.py (1)
252-256: Extensibleattrs+__repr__design looks solidCentralizing the repr fields via
attrsmakes subclass customization straightforward and keepsHparam.__repr__generic; the base implementation is correct given the existing attributes.Also applies to: 259-259
modelopt/torch/quantization/algorithms.py (2)
170-203: QuantRecipeHparam wiring of quant/score modules and importance dict looks correctThe constructor’s handling of
quant_modules/score_modulesand the equal-length validation are sound, and anchoring the “no‑quant” recipe inchoicesmatches the intended search space. The_all_quantizer_choicessnapshot plusactivesetter correctly swap quantizers per recipe, and the importance dict keyed byscore_modulesis consistent with how_estimate_auto_quantize_scoresfills it. The overriddenattrs(prefixing"name") integrates cleanly with the newHparam.__repr__.Also applies to: 231-241, 262-273
888-905: API usage confirmed—no issues remainThe
torch.register_full_backward_hookAPI is the recommended approach for module-level backward hooks in PyTorch 2.6, and the framework guarantees thatgrad_output(andgrad_input) are delivered as tuples. The gradient‑based sensitivity estimation pipeline in your code aligns with these specifications.Also applies to: 916-965
modelopt/torch/quantization/model_quant.py (1)
234-239: Docstring updates correctly describe new grouping APIsThe expanded
auto_quantizedocs and examples now line up withAutoQuantizeSearcher.quant_grouping_rules/score_module_rulessemantics inalgorithms.py, and the TODO about configuration exposure is clear. No issues from a behavior standpoint.Also applies to: 393-418
tests/unit/torch/quantization/test_autoquant.py (1)
93-96: QuantRecipeHparam test update aligns with new APISwitching to
quant_modules=[model_test]matches the updatedQuantRecipeHparamconstructor and keeps the test’s intent (verifying default active recipe and choice set) intact.
60a0f26 to
0275c61
Compare
|
|
||
| self.name = name | ||
|
|
||
| self.quant_modules = quant_modules if quant_modules else [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
self.quant_modules = quant_modules or []
|
|
||
| candidate_stats: dict[str, dict[str, list[float]]] | ||
| best: dict[str, Any] | ||
| custom_support: list[tuple[Callable, Callable, Callable]] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable should be moved down to AutoQuantizeGradientSearcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes good point, let me do that
|
Thanks @realAsma , do you also have documentation how you map quant module to its score module? |
|
@cjluo-nv I do not have user-facing documentation for this. See
|
meenchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the refactoring.
d08a403 to
2d8ad4d
Compare
Still not clear how these modules are linked. I only see these definitions in the code. Could you document in the code where is the logic that determines which quant module link to which score module? |
|
Case 1: quant_module and score_module are same, Example: qkv_proj layers Corresponding score module: Code - see
|
6467ec2 to
0aada4e
Compare
|
@realAsma I think you pushed commits that should belong to follow-up PRs into this PR. |
dff27b8 to
91da6a8
Compare
…antizeGradientSearcher; seperated quant modules and score modules; Added KDLoss based AutoQuantize; Added autoquantize search state save/restore support Signed-off-by: realAsma <[email protected]>
91da6a8 to
5264fc7
Compare
…AutoQuantizeGradientSearcher; seperated quant modules and score modules (NVIDIA#586) ## What does this PR do? **Type of change:** Refator; Minor new feature **Overview:** ? 1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher - Prepares architecture for additional search methods. 2. seperated quant modules and score modules - separate quantization modules from scoring modules, enabling auto-quantization to measure sensitivity at parent layers (e.g., MLP output for MoE experts) rather than individual ops. 3. Also see NVIDIA#592 and NVIDIA#588 ## Testing See unittests; `tests/unit/torch/quantization/test_autoquant.py` and `tests/unit/torch/quantization/plugins/test_huggingface.py` ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Not Required ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for score modules in quantization workflows. * Added optional naming for quantization recipes. * **Bug Fixes** * Improved quantization grouping rules documentation with clearer configuration examples. * **Refactor** * Renamed quantization module parameters for improved clarity. * Enhanced quantization search architecture for better scalability. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: realAsma <[email protected]> Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]> Signed-off-by: inisis <[email protected]>
…AutoQuantizeGradientSearcher; seperated quant modules and score modules (NVIDIA#586) ## What does this PR do? **Type of change:** Refator; Minor new feature **Overview:** ? 1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher - Prepares architecture for additional search methods. 2. seperated quant modules and score modules - separate quantization modules from scoring modules, enabling auto-quantization to measure sensitivity at parent layers (e.g., MLP output for MoE experts) rather than individual ops. 3. Also see NVIDIA#592 and NVIDIA#588 ## Testing See unittests; `tests/unit/torch/quantization/test_autoquant.py` and `tests/unit/torch/quantization/plugins/test_huggingface.py` ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Not Required ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for score modules in quantization workflows. * Added optional naming for quantization recipes. * **Bug Fixes** * Improved quantization grouping rules documentation with clearer configuration examples. * **Refactor** * Renamed quantization module parameters for improved clarity. * Enhanced quantization search architecture for better scalability. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: realAsma <[email protected]> Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]>
@shengliangxu yes this was intended. I wanted to merge these together to avoid the CICD overhead with each pipelines, |
What does this PR do?
Type of change: Refator; Minor new feature
Overview: ?
Testing
See unittests;
tests/unit/torch/quantization/test_autoquant.pyandtests/unit/torch/quantization/plugins/test_huggingface.pyBefore your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.