[1/N] Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher; seperated quant modules and score modules #586

realAsma · 2025-11-20T18:30:52Z

What does this PR do?

Type of change: Refator; Minor new feature

Overview: ?

Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher - Prepares architecture for additional search methods.
seperated quant modules and score modules - separate quantization modules from scoring modules, enabling auto-quantization to measure sensitivity at parent layers (e.g., MLP output for MoE experts) rather than individual ops.
Also see [2/N] Added KDLoss based AutoQuantize #592 and [3/N] Support for save/restoring AutoQuantize sensitivity scores #588

Testing

See unittests; tests/unit/torch/quantization/test_autoquant.py and tests/unit/torch/quantization/plugins/test_huggingface.py

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: Not Required

Additional Information

Summary by CodeRabbit

New Features
- Added support for score modules in quantization workflows.
- Added optional naming for quantization recipes.
Bug Fixes
- Improved quantization grouping rules documentation with clearer configuration examples.
Refactor
- Renamed quantization module parameters for improved clarity.
- Enhanced quantization search architecture for better scalability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2025-11-20T18:30:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

codecov · 2025-11-20T18:58:37Z

Codecov Report

❌ Patch coverage is 89.26702% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.76%. Comparing base (592a499) to head (5264fc7).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/algorithms.py	89.03%	41 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #586      +/-   ##
==========================================
+ Coverage   74.58%   74.76%   +0.17%     
==========================================
  Files         183      183              
  Lines       18412    18630     +218     
==========================================
+ Hits        13733    13929     +196     
- Misses       4679     4701      +22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

modelopt/torch/quantization/algorithms.py

realAsma · 2025-11-21T16:47:36Z

@coderabbitai review

coderabbitai · 2025-11-21T16:47:50Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-11-21T16:54:21Z

Walkthrough

The changes refactor quantization search architecture by introducing abstract base classes and renaming module parameters from nn_modules to quant_modules and score_modules. A new hyperparameter attrs property abstracts representation attributes, while search logic is restructured into gradient-based and loss-based searcher implementations with distributed synchronization support.

Changes

Cohort / File(s)	Summary
Hyperparameter Abstraction `modelopt/torch/opt/hparam.py`	Introduces new public property `attrs` returning `["choices", "active", "original"]`; updates `__repr__` to use this property instead of hard-coded list.
Core Quantization Architecture Refactoring `modelopt/torch/quantization/algorithms.py`	Renames `nn_modules` to `quant_modules`/`score_modules` with validation; refactors `QuantRecipeHparam` with new `name`, `attrs` property; introduces abstract base class `_AutoQuantizeBaseSearcher` with abstract methods; creates concrete `AutoQuantizeGradientSearcher` subclass; adds grouping rules, distributed synchronization methods, and refactored search flow; maintains `AutoQuantizeSearcher` as alias for backward compatibility.
Documentation Updates `modelopt/torch/quantization/model_quant.py`	Updates `auto_quantize` docstring to reference new `quant_grouping_rules` mechanism and provide examples using the updated API; no runtime logic changes.
Test Updates `tests/unit/torch/quantization/test_autoquant.py`, `tests/unit/torch/quantization/plugins/test_huggingface.py`	Updates `QuantRecipeHparam` constructor calls to use `quant_modules` instead of `nn_modules`; adds debug print statement in test execution.

Sequence Diagram

sequenceDiagram
    participant User
    participant AutoQuantizeSearcher as AutoQuantizeGradientSearcher
    participant BaseClass as _AutoQuantizeBaseSearcher
    participant Hparam as QuantRecipeHparam
    
    User->>AutoQuantizeSearcher: before_search()
    activate AutoQuantizeSearcher
    
    Note over AutoQuantizeSearcher: Apply grouping rules to<br/>quant_modules/score_modules
    AutoQuantizeSearcher->>BaseClass: estimate_sensitivity_scores()
    activate BaseClass
    BaseClass-->>AutoQuantizeSearcher: Scores computed
    deactivate BaseClass
    
    Note over AutoQuantizeSearcher: Insert hparams per group<br/>with associated score_modules
    AutoQuantizeSearcher->>Hparam: Create QuantRecipeHparam<br/>(quant_modules, score_modules)
    activate Hparam
    Hparam-->>AutoQuantizeSearcher: Hparam instance
    deactivate Hparam
    deactivate AutoQuantizeSearcher
    
    User->>AutoQuantizeSearcher: run_search_with_stats()
    activate AutoQuantizeSearcher
    loop For each hparam
        AutoQuantizeSearcher->>AutoQuantizeSearcher: Compute cost/score<br/>across recipes
        AutoQuantizeSearcher->>AutoQuantizeSearcher: get_dist_syncd_score()<br/>get_dist_syncd_cost()
    end
    AutoQuantizeSearcher-->>User: Search results
    deactivate AutoQuantizeSearcher

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

modelopt/torch/quantization/algorithms.py: Core architectural refactoring with new abstract base class hierarchy, abstract method contracts, and distributed synchronization logic. Verify state management consistency across _AutoQuantizeBaseSearcher, AutoQuantizeGradientSearcher, and AutoQuantizeLossSearcher.
Module parameter migration: Validation of quant_modules/score_modules length equivalence and correctness of grouping rule application across refactored search flow.
Backward compatibility: Ensure AutoQuantizeSearcher alias and AutoQuantizeLossSearcher maintain expected behavior and API surface for existing code.
Distributed scoring/cost paths: New get_dist_syncd_score and get_dist_syncd_cost methods should be reviewed for correctness in multi-process scenarios.

Poem

🐰 Hop skip and a quantum leap,
Modules grouped in stacks so deep,
Base classes guide the search with care,
Score and cost sync everywhere!
Old names fade, new patterns bloom,
Quantization dances in the room! 🎯

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: refactoring AutoQuantizeSearcher into a base class and gradient-based subclass, and separating quant and score modules.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch asma/auto_quantize_improvements

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/quantization/algorithms.py (1)
424-475: Bug: mutating quant_recipes causes all groups after a disabled one to be treated as disabled

In insert_hparams_after_merge_rules, this line:
quant_recipes = None if disabled else quant_recipes
rebinds the local quant_recipes variable. Once any group has disabled=True, quant_recipes becomes None, and every subsequent group in the loop will also see quant_recipes is None, even if disabled is False. That will incorrectly create QuantRecipeHparams with only the no‑quant option for all later groups and effectively disable search for them.

Use a per‑group variable instead so the original list is preserved:
-            quant_recipes = None if disabled else quant_recipes
-            hparam = QuantRecipeHparam(
-                quant_recipes,
+            group_quant_recipes = None if disabled else quant_recipes
+            hparam = QuantRecipeHparam(
+                group_quant_recipes,
                 quant_modules=quant_modules,
                 score_modules=score_modules,
                 name=str(group_key),
             )

🧹 Nitpick comments (4)

modelopt/torch/quantization/algorithms.py (3)
350-390: Grouping and score‑module helper utilities are well‑structured; minor wording nit

The _apply_quant_group_rule / _apply_score_group_rule helpers give a clear, extensible contract for regex vs callable rules, and _get_score_module_from_name’s get_submodule fallback to the quant module keeps behavior robust when a score target is missing.

Tiny nit: the warning text in Line 408 reads “Score will estimated…” — consider changing to “Score will be estimated…” for clarity.

Also applies to: 391-412

968-973: Clarify AutoQuantizeLossSearcher status or keep it explicitly abstract

AutoQuantizeLossSearcher subclasses _AutoQuantizeBaseSearcher but doesn’t implement the abstract methods, so it remains abstract and cannot be instantiated even though its docstring suggests it is a usable searcher.

Consider either:

Implementing the loss‑based methods (even as NotImplementedError stubs with clear messaging), or

Marking the class as experimental/placeholder in the docstring and keeping it obviously abstract.

This will avoid user confusion if they attempt to construct it.

579-585: Add defensive guard to get_dist_syncd_score in AutoQuantizeGradientSearcher to prevent crashes with modules lacking parallel_state

The method at line 906-914 directly accesses hparam.quant_modules[0].parallel_state without validation. Test models like _ToyLinearQuant do not initialize parallel_state during _setup(), and the codebase patterns in Megatron plugins use defensive hasattr() checks before accessing this attribute, confirming it is not guaranteed. A simple guard prevents both IndexError (empty quant_modules) and AttributeError (missing attribute):
 def get_dist_syncd_score(self, score: float, hparam: QuantRecipeHparam) -> float:
     """Sync the score across the distributed process group."""
+    if not hparam.quant_modules or not hasattr(hparam.quant_modules[0], "parallel_state"):
+        return score
     _ps = hparam.quant_modules[0].parallel_state
     score = DistributedProcessGroup.get_dist_syncd_obj(
         score, [_ps.data_parallel_group, _ps.tensor_parallel_group], sum
     )
     return score
tests/unit/torch/quantization/plugins/test_huggingface.py (1)

163-164: Avoid unconditional print in test to reduce noise

The print(search_history, model) in test_autoquantize_huggingface adds noisy output to test logs without affecting assertions. Consider removing it or guarding it behind a debug flag if you still need it for local debugging.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38550b0 and a310038.

📒 Files selected for processing (5)

modelopt/torch/opt/hparam.py (2 hunks)
modelopt/torch/quantization/algorithms.py (13 hunks)
modelopt/torch/quantization/model_quant.py (2 hunks)
tests/unit/torch/quantization/plugins/test_huggingface.py (1 hunks)
tests/unit/torch/quantization/test_autoquant.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-18T20:15:04.615Z

Learnt from: realAsma
Repo: NVIDIA/TensorRT-Model-Optimizer PR: 332
File: modelopt/torch/quantization/algorithms.py:323-326
Timestamp: 2025-09-18T20:15:04.615Z
Learning: In modelopt/torch/quantization/algorithms.py, the `_is_auto_quantize_module` method requires `isinstance(module, QuantModule)` because some modules like MCore Column/Row Parallel Linear are `QuantModule` but not `QuantLinearConvBase`. The check ensures all quantization-capable modules are included in AutoQuantize search.

Applied to files:

tests/unit/torch/quantization/test_autoquant.py
modelopt/torch/quantization/algorithms.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: linux

🔇 Additional comments (5)

modelopt/torch/opt/hparam.py (1)

252-256: Extensible attrs + __repr__ design looks solid

Centralizing the repr fields via attrs makes subclass customization straightforward and keeps Hparam.__repr__ generic; the base implementation is correct given the existing attributes.

Also applies to: 259-259

modelopt/torch/quantization/algorithms.py (2)

170-203: QuantRecipeHparam wiring of quant/score modules and importance dict looks correct

The constructor’s handling of quant_modules/score_modules and the equal-length validation are sound, and anchoring the “no‑quant” recipe in choices matches the intended search space. The _all_quantizer_choices snapshot plus active setter correctly swap quantizers per recipe, and the importance dict keyed by score_modules is consistent with how _estimate_auto_quantize_scores fills it. The overridden attrs (prefixing "name") integrates cleanly with the new Hparam.__repr__.

Also applies to: 231-241, 262-273

888-905: API usage confirmed—no issues remain

The torch.register_full_backward_hook API is the recommended approach for module-level backward hooks in PyTorch 2.6, and the framework guarantees that grad_output (and grad_input) are delivered as tuples. The gradient‑based sensitivity estimation pipeline in your code aligns with these specifications.

Also applies to: 916-965

modelopt/torch/quantization/model_quant.py (1)

234-239: Docstring updates correctly describe new grouping APIs

The expanded auto_quantize docs and examples now line up with AutoQuantizeSearcher.quant_grouping_rules / score_module_rules semantics in algorithms.py, and the TODO about configuration exposure is clear. No issues from a behavior standpoint.

Also applies to: 393-418

tests/unit/torch/quantization/test_autoquant.py (1)

93-96: QuantRecipeHparam test update aligns with new API

Switching to quant_modules=[model_test] matches the updated QuantRecipeHparam constructor and keeps the test’s intent (verifying default active recipe and choice set) intact.

modelopt/torch/quantization/algorithms.py

shengliangxu · 2025-11-22T00:35:39Z

modelopt/torch/quantization/algorithms.py

+
+        self.name = name
+
+        self.quant_modules = quant_modules if quant_modules else []


nit:

self.quant_modules = quant_modules or []

modelopt/torch/quantization/algorithms.py

shengliangxu · 2025-11-24T22:32:16Z

modelopt/torch/quantization/algorithms.py


    candidate_stats: dict[str, dict[str, list[float]]]
    best: dict[str, Any]
    custom_support: list[tuple[Callable, Callable, Callable]] = []


This variable should be moved down to AutoQuantizeGradientSearcher.

yes good point, let me do that

modelopt/torch/quantization/algorithms.py

cjluo-nv · 2025-11-25T06:23:13Z

Thanks @realAsma , do you also have documentation how you map quant module to its score module?

realAsma · 2025-11-25T17:44:19Z

#586 (comment)

@cjluo-nv I do not have user-facing documentation for this. See

TensorRT-Model-Optimizer/modelopt/torch/quantization/algorithms.py

Line 278 in d210561

    
           # This searcher finds optimal per-layer quantization by searching across quantization formats

and

TensorRT-Model-Optimizer/modelopt/torch/quantization/algorithms.py

Line 667 in d210561

    
           """A searcher for AutoQuantize algorithm that uses gradient based score estimation.

meenchen

LGTM, thanks for the refactoring.

cjluo-nv · 2025-11-25T23:07:35Z

#586 (comment)

@cjluo-nv I do not have user-facing documentation for this. See

TensorRT-Model-Optimizer/modelopt/torch/quantization/algorithms.py

Line 278 in d210561

# This searcher finds optimal per-layer quantization by searching across quantization formats

and

TensorRT-Model-Optimizer/modelopt/torch/quantization/algorithms.py

Line 667 in d210561

"""A searcher for AutoQuantize algorithm that uses gradient based score estimation.

Still not clear how these modules are linked.

I only see these definitions in the code.

quant_grouping_rules = [
    r”^(.*?)\.(q_proj|k_proj|v_proj)$“,  # q_proj, k_proj, v_proj for llama like models
    # gate_proj, up_proj, down_proj for Qwen3 like MoE models
    r”^(.*?\.mlp\.experts)\.\d+\.(gate_proj|up_proj|down_proj)$“,
    r”^(.*?)\.(gate_proj|up_proj)$“,  # gate_proj, up_proj for llama like models
    r”^(.*?)\.(\d+\.(w1|w2|w3))$“,  # mixtral experts
    r”^(.*?)\.((w1_linear|w2_linear|w3_linear)\.\d+)$“,  # dbrx experts
]
score_module_rules = [
    # Use MLP layer output for gate_proj, up_proj, down_proj for Qwen3 like MoE models (local and shared experts)
    r”^(.*?\.mlp\.experts)\.\d+\.(gate_proj|up_proj|down_proj)$“,
    r”^(.*?)\.(\d+\.(w1|w2|w3))$“,  # mixtral experts
    r”^(.*?)\.((w1_linear|w2_linear|w3_linear)\.\d+)$“,  # dbrx experts
] (edited)

Could you document in the code where is the logic that determines which quant module link to which score module?

realAsma · 2025-11-25T23:11:14Z

#586 (comment)

Case 1: quant_module and score_module are same, Example: qkv_proj layers
Case 2: quant_module and score_module are different. For MoE Layers:
Quant modules:

[experts.0.up_proj, experts.1.up_proj, experts.2.up_proj, ... ]

Corresponding score module:

[experts] # Only one layer

Code - see

TensorRT-Model-Optimizer/modelopt/torch/quantization/algorithms.py

Line 489 in 6ab013e

for rule in self.quant_grouping_rules:

and

TensorRT-Model-Optimizer/modelopt/torch/quantization/algorithms.py

Line 497 in 6ab013e

score_module_name = name # Default: score from same module

shengliangxu · 2025-11-26T02:08:07Z

@realAsma I think you pushed commits that should belong to follow-up PRs into this PR.

…antizeGradientSearcher; seperated quant modules and score modules; Added KDLoss based AutoQuantize; Added autoquantize search state save/restore support Signed-off-by: realAsma <[email protected]>

…AutoQuantizeGradientSearcher; seperated quant modules and score modules (NVIDIA#586) ## What does this PR do? **Type of change:** Refator; Minor new feature **Overview:** ? 1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher - Prepares architecture for additional search methods. 2. seperated quant modules and score modules - separate quantization modules from scoring modules, enabling auto-quantization to measure sensitivity at parent layers (e.g., MLP output for MoE experts) rather than individual ops. 3. Also see NVIDIA#592 and NVIDIA#588 ## Testing See unittests; `tests/unit/torch/quantization/test_autoquant.py` and `tests/unit/torch/quantization/plugins/test_huggingface.py` ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Not Required ## Additional Information   ## Summary by CodeRabbit * **New Features** * Added support for score modules in quantization workflows. * Added optional naming for quantization recipes. * **Bug Fixes** * Improved quantization grouping rules documentation with clearer configuration examples. * **Refactor** * Renamed quantization module parameters for improved clarity. * Enhanced quantization search architecture for better scalability. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  --------- Signed-off-by: realAsma <[email protected]> Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]> Signed-off-by: inisis <[email protected]>

…AutoQuantizeGradientSearcher; seperated quant modules and score modules (NVIDIA#586) ## What does this PR do? **Type of change:** Refator; Minor new feature **Overview:** ? 1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher - Prepares architecture for additional search methods. 2. seperated quant modules and score modules - separate quantization modules from scoring modules, enabling auto-quantization to measure sensitivity at parent layers (e.g., MLP output for MoE experts) rather than individual ops. 3. Also see NVIDIA#592 and NVIDIA#588 ## Testing See unittests; `tests/unit/torch/quantization/test_autoquant.py` and `tests/unit/torch/quantization/plugins/test_huggingface.py` ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Not Required ## Additional Information   ## Summary by CodeRabbit * **New Features** * Added support for score modules in quantization workflows. * Added optional naming for quantization recipes. * **Bug Fixes** * Improved quantization grouping rules documentation with clearer configuration examples. * **Refactor** * Renamed quantization module parameters for improved clarity. * Enhanced quantization search architecture for better scalability. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  --------- Signed-off-by: realAsma <[email protected]> Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]>

realAsma · 2025-11-26T15:00:12Z

@realAsma I think you pushed commits that should belong to follow-up PRs into this PR.

@shengliangxu yes this was intended. I wanted to merge these together to avoid the CICD overhead with each pipelines,

realAsma requested review from a team as code owners November 20, 2025 18:30

realAsma requested review from kevalmorabia97 and sychen52 November 20, 2025 18:30

realAsma force-pushed the asma/auto_quantize_improvements branch 3 times, most recently from bbb5a23 to 17e0c3f Compare November 20, 2025 18:45

realAsma requested a review from a team as a code owner November 20, 2025 22:35

realAsma requested a review from meenchen November 20, 2025 22:35

realAsma changed the title ~~[1/N] Refactored Auto Quantize - seperated quant_grouping and scoring_scoring~~ [1/N] Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher; seperated quant modules and score modules Nov 20, 2025

realAsma requested a review from ajrasane November 20, 2025 22:54

realAsma force-pushed the asma/auto_quantize_improvements branch 2 times, most recently from 9ebd69f to b7bd107 Compare November 21, 2025 00:21

cjluo-nv reviewed Nov 21, 2025

View reviewed changes

modelopt/torch/quantization/algorithms.py Show resolved Hide resolved

cjluo-nv reviewed Nov 21, 2025

View reviewed changes

modelopt/torch/quantization/algorithms.py Outdated Show resolved Hide resolved

cjluo-nv reviewed Nov 21, 2025

View reviewed changes

modelopt/torch/quantization/algorithms.py Outdated Show resolved Hide resolved

cjluo-nv reviewed Nov 21, 2025

View reviewed changes

modelopt/torch/quantization/algorithms.py Outdated Show resolved Hide resolved

realAsma force-pushed the asma/auto_quantize_improvements branch 2 times, most recently from e38a551 to a310038 Compare November 21, 2025 16:44

realAsma requested review from cjluo-nv and shengliangxu November 21, 2025 16:48

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

realAsma force-pushed the asma/auto_quantize_improvements branch 2 times, most recently from 60a0f26 to 0275c61 Compare November 21, 2025 17:56

realAsma commented Nov 21, 2025

View reviewed changes

modelopt/torch/quantization/algorithms.py Show resolved Hide resolved

shengliangxu reviewed Nov 25, 2025

View reviewed changes

meenchen approved these changes Nov 25, 2025

View reviewed changes

realAsma requested review from shengliangxu November 25, 2025 19:40

realAsma force-pushed the asma/auto_quantize_improvements branch 3 times, most recently from d08a403 to 2d8ad4d Compare November 25, 2025 21:44

realAsma requested a review from a team November 25, 2025 22:21

shengliangxu approved these changes Nov 25, 2025

View reviewed changes

realAsma force-pushed the asma/auto_quantize_improvements branch 2 times, most recently from 6467ec2 to 0aada4e Compare November 25, 2025 23:45

mxinO approved these changes Nov 26, 2025

View reviewed changes

realAsma enabled auto-merge (squash) November 26, 2025 03:27

realAsma force-pushed the asma/auto_quantize_improvements branch 4 times, most recently from dff27b8 to 91da6a8 Compare November 26, 2025 03:36

Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQu…

5264fc7

…antizeGradientSearcher; seperated quant modules and score modules; Added KDLoss based AutoQuantize; Added autoquantize search state save/restore support Signed-off-by: realAsma <[email protected]>

realAsma force-pushed the asma/auto_quantize_improvements branch from 91da6a8 to 5264fc7 Compare November 26, 2025 03:53

realAsma merged commit 768ee6a into main Nov 26, 2025
27 checks passed

realAsma deleted the asma/auto_quantize_improvements branch November 26, 2025 05:23


		self.name = name

		self.quant_modules = quant_modules if quant_modules else []

[1/N] Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher; seperated quant modules and score modules #586

[1/N] Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher; seperated quant modules and score modules #586

Uh oh!

Conversation

realAsma commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Nov 20, 2025

Uh oh!

codecov bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

realAsma commented Nov 21, 2025

Uh oh!

coderabbitai bot commented Nov 21, 2025

Uh oh!

coderabbitai bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shengliangxu Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shengliangxu Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

realAsma Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cjluo-nv commented Nov 25, 2025

Uh oh!

realAsma commented Nov 25, 2025

Uh oh!

meenchen left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv commented Nov 25, 2025

Uh oh!

realAsma commented Nov 25, 2025

Uh oh!

shengliangxu commented Nov 26, 2025

Uh oh!

Uh oh!

realAsma commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

realAsma commented Nov 20, 2025 •

edited

Loading

codecov bot commented Nov 20, 2025 •

edited

Loading

coderabbitai bot commented Nov 21, 2025 •

edited

Loading