Skip to content

Conversation

@realAsma
Copy link
Contributor

@realAsma realAsma commented Nov 20, 2025

What does this PR do?

Type of change: Refator; Minor new feature

Overview: ?

  1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher - Prepares architecture for additional search methods.
  2. seperated quant modules and score modules - separate quantization modules from scoring modules, enabling auto-quantization to measure sensitivity at parent layers (e.g., MLP output for MoE experts) rather than individual ops.
  3. Also see [2/N] Added KDLoss based AutoQuantize #592 and [3/N] Support for save/restoring AutoQuantize sensitivity scores #588

Testing

See unittests; tests/unit/torch/quantization/test_autoquant.py and tests/unit/torch/quantization/plugins/test_huggingface.py

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: Not Required

Additional Information

Summary by CodeRabbit

  • New Features

    • Added support for score modules in quantization workflows.
    • Added optional naming for quantization recipes.
  • Bug Fixes

    • Improved quantization grouping rules documentation with clearer configuration examples.
  • Refactor

    • Renamed quantization module parameters for improved clarity.
    • Enhanced quantization search architecture for better scalability.

✏️ Tip: You can customize this high-level summary in your review settings.

@realAsma realAsma requested review from a team as code owners November 20, 2025 18:30
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@realAsma realAsma force-pushed the asma/auto_quantize_improvements branch 3 times, most recently from bbb5a23 to 17e0c3f Compare November 20, 2025 18:45
@codecov
Copy link

codecov bot commented Nov 20, 2025

Codecov Report

❌ Patch coverage is 89.26702% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.76%. Comparing base (592a499) to head (5264fc7).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/algorithms.py 89.03% 41 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #586      +/-   ##
==========================================
+ Coverage   74.58%   74.76%   +0.17%     
==========================================
  Files         183      183              
  Lines       18412    18630     +218     
==========================================
+ Hits        13733    13929     +196     
- Misses       4679     4701      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@realAsma realAsma requested a review from a team as a code owner November 20, 2025 22:35
@realAsma realAsma requested a review from meenchen November 20, 2025 22:35
@realAsma realAsma changed the title [1/N] Refactored Auto Quantize - seperated quant_grouping and scoring_scoring [1/N] Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher; seperated quant modules and score modules Nov 20, 2025
@realAsma realAsma requested a review from ajrasane November 20, 2025 22:54
@realAsma realAsma force-pushed the asma/auto_quantize_improvements branch 2 times, most recently from 9ebd69f to b7bd107 Compare November 21, 2025 00:21
@realAsma realAsma force-pushed the asma/auto_quantize_improvements branch 2 times, most recently from e38a551 to a310038 Compare November 21, 2025 16:44
@realAsma
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

The changes refactor quantization search architecture by introducing abstract base classes and renaming module parameters from nn_modules to quant_modules and score_modules. A new hyperparameter attrs property abstracts representation attributes, while search logic is restructured into gradient-based and loss-based searcher implementations with distributed synchronization support.

Changes

Cohort / File(s) Summary
Hyperparameter Abstraction
modelopt/torch/opt/hparam.py
Introduces new public property attrs returning ["choices", "active", "original"]; updates __repr__ to use this property instead of hard-coded list.
Core Quantization Architecture Refactoring
modelopt/torch/quantization/algorithms.py
Renames nn_modules to quant_modules/score_modules with validation; refactors QuantRecipeHparam with new name, attrs property; introduces abstract base class _AutoQuantizeBaseSearcher with abstract methods; creates concrete AutoQuantizeGradientSearcher subclass; adds grouping rules, distributed synchronization methods, and refactored search flow; maintains AutoQuantizeSearcher as alias for backward compatibility.
Documentation Updates
modelopt/torch/quantization/model_quant.py
Updates auto_quantize docstring to reference new quant_grouping_rules mechanism and provide examples using the updated API; no runtime logic changes.
Test Updates
tests/unit/torch/quantization/test_autoquant.py, tests/unit/torch/quantization/plugins/test_huggingface.py
Updates QuantRecipeHparam constructor calls to use quant_modules instead of nn_modules; adds debug print statement in test execution.

Sequence Diagram

sequenceDiagram
    participant User
    participant AutoQuantizeSearcher as AutoQuantizeGradientSearcher
    participant BaseClass as _AutoQuantizeBaseSearcher
    participant Hparam as QuantRecipeHparam
    
    User->>AutoQuantizeSearcher: before_search()
    activate AutoQuantizeSearcher
    
    Note over AutoQuantizeSearcher: Apply grouping rules to<br/>quant_modules/score_modules
    AutoQuantizeSearcher->>BaseClass: estimate_sensitivity_scores()
    activate BaseClass
    BaseClass-->>AutoQuantizeSearcher: Scores computed
    deactivate BaseClass
    
    Note over AutoQuantizeSearcher: Insert hparams per group<br/>with associated score_modules
    AutoQuantizeSearcher->>Hparam: Create QuantRecipeHparam<br/>(quant_modules, score_modules)
    activate Hparam
    Hparam-->>AutoQuantizeSearcher: Hparam instance
    deactivate Hparam
    deactivate AutoQuantizeSearcher
    
    User->>AutoQuantizeSearcher: run_search_with_stats()
    activate AutoQuantizeSearcher
    loop For each hparam
        AutoQuantizeSearcher->>AutoQuantizeSearcher: Compute cost/score<br/>across recipes
        AutoQuantizeSearcher->>AutoQuantizeSearcher: get_dist_syncd_score()<br/>get_dist_syncd_cost()
    end
    AutoQuantizeSearcher-->>User: Search results
    deactivate AutoQuantizeSearcher
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • modelopt/torch/quantization/algorithms.py: Core architectural refactoring with new abstract base class hierarchy, abstract method contracts, and distributed synchronization logic. Verify state management consistency across _AutoQuantizeBaseSearcher, AutoQuantizeGradientSearcher, and AutoQuantizeLossSearcher.
  • Module parameter migration: Validation of quant_modules/score_modules length equivalence and correctness of grouping rule application across refactored search flow.
  • Backward compatibility: Ensure AutoQuantizeSearcher alias and AutoQuantizeLossSearcher maintain expected behavior and API surface for existing code.
  • Distributed scoring/cost paths: New get_dist_syncd_score and get_dist_syncd_cost methods should be reviewed for correctness in multi-process scenarios.

Poem

🐰 Hop skip and a quantum leap,
Modules grouped in stacks so deep,
Base classes guide the search with care,
Score and cost sync everywhere!
Old names fade, new patterns bloom,
Quantization dances in the room! 🎯

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: refactoring AutoQuantizeSearcher into a base class and gradient-based subclass, and separating quant and score modules.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch asma/auto_quantize_improvements

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/torch/quantization/algorithms.py (1)

424-475: Bug: mutating quant_recipes causes all groups after a disabled one to be treated as disabled

In insert_hparams_after_merge_rules, this line:

quant_recipes = None if disabled else quant_recipes

rebinds the local quant_recipes variable. Once any group has disabled=True, quant_recipes becomes None, and every subsequent group in the loop will also see quant_recipes is None, even if disabled is False. That will incorrectly create QuantRecipeHparams with only the no‑quant option for all later groups and effectively disable search for them.

Use a per‑group variable instead so the original list is preserved:

-            quant_recipes = None if disabled else quant_recipes
-            hparam = QuantRecipeHparam(
-                quant_recipes,
+            group_quant_recipes = None if disabled else quant_recipes
+            hparam = QuantRecipeHparam(
+                group_quant_recipes,
                 quant_modules=quant_modules,
                 score_modules=score_modules,
                 name=str(group_key),
             )
🧹 Nitpick comments (4)
modelopt/torch/quantization/algorithms.py (3)

350-390: Grouping and score‑module helper utilities are well‑structured; minor wording nit

The _apply_quant_group_rule / _apply_score_group_rule helpers give a clear, extensible contract for regex vs callable rules, and _get_score_module_from_name’s get_submodule fallback to the quant module keeps behavior robust when a score target is missing.

Tiny nit: the warning text in Line 408 reads “Score will estimated…” — consider changing to “Score will be estimated…” for clarity.

Also applies to: 391-412


968-973: Clarify AutoQuantizeLossSearcher status or keep it explicitly abstract

AutoQuantizeLossSearcher subclasses _AutoQuantizeBaseSearcher but doesn’t implement the abstract methods, so it remains abstract and cannot be instantiated even though its docstring suggests it is a usable searcher.

Consider either:

  • Implementing the loss‑based methods (even as NotImplementedError stubs with clear messaging), or
  • Marking the class as experimental/placeholder in the docstring and keeping it obviously abstract.

This will avoid user confusion if they attempt to construct it.


579-585: Add defensive guard to get_dist_syncd_score in AutoQuantizeGradientSearcher to prevent crashes with modules lacking parallel_state

The method at line 906-914 directly accesses hparam.quant_modules[0].parallel_state without validation. Test models like _ToyLinearQuant do not initialize parallel_state during _setup(), and the codebase patterns in Megatron plugins use defensive hasattr() checks before accessing this attribute, confirming it is not guaranteed. A simple guard prevents both IndexError (empty quant_modules) and AttributeError (missing attribute):

 def get_dist_syncd_score(self, score: float, hparam: QuantRecipeHparam) -> float:
     """Sync the score across the distributed process group."""
+    if not hparam.quant_modules or not hasattr(hparam.quant_modules[0], "parallel_state"):
+        return score
     _ps = hparam.quant_modules[0].parallel_state
     score = DistributedProcessGroup.get_dist_syncd_obj(
         score, [_ps.data_parallel_group, _ps.tensor_parallel_group], sum
     )
     return score
tests/unit/torch/quantization/plugins/test_huggingface.py (1)

163-164: Avoid unconditional print in test to reduce noise

The print(search_history, model) in test_autoquantize_huggingface adds noisy output to test logs without affecting assertions. Consider removing it or guarding it behind a debug flag if you still need it for local debugging.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38550b0 and a310038.

📒 Files selected for processing (5)
  • modelopt/torch/opt/hparam.py (2 hunks)
  • modelopt/torch/quantization/algorithms.py (13 hunks)
  • modelopt/torch/quantization/model_quant.py (2 hunks)
  • tests/unit/torch/quantization/plugins/test_huggingface.py (1 hunks)
  • tests/unit/torch/quantization/test_autoquant.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-18T20:15:04.615Z
Learnt from: realAsma
Repo: NVIDIA/TensorRT-Model-Optimizer PR: 332
File: modelopt/torch/quantization/algorithms.py:323-326
Timestamp: 2025-09-18T20:15:04.615Z
Learning: In modelopt/torch/quantization/algorithms.py, the `_is_auto_quantize_module` method requires `isinstance(module, QuantModule)` because some modules like MCore Column/Row Parallel Linear are `QuantModule` but not `QuantLinearConvBase`. The check ensures all quantization-capable modules are included in AutoQuantize search.

Applied to files:

  • tests/unit/torch/quantization/test_autoquant.py
  • modelopt/torch/quantization/algorithms.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: linux
🔇 Additional comments (5)
modelopt/torch/opt/hparam.py (1)

252-256: Extensible attrs + __repr__ design looks solid

Centralizing the repr fields via attrs makes subclass customization straightforward and keeps Hparam.__repr__ generic; the base implementation is correct given the existing attributes.

Also applies to: 259-259

modelopt/torch/quantization/algorithms.py (2)

170-203: QuantRecipeHparam wiring of quant/score modules and importance dict looks correct

The constructor’s handling of quant_modules/score_modules and the equal-length validation are sound, and anchoring the “no‑quant” recipe in choices matches the intended search space. The _all_quantizer_choices snapshot plus active setter correctly swap quantizers per recipe, and the importance dict keyed by score_modules is consistent with how _estimate_auto_quantize_scores fills it. The overridden attrs (prefixing "name") integrates cleanly with the new Hparam.__repr__.

Also applies to: 231-241, 262-273


888-905: API usage confirmed—no issues remain

The torch.register_full_backward_hook API is the recommended approach for module-level backward hooks in PyTorch 2.6, and the framework guarantees that grad_output (and grad_input) are delivered as tuples. The gradient‑based sensitivity estimation pipeline in your code aligns with these specifications.

Also applies to: 916-965

modelopt/torch/quantization/model_quant.py (1)

234-239: Docstring updates correctly describe new grouping APIs

The expanded auto_quantize docs and examples now line up with AutoQuantizeSearcher.quant_grouping_rules / score_module_rules semantics in algorithms.py, and the TODO about configuration exposure is clear. No issues from a behavior standpoint.

Also applies to: 393-418

tests/unit/torch/quantization/test_autoquant.py (1)

93-96: QuantRecipeHparam test update aligns with new API

Switching to quant_modules=[model_test] matches the updated QuantRecipeHparam constructor and keeps the test’s intent (verifying default active recipe and choice set) intact.

@realAsma realAsma force-pushed the asma/auto_quantize_improvements branch 2 times, most recently from 60a0f26 to 0275c61 Compare November 21, 2025 17:56

self.name = name

self.quant_modules = quant_modules if quant_modules else []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

self.quant_modules = quant_modules or []


candidate_stats: dict[str, dict[str, list[float]]]
best: dict[str, Any]
custom_support: list[tuple[Callable, Callable, Callable]] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable should be moved down to AutoQuantizeGradientSearcher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good point, let me do that

@cjluo-nv
Copy link
Collaborator

Thanks @realAsma , do you also have documentation how you map quant module to its score module?

@realAsma
Copy link
Contributor Author

#586 (comment)

@cjluo-nv I do not have user-facing documentation for this. See

# This searcher finds optimal per-layer quantization by searching across quantization formats
and
"""A searcher for AutoQuantize algorithm that uses gradient based score estimation.

Copy link
Contributor

@meenchen meenchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the refactoring.

@realAsma realAsma force-pushed the asma/auto_quantize_improvements branch 3 times, most recently from d08a403 to 2d8ad4d Compare November 25, 2025 21:44
@realAsma realAsma requested a review from a team November 25, 2025 22:21
@cjluo-nv
Copy link
Collaborator

#586 (comment)

@cjluo-nv I do not have user-facing documentation for this. See

# This searcher finds optimal per-layer quantization by searching across quantization formats

and

"""A searcher for AutoQuantize algorithm that uses gradient based score estimation.

Still not clear how these modules are linked.

I only see these definitions in the code.

quant_grouping_rules = [
    r”^(.*?)\.(q_proj|k_proj|v_proj)$“,  # q_proj, k_proj, v_proj for llama like models
    # gate_proj, up_proj, down_proj for Qwen3 like MoE models
    r”^(.*?\.mlp\.experts)\.\d+\.(gate_proj|up_proj|down_proj)$“,
    r”^(.*?)\.(gate_proj|up_proj)$“,  # gate_proj, up_proj for llama like models
    r”^(.*?)\.(\d+\.(w1|w2|w3))$“,  # mixtral experts
    r”^(.*?)\.((w1_linear|w2_linear|w3_linear)\.\d+)$“,  # dbrx experts
]
score_module_rules = [
    # Use MLP layer output for gate_proj, up_proj, down_proj for Qwen3 like MoE models (local and shared experts)
    r”^(.*?\.mlp\.experts)\.\d+\.(gate_proj|up_proj|down_proj)$“,
    r”^(.*?)\.(\d+\.(w1|w2|w3))$“,  # mixtral experts
    r”^(.*?)\.((w1_linear|w2_linear|w3_linear)\.\d+)$“,  # dbrx experts
] (edited) 

Could you document in the code where is the logic that determines which quant module link to which score module?

@realAsma
Copy link
Contributor Author

#586 (comment)

Case 1: quant_module and score_module are same, Example: qkv_proj layers
Case 2: quant_module and score_module are different. For MoE Layers:
Quant modules:

[experts.0.up_proj, experts.1.up_proj, experts.2.up_proj, ... ]

Corresponding score module:

[experts] # Only one layer

Code - see

for rule in self.quant_grouping_rules:
and
score_module_name = name # Default: score from same module

@realAsma realAsma force-pushed the asma/auto_quantize_improvements branch 2 times, most recently from 6467ec2 to 0aada4e Compare November 25, 2025 23:45
@shengliangxu
Copy link
Contributor

@realAsma I think you pushed commits that should belong to follow-up PRs into this PR.

@realAsma realAsma enabled auto-merge (squash) November 26, 2025 03:27
@realAsma realAsma force-pushed the asma/auto_quantize_improvements branch 4 times, most recently from dff27b8 to 91da6a8 Compare November 26, 2025 03:36
…antizeGradientSearcher; seperated quant modules and score modules; Added KDLoss based AutoQuantize; Added autoquantize search state save/restore support

Signed-off-by: realAsma <[email protected]>
@realAsma realAsma force-pushed the asma/auto_quantize_improvements branch from 91da6a8 to 5264fc7 Compare November 26, 2025 03:53
@realAsma realAsma merged commit 768ee6a into main Nov 26, 2025
27 checks passed
@realAsma realAsma deleted the asma/auto_quantize_improvements branch November 26, 2025 05:23
inisis pushed a commit to inisis/TensorRT-Model-Optimizer that referenced this pull request Nov 26, 2025
…AutoQuantizeGradientSearcher; seperated quant modules and score modules (NVIDIA#586)

## What does this PR do?

**Type of change:**  Refator; Minor new feature

**Overview:** ?

1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher &
AutoQuantizeGradientSearcher - Prepares architecture for additional
search methods.
2. seperated quant modules and score modules - separate quantization
modules from scoring modules, enabling auto-quantization to measure
sensitivity at parent layers (e.g., MLP output for MoE experts) rather
than individual ops.
3. Also see NVIDIA#592
and NVIDIA#588

## Testing
See unittests; `tests/unit/torch/quantization/test_autoquant.py` and
`tests/unit/torch/quantization/plugins/test_huggingface.py`

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: Yes
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Not Required

## Additional Information
<!-- E.g. related issue. -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Added support for score modules in quantization workflows.
  * Added optional naming for quantization recipes.

* **Bug Fixes**
* Improved quantization grouping rules documentation with clearer
configuration examples.

* **Refactor**
  * Renamed quantization module parameters for improved clarity.
  * Enhanced quantization search architecture for better scalability.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: realAsma <[email protected]>
Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]>
Signed-off-by: inisis <[email protected]>
jQizhang pushed a commit to jQizhang/TensorRT-Model-Optimizer that referenced this pull request Nov 26, 2025
…AutoQuantizeGradientSearcher; seperated quant modules and score modules (NVIDIA#586)

## What does this PR do?

**Type of change:**  Refator; Minor new feature

**Overview:** ?

1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher &
AutoQuantizeGradientSearcher - Prepares architecture for additional
search methods.
2. seperated quant modules and score modules - separate quantization
modules from scoring modules, enabling auto-quantization to measure
sensitivity at parent layers (e.g., MLP output for MoE experts) rather
than individual ops.
3. Also see NVIDIA#592
and NVIDIA#588

## Testing
See unittests; `tests/unit/torch/quantization/test_autoquant.py` and
`tests/unit/torch/quantization/plugins/test_huggingface.py`

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: Yes
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Not Required

## Additional Information
<!-- E.g. related issue. -->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Added support for score modules in quantization workflows.
  * Added optional naming for quantization recipes.

* **Bug Fixes**
* Improved quantization grouping rules documentation with clearer
configuration examples.

* **Refactor**
  * Renamed quantization module parameters for improved clarity.
  * Enhanced quantization search architecture for better scalability.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: realAsma <[email protected]>
Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]>
@realAsma
Copy link
Contributor Author

@realAsma I think you pushed commits that should belong to follow-up PRs into this PR.

@shengliangxu yes this was intended. I wanted to merge these together to avoid the CICD overhead with each pipelines,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants