[Bug]: The new scale_dtype and zp_dtype are not backward compatible with released vLLM

### ⚙️ Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
### Environment Information ###
Operating System: `Linux-6.17.8-arch1-1-x86_64-with-glibc2.42`
Python Version: `3.12.9 (main, Mar 17 2025, 21:01:58) [Clang 20.1.0 ]`
llm-compressor Version: `0.8.2.dev61+ga270f33a`
compressed-tensors Version: `0.12.3a20251114`
transformers Version: `4.56.2`
torch Version: `2.8.0+cu129`
CUDA Devices: `['NVIDIA RTX PRO 6000 Blackwell Workstation Edition', 'NVIDIA RTX PRO 6000 Blackwell Workstation Edition']`
AMD Devices: `None`
```

</details>


### 🐛 Describe the bug

I tried the new model_free_ptq pipeline but the compressed models are now failing to load in vLLM

```
(APIServer pid=1) INFO 11-20 08:27:22 [model.py:630] Resolved architecture: Glm4MoeForCausalLM
(APIServer pid=1) INFO 11-20 08:27:22 [model.py:1728] Using max model len 131072
(APIServer pid=1) INFO 11-20 08:27:22 [scheduler.py:254] Chunked prefill is enabled with max_num_batched_tokens=16384.
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1)     sys.exit(main())
(APIServer pid=1)              ^^^^^^
(APIServer pid=1)   File "/workspace/vllm/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=1)     args.dispatch_function(args)
(APIServer pid=1)   File "/workspace/vllm/vllm/entrypoints/cli/serve.py", line 59, in cmd
(APIServer pid=1)     uvloop.run(run_server(args))
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1)     return __asyncio.run(
(APIServer pid=1)            ^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=1)     return runner.run(main)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1)     return self._loop.run_until_complete(task)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1)     return await main
(APIServer pid=1)            ^^^^^^^^^^
(APIServer pid=1)   File "/workspace/vllm/vllm/entrypoints/openai/api_server.py", line 2006, in run_server
(APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1)   File "/workspace/vllm/vllm/entrypoints/openai/api_server.py", line 2025, in run_server_worker
(APIServer pid=1)     async with build_async_engine_client(
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/workspace/vllm/vllm/entrypoints/openai/api_server.py", line 195, in build_async_engine_client
(APIServer pid=1)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/workspace/vllm/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args
(APIServer pid=1)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=1)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/workspace/vllm/vllm/engine/arg_utils.py", line 1645, in create_engine_config
(APIServer pid=1)     config = VllmConfig(
(APIServer pid=1)              ^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=1)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=1) pydantic_core._pydantic_core.ValidationError: 2 validation errors for VllmConfig
(APIServer pid=1) scale_dtype
(APIServer pid=1)   Extra inputs are not permitted [type=extra_forbidden, input_value=None, input_type=NoneType]
(APIServer pid=1)     For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
(APIServer pid=1) zp_dtype
(APIServer pid=1)   Extra inputs are not permitted [type=extra_forbidden, input_value=None, input_type=NoneType]
(APIServer pid=1)     For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
```

This is likely linked to this update of compressed-tensors https://github.com/vllm-project/compressed-tensors/pull/508

I'm not sure if this should be fixed on the LLMCompressor side to ensure the config.json does not include non-backward compatible fields or fixed on vLLM side to ignore unknown field and issue a warning but the current situation will make all new weights quantized by LLM compressor incompatible with vLLM from just a couple months ago. Given upgrade procedures that might be slow (say once every 6 months) in certain companies, that might block them from using weights.

For my own weights the fix is straightforward, I just need to remove the offending fields in config.json
```json
"quantization_config": {
    "config_groups": {
      "config_group_0": {
        "format": "float-quantized",
        "input_activations": {
          "actorder": null,
          "block_structure": null,
          "dynamic": true,
          "group_size": 128,
          "num_bits": 8,
          "observer": null,
          "observer_kwargs": {},
          "scale_dtype": null,
          "strategy": "group",
          "symmetric": true,
          "type": "float",
          "zp_dtype": null
        },
        "output_activations": null,
        "targets": [
          "Linear"
        ],
        "weights": {
          "actorder": null,
          "block_structure": [
            32,
            32
          ],
          "dynamic": false,
          "group_size": null,
          "num_bits": 8,
          "observer": "static_minmax",
          "observer_kwargs": {},
          "scale_dtype": null, <----------
          "strategy": "block",
          "symmetric": true,
          "type": "float",
          "zp_dtype": null <----------
        }
      }
    },
    ...
```

### 🛠️ Steps to reproduce

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: The new scale_dtype and zp_dtype are not backward compatible with released vLLM #2057

⚙️ Your current environment

🐛 Describe the bug

🛠️ Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: The new scale_dtype and zp_dtype are not backward compatible with released vLLM #2057

Description

⚙️ Your current environment

🐛 Describe the bug

🛠️ Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions