Skip to content

Conversation

@Qubitium
Copy link
Collaborator

@Qubitium Qubitium commented Mar 2, 2025

Fix upstream transformers modeling inference code is passing impossible input shape where shape[0]==0 to module

Patch Fixes: #1361

However, I believe this should not happen at all, has huge performance implications as the input would result in a no-op with wasted zero tensor allocations, and fix should be upstream in transformers.

@SunMarc @MekkCyber

Please check tests/models/test_test_qwen_15_moe.py for reproduction without this PR fix. The offending module is gate_proj in the Qwen MoE model.

input = tensor([], device='cuda:0', size=(0, 2048), dtype=torch.float16)
test_qwen_15_moe.py:30: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../gptqmodel/models/base.py:1142: in generate
    return self.model.generate(inputs=inputs, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/torch/utils/_contextlib.py:116: in decorate_context
    return func(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/transformers/generation/utils.py:2223: in generate
    result = self._sample(
/root/miniconda3/lib/python3.12/site-packages/transformers/generation/utils.py:3211: in _sample
    outputs = self(**model_inputs, return_dict=True)
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1739: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1750: in _call_impl
    return forward_call(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/transformers/utils/deprecation.py:172: in wrapped_func
    return func(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py:1317: in forward
    outputs = self.model(
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1739: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1750: in _call_impl
    return forward_call(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py:1017: in forward
    layer_outputs = decoder_layer(
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1739: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1750: in _call_impl
    return forward_call(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py:745: in forward
    hidden_states = self.mlp(hidden_states)
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1739: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1750: in _call_impl
    return forward_call(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py:654: in forward
    current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1739: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1750: in _call_impl
    return forward_call(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py:280: in forward
    return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1739: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py:1750: in _call_impl
    return forward_call(*args, **kwargs)
../../gptqmodel/nn_modules/qlinear/marlin.py:412: in forward
    out = apply_gptq_marlin_linear(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = tensor([], device='cuda:0', size=(0, 2048), dtype=torch.float16)

…le input shape where `shape[0]==0` to module

Signed-off-by: Qubitium <[email protected]>
@Qubitium Qubitium changed the title Fix transformers inference code is passing impossible shape to kernel Fix transformers modeling inference is passing impossible shape to kernel Mar 2, 2025
@Qubitium Qubitium changed the title Fix transformers modeling inference is passing impossible shape to kernel Fix transformers modeling inference is passing impossible shape to nn.module Mar 2, 2025
@Qubitium Qubitium changed the title Fix transformers modeling inference is passing impossible shape to nn.module Fix transformers modeling code passing input.shape[0] == 0 to nn.module Mar 2, 2025
@Qubitium Qubitium merged commit 0a0cfb0 into main Mar 2, 2025
4 checks passed
@Qubitium Qubitium deleted the fix-impossible-input-shape branch March 2, 2025 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] RuntimeError: Invalid MNK = [0, 1408, 2048]

2 participants