Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

@WoosukKwon WoosukKwon commented Mar 30, 2023

This PR implements a custom CUDA kernel for rotary embedding, which is used in LLaMA. The kernel is responsible for the entire process of applying rotary embedding to query and key, and is thus much more efficient than the PyTorch implementation.

Tested models:

  • LLaMA-7B
  • LLaMA-13B

Tested GPUs:

  • A100

@WoosukKwon WoosukKwon requested a review from zhuohan123 March 30, 2023 10:29
@WoosukKwon WoosukKwon changed the title Add custom kernel for rotary embedding Implement custom kernel for LLaMA rotary embedding Mar 30, 2023
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@WoosukKwon WoosukKwon merged commit 88c0268 into main Mar 30, 2023
@WoosukKwon WoosukKwon deleted the rotary-embedding branch March 30, 2023 18:04
bigPYJ1151 added a commit to bigPYJ1151/vllm that referenced this pull request Sep 12, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
luo-cheng2021 pushed a commit to luo-cheng2021/vllm that referenced this pull request Mar 25, 2024
mzusman pushed a commit to mzusman/vllm that referenced this pull request May 6, 2024
* remove JambaConfig and use official one from transformers

* changes in Jamba modeling file to align with official HF format
fxmarty pushed a commit to fxmarty/vllm-public that referenced this pull request May 31, 2024
enable fused topK_softmax kernel for hip path
ykim362 pushed a commit to ykim362/vllm that referenced this pull request Jun 17, 2024
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
Summary:
  Add benchmarking scripts and utils.
 Things to note : 
   - All files are stored in `neuralmagic` folder.
- neuralmagic/benchmarks/scripts/* : Actual benchmarking scripts that
interact with vllm engine.
- neuralmagic/benchmarks/configs/* : JSON config files that define what
benchmark commands to run.
- neuralmagic/benchmarks/run_*.py : Scripts that consume some config
file and run the benchmark scripts.
   - neuralmagic/tools : Add tools 

Testing:
Local testing

---------

Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: rsnm2 <[email protected]>
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
wuhuikx pushed a commit to wuhuikx/vllm that referenced this pull request Mar 27, 2025
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Sep 29, 2025
ISSUE: The USE_CUTLASS_MOE environment variable support (CLAUDE.md entry vllm-project#14)
was lost during a previous merge, removing critical debugging/compatibility control.

ROOT CAUSE: Upstream changes overwrote the Mantle modification that added
environment variable control for CUTLASS MoE implementations.

SOLUTION: Restored the missing environment variable logic:
- Added `import os` to imports
- Restored `default_use_cutlass` calculation with original conditions
- Restored `USE_CUTLASS_MOE` environment variable with smart defaults:
  * USE_CUTLASS_MOE=1 forces CUTLASS MoE on (default when conditions met)
  * USE_CUTLASS_MOE=0 disables CUTLASS MoE, fallback to other implementations
- Maintains backward compatibility with automatic detection

CODE CHANGES:
- File: `vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py`
- Lines: 5 (import), 547-556 (environment variable logic)
- Annotation: Added comprehensive Mantle modification comments for future merge guidance

TESTING: Verified import functionality and environment variable integration.

This fix enables debugging and compatibility control for CUTLASS MoE implementations
as documented in CLAUDE.md registry entry vllm-project#14.

Signed-off-by: Pradyun Ramadorai <[email protected]>
heheda12345 pushed a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants