Skip to content

Conversation

@kzawora-intel
Copy link

No description provided.

bigPYJ1151 and others added 30 commits June 13, 2024 09:33
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: zifeitong <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Yard1 and others added 14 commits July 1, 2024 20:12
Signed-off-by: Xiaowei Jiang <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
@kzawora-intel kzawora-intel marked this pull request as ready for review July 2, 2024 13:06
Copy link

@madamczyk-intel madamczyk-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+64k -19k
🙈 LGTM

@kzawora-intel kzawora-intel merged commit 5e1a565 into habana_main Jul 2, 2024
kzawora-intel added a commit that referenced this pull request Jul 2, 2024
@kzawora-intel kzawora-intel mentioned this pull request Jul 2, 2024
@kzawora-intel kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Sep 20, 2024
@kzawora-intel kzawora-intel deleted the private/kzawora/rebase_v3 branch October 7, 2024 12:56
michalkuligowski added a commit that referenced this pull request Jan 15, 2025
remove expert_max hard code (#47)
vLLM-Ext: Full enabling of ALiBi (#34)
Add version inference via setuptools-scm (#58)
Revert "vLLM-Ext: Full enabling of ALiBi (#34)" (#59)
Remove punica_hpu.py from vllm_hpu_extension (#66)
Removed previous (not-pipelined) pa implementation (#72)
Add flag to enable running softmax in fp32 (#71)
Update calibration readme link (#73)
allow lm_head quantization in calibration process (#65)
Pad to bmin if value is less (#67)
Update pyproject.toml (#75)

---------

Co-authored-by: Michał Kuligowski <[email protected]>
mfylcek added a commit that referenced this pull request Jan 21, 2025
remove expert_max hard code (#47)
vLLM-Ext: Full enabling of ALiBi (#34)
Add version inference via setuptools-scm (#58)
Revert "vLLM-Ext: Full enabling of ALiBi (#34)" (#59)
Remove punica_hpu.py from vllm_hpu_extension (#66)
Removed previous (not-pipelined) pa implementation (#72)
Add flag to enable running softmax in fp32 (#71)
Update calibration readme link (#73)
allow lm_head quantization in calibration process (#65)
Pad to bmin if value is less (#67)
Update pyproject.toml (#75)

---------

Co-authored-by: Michał Kuligowski <[email protected]>
ranzhejiang pushed a commit to ranzhejiang/vllm-fork that referenced this pull request Apr 11, 2025
* Add support of ngrams speculative decoding on Gaudi

Signed-off-by: Bob Zhu <[email protected]>

* rename the Gaudi device name to hpu only

hpu:x may lead to memory leak according to release note:
https://docs.habana.ai/en/latest/Release_Notes/GAUDI_Release_Notes.html

Note: current hpu does not support ngrams SD with TP>1

---------

Signed-off-by: Bob Zhu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

habana Issues or PRs submitted by Habana Labs rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.