habana_main rebase #71

kzawora-intel · 2024-06-25T10:27:56Z

No description provided.

…llm-project#4971) Co-authored-by: Jianan Gu <[email protected]>

Co-authored-by: Roger Wang <[email protected]>

Co-authored-by: Michael Goin <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: zifeitong <[email protected]> Co-authored-by: Robert Shaw <[email protected]>

Co-authored-by: Philipp Moritz <[email protected]>

…-project#5478)

)

… with `perf-benchmarks` label (vllm-project#5073) Co-authored-by: simon-mo <[email protected]>

…#5516)

Co-authored-by: mgoin <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

…#5401)

…t#5546)

…roject#5512)

…project#5460) Signed-off-by: Thomas Parnell <[email protected]>

…roupPool (vllm-project#6039)

…le (vllm-project#5965) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Joshua Rosenkranz <[email protected]>

…ult) (vllm-project#5602)

Signed-off-by: Xiaowei Jiang <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Roger Wang <[email protected]>

…ase_v3

madamczyk-intel

+64k -19k
🙈 LGTM

This reverts commit 5e1a565.

remove expert_max hard code (#47) vLLM-Ext: Full enabling of ALiBi (#34) Add version inference via setuptools-scm (#58) Revert "vLLM-Ext: Full enabling of ALiBi (#34)" (#59) Remove punica_hpu.py from vllm_hpu_extension (#66) Removed previous (not-pipelined) pa implementation (#72) Add flag to enable running softmax in fp32 (#71) Update calibration readme link (#73) allow lm_head quantization in calibration process (#65) Pad to bmin if value is less (#67) Update pyproject.toml (#75) --------- Co-authored-by: Michał Kuligowski <[email protected]>

* Add support of ngrams speculative decoding on Gaudi Signed-off-by: Bob Zhu <[email protected]> * rename the Gaudi device name to hpu only hpu:x may lead to memory leak according to release note: https://docs.habana.ai/en/latest/Release_Notes/GAUDI_Release_Notes.html Note: current hpu does not support ngrams SD with TP>1 --------- Signed-off-by: Bob Zhu <[email protected]>

bigPYJ1151 and others added 30 commits June 13, 2024 09:33

[Hardware][Intel] Optimize CPU backend and add more performance tips (v…

80aa7e9

…llm-project#4971) Co-authored-by: Jianan Gu <[email protected]>

[Docs] Add 4th meetup slides (vllm-project#5509)

a65634d

[Misc] Add vLLM version getter to utils (vllm-project#5098)

03dccc8

[CI/Build] Simplify OpenAI server setup in tests (vllm-project#5100)

3987347

[Doc] Update LLaVA docs (vllm-project#5437)

0ce7b95

Co-authored-by: Roger Wang <[email protected]>

[Kernel] Factor out epilogues from cutlass kernels (vllm-project#5391)

85657b5

Co-authored-by: Michael Goin <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: zifeitong <[email protected]> Co-authored-by: Robert Shaw <[email protected]>

[MISC] Remove FP8 warning (vllm-project#5472)

30299a4

Co-authored-by: Philipp Moritz <[email protected]>

Seperate dev requirements into lint and test (vllm-project#5474)

a8fda4f

Revert "[Core] Remove unnecessary copies in flash attn backend" (vllm…

6b0511a

…-project#5478)

[misc] fix format.sh (vllm-project#5511)

1696efe

[CI/Build] Disable test_fp8.py (vllm-project#5508)

33e3b37

[Kernel] Disable CUTLASS kernels for fp8 (vllm-project#5505)

e38042d

Add cuda_device_count_stateless (vllm-project#5473)

50eed24

[Hardware][Intel] Support CPU inference with AVX2 ISA (vllm-project#5452

cd9c0d6

)

[Misc] Fix arg names in quantizer script (vllm-project#5507)

55d6361

bump version to v0.5.0.post1 (vllm-project#5522)

0f0d8bc

[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs…

319ad7f

… with `perf-benchmarks` label (vllm-project#5073) Co-authored-by: simon-mo <[email protected]>

[CI/Build] Disable LLaVA-NeXT CPU test (vllm-project#5529)

d47af2b

[Kernel] Fix CUTLASS 3.x custom broadcast load epilogue (vllm-project…

703475f

…#5516)

[Misc] Fix arg names (vllm-project#5524)

d74674b

[ Misc ] Rs/compressed tensors cleanup (vllm-project#5432)

1598568

Co-authored-by: mgoin <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

[Kernel] Suppress mma.sp warning on CUDA 12.5 and later (vllm-project…

348616a

…#5401)

[mis] fix flaky test of test_cuda_device_count_stateless (vllm-projec…

48f589e

…t#5546)

[Core] Remove duplicate processing in async engine (vllm-project#5525)

77490c6

[misc][distributed] fix benign error in is_in_the_same_node (vllm-p…

d1c3d7d

…roject#5512)

[Docs] Add ZhenFund as a Sponsor (vllm-project#5548)

cdab68d

[Doc] Update documentation on Tensorizer (vllm-project#5471)

6e2527a

[Bugfix] Enable loading FP8 checkpoints for gpt_bigcode models (vllm-…

e2afb03

…project#5460) Signed-off-by: Thomas Parnell <[email protected]>

[Bugfix] Fix typo in Pallas backend (vllm-project#5558)

28c145e

[Core][Distributed] improve p2p cache generation (vllm-project#5528)

f5bb85b

Yard1 and others added 14 commits July 1, 2024 20:12

[Bugfix] Use RayActorError for older versions of Ray in RayTokenizerG…

dec6fc6

…roupPool (vllm-project#6039)

[Bugfix] adding chunking mechanism to fused_moe to handle large inputs (

12a5995

vllm-project#6029)

add FAQ doc under 'serving' (vllm-project#5946)

83bdcb6

[Bugfix][Doc] Fix Doc Formatting (vllm-project#6048)

8e0817c

[Bugfix] Add explicit end_forward calls to flashinfer (vllm-project…

c4059ea

…#6044)

[BugFix] Ensure worker model loop is always stopped at the right time (…

c87ebc3

…vllm-project#5987)

[Frontend] Relax api url assertion for openai benchmarking (vllm-proj…

e373853

…ect#6046)

[Model] Changes to MLPSpeculator to support tie_weights and input_sca…

5460070

…le (vllm-project#5965) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Joshua Rosenkranz <[email protected]>

[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 defa…

3476ed0

…ult) (vllm-project#5602)

[Frontend] Add template related params to request (vllm-project#5709)

2c37540

[VLM] Remove image_input_type from VLM config (vllm-project#5852)

98d6682

Signed-off-by: Xiaowei Jiang <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Roger Wang <[email protected]>

Merge remote-tracking branch 'upstream/main' into private/kzawora/reb…

c365082

…ase_v3

[Doc] Reinstate doc dependencies (vllm-project#6061)

31354e5

Merge remote-tracking branch 'upstream/main' into private/kzawora/reb…

aee6daf

…ase_v3

kzawora-intel marked this pull request as ready for review July 2, 2024 13:06

guard model loader wa for hpu

d99d986

madamczyk-intel approved these changes Jul 2, 2024

View reviewed changes

kzawora-intel merged commit 5e1a565 into habana_main Jul 2, 2024

kzawora-intel added a commit that referenced this pull request Jul 2, 2024

Revert "habana_main rebase (#71)"

36b63b5

This reverts commit 5e1a565.

kzawora-intel mentioned this pull request Jul 2, 2024

Revert "habana_main rebase" #80

Merged

kzawora-intel mentioned this pull request Jul 2, 2024

habana_main rebase #81

Merged

kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Sep 20, 2024

kzawora-intel deleted the private/kzawora/rebase_v3 branch October 7, 2024 12:56

mfylcek mentioned this pull request Jan 14, 2025

Set vllm-hpu-extension to 6ac93fb #684

Merged

michalkuligowski mentioned this pull request Jan 15, 2025

Update requirements-hpu.txt #685

Closed

kzawora-intel added the rebase label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

habana_main rebase #71

habana_main rebase #71

Uh oh!

kzawora-intel commented Jun 25, 2024

Uh oh!

madamczyk-intel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

66 participants

habana_main rebase #71

habana_main rebase #71

Uh oh!

Conversation

kzawora-intel commented Jun 25, 2024

Uh oh!

madamczyk-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

66 participants