forked from opendatahub-io/vllm
-
Couldn't load subscription status.
- Fork 15
nm vllm ent 0.8.5 sync #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…m-project#16801) Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: Luka Govedič <[email protected]>
Signed-off-by: Lu Fang <[email protected]>
…16796) Signed-off-by: Nathan Weinberg <[email protected]>
…ect#16809) Signed-off-by: windsonsea <[email protected]>
Signed-off-by: Jonghyun Choe <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
…llm-project#16829) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]>
…nfig info (vllm-project#16857) Signed-off-by: jmho <[email protected]>
Signed-off-by: omrishiv <[email protected]>
…ect#15130) Signed-off-by: fyabc <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Xiong Wang <[email protected]>
Signed-off-by: Divakar Verma <[email protected]>
…llm-project#16591) Signed-off-by: Jannis Schönleber <[email protected]> Signed-off-by: NickLucche <[email protected]> Co-authored-by: Jannis Schönleber <[email protected]>
Signed-off-by: NickLucche <[email protected]>
…vllm-project#16460) Signed-off-by: vie-serendipity <[email protected]>
… V1 (vllm-project#15477) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]>
Signed-off-by: rzou <[email protected]>
Signed-off-by: Staszek Pasko <[email protected]> Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: rzou <[email protected]>
Signed-off-by: qizixi <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
- remove build steps/dependencies - allow for installing pre-built flash-attention/vllm wheels - default ROCM_VERSION to 6.3.4, allowing ovverride with env vars - cleanup rocm docker bake, defaults - amdsmi: use setup.py to build - add amdsmi bind mount - remove flashinfer from rocm target - bump vllm-tgis-adapter to 0.7.0 - Dockerfile*.ubi: bump ubi base
Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
- remove build steps/dependencies - allow for installing pre-built flash-attention/vllm wheels - default ROCM_VERSION to 6.3.4, allowing ovverride with env vars - cleanup rocm docker bake, defaults - amdsmi: use setup.py to build - add amdsmi bind mount - remove flashinfer from rocm target - bump vllm-tgis-adapter to 0.7.0 - Dockerfile*.ubi: bump ubi base
…-project#17303) Signed-off-by: Harry Mellor <[email protected]>
…vllm-project#17255) Signed-off-by: Harry Mellor <[email protected]>
…rides are ordered (vllm-project#17256) Signed-off-by: Harry Mellor <[email protected]>
…17197) Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Russell Bryant <[email protected]>
…t have shape (metadata_size) (vllm-project#17283) Signed-off-by: Lucas Wilkinson <[email protected]>
…_after_loading`. (vllm-project#16854) Signed-off-by: charlifu <[email protected]>
Signed-off-by: simon-mo <[email protected]>
…#17328) Signed-off-by: mgoin <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
…ct results (vllm-project#17574) Signed-off-by: Lucas Wilkinson <[email protected]>
…client' (vllm-project#17434) Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: mgoin <[email protected]>
…7315) Signed-off-by: Lucia Fang <[email protected]>
Syncing midstream NM fork to Upstream tag of [v0.8.5.post1](https://github.com/vllm-project/vllm/tree/v0.8.5.post1) + cherry pick of vllm-project@be633fb needed for benchmarks + [CP](neuralmagic/nm-vllm-ent@1fe447d) for compressed tensor bump + [CP](vllm-project#17677) for lora on AMD + [CP](vllm-project#17315) for llama4 w/ pure dense layers ``` commit 31c73ba (HEAD -> upstream-v0.8.5, nm-fork/upstream-v0.8.5) Author: Chauncey <[email protected]> Date: Wed Apr 30 15:11:04 2025 +0800 [Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' (vllm-project#17434) Signed-off-by: chaunceyjiang <[email protected]> commit f8db0bd Author: Lucas Wilkinson <[email protected]> Date: Fri May 2 14:01:38 2025 -0400 [BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (vllm-project#17574) Signed-off-by: Lucas Wilkinson <[email protected]> commit e335c34 Author: Robert Shaw <[email protected]> Date: Fri May 2 04:07:03 2025 -0400 [BugFix] Fix Memory Leak (vllm-project#17567) Signed-off-by: [email protected] <[email protected]> commit cc463fe Merge: 1e358ff ba41cc9 Author: Selbi Nuryyeva <[email protected]> Date: Tue Apr 29 12:34:57 2025 -0400 Merge branch 'tag-upstream-v0.8.5' into upstream-v0.8.5 commit ba41cc9 (tag: v0.8.5, tag-upstream-v0.8.5) Author: Michael Goin <[email protected]> Date: Mon Apr 28 16:20:24 2025 -0600 [Model] Add tuned triton fused_moe configs for Qwen3Moe (vllm-project#17328) Signed-off-by: mgoin <[email protected]> commit dcbac4c Author: Simon Mo <[email protected]> Date: Mon Apr 28 14:12:01 2025 -0700 [Model] Qwen3 Dense FP8 Compat Fixes (vllm-project#17318) Signed-off-by: simon-mo <[email protected]> [...] ``` Commands ``` git fetch upstream git checkout -b upstream-v0.8.5 git merge upstream/v0.8.5 git cherry-pick be633fb ``` TEST PLAN accept sync: https://github.com/neuralmagic/nm-cicd/actions/runs/14841223552 related PR in cicd: neuralmagic/nm-cicd#99 release workflow: https://github.com/neuralmagic/nm-cicd/actions/runs/14845693864
This bumps the cuda version in the base layer to 12-8 instead of 12-4. This could break something if during dep install we have to build a dependency from source, as the wheels we bring in later in prepare are now being built against 12.8. FIX #xxxx (*link existing issues this PR will resolve*) <!--- pyml disable-next-line no-emphasis-as-heading --> **BEFORE SUBMITTING, PLEASE READ <https://docs.vllm.ai/en/latest/contributing/overview.html>** (anything written below this line will be removed by GitHub Actions)
notable conflicts were in Dockerfile.rocm.ubi and Dockerfile.ubi Up to date with Upstream v0.8.5.post1 tag and includes CPs for lora, llama4, compressed tensors bump
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SchedulerConfig(Improve configs -SchedulerConfigvllm-project/vllm#16533)max-num-batched-tokensis not a power of 2 ([TPU][V1] Fix exponential padding whenmax-num-batched-tokensis not a power of 2 vllm-project/vllm#16596)pyzmqversion ([BugFix]: Update minimumpyzmqversion vllm-project/vllm#16549)vllm bench [latency, throughput]CLI commands (Addvllm bench [latency, throughput]CLI commands vllm-project/vllm#16508)compressed-tensorsWNA16 to support zero-points ([Misc] Updatecompressed-tensorsWNA16 to support zero-points vllm-project/vllm#14211)backend_xgrammar.py([V1][Structured Output] Move xgrammar related utils tobackend_xgrammar.pyvllm-project/vllm#16578)additional_dependencies: [toml]for pre-commit yapf hook ([CI] Cleanupadditional_dependencies: [toml]for pre-commit yapf hook vllm-project/vllm#16405)TokenizerPoolConfig+DeviceConfig(Improve configs -TokenizerPoolConfig+DeviceConfigvllm-project/vllm#16603)max-num-batched-tokensis not even ([TPU][V1] Fix padding recompilation whenmax-num-batched-tokensis not even vllm-project/vllm#16726)--compilation-config([Doc] Improve help examples for--compilation-configvllm-project/vllm#16729)_validate_structured_output()([V1][Structured Output] Minor modification to_validate_structured_output()vllm-project/vllm#16748)MultiModalConfig+PoolerConfig+DecodingConfigvllm-project/vllm#16789)nullable_kvsfallback (Fixnullable_kvsfallback vllm-project/vllm#16837)v1/audio/transcriptionsendpoint ([Frontend] Add sampling params tov1/audio/transcriptionsendpoint vllm-project/vllm#16591)CacheConfig(Improve configs -CacheConfigvllm-project/vllm#16835)_update_statesfor GPU model runner ([Perf] Optimize_update_statesfor GPU model runner vllm-project/vllm#16910)SpeculativeConfig(Improve configs -SpeculativeConfigvllm-project/vllm#16971)collective_rpctimeout ([BugFix] Remove default multiproc executorcollective_rpctimeout vllm-project/vllm#17000)tests/kernels/based on kernel type (Categorizetests/kernels/based on kernel type vllm-project/vllm#16799)pidpassed tokill_process_treeisintformypy(Ensure thatpidpassed tokill_process_treeisintformypyvllm-project/vllm#17051)CacheConfig.block_sizeshould always beintwhen used (CacheConfig.block_sizeshould always beintwhen used vllm-project/vllm#17052)@propertyand private field fordata_parallel_rank_local(Use@propertyand private field fordata_parallel_rank_localvllm-project/vllm#17053)TokenizerGroup(SimplifyTokenizerGroupvllm-project/vllm#16790)LoRAModelRunnerMixin(Improve static type checking inLoRAModelRunnerMixinvllm-project/vllm#17104)tool-callinggithub label ([CI] Add automation for thetool-callinggithub label vllm-project/vllm#17118):markdownhelp:toEngineArgsdocs so markdown docstrings render properly (Add:markdownhelp:toEngineArgsdocs so markdown docstrings render properly vllm-project/vllm#17124)LoRAConfig+PromptAdapterConfig(Improve configs -LoRAConfig+PromptAdapterConfigvllm-project/vllm#16980)SchedulerConfigargs into scheduler config group inEngineArgs(Move missedSchedulerConfigargs into scheduler config group inEngineArgsvllm-project/vllm#17131)get_text_config()instead of checking fortext_config(Use Transformers helperget_text_config()instead of checking fortext_configvllm-project/vllm#17105)LLM.chat()tokenization ([BugFix][Frontend] FixLLM.chat()tokenization vllm-project/vllm#16081)-nin multi-image example ([Bugfix] Fix missing int type for-nin multi-image example vllm-project/vllm#17223)structural_tagsupport using xgrammar ([V1] Addstructural_tagsupport using xgrammar vllm-project/vllm#17085)vllm_flash_attnduring development mode ([Chore] added stubs forvllm_flash_attnduring development mode vllm-project/vllm#17228)skip_tokenizer_initwithnum_scheduler_steps([Bugfix] fix error due to an uninitialized tokenizer when usingskip_tokenizer_initwithnum_scheduler_stepsvllm-project/vllm#9276)stop_token_idscontents ([Misc] Validatestop_token_idscontents vllm-project/vllm#17268)PromptAdapterConfig(Add missing class docstring forPromptAdapterConfigvllm-project/vllm#17302)get_language_modelto new MLLMs ([Bugfix] Add missingget_language_modelto new MLLMs vllm-project/vllm#17300)platforms/interface.py([Misc] Minor typo/grammar inplatforms/interface.pyvllm-project/vllm#17307)compressed-tensorsquant method consistent across vLLM (Make name ofcompressed-tensorsquant method consistent across vLLM vllm-project/vllm#17255)process_weights_after_loading. ([Bugfix] Fix moe weight losing all extra attrs afterprocess_weights_after_loading. vllm-project/vllm#16854)