-
Notifications
You must be signed in to change notification settings - Fork 624
Closed
Description
This issue tracks the checklist for official v0.7.3 release
Code develop
- Wait for CANN8.1 release, then update dockerfile base image Upgrade CANN version to 8.1.rc1 #747 @Yikun
- Update torch-npu to 2.5.1 official release @MengqingCao
[v0.7.3][Build] Upgrade torch-npu to 2.5.1 #662 - PR waiting for merge/review/close @wangxiyuan
[0.7.3] Optimize apply_penalties & topKtopP for both V0/V1 Engine #525 @linfeng-yuan
[Doc] Update v0.7.3 faqs #695
[ModelRunnerV1] Adapt kv_cache quant in v1. #685
[Misc] Add v0.7.3 benchmark #678
[0.7.3] optimize qwen2_vl and qwen2_5_vl #702 - lora support cherry-pick @paulyu12 @Yikun
Add LoRA & Multi-LoRA support for V0.7.3 dev by Cherry Pick #700 - write release note @Yikun
[Doc] Add release note for 0.7.3 #735 - CPU memory overleak @celestialli
[0.7.3] patch from_seq_group to clear finished seq in seq_id_to_seq_group #691
Documant enhancement
-
Installation @MengqingCao
[Build][0.7.3] Integrate MindIE Turbo into vLLM Ascend #708- install from source code
- vllm
- vllm-ascend[mindie-turbo]
- install from binary
- vllm
- vllm-ascend
- mindie-turbo
- install with docker
- install from source code
-
User Guide
- Use ascend scheduler with V1 Engine @MengqingCao [Guide]: Usage on AscendScheduler in vLLM Ascend #788
- Improve performance with python and pytorch @wangxiyuan [Doc] Add release note for 0.7.3 #735
- Update doc to address compile enhancement @MengqingCao
[Build][0.7.3] Integrate MindIE Turbo into vLLM Ascend #708 - FAQ cherry-pick @Potabk
[Doc] Update v0.7.3 faqs #695
[v0.7.3][Doc] Add notes for OOM in FAQs (#786) #795 - Feature support update @MengqingCao
[Build][0.7.3] Integrate MindIE Turbo into vLLM Ascend #708 - Model support update @MengqingCao
[Build][0.7.3] Integrate MindIE Turbo into vLLM Ascend #708 - Accurary report @hfadzxy [v0.7.3][Doc] Add accuracy report #793
Add index page once the report exist. - Performance feedback issue: [Guide][Performance]: vllm-ascend v0.7.3 release performance benchmark #776 @Potabk
-
Developer Guide
- Update Release Compatibility Matrix include mindie-turbo verion: [Doc] Add release note for 0.7.3 #735 @Yikun
Function and Model Test
- key models:
- qwen2.5
- deepseek-v3
- qwen2.5-vl
- features
If the certain feature usage is different from the original usage in vllm, we need to add one for vllm-ascend[mindie-turbo]- chunked prefill @MengqingCao
rely on CANN 8.1 nnal - custom ops @celestialli
- guided decoding – same as vllm @shen-shanshan
- sleep mode @celestialli
- create an issue to track the sleep mode @celestialli
[Guide]: Sleep mode feature guide #733 - update feature support list to link to the issue @MengqingCao
[Build][0.7.3] Integrate MindIE Turbo into vLLM Ascend #708
- create an issue to track the sleep mode @celestialli
- speculative decoding – same as vllm @MengqingCao
- multi-step scheduler – same as vllm @MengqingCao
- mtp – same as vllm @MengqingCao
- prefix cache @Potabk
- pooling model – same as vllm @MengqingCao
- V1Engine @shen-shanshan
- distribution @shen-shanshan
- tp
- pp
- chunked prefill @MengqingCao
Release artifacts @wangxiyuan
- accuracy report @hfadzxy [v0.7.3][Doc] Add accuracy report #793
Need generate the report by hand. - pypi package @MengqingCao https://pypi.org/project/vllm-ascend/0.7.3/
- docker image @Yikun https://github.com/vllm-project/vllm-ascend/actions/runs/14872918023/job/41866668626?pr=730
Metadata
Metadata
Assignees
Labels
No labels