Skip to content

Conversation

@yangjianfengo1
Copy link
Contributor

@yangjianfengo1 yangjianfengo1 commented Nov 18, 2025

Motivation

对于长视频来说token太多,所占激活显存太多,故推理时支持chunk prefill,并且视频的chunk prefill只能帧与帧切割

Modifications

修改了多模中的视频理解每次推理时给引擎的token数

Usage or Command

请求中带入can_split_idx字段就好,这个数组中的每个值是可以切割的下标,例如token_num=10,can_split_list=[2,5,9],表述chunk fill时可以分3次推理,每次的token数分别是[3,3,4]

Accuracy Tests

模型单测通过

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Nov 18, 2025

Thanks for your contribution!

@TBD1 TBD1 self-requested a review November 18, 2025 07:52
TBD1
TBD1 previously approved these changes Nov 18, 2025
@yangjianfengo1 yangjianfengo1 changed the title 【new feature】视频理解支持chunk prefill [Scheduler] 视频理解支持chunk prefill Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants