Skip to content

Conversation

Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Aug 1, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

video_grid_thw before:

tensor([[ 1, 26, 46],
        [ 1, 26, 46],
        [ 1, 26, 46],
        [ 1, 26, 46],
        [ 1, 26, 46]])

video_grid_thw currently:

tensor([[ 5, 26, 46]])

TODO

  • Add regression test

Test Plan

Test Result

(Optional) Documentation Update

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a video inference issue in GLM-4.1V caused by a change in the transformers library. While the change correctly adapts to the new video_grid_thw format for a single video, it introduces a critical bug in handling multiple videos within a single request. The proposed change incorrectly uses data from only the last video in a batch, ignoring the others. My review includes a fix for this issue.

Comment on lines 1126 to 1129
video_outputs = dict(
pixel_values_videos=torch.cat(pixel_values_videos_lst),
video_grid_thw=torch.cat(video_grid_thw_lst),
video_grid_thw=video_outputs["video_grid_thw"],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change introduces a bug when processing multiple videos in a single request. The video_outputs variable is updated in each iteration of the loop over videos. By using video_outputs["video_grid_thw"] after the loop, you are only using the video_grid_thw from the last video processed.

The video_grid_thw_lst correctly accumulates the grid information for all videos. You should concatenate the tensors in this list to form the final batched video_grid_thw, as was done previously.

This is a critical issue that will lead to incorrect outputs for multi-video inputs.

Suggested change
video_outputs = dict(
pixel_values_videos=torch.cat(pixel_values_videos_lst),
video_grid_thw=torch.cat(video_grid_thw_lst),
video_grid_thw=video_outputs["video_grid_thw"],
)
video_outputs = dict(
pixel_values_videos=torch.cat(pixel_values_videos_lst),
video_grid_thw=torch.cat(video_grid_thw_lst),
)

Signed-off-by: Isotr0py <[email protected]>
Copy link

github-actions bot commented Aug 1, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Isotr0py <[email protected]>
@Isotr0py Isotr0py marked this pull request as ready for review August 1, 2025 10:16
@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Aug 1, 2025
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 1, 2025 10:25
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 1, 2025
@DarkLight1337
Copy link
Member

Can you merge from main?

@DarkLight1337 DarkLight1337 disabled auto-merge August 1, 2025 12:59
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 1, 2025 12:59
@vllm-bot vllm-bot merged commit 3f8e952 into vllm-project:main Aug 1, 2025
42 of 44 checks passed
@Isotr0py Isotr0py deleted the fix-glm41v-video branch August 2, 2025 04:13
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
HeJunyan added a commit to HeJunyan/vllm-fork that referenced this pull request Aug 20, 2025
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
HeJunyan added a commit to HeJunyan/vllm-fork that referenced this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: GLM-4.1V-Thinking ValueError [Bug]: run glm4.1v ,ValueError: Attempted to assign 100 = 100 multimodal tokens to 30000 placeholders

3 participants