ComfyUI Qwen2.5-VL Object Detection Node

This repository provides a custom ComfyUI node for running object detection with the Qwen 2.5 VL model. The node downloads the selected model on demand, runs a detection prompt and outputs bounding boxes that can be used with segmentation nodes such as SAM2.

Nodes

`DownloadAndLoadQwenModel`

Downloads a chosen Qwen 2.5-VL model into models/Qwen and returns the loaded model and processor. You can choose which device to load the model onto (e.g. cuda:1 if you have multiple GPUs), the precision for the checkpoint (INT4, INT8, BF16, FP16 or FP32) and whether to use FlashAttention or SDPA. FlashAttention is automatically replaced with SDPA when FP32 precision is selected because FlashAttention does not support it.

`QwenVLDetection`

Runs a detection prompt on an input image using the loaded model. The node outputs a JSON list of bounding boxes of the form {"bbox_2d": [x1, y1, x2, y2], "label": "object"} and a separate list of coordinates. Boxes are sorted by confidence and you can specify which ones to return using the bbox_selection parameter:

all – return all boxes (default)
Comma-separated indices such as 0, 1,2 or 0,2 – return only the selected boxes, sorted by detection confidence
merge_boxes – when enabled, merge the selected boxes into a single bounding box
score_threshold – drop boxes with a confidence score below this value when available

The bounding boxes are converted to absolute pixel coordinates so they can be passed to SAM2 nodes.

`BBoxesToSAM2`

Wraps a list of bounding boxes into the BBOXES batch format expected by ComfyUI-segment-anything-2 and compatible nodes such as sam_2_ultra.py.

`BBoxesStringToSAM2`

Convert bounding boxes strings to SAM2 format,this node allows you to use external LLM API (e.g.comfyui_LLM_party) to detect the object.

The external LLM should have the ability to detect objects and return bounding boxes as required,for the response format and prompt,please reference External_LLM_example.json

Usage

Place this repository inside your ComfyUI/custom_nodes directory.
From the Download and Load Qwen2.5-VL Model node, select the model you want to use, choose the desired precision (INT4/INT8/BF16/FP16/FP32), the attention implementation (FlashAttention or SDPA) and, if necessary, choose the device (such as cuda:1) where it should be loaded. The snapshot download will resume automatically if a previous attempt was interrupted. FlashAttention is replaced with SDPA automatically when used with FP32 precision.
Connect the output model to Qwen2.5-VL Object Detection, provide an image and the object you want to locate (e.g. cat). Optionally set score_threshold to filter out low-confidence boxes, use bbox_selection to choose specific ones (e.g. 0,2) and enable merge_boxes if you want them merged. The node will automatically build the detection prompt and return the selected boxes in JSON.
Pass the bounding boxes through Prepare BBoxes for SAM2 before feeding them into the SAM2 workflow.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
examples		examples
README.md		README.md
__init__.py		__init__.py
nodes.py		nodes.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

ComfyUI Qwen2.5-VL Object Detection Node

Nodes

`DownloadAndLoadQwenModel`

`QwenVLDetection`

`BBoxesToSAM2`

`BBoxesStringToSAM2`

Usage

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

ftf001-tech/Comfyui_Object_Detect_QWen_VL

Folders and files

Latest commit

History

Repository files navigation

ComfyUI Qwen2.5-VL Object Detection Node

Nodes

DownloadAndLoadQwenModel

QwenVLDetection

BBoxesToSAM2

BBoxesStringToSAM2

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`DownloadAndLoadQwenModel`

`QwenVLDetection`

`BBoxesToSAM2`

`BBoxesStringToSAM2`

Packages