This README.MD file is generated by Gemini-2.5-flash in conjunction with the code in this node, for reference only, please check examples/example_workflow.json.
此README.MD 文件由Gemini-2.5-flash结合本节点代码生成,仅供参考,请查看examples/example_workflow.json
Note: This project is inspired by TTPlanetPig/Comfyui_Object_Detect_QWen_VL. The code was generated with the assistance of Gemini-2.5-flash. While designed for generic external LLMs, these nodes have been specifically verified to work with the Qwen2.5-VL series model only.
This repository provides custom ComfyUI nodes for performing object detection using a generic external Large Language Model (LLM) API. These nodes allow you to configure LLM API connections, send images with custom prompts, and convert the LLM's JSON bounding box responses into a format compatible with segmentation nodes like SAM2.
Configures the connection settings for an external LLM API, including base_url, model_id, and token. The token parameter corresponds to the api_key in the OpenAI API client. This node's output is then passed to the ExternalLLMDetectorMainProcess node.
The core node for sending images and prompts to the external LLM. It supports concurrent requests (threads) and custom delays (delay) in seconds between requests. It takes the ex_llm_settings from the ExternalLLMDetectorSettings node, an images input, a target objectsand negative_objects (for prompt placeholder replacement to determine what you want to split), a retriesinput to set the retry times for each request if it fails,and a fully customizable prompt.
Important Note: The external LLM API must be compatible with the OpenAI API format, specifically its chat.completions endpoint. This means the base_url should point to an OpenAI-compatible server, and the token provided in ExternalLLMDetectorSettings will be used as the api_key for authentication.
It also expects the external LLM to be capable of:
- Multimodal Input: Accepting both image and text as input.
- JSON Output: Returning a JSON string containing bounding boxes. The default prompt is designed to elicit a JSON array of objects, where each object must contain a
bbox_2dkey with a 4-element list[x1, y1, x2, y2]. Example format:[ {"bbox_2d": [123, 456, 789, 012], "label": "target_object"} ]
Converts the raw JSON bounding box strings received from the ExternalLLMDetectorMainProcess node into the BBOXES format compatible with SAM2 and similar segmentation nodes. It intelligently handles cases where the LLM returns a single JSON object instead of a list, and validates the bbox_2d format. Any malformed or empty responses are gracefully handled.
SAM2 BBOXES Format Example:
[
[[10, 10, 100, 100], [200, 200, 300, 300]], # Image 1: two bounding boxes
[[50, 50, 150, 150]], # Image 2: one bounding box
[[100, 100, 200, 200], [300, 300, 400, 400], [500, 500, 600, 600]] # Image 3: three bounding boxes
]- Place this repository inside your
ComfyUI/custom_nodesdirectory,then runpip install -r requirements.txt - Start by using the
ExternalLLMDetectorSettingsnode. Provide your external LLM API'sbase_url,model_id, andtoken(API key). Remember: your LLM API must be OpenAI-compatible. - Connect the
ex_llm_settingsoutput to theExternalLLMDetectorMainProcessnode. Input yourimages, set the desiredthreadsfor concurrency,delaybetween requests andretriesfor retry times when a request fails, and specify the objects you want or not want to detect with theobjectsandnegative_objects. Customize thepromptto guide the LLM on how to respond with bounding boxes in the required JSON format. Ensure your chosen LLM can handle multimodal inputs (image + text) and is capable of generating JSON responses in the specifiedbbox_2dformat. - Connect the
bboxes_strings_listoutput fromExternalLLMDetectorMainProcessto theExternalLLMDetectorBboxesConvertnode. This node will parse and convert the raw JSON strings into theBBOXESformat expected by SAM2. - The
sam2_bboxesoutput fromExternalLLMDetectorBboxesConvertshould then be connected to any SAM2-compatible node for further processing (e.g., for segmentation).
备注: 本项目受 TTPlanetPig/Comfyui_Object_Detect_QWen_VL 启发。本代码的生成得到了 Gemini-2.5-flash 的协助。虽然旨在支持通用外部 LLM,但该节点仅在 Qwen2.5-VL 系列模型上得到了验证。
本仓库提供了自定义 ComfyUI 节点,用于通过通用外部大型语言模型 (LLM) API 执行目标检测。这些节点允许您配置 LLM API 连接、发送带有自定义提示的图像,并将 LLM 返回的 JSON 边界框响应转换为与 SAM2等分割节点兼容的格式。
配置外部 LLM API 的连接设置,包括 base_url(API 地址)、model_id(模型名称)和 token。token 参数对应于 OpenAI API 客户端中的 api_key。此节点的输出将传递给 ExternalLLMDetectorMainProcess 节点。
向外部 LLM 发送图像和提示的核心节点。它支持并发请求(threads)和每次请求间的自定义延迟(delay),单位为秒。该节点接收来自 ExternalLLMDetectorSettings 的 ex_llm_settings,一个 images 输入,一个 objects和一个negative_objects(用于替换提示中的占位符以选定想要分割的内容),一个retries输入表示重试次数,以及一个完全可自定义的 prompt。
重要提示: 外部 LLM API 必须与 OpenAI API 格式兼容,特别是其 chat.completions 端点。这意味着 base_url 应指向一个与 OpenAI 兼容的服务器,并且在 ExternalLLMDetectorSettings 中提供的 token 将用作身份验证的 api_key。
它还期望外部 LLM 能够:
- 多模态输入: 接受图像和文本作为输入。
- JSON 输出: 返回包含边界框的 JSON 字符串。默认提示旨在引出一个 JSON 对象数组,其中每个对象必须包含一个
bbox_2d键,其值为包含 4 个元素的列表[x1, y1, x2, y2]。例如格式:[ {"bbox_2d": [123, 456, 789, 012], "label": "target_object"} ]
将从 ExternalLLMDetectorMainProcess 节点接收到的原始 JSON 边界框字符串转换为与 SAM2 及类似分割节点兼容的 BBOXES 格式。它能智能处理 LLM 返回单个 JSON 对象而非列表的情况,并验证 bbox_2d 格式。任何格式错误或空的响应都会被妥善处理。
SAM2 BBOXES 格式示例:
[
[[10, 10, 100, 100], [200, 200, 300, 300]], # 图像 1:两个边界框
[[50, 50, 150, 150]], # 图像 2:一个边界框
[[100, 100, 200, 200], [300, 300, 400, 400], [500, 500, 600, 600]] # 图像 3:三个边界框
]- 将本仓库放置到您的
ComfyUI/custom_nodes目录下。然后运行pip install -r requirements.txt - 首先使用
ExternalLLMDetectorSettings节点。提供您的外部 LLM API 的base_url、model_id和token(API 密钥)。请记住:您的 LLM API 必须与 OpenAI 兼容。 - 将
ex_llm_settings输出连接到ExternalLLMDetectorMainProcess节点。输入您的images,设置所需的并发threads和重试次数retries,请求间的delay,并通过objects和negative_object指定您想要或不想被检测到的物体 。自定义prompt以引导 LLM 以所需的 JSON 格式返回边界框。请确保您选择的 LLM 能够处理多模态输入(图像 + 文本)并能够生成指定bbox_2d格式的 JSON 响应。 - 将
ExternalLLMDetectorMainProcess的bboxes_strings_list输出连接到ExternalLLMDetectorBboxesConvert节点。此节点将解析并将原始 JSON 字符串转换为 SAM2 期望的BBOXES格式。 ExternalLLMDetectorBboxesConvert的sam2_bboxes输出随后可以连接到任何 SAM2 兼容的节点进行进一步处理(例如,用于分割)。