Skip to content

These nodes allow you to configure LLM API connections, send images with custom prompts, and convert the LLM's JSON bounding box responses into a format compatible with segmentation nodes like SAM2

License

ftf001-tech/ComfyUI-ExternalLLMDetector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This README.MD file is generated by Gemini-2.5-flash in conjunction with the code in this node, for reference only, please check examples/example_workflow.json.

README.MD 文件由Gemini-2.5-flash结合本节点代码生成,仅供参考,请查看examples/example_workflow.json

ComfyUI External LLM Detector Nodes

Note: This project is inspired by TTPlanetPig/Comfyui_Object_Detect_QWen_VL. The code was generated with the assistance of Gemini-2.5-flash. While designed for generic external LLMs, these nodes have been specifically verified to work with the Qwen2.5-VL series model only.

This repository provides custom ComfyUI nodes for performing object detection using a generic external Large Language Model (LLM) API. These nodes allow you to configure LLM API connections, send images with custom prompts, and convert the LLM's JSON bounding box responses into a format compatible with segmentation nodes like SAM2.

Nodes

ExternalLLMDetectorSettings

Configures the connection settings for an external LLM API, including base_url, model_id, and token. The token parameter corresponds to the api_key in the OpenAI API client. This node's output is then passed to the ExternalLLMDetectorMainProcess node.

ExternalLLMDetectorMainProcess

The core node for sending images and prompts to the external LLM. It supports concurrent requests (threads) and custom delays (delay) in seconds between requests. It takes the ex_llm_settings from the ExternalLLMDetectorSettings node, an images input, a target objectsand negative_objects (for prompt placeholder replacement to determine what you want to split), a retriesinput to set the retry times for each request if it fails,and a fully customizable prompt.

Important Note: The external LLM API must be compatible with the OpenAI API format, specifically its chat.completions endpoint. This means the base_url should point to an OpenAI-compatible server, and the token provided in ExternalLLMDetectorSettings will be used as the api_key for authentication.

It also expects the external LLM to be capable of:

  1. Multimodal Input: Accepting both image and text as input.
  2. JSON Output: Returning a JSON string containing bounding boxes. The default prompt is designed to elicit a JSON array of objects, where each object must contain a bbox_2d key with a 4-element list [x1, y1, x2, y2]. Example format:
    [
      {"bbox_2d": [123, 456, 789, 012], "label": "target_object"}
    ]

ExternalLLMDetectorBboxesConvert

Converts the raw JSON bounding box strings received from the ExternalLLMDetectorMainProcess node into the BBOXES format compatible with SAM2 and similar segmentation nodes. It intelligently handles cases where the LLM returns a single JSON object instead of a list, and validates the bbox_2d format. Any malformed or empty responses are gracefully handled. SAM2 BBOXES Format Example:

[
    [[10, 10, 100, 100], [200, 200, 300, 300]],  # Image 1: two bounding boxes
    [[50, 50, 150, 150]],                        # Image 2: one bounding box
    [[100, 100, 200, 200], [300, 300, 400, 400], [500, 500, 600, 600]] # Image 3: three bounding boxes
]

Usage

  1. Place this repository inside your ComfyUI/custom_nodes directory,then run pip install -r requirements.txt
  2. Start by using the ExternalLLMDetectorSettings node. Provide your external LLM API's base_url, model_id, and token (API key). Remember: your LLM API must be OpenAI-compatible.
  3. Connect the ex_llm_settings output to the ExternalLLMDetectorMainProcess node. Input your images, set the desired threads for concurrency, delay between requests and retriesfor retry times when a request fails, and specify the objects you want or not want to detect with the objects and negative_objects. Customize the prompt to guide the LLM on how to respond with bounding boxes in the required JSON format. Ensure your chosen LLM can handle multimodal inputs (image + text) and is capable of generating JSON responses in the specified bbox_2d format.
  4. Connect the bboxes_strings_list output from ExternalLLMDetectorMainProcess to the ExternalLLMDetectorBboxesConvert node. This node will parse and convert the raw JSON strings into the BBOXES format expected by SAM2.
  5. The sam2_bboxes output from ExternalLLMDetectorBboxesConvert should then be connected to any SAM2-compatible node for further processing (e.g., for segmentation).

ComfyUI 外部大模型图像分割检测节点

备注: 本项目受 TTPlanetPig/Comfyui_Object_Detect_QWen_VL 启发。本代码的生成得到了 Gemini-2.5-flash 的协助。虽然旨在支持通用外部 LLM,但该节点仅在 Qwen2.5-VL 系列模型上得到了验证。

本仓库提供了自定义 ComfyUI 节点,用于通过通用外部大型语言模型 (LLM) API 执行目标检测。这些节点允许您配置 LLM API 连接、发送带有自定义提示的图像,并将 LLM 返回的 JSON 边界框响应转换为与 SAM2等分割节点兼容的格式。

节点

ExternalLLMDetectorSettings

配置外部 LLM API 的连接设置,包括 base_url(API 地址)、model_id(模型名称)和 tokentoken 参数对应于 OpenAI API 客户端中的 api_key。此节点的输出将传递给 ExternalLLMDetectorMainProcess 节点。

ExternalLLMDetectorMainProcess

向外部 LLM 发送图像和提示的核心节点。它支持并发请求(threads)和每次请求间的自定义延迟(delay),单位为秒。该节点接收来自 ExternalLLMDetectorSettingsex_llm_settings,一个 images 输入,一个 objects和一个negative_objects(用于替换提示中的占位符以选定想要分割的内容),一个retries输入表示重试次数,以及一个完全可自定义的 prompt

重要提示: 外部 LLM API 必须与 OpenAI API 格式兼容,特别是其 chat.completions 端点。这意味着 base_url 应指向一个与 OpenAI 兼容的服务器,并且在 ExternalLLMDetectorSettings 中提供的 token 将用作身份验证的 api_key

它还期望外部 LLM 能够:

  1. 多模态输入: 接受图像和文本作为输入。
  2. JSON 输出: 返回包含边界框的 JSON 字符串。默认提示旨在引出一个 JSON 对象数组,其中每个对象必须包含一个 bbox_2d 键,其值为包含 4 个元素的列表 [x1, y1, x2, y2]。例如格式:
    [
      {"bbox_2d": [123, 456, 789, 012], "label": "target_object"}
    ]

ExternalLLMDetectorBboxesConvert

将从 ExternalLLMDetectorMainProcess 节点接收到的原始 JSON 边界框字符串转换为与 SAM2 及类似分割节点兼容的 BBOXES 格式。它能智能处理 LLM 返回单个 JSON 对象而非列表的情况,并验证 bbox_2d 格式。任何格式错误或空的响应都会被妥善处理。

SAM2 BBOXES 格式示例:

[
    [[10, 10, 100, 100], [200, 200, 300, 300]],  # 图像 1:两个边界框
    [[50, 50, 150, 150]],                        # 图像 2:一个边界框
    [[100, 100, 200, 200], [300, 300, 400, 400], [500, 500, 600, 600]] # 图像 3:三个边界框
]

使用方法

  1. 将本仓库放置到您的 ComfyUI/custom_nodes 目录下。然后运行pip install -r requirements.txt
  2. 首先使用 ExternalLLMDetectorSettings 节点。提供您的外部 LLM API 的 base_urlmodel_idtoken(API 密钥)。请记住:您的 LLM API 必须与 OpenAI 兼容。
  3. ex_llm_settings 输出连接到 ExternalLLMDetectorMainProcess 节点。输入您的 images,设置所需的并发 threads和重试次数retries,请求间的 delay,并通过objectsnegative_object指定您想要或不想被检测到的物体 。自定义 prompt 以引导 LLM 以所需的 JSON 格式返回边界框。请确保您选择的 LLM 能够处理多模态输入(图像 + 文本)并能够生成指定 bbox_2d 格式的 JSON 响应。
  4. ExternalLLMDetectorMainProcessbboxes_strings_list 输出连接到 ExternalLLMDetectorBboxesConvert 节点。此节点将解析并将原始 JSON 字符串转换为 SAM2 期望的 BBOXES 格式。
  5. ExternalLLMDetectorBboxesConvertsam2_bboxes 输出随后可以连接到任何 SAM2 兼容的节点进行进一步处理(例如,用于分割)。

About

These nodes allow you to configure LLM API connections, send images with custom prompts, and convert the LLM's JSON bounding box responses into a format compatible with segmentation nodes like SAM2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages