Skip to content

The auto construction of Tool parameters in ComponentTool does not work for more complex types #9292

@sjrl

Description

@sjrl

Describe the bug
When creating a ComponentTool using the following code

import inspect
from haystack.components.converters.html import HTMLToDocument
from haystack.tools import ComponentTool

print(inspect.signature(HTMLToDocument().run))

comp_tool = ComponentTool(name="htmltodoc", component=HTMLToDocument())
print(comp_tool.parameters["properties"])

I get

{
    "type": "object",
    "properties": {
        "sources": {
            "type": "array",
            "description": "List of HTML file paths or ByteStream objects.",
            "items": {"type": "string"},
        },
        "meta": {
            "type": "string",
            "description": "Optional metadata to attach to the Documents.\nThis value can be either a list of dictionaries or a single dictionary.\nIf it's a single dictionary, its content is added to the metadata of all produced Documents.\nIf it's a list, the length of the list must match the number of sources, because the two lists will\nbe zipped.\nIf `sources` contains ByteStream objects, their `meta` will be added to the output Documents.",
        },
        "extraction_kwargs": {
            "type": "string",
            "description": "A dictionary containing keyword arguments to customize the extraction process. These\nare passed to the underlying Trafilatura `extract` function. For the full list of available arguments, see\nthe [Trafilatura documentation](https://trafilatura.readthedocs.io/en/latest/corefunctions.html#extract).",
        },
    },
    "required": ["sources"],
}

You can see that the meta and extraction_kwargs are incorrectly given the type of "string".

Expected behavior
The correct parameter construction which should be something like (not tested, needs to be double checked)

{
    "type": "object",
    "properties": {
        "sources": {
            "type": "array",
            "items": {
                "type": "string",
                "description": "List of HTML file paths or ByteStream objects.",
            },
        },
        "meta": {
            "description": "Optional metadata for the documents.\nThis value can be either a list of dictionaries or a single dictionary. ...",
            "oneOf": [
                {"type": "object", "additionalProperties": True},
                {"type": "array", "items": {"type": "object", "additionalProperties": True}},
            ],
        },
        "extraction_kwargs": {
            "type": "object",
            "description": "A dictionary containing keyword arguments to customize the extraction process. These ...",
            "additionalProperties": True,
        },
    },
    "required": ["sources"],
}

Additional context
This would greatly impact the ability of the LLM to properly use this tool through ComponentTool if a user doesn't manually provide the parameters specification.

Metadata

Metadata

Assignees

Labels

P1High priority, add to the next sprinttype:bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions