-
Couldn't load subscription status.
- Fork 2.5k
Closed
Labels
P1High priority, add to the next sprintHigh priority, add to the next sprinttype:bugSomething isn't workingSomething isn't working
Description
Describe the bug
When creating a ComponentTool using the following code
import inspect
from haystack.components.converters.html import HTMLToDocument
from haystack.tools import ComponentTool
print(inspect.signature(HTMLToDocument().run))
comp_tool = ComponentTool(name="htmltodoc", component=HTMLToDocument())
print(comp_tool.parameters["properties"])I get
{
"type": "object",
"properties": {
"sources": {
"type": "array",
"description": "List of HTML file paths or ByteStream objects.",
"items": {"type": "string"},
},
"meta": {
"type": "string",
"description": "Optional metadata to attach to the Documents.\nThis value can be either a list of dictionaries or a single dictionary.\nIf it's a single dictionary, its content is added to the metadata of all produced Documents.\nIf it's a list, the length of the list must match the number of sources, because the two lists will\nbe zipped.\nIf `sources` contains ByteStream objects, their `meta` will be added to the output Documents.",
},
"extraction_kwargs": {
"type": "string",
"description": "A dictionary containing keyword arguments to customize the extraction process. These\nare passed to the underlying Trafilatura `extract` function. For the full list of available arguments, see\nthe [Trafilatura documentation](https://trafilatura.readthedocs.io/en/latest/corefunctions.html#extract).",
},
},
"required": ["sources"],
}You can see that the meta and extraction_kwargs are incorrectly given the type of "string".
Expected behavior
The correct parameter construction which should be something like (not tested, needs to be double checked)
{
"type": "object",
"properties": {
"sources": {
"type": "array",
"items": {
"type": "string",
"description": "List of HTML file paths or ByteStream objects.",
},
},
"meta": {
"description": "Optional metadata for the documents.\nThis value can be either a list of dictionaries or a single dictionary. ...",
"oneOf": [
{"type": "object", "additionalProperties": True},
{"type": "array", "items": {"type": "object", "additionalProperties": True}},
],
},
"extraction_kwargs": {
"type": "object",
"description": "A dictionary containing keyword arguments to customize the extraction process. These ...",
"additionalProperties": True,
},
},
"required": ["sources"],
}Additional context
This would greatly impact the ability of the LLM to properly use this tool through ComponentTool if a user doesn't manually provide the parameters specification.
Metadata
Metadata
Assignees
Labels
P1High priority, add to the next sprintHigh priority, add to the next sprinttype:bugSomething isn't workingSomething isn't working