-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Closed
Labels
Contributions wanted!Looking for external contributionsLooking for external contributionsP1High priority, add to the next sprintHigh priority, add to the next sprinttype:featureNew feature or requestNew feature or request
Description
Is your feature request related to a problem? Please describe.
The current CSVToDocument converter processes an entire CSV file as a single Haystack Document. This makes it difficult to handle use cases where each CSV row should become its own Document, which is common with structured datasets.
For example, if a CSV contains customer feedback with one entry per row, you may want to index each feedback item as a separate Document.
Describe the solution you’d like
Extend CSVToDocument to support row-level conversion, similar to DeepsetCSVRowsToDocumentConverter
(suggested by @sjrl).
Key points for row level conversion:
- Add a
conversion_modeinit parameter to choose whether the entire file or each row is converted into a Document (analogous tosplit_modeinCSVDocumentSplitter). - For row-level conversion:
- Allow selecting one column to populate
Document.content(configurable by the user). - Store all remaining columns in
Document.meta, using column names as keys and their values as the corresponding metadata values.
- Allow selecting one column to populate
Describe alternatives you've considered
NA
Additional context
NA
Metadata
Metadata
Assignees
Labels
Contributions wanted!Looking for external contributionsLooking for external contributionsP1High priority, add to the next sprintHigh priority, add to the next sprinttype:featureNew feature or requestNew feature or request
Type
Projects
Status
Done