Skip to content

Enhance CSVToDocument Converter to Support Row-Level Conversion #8848

@Amnah199

Description

@Amnah199

Is your feature request related to a problem? Please describe.
The current CSVToDocument converter processes an entire CSV file as a single Haystack Document. This makes it difficult to handle use cases where each CSV row should become its own Document, which is common with structured datasets.

For example, if a CSV contains customer feedback with one entry per row, you may want to index each feedback item as a separate Document.

Describe the solution you’d like
Extend CSVToDocument to support row-level conversion, similar to DeepsetCSVRowsToDocumentConverter
(suggested by @sjrl).

Key points for row level conversion:

  • Add a conversion_mode init parameter to choose whether the entire file or each row is converted into a Document (analogous to split_mode in CSVDocumentSplitter).
  • For row-level conversion:
    • Allow selecting one column to populate Document.content (configurable by the user).
    • Store all remaining columns in Document.meta, using column names as keys and their values as the corresponding metadata values.

Describe alternatives you've considered
NA

Additional context
NA

Metadata

Metadata

Assignees

Labels

Contributions wanted!Looking for external contributionsP1High priority, add to the next sprinttype:featureNew feature or request

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions