Skip to content

Implement Reader API and Pandas support #5

Closed
@Spartee

Description

@Spartee

Description

redisvl.readers will be a module of classes named after the method used to read data from specific formats. For example, the pandas reader will be implemented first with support with csv, and pickled dataframe formats.

Interface

class Reader:

    support_map = {
        "pickle": self._from_pickle,
        "json": self._from_json,
        "parquet": self._from_parquet,
        "csv": self._from_csv
    }

    def __init__(data_format="pickle"):
        self._data_format = data_format

    @staticmethod
    def _from_pickle() -> t.Iterable[t.Dict[str, t.Any]]:

    @staticmethod
    def _from_json() -> t.Iterable[t.Dict[str, t.Any]]:

    @staticmethod
    def _from_parquet() -> t.Iterable[t.Dict[str, t.Any]]:
    
    @staticmethod
    def _from_csv() -> t.Iterable[t.Dict[str, t.Any]]:

    def __iter__():
        self.support_map[self._data_format]()

The Reader will be imported by the user and passed to the SearchIndex class. The SearchIndex will use the Reader to get data prior to loading it into redis.

The interface isn't complete stable, but should be more flushed out after the pandas reader is adapted to it.

Out of scope here

out of scope items, but worth thinking about while developing.

  • multifile reader
  • distributed reader (i.e. Dask)
  • larger than memory reader
  • conversion of vectors to bytes

Acceptance Criteria

  • Implemented the redisvl.readers.pandas module adhering to the above interface.
  • tests confirming support for
    • Pickled dataframe
    • csv
  • Use in CLI load command

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions