Closed
Description
Description
redisvl.readers
will be a module of classes named after the method used to read data from specific formats. For example, the pandas reader will be implemented first with support with csv, and pickled dataframe formats.
Interface
class Reader:
support_map = {
"pickle": self._from_pickle,
"json": self._from_json,
"parquet": self._from_parquet,
"csv": self._from_csv
}
def __init__(data_format="pickle"):
self._data_format = data_format
@staticmethod
def _from_pickle() -> t.Iterable[t.Dict[str, t.Any]]:
@staticmethod
def _from_json() -> t.Iterable[t.Dict[str, t.Any]]:
@staticmethod
def _from_parquet() -> t.Iterable[t.Dict[str, t.Any]]:
@staticmethod
def _from_csv() -> t.Iterable[t.Dict[str, t.Any]]:
def __iter__():
self.support_map[self._data_format]()
The Reader
will be imported by the user and passed to the SearchIndex
class. The SearchIndex
will use the Reader
to get data prior to loading it into redis.
The interface isn't complete stable, but should be more flushed out after the pandas reader is adapted to it.
Out of scope here
out of scope items, but worth thinking about while developing.
- multifile reader
- distributed reader (i.e. Dask)
- larger than memory reader
- conversion of vectors to bytes
Acceptance Criteria
- Implemented the
redisvl.readers.pandas
module adhering to the above interface. - tests confirming support for
- Pickled dataframe
- csv
- Use in CLI
load
command