Skip to content

Implement Reader API and Pandas support #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 of 5 tasks
Spartee opened this issue Nov 21, 2022 · 1 comment
Closed
4 of 5 tasks

Implement Reader API and Pandas support #5

Spartee opened this issue Nov 21, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@Spartee
Copy link
Contributor

Spartee commented Nov 21, 2022

Description

redisvl.readers will be a module of classes named after the method used to read data from specific formats. For example, the pandas reader will be implemented first with support with csv, and pickled dataframe formats.

Interface

class Reader:

    support_map = {
        "pickle": self._from_pickle,
        "json": self._from_json,
        "parquet": self._from_parquet,
        "csv": self._from_csv
    }

    def __init__(data_format="pickle"):
        self._data_format = data_format

    @staticmethod
    def _from_pickle() -> t.Iterable[t.Dict[str, t.Any]]:

    @staticmethod
    def _from_json() -> t.Iterable[t.Dict[str, t.Any]]:

    @staticmethod
    def _from_parquet() -> t.Iterable[t.Dict[str, t.Any]]:
    
    @staticmethod
    def _from_csv() -> t.Iterable[t.Dict[str, t.Any]]:

    def __iter__():
        self.support_map[self._data_format]()

The Reader will be imported by the user and passed to the SearchIndex class. The SearchIndex will use the Reader to get data prior to loading it into redis.

The interface isn't complete stable, but should be more flushed out after the pandas reader is adapted to it.

Out of scope here

out of scope items, but worth thinking about while developing.

  • multifile reader
  • distributed reader (i.e. Dask)
  • larger than memory reader
  • conversion of vectors to bytes

Acceptance Criteria

  • Implemented the redisvl.readers.pandas module adhering to the above interface.
  • tests confirming support for
    • Pickled dataframe
    • csv
  • Use in CLI load command
@Spartee Spartee added reader enhancement New feature or request labels Nov 21, 2022
@Spartee Spartee moved this to Todo in RedisVL v0.1.0 Nov 21, 2022
@Spartee Spartee moved this from Todo to In Progress in RedisVL v0.1.0 Dec 2, 2022
@Spartee Spartee self-assigned this Dec 2, 2022
@Spartee
Copy link
Contributor Author

Spartee commented Dec 5, 2022

closed with #13. The complete functionality is not implemented as the base class isn't written yet, but further issues will follow that determine what should be in the base class primarily to handle bytes conversion

@Spartee Spartee closed this as completed Dec 5, 2022
Repository owner moved this from In Progress to Done in RedisVL v0.1.0 Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant