-
Notifications
You must be signed in to change notification settings - Fork 55
Full Fledged CLI and Library #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
6c90187
Move to Library + CLI approach
d01504b
remove old data
0cc9ab6
Remove the old utilities
dbdd276
Add pandas readers and loaders
5336b7f
Introduce CLI structure
e88f3b7
make it into a python package
7877ed9
Edit README
80ef741
Refactor dataset standards; project helpers
6a22ac0
Add simple vector query helper
c251c84
Add tests
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
__pycache__/ | ||
__pycache__/ | ||
redisvl.egg-info/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
MAKEFLAGS += --no-print-directory | ||
|
||
# Do not remove this block. It is used by the 'help' rule when | ||
# constructing the help output. | ||
# help: | ||
# help: Developer Makefile | ||
# help: | ||
|
||
|
||
SHELL:=/bin/bash | ||
|
||
# help: help - display this makefile's help information | ||
.PHONY: help | ||
help: | ||
@grep "^# help\:" Makefile | grep -v grep | sed 's/\# help\: //' | sed 's/\# help\://' | ||
|
||
|
||
# help: | ||
# help: Style | ||
# help: ------- | ||
|
||
# help: style - Sort imports and format with black | ||
.PHONY: style | ||
style: sort-imports format | ||
|
||
|
||
# help: check-style - check code style compliance | ||
.PHONY: check-style | ||
check-style: check-sort-imports check-format | ||
|
||
|
||
# help: format - perform code style format | ||
.PHONY: format | ||
format: | ||
@black ./redisvl ./tests/ | ||
|
||
|
||
# help: sort-imports - apply import sort ordering | ||
.PHONY: sort-imports | ||
sort-imports: | ||
@isort ./redisvl ./tests/ --profile black | ||
|
||
|
||
# help: check-lint - run static analysis checks | ||
.PHONY: check-lint | ||
check-lint: | ||
@pylint --rcfile=.pylintrc ./redisvl | ||
|
||
|
||
# help: | ||
# help: Test | ||
# help: ------- | ||
|
||
# help: test - Run all tests | ||
.PHONY: test | ||
test: | ||
@python -m pytest | ||
|
||
# help: test-verbose - Run all tests verbosely | ||
.PHONY: test-verbose | ||
test-verbose: | ||
@python -m pytest -vv -s | ||
|
||
# help: test-cov - Run all tests with coverage | ||
.PHONY: test-cov | ||
test-cov: | ||
@python -m pytest -vv --cov=./redisvl | ||
|
||
# help: cov - generate html coverage report | ||
.PHONY: cov | ||
cov: | ||
@coverage html | ||
@echo if data was present, coverage report is in ./htmlcov/index.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,65 +1,97 @@ | ||
# RediSearch Data Loader | ||
The purpose of this script is to assist in loading datasets to a RediSearch instance efficiently. | ||
# RedisVL | ||
|
||
The project is brand new and will undergo improvements over time. | ||
A CLI and Library to help with loading data into Redis specifically for | ||
usage with RediSearch and Redis Vector Search capabilities | ||
|
||
## Getting Started | ||
### Usage | ||
|
||
### Requirements | ||
Install the Python requirements listed in `requirements.txt`. | ||
|
||
```bash | ||
$ pip install -r requirements.txt | ||
``` | ||
usage: redisvl <command> [<args>] | ||
|
||
### Data | ||
In order to run the script you need to have a dataset that contains your vectors and metadata. | ||
Commands: | ||
load Load vector data into redis | ||
index Index manipulation (create, delete, etc.) | ||
query Query an existing index | ||
|
||
>Currently, the data file must be a pickled pandas dataframe. Support for more data types will be included in future iterations. | ||
Redis Vector load CLI | ||
|
||
### Schema | ||
Along with the dataset, you must update the dataset schema for RediSearch in [`data/schema.py`](data/schema.py). | ||
positional arguments: | ||
command Subcommand to run | ||
|
||
### Running | ||
The `main.py` script provides an entrypoint with optional arguments to upload your dataset to a Redis server. | ||
optional arguments: | ||
-h, --help show this help message and exit | ||
|
||
#### Usage | ||
``` | ||
python main.py | ||
|
||
-h, --help Show this help message and exit | ||
--host Redis host | ||
-p, --port Redis port | ||
-a, --password Redis password | ||
-c , --concurrency Amount of concurrency | ||
-d , --data Path to data file | ||
--prefix Key prefix for all hashes in the search index | ||
-v , --vector Vector field name in df | ||
-i , --index Index name | ||
``` | ||
|
||
#### Defaults | ||
For any of the above commands, you will need to have an index schema written | ||
into a yaml file for the cli to read. The format of the schema is as follows | ||
|
||
```yaml | ||
index: | ||
name: sample # index name used for querying | ||
storage_type: hash | ||
key_field: "id" # column name to use for key in redis | ||
prefix: vector # prefix used for all loaded docs | ||
|
||
# all fields to create index with | ||
# sub-items correspond to redis-py Field arguments | ||
fields: | ||
tag: | ||
categories: # name of a tag field used for queries | ||
separator: "|" | ||
year: # name of a tag field used for queries | ||
separator: "|" | ||
vector: | ||
vector: # name of the vector field used for queries | ||
datatype: "float32" | ||
algorithm: "flat" # flat or HSNW | ||
dims: 768 | ||
distance_metric: "cosine" # ip, L2, cosine | ||
``` | ||
|
||
| Argument | Default | | ||
| ----------- | ----------- | | ||
| Host | `localhost` | | ||
| Port | `6379` | | ||
| Password | "" | | ||
| Concurrency | `50` | | ||
| Data (Path) | `data/embeddings.pkl` | | ||
| Prefix | `vector:` | | ||
| Vector (Field Name) | `vector` | | ||
| Index Name | `index` | | ||
#### Example Usage | ||
|
||
```bash | ||
# load in a pickled dataframe with | ||
redisvl load -s sample.yml -d embeddings.pkl | ||
``` | ||
|
||
#### Examples | ||
```bash | ||
# load in a pickled dataframe to a specific address and port | ||
redisvl load -s sample.yml -d embeddings.pkl -h 127.0.0.1 -p 6379 | ||
``` | ||
|
||
Load to a local (default) redis server with a custom index name and with concurrency = 100: | ||
```bash | ||
$ python main.py -d data/embeddings.pkl -i myIndex -c 100 | ||
# load in a pickled dataframe to a specific | ||
# address and port and with password | ||
redisvl load -s sample.yml -d embeddings.pkl -h 127.0.0.1 -p 6379 -p supersecret | ||
``` | ||
|
||
Load to a cloud redis server with all other defaults: | ||
### Support | ||
|
||
#### Supported Index Fields | ||
|
||
- ``geo`` | ||
- ``tag`` | ||
- ``numeric`` | ||
- ``vector`` | ||
- ``text`` | ||
#### Supported Data Types | ||
- Pandas DataFrame (pickled) | ||
#### Supported Redis Data Types | ||
- Hash | ||
- JSON (soon) | ||
|
||
### Install | ||
Install the Python requirements listed in `requirements.txt`. | ||
|
||
```bash | ||
$ python main.py -h {redis-host} -p {redis-port} -a {redis-password} | ||
``` | ||
git clone https://github.com/RedisVentures/data-loader.git | ||
cd redisvl | ||
pip install . | ||
``` | ||
|
||
### Creating Input Data | ||
#### Pandas DataFrame | ||
|
||
more to come, see tests and sample-data for usage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
import os | ||
import pytest | ||
|
||
from redisvl.utils.connection import get_async_redis_connection | ||
|
||
HOST = os.environ.get("REDIS_HOST", "localhost") | ||
PORT = os.environ.get("REDIS_PORT", 6379) | ||
USER = os.environ.get("REDIS_USER", "default") | ||
PASS = os.environ.get("REDIS_PASSWORD", "") | ||
|
||
@pytest.fixture | ||
def async_redis(): | ||
return get_async_redis_connection(HOST, PORT, PASS) |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should back link to our docs probably!