-
Notifications
You must be signed in to change notification settings - Fork 47
Full Fledged CLI and Library #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
While a generic script is useful, there was alot of opportunity to stop duplicating the same functions we've written many times. split the functions in the script out into modules and introduce the library structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
||
The project is brand new and will undergo improvements over time. | ||
A CLI and Library to help with loading data into Redis specifically for | ||
usage with RediSearch and Redis Vector Search capabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should back link to our docs probably!
redisvl/cli/load.py
Outdated
|
||
# Create Redis Connection | ||
logger.info(f"Connecting to {args.host}:{str(args.port)}") | ||
redis_conn = get_async_redis_connection(args.host, args.port, args.password) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good for us to also support env vars too (eventually)
The expectation for saved datasets has now been changed to require that a column in a pickled dataframe have byte vectors already created. This removes the need for the user to specify a column for the vector field. The goal is to eventually have helper functions in the library to make this easy. Lastly, a bunch of project helpers and python project standard items were added.
VoyageAI vectorizer and reranker
Refactor interfaces related to the Redis client in AsyncRedisIndex (and possibly others). ## Constraints - We want to initialize the Redis connection lazily, not at object instantiation or module import. - Python doesn't have async properties, so properties are limited to sync function calls - Therefore, we don't want properties to be part of the interface - For functions we "remove," we'll leave them in place with a deprecation warning - Remaining backwards-compatible until the next major release ## The Design Today - Can create a client and set it using `index.connect()` - Pass in a `redis_url` - Calls `index.set_client()` - Can pass in a client and set it with `index.set_client()` - Pass in a client instance - **Can't** pass in a client instance with `__init__()` - Or anything URL, etc. - Can create a client with `index.from_existing()` - Pass it a `redis_url` - Calls `index.set_client()` - **Can't** use index as an async context manager - Requires explicit resource handling, `atexit` handler - But this is broken - The `setup_async_redis()` decorator creates a function wrapper that calls `validate_async_redis()` with every `set_client()` call - The `RedisConnectionFactory.connect()` returns incompatible types (sync, async `Redis`) - Sync vs. async depends on a parameter ## The Design After Refactor - **Can** use as an async context manager - Either disconnect the index manually with `disconnect()` or use it as a context manager - `async with Index()...` will `disconnect()` after you exit the context block - Lazily instantiate client with `self._get_client()` method ("private") - Remove `set_client()` public interface if possible - Lazily instantiate client - Remove `connect()` public interface if possible - Lazily instantiate client - Leave `from_existing()` - Leave `client()` property, now just returns `._redis_client` and can be None - Call `validate_async_redis()` when setting a new client instance for the first time - But make this an internal check rather than attached to public interface - Allow `redis_url`, `redis_kwargs`, or `redis_client` in `__init__()` ### Examples Post-Refactor ```python #1: New instance with redis URL index = AsyncRedisIndex(redis_url="...") #2: New instance with existing client index = AsyncRedisIndex(redis_client=client) #3: New instance with `from_existing` index = AsyncRedisIndex.from_existing(..., redis_url="...") #4 Passing both a client and a URL is isallowed try: index = AsyncRedisIndex(redis_client=client, redis_url="...") except: pass else: raise RuntimeError("Should have raised!") # The client is lazily connected here, and we send telemetry. await index.something_using_redis() await index.something_else_using_redis() # Close the client. await index.close() async with AsyncRedisIndex(redis_url="...") as index: # Same story: client is lazily created now await index.something_using_redis() await index.something_else_using_redis() # Client is automatically closed # __enter__ is reentrant async with index: # Lazily opens a new client connection. await index.a_third_thing_using_redis() # Client is automatically closed ``` --------- Co-authored-by: Tyler Hutcherson <[email protected]>
Description
While a generic script is useful, there was alot of opportunity to stop duplicating the same functions we've written many times.
The goal of this is to be able to use this library across our projects and use the cli where and when needed to reduce friction.
Schema
This also adds the ability to create flexible custom schema based on a yaml file. these fields largely follow what the
redis-py
library expects.Here is a sample
See the new README for more details.