Skip to content

Full Fledged CLI and Library #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 11, 2022
Merged

Full Fledged CLI and Library #1

merged 10 commits into from
Nov 11, 2022

Conversation

Spartee
Copy link
Contributor

@Spartee Spartee commented Nov 9, 2022

Description

While a generic script is useful, there was alot of opportunity to stop duplicating the same functions we've written many times.

The goal of this is to be able to use this library across our projects and use the cli where and when needed to reduce friction.

Schema

This also adds the ability to create flexible custom schema based on a yaml file. these fields largely follow what the redis-py library expects.

Here is a sample

index:
  name: sample
  storage_type: hash
  prefix: vector

fields:
  tag:
    categories:
      separator: "|"
    year:
      separator: "|"
  vector:
    vector:
      datatype: "float32"
      algorithm: "flat"
      dims: 768
      distance_metric: "cosine"

See the new README for more details.

Sam Partee added 7 commits November 8, 2022 17:37
While a generic script is useful, there was
alot of opportunity to stop duplicating the
same functions we've written many times.

split the functions in the script out into
modules and introduce the library structure
@Spartee Spartee added the enhancement New feature or request label Nov 9, 2022
Copy link
Collaborator

@tylerhutcherson tylerhutcherson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


The project is brand new and will undergo improvements over time.
A CLI and Library to help with loading data into Redis specifically for
usage with RediSearch and Redis Vector Search capabilities
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should back link to our docs probably!


# Create Redis Connection
logger.info(f"Connecting to {args.host}:{str(args.port)}")
redis_conn = get_async_redis_connection(args.host, args.port, args.password)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good for us to also support env vars too (eventually)

Sam Partee added 3 commits November 10, 2022 22:46
The expectation for saved datasets has now been
changed to require that a column in a pickled dataframe
have byte vectors already created. This removes
the need for the user to specify a column for the
vector field.

The goal is to eventually have helper functions
in the library to make this easy.

Lastly, a bunch of project helpers and python project
standard items were added.
@Spartee Spartee merged commit e3a97e9 into main Nov 11, 2022
@Spartee Spartee deleted the cli branch July 24, 2023 22:18
tylerhutcherson pushed a commit that referenced this pull request Sep 21, 2024
VoyageAI vectorizer and reranker
tylerhutcherson added a commit that referenced this pull request Feb 20, 2025
Refactor interfaces related to the Redis client in AsyncRedisIndex (and
possibly others).

## Constraints
- We want to initialize the Redis connection lazily, not at object
instantiation or module import.
- Python doesn't have async properties, so properties are limited to
sync function calls
    - Therefore, we don't want properties to be part of the interface
- For functions we "remove," we'll leave them in place with a
deprecation warning
    - Remaining backwards-compatible until the next major release

## The Design Today
- Can create a client and set it using `index.connect()`
    - Pass in a `redis_url`
    - Calls `index.set_client()`
- Can pass in a client and set it with `index.set_client()`
    - Pass in a client instance
- **Can't** pass in a client instance with `__init__()`
    - Or anything URL, etc.
- Can create a client with `index.from_existing()`
    - Pass it a `redis_url`
    - Calls `index.set_client()`
- **Can't** use index as an async context manager
    - Requires explicit resource handling, `atexit` handler
    - But this is broken
- The `setup_async_redis()` decorator creates a function wrapper that
calls `validate_async_redis()` with every `set_client()` call
- The `RedisConnectionFactory.connect()` returns incompatible types
(sync, async `Redis`)
    - Sync vs. async depends on a parameter

## The Design After Refactor
- **Can** use as an async context manager
- Either disconnect the index manually with `disconnect()` or use it as
a context manager
- `async with Index()...` will `disconnect()` after you exit the context
block
- Lazily instantiate client with `self._get_client()` method ("private")
- Remove `set_client()` public interface if possible
    - Lazily instantiate client
- Remove `connect()` public interface if possible
    - Lazily instantiate client
- Leave `from_existing()`
- Leave `client()` property, now just returns `._redis_client` and can
be None
- Call `validate_async_redis()` when setting a new client instance for
the first time
- But make this an internal check rather than attached to public
interface
- Allow `redis_url`, `redis_kwargs`, or `redis_client` in `__init__()`

### Examples Post-Refactor
```python
#1: New instance with redis URL
index = AsyncRedisIndex(redis_url="...")
#2: New instance with existing client
index = AsyncRedisIndex(redis_client=client)
#3: New instance with `from_existing`
index = AsyncRedisIndex.from_existing(..., redis_url="...")

#4 Passing both a client and a URL is isallowed
try:
    index = AsyncRedisIndex(redis_client=client, redis_url="...")
except:
    pass
else:
    raise RuntimeError("Should have raised!")

# The client is lazily connected here, and we send telemetry.
await index.something_using_redis()
await index.something_else_using_redis()
# Close the client.
await index.close()

async with AsyncRedisIndex(redis_url="...") as index:
    # Same story: client is lazily created now
    await index.something_using_redis()
    await index.something_else_using_redis()
    # Client is automatically closed

# __enter__ is reentrant
async with index:
    # Lazily opens a new client connection.
    await index.a_third_thing_using_redis()
    # Client is automatically closed
```

---------

Co-authored-by: Tyler Hutcherson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants