-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Make numpy
and pandas
optional for ~7 times smaller deps
#153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy
and pandas
optional dependenciesnumpy
and pandas
optional for ~7 times smaller dependencies
numpy
and pandas
optional for ~7 times smaller dependenciesnumpy
and pandas
optional for ~7 times smaller deps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jakubroztocil!
from openai.datalib import numpy as np | ||
from openai.datalib import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should call assert_has_numpy
and assert_has_pandas
in each function these modules are used so that it's very clear to users what to do to fix the issue (rather than getting a generic 'NoneType' object has no attribute
Python exception).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The embeddings_utis.py
file is not imported from anywhere and it’s the only module that imports sklearn
and other libraries listed in the openai[embeddings]
extra. I couldn’t find any docs, but its usage implies pip install openai[embeddings]
(which now also ensures numpy/pandas/etc.), so the experience of using embeddings_utils.py
should be unchanged.
It could be improved, though. I think each optional extra — embeddings
, wandb
, and the new datalib
— would deserve mention in the README. I’ll add a section on the new one, and if you can give me some context on the other two, I’ll be happy to mention them too.
I wasn't sure whether you’d be interested in the PR, but it looks like you are, so I’ll polish it a bit: I’m thinking maybe throwing an ImportError
instead of just Exception
from the assert_has_*
functions, ensuring the error messages are clear, etc.
It’s to a degree a backward-incompatible change (for existing users who don’t install openai[embeddings]
and hit this line or use read_any_format()
via the CLI ), so it might also be worth bumping the major version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you're right this is an embeddings file so it will have the right dependencies.
Regarding the backward-incompatibility, yes it's unfortunate but personally I think it's probably ok as long as the error is clear and explains how to resolve the problem. Also the line in read_any_format
is specific to embeddings so it's fine to assume that the embedding deps were installed.
See #124 for some historical context too about how deps have been handled too.
@ddeville I’ve added a new subsection, “Optional dependencies,” under “Installation.” I also tweaked the errors and instructions. This is what the user gets when trying to use a feature that needs one of the libraries:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thank you so much!
@@ -25,6 +25,26 @@ Install from source with: | |||
python setup.py install | |||
``` | |||
|
|||
### Optional dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice
@jakubroztocil Nice work! I saw this PR via your blog post. I'm sure you're aware of this, but thought it might help anyone else who lands here to point it out:
With AWS Lambda supporting container images, it's fairly trivial to deploy heavy libraries and large ML models to run in Lambda with little-to-no impact on performance (other than the initial pull from ECR after a fresh deployment.) Also has a nice side-benefit of making it easier to test the Lambda locally in a similar runtime environment. https://docs.aws.amazon.com/lambda/latest/dg/images-create.html (I realize it probably sounds like it, but no, I don't work for AWS. Just a Lambda & OpenAI fanboi. 😛) |
best pr i've read the whole day. amazing work guys! |
) * Make `numpy` and `pandas` optional dependencies * Cleanup * Cleanup * Cleanup * Cleanup * Cleanup * Cleanup * Move `openpyxl` to `datalib` extras * Improve errors and instructions * Add “Optional dependencies” to README * Polish README.md * Polish README.md Co-authored-by: hallacy <[email protected]>
This PR makes data libraries like
numpy
andpandas
optional dependencies. These libraries add up to 146MB, which makes it challenging to deploy applications using this library in environments with code size constraints, such as AWS Lambda.Since the primary use case of this library (talking to the OpenAI API) doesn’t generally require data libraries, it’s safe to make them optional. The rare case when the data libraries are needed in the API client is handled through assertions with instructive error messages.
Requirements before
Installing
openai-python
requires thenumpy
,pandas
, andopenpyxl
data libraries that add up to 146MB:Requirements after
Installing
openai-python
doesn’t require the data libraries by default, resulting in ~7 times smaller aggregate size of dependencies:Data libraries can be installed manually using the new
datalib
extras, if needed:And they are now also included in the existing
embeddings
andwantdb
extras: