-
Notifications
You must be signed in to change notification settings - Fork 707
A candidate for wr.dynamodb.read_items
#1867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is really fantastic work @a-slice-of-py and yes we would be very keen to include this in the library. May I ask you to create a PR with the above and we can discuss the changes there |
Hi @jaidisido, glad you find it valuable! Regarding the PR:
|
I wouldn't worry too much about that. All our unit tests are triggered when a PR is opened against main anyways. I think the only benefit to running your additional tests locally is that you don't have to wait for our CodeBuild to run all of them (takes 20 min) before you know if they passed, so the feedback loop would be faster. But again optional and we can work on it in the PR bit by bit. We recommend adding the |
* feat: add read_items to dynamodb module (#1867) Co-authored-by: Silvio Lugaro <[email protected]> Co-authored-by: jaidisido <[email protected]> Co-authored-by: Lucas Hanson <[email protected]> Co-authored-by: kukushking <[email protected]>
Now merged and will be available in the next release, thanks again |
DISCLAIMER: I'm opening a new issue even if #895 has a similar scope: I commented on that even if it was already closed, but I wasn't sure that the proposal has got enough visibility, so here I am.
I recently found myself putting some effort in trying to handle reading items from a DynamoDB table and returning a Pandas Dataframe. Basically, I wanted to abstract some complexity away from available Boto3 read actions, and handle once for all the headache of thinking about keys, query, scan, etc.: since I'm pretty happy with the results, I decided to share it here to let you evaluate if the proposed solution can be a good candidate for
wr.dynamodb.read_items
.I am aware of the recent addition of
wr.dynamodb.read_partiql_query
in #1390, as well as its currently issue as reported in #1571, but the below proposed solution does not involve PartiQL: my goal was to avoid as much as possible the risks that come with its usage towards a DynamoDB table, regarding possible translation of a given query to a full scan op (see for example the disclaimer in the docs).Features
I tested all the below-checked features with relatively simple dummy tables, so it probably requires more focused test sessions before possible addition in awswrangler stable codebase.
get_item > batch_get_item > query > scan
(inspiration from here and here)UnprocessedKeys
inbatch_get_item
responsestring
andKey/Attr
fromboto3.dynamodb.conditions
ExpressionAttributeNames
substitutionsallow_full_scan
kwarg which defaults toFalse
columns
kwarg, which corresponds to Boto3ProjectionExpression
consistent
kwarg, which defaults toFalse
max_items_evaluated
kwarg (a kind of anhead()
method for the table!)The last features are unchecked because I considered them out of scope, at least for the moment.
Source code
The below snippet should be auto-consistent, but if its testing become uncomfortable I should be able to temporarly packaging it and publish to pip (I tried to stick to awswrangler best practices as much as I can).
Examples
Reading 5 random items from a table
Strongly-consistent reading of a given partition value from a table
Reading items pairwise-identified by partition and sort values, from a table with a composite primary key
Reading items while retaining only specified attributes, automatically handling possible collision with DynamoDB reserved keywords
Reading all items from a table explicitly allowing full scan
Reading items matching a KeyConditionExpression expressed with boto3.dynamodb.conditions.Key
Same as above, but with KeyConditionExpression as string
Reading items matching a FilterExpression expressed with boto3.dynamodb.conditions.Attr
Same as above, but with FilterExpression as string
Reading items involving an attribute which collides with DynamoDB reserved keywords
Package versions
and Python 3.10.8.
The text was updated successfully, but these errors were encountered: