Skip to content

Faster cache retrival for large datasets #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MrThearMan opened this issue Mar 22, 2024 · 2 comments
Closed

Faster cache retrival for large datasets #86

MrThearMan opened this issue Mar 22, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@MrThearMan
Copy link
Owner

MrThearMan commented Mar 22, 2024

Description

When dealing with large datasets (>50 000 rows), its aparrent that the current mechanism for fetching cached data is not very performant. See this flamegraph (pictured below) for a query for 200 000 rows, where each row has a single child through a foreign key. The actual database query takes ~10% of the request time, while fetcing data from cache is ~60%. In this 60%, constructing the queryset.filter() alone is ~30%, while the compiler takes another ~15%, and the actual fetching from the cache is only ~10%.

The current mechanism for caching needs to run the compiler in order to create a unique cache key for a given model row based on the field selections in the GraphQL query, since the it's possible that the same row could appear in the same request with different field selections, and therefore we can't simply use its primary key as the cache keys.

One option could be to have some sort of cache merging stategy, where rows with the same primary keys would be merged to the cache, and then we could simply use the primary keys. How this would work with prefetched and related fields is a mystery though.

Or maybe there is some strategy for storing multiple rows for the same primary key in a way that wouldn't need to know so much about the query? Or maybe there is a way to rely on the order things are inserted to the cache?

This is a longer terms issue, but if you are reading this, suggestions are welcome!

profile-flame

Motivation

When fetching larger datasets without pagination, fetching from the query can become a bottleneck. Looking at the measurements, this could be made ~2x faster by faster cache retrieval.

@MrThearMan MrThearMan added the enhancement New feature or request label Mar 22, 2024
@MrThearMan MrThearMan self-assigned this Mar 22, 2024
@MrThearMan
Copy link
Owner Author

After looking at this a little bit closer, I noticed that the entire caching mechanism is redudnant, and was there just for one case with related to-one fields. I was able to remove the mechanism, which of course removes this issues. Here is a link to the after-flamegraph (pictured below).

profile

Released on 0.5.0.

@vade
Copy link

vade commented Jul 6, 2024

BTW Thanks for all your work on the optimizer! This project is awesome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants