Faster cache retrival for large datasets #86

MrThearMan · 2024-03-22T18:05:25Z

Description

When dealing with large datasets (>50 000 rows), its aparrent that the current mechanism for fetching cached data is not very performant. See this flamegraph (pictured below) for a query for 200 000 rows, where each row has a single child through a foreign key. The actual database query takes ~10% of the request time, while fetcing data from cache is ~60%. In this 60%, constructing the queryset.filter() alone is ~30%, while the compiler takes another ~15%, and the actual fetching from the cache is only ~10%.

The current mechanism for caching needs to run the compiler in order to create a unique cache key for a given model row based on the field selections in the GraphQL query, since the it's possible that the same row could appear in the same request with different field selections, and therefore we can't simply use its primary key as the cache keys.

One option could be to have some sort of cache merging stategy, where rows with the same primary keys would be merged to the cache, and then we could simply use the primary keys. How this would work with prefetched and related fields is a mystery though.

Or maybe there is some strategy for storing multiple rows for the same primary key in a way that wouldn't need to know so much about the query? Or maybe there is a way to rely on the order things are inserted to the cache?

This is a longer terms issue, but if you are reading this, suggestions are welcome!

Motivation

When fetching larger datasets without pagination, fetching from the query can become a bottleneck. Looking at the measurements, this could be made ~2x faster by faster cache retrieval.

The text was updated successfully, but these errors were encountered:

MrThearMan · 2024-04-03T16:40:14Z

After looking at this a little bit closer, I noticed that the entire caching mechanism is redudnant, and was there just for one case with related to-one fields. I was able to remove the mechanism, which of course removes this issues. Here is a link to the after-flamegraph (pictured below).

Released on 0.5.0.

vade · 2024-07-06T16:39:06Z

BTW Thanks for all your work on the optimizer! This project is awesome :)

MrThearMan added the enhancement New feature or request label Mar 22, 2024

MrThearMan self-assigned this Mar 22, 2024

MrThearMan pushed a commit that referenced this issue Mar 27, 2024

Delay filtering in optimize_single for performance, see #86

15ede5d

MrThearMan closed this as completed Apr 3, 2024

vade mentioned this issue Jul 6, 2024

Performance issues with large data sets graphql-python/graphene#268

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster cache retrival for large datasets #86

Faster cache retrival for large datasets #86

MrThearMan commented Mar 22, 2024 •

edited

Loading

MrThearMan commented Apr 3, 2024

Uh oh!

vade commented Jul 6, 2024

Uh oh!

Faster cache retrival for large datasets #86

Faster cache retrival for large datasets #86

Comments

MrThearMan commented Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation

MrThearMan commented Apr 3, 2024

Uh oh!

vade commented Jul 6, 2024

Uh oh!

MrThearMan commented Mar 22, 2024 •

edited

Loading