You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When dealing with large datasets (>50 000 rows), its aparrent that the current mechanism for fetching cached data is not very performant. See this flamegraph (pictured below) for a query for 200 000 rows, where each row has a single child through a foreign key. The actual database query takes ~10% of the request time, while fetcing data from cache is ~60%. In this 60%, constructing the queryset.filter() alone is ~30%, while the compiler takes another ~15%, and the actual fetching from the cache is only ~10%.
The current mechanism for caching needs to run the compiler in order to create a unique cache key for a given model row based on the field selections in the GraphQL query, since the it's possible that the same row could appear in the same request with different field selections, and therefore we can't simply use its primary key as the cache keys.
One option could be to have some sort of cache merging stategy, where rows with the same primary keys would be merged to the cache, and then we could simply use the primary keys. How this would work with prefetched and related fields is a mystery though.
Or maybe there is some strategy for storing multiple rows for the same primary key in a way that wouldn't need to know so much about the query? Or maybe there is a way to rely on the order things are inserted to the cache?
This is a longer terms issue, but if you are reading this, suggestions are welcome!
Motivation
When fetching larger datasets without pagination, fetching from the query can become a bottleneck. Looking at the measurements, this could be made ~2x faster by faster cache retrieval.
The text was updated successfully, but these errors were encountered:
After looking at this a little bit closer, I noticed that the entire caching mechanism is redudnant, and was there just for one case with related to-one fields. I was able to remove the mechanism, which of course removes this issues. Here is a link to the after-flamegraph (pictured below).
Uh oh!
There was an error while loading. Please reload this page.
Description
When dealing with large datasets (>50 000 rows), its aparrent that the current mechanism for fetching cached data is not very performant. See this flamegraph (pictured below) for a query for 200 000 rows, where each row has a single child through a foreign key. The actual database query takes ~10% of the request time, while fetcing data from cache is ~60%. In this 60%, constructing the
queryset.filter()
alone is ~30%, while the compiler takes another ~15%, and the actual fetching from the cache is only ~10%.The current mechanism for caching needs to run the compiler in order to create a unique cache key for a given model row based on the field selections in the GraphQL query, since the it's possible that the same row could appear in the same request with different field selections, and therefore we can't simply use its primary key as the cache keys.
One option could be to have some sort of cache merging stategy, where rows with the same primary keys would be merged to the cache, and then we could simply use the primary keys. How this would work with prefetched and related fields is a mystery though.
Or maybe there is some strategy for storing multiple rows for the same primary key in a way that wouldn't need to know so much about the query? Or maybe there is a way to rely on the order things are inserted to the cache?
This is a longer terms issue, but if you are reading this, suggestions are welcome!
Motivation
When fetching larger datasets without pagination, fetching from the query can become a bottleneck. Looking at the measurements, this could be made ~2x faster by faster cache retrieval.
The text was updated successfully, but these errors were encountered: