Skip to content

reduce memory usage of pathological packs #852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 16, 2023
Merged

reduce memory usage of pathological packs #852

merged 12 commits into from
May 16, 2023

Conversation

Byron
Copy link
Member

@Byron Byron commented May 14, 2023

When resolving packs, currently no matter which algorithm is used, memory usage can skyrocket
in pathological cases where the delta-chain list is very deep, along with big objects in the size of
~100MB (or more).

This patch series aims to reduce memory usage and improve performance, as thread-starvation will also
happen here.

Tasks

  • make sure we deal with big objects better (should have configurable cut-off for what's a big object and maybe never be cached?)
  • is there a way to get the cache hit-rate up? - no way, except for having a shared cache which really isn't what we want. The individual thread's cache is too small with objects this large. If there was a way to try to hold base objects longer, maybe this could help in some cases, but how to decide which base objects are worth holding?
  • make static pack-cache size limit configurable via configuration flags
  • if the alternative algorithm performs better, make it available for clones in general and expose it on the commandline for gix clone - it doesn't perform better as the cache hit rate is too low. But is there a way to get a better hit rate? Not really
  • let worker threads check for interruptions more often - they seem to check only once a thread is done processing a base which is too slow in this case.
  • additional parallelism in case of starvation
  • make use of RawProgress
  • remove TODOs and dbg! statements

…much. (#851)

Previously it would take a buffer from the free-list, copy data into it, and
when exceeding the capacity loose it entirely.
Now the freelist is handled correctly.
Byron added 4 commits May 14, 2023 20:05
Previously when traversing a pack it could appear to hang as checks
were only performed on chunk or base (of a delta-tree) level.

Now interrupt checks are performed more often to stop all work much quicker.
Previously, the 64 slot big LRU cache for pack deltas didn't
use any memory limit which could lead to memory exhaustion in the
face of untypical, large objects.

Now we add a generous default limit to do *better* in such situations.
It's worth noting though that that even without any cache, the working
set of buffers to do delta resolution takes considerable memory, despite
trying to keep it minimal.

Note that for bigger objects, the cache is now not used at all, which probably
leads to terrible performance as not even the base object can be cached.
#851)

This change will put more of the delta-chain into the cache
which possibly leads to increased chances of cache-hits if objects
aren't queried in random order, but in pack-offset order.

Note that in general, it tends to be faster to not use any cache at all.

This change was pruned back right away as the major difference to git,
which does it by storing every object of the chain in the cache, is that
we don't share the cache among threads. This leaves a much smaller per-thread
cache size which really is a problem if the objects are large.

So instead of slowing pack access down by trying it, with the default cache being
unsuitable as it would evict all the time due to memory overruns, we do nothing
here and rather improve the performance when dealing with pathological cases
during pack traversal.
…ry limits. (#851)

Previously the 64 slot LRU cache didn't have any limit, now one is implemented that
defaults to about 96MB.
@Byron Byron force-pushed the fix-851 branch 3 times, most recently from 2da31f6 to 6d30c27 Compare May 15, 2023 05:56
Byron added 4 commits May 15, 2023 15:37
Threads started for working on an entry in a slice can now see the amount
of threads left for use (and manipulate that variable) which effectively
allows them to implement their own parallelization on top of the current one.

This is useful if there is there is very imbalanced work within the slice itself.

While at it, we also make consumer functions mutable as they exsit per thread.
Previously we would use `resize(new_len, 0)` to resize buffers, even though
these values would then be overwritten (or the buffer isn't available).

Now we use `set_len(new_len)` after calling `reserve` to do the same, but safe
a memset.
@Byron Byron force-pushed the fix-851 branch 3 times, most recently from 1f34844 to 60dc8b4 Compare May 16, 2023 12:45
Byron added 3 commits May 16, 2023 21:01
It's an object-safe version of the `Progress` trait.
When delta-trees are unbalanced, in pathological cases it's possible that that one thread
ends up with more than half of the work. In this case it's required that it manages to
spawn its own threads to parallelize the work it has.
@Byron Byron merged commit 2f275d5 into main May 16, 2023
@Byron Byron deleted the fix-851 branch May 16, 2023 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gitoxide use significantly more memory than git when cloning semisynthetic repos
1 participant