-
-
Notifications
You must be signed in to change notification settings - Fork 359
reduce memory usage of pathological packs #852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…much. (#851) Previously it would take a buffer from the free-list, copy data into it, and when exceeding the capacity loose it entirely. Now the freelist is handled correctly.
1 task
Previously when traversing a pack it could appear to hang as checks were only performed on chunk or base (of a delta-tree) level. Now interrupt checks are performed more often to stop all work much quicker.
Previously, the 64 slot big LRU cache for pack deltas didn't use any memory limit which could lead to memory exhaustion in the face of untypical, large objects. Now we add a generous default limit to do *better* in such situations. It's worth noting though that that even without any cache, the working set of buffers to do delta resolution takes considerable memory, despite trying to keep it minimal. Note that for bigger objects, the cache is now not used at all, which probably leads to terrible performance as not even the base object can be cached.
#851) This change will put more of the delta-chain into the cache which possibly leads to increased chances of cache-hits if objects aren't queried in random order, but in pack-offset order. Note that in general, it tends to be faster to not use any cache at all. This change was pruned back right away as the major difference to git, which does it by storing every object of the chain in the cache, is that we don't share the cache among threads. This leaves a much smaller per-thread cache size which really is a problem if the objects are large. So instead of slowing pack access down by trying it, with the default cache being unsuitable as it would evict all the time due to memory overruns, we do nothing here and rather improve the performance when dealing with pathological cases during pack traversal.
…ry limits. (#851) Previously the 64 slot LRU cache didn't have any limit, now one is implemented that defaults to about 96MB.
2da31f6
to
6d30c27
Compare
Threads started for working on an entry in a slice can now see the amount of threads left for use (and manipulate that variable) which effectively allows them to implement their own parallelization on top of the current one. This is useful if there is there is very imbalanced work within the slice itself. While at it, we also make consumer functions mutable as they exsit per thread.
Previously we would use `resize(new_len, 0)` to resize buffers, even though these values would then be overwritten (or the buffer isn't available). Now we use `set_len(new_len)` after calling `reserve` to do the same, but safe a memset.
1f34844
to
60dc8b4
Compare
It's an object-safe version of the `Progress` trait.
When delta-trees are unbalanced, in pathological cases it's possible that that one thread ends up with more than half of the work. In this case it's required that it manages to spawn its own threads to parallelize the work it has.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When resolving packs, currently no matter which algorithm is used, memory usage can skyrocket
in pathological cases where the delta-chain list is very deep, along with big objects in the size of
~100MB (or more).
This patch series aims to reduce memory usage and improve performance, as thread-starvation will also
happen here.
Tasks
is there a way to get the cache hit-rate up?- no way, except for having a shared cache which really isn't what we want. The individual thread's cache is too small with objects this large. If there was a way to try to hold base objects longer, maybe this could help in some cases, but how to decide which base objects are worth holding?if the alternative algorithm performs better, make it available for clones in general and expose it on the commandline for- it doesn't perform better as the cache hit rate is too low. But is there a way to get a better hit rate? Not reallygix clone
RawProgress
TODO
s anddbg!
statements