-
Notifications
You must be signed in to change notification settings - Fork 212
upgrade crates-index-diff & git-repository #1935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration itself looks good to me even though I can't tell what's best to do with the new Change
variants.
There are a few more improvements to consider here.
- set the
GITOXIDE_PACK_CACHE_MEMORY=256MB
environment variable to increase the maximum cache to use for pack entries, which typically improves diffing performance by a factor of 4. Please note that the number given here is a rough guess and the performance improvement flatlines quickly once the optimal value is reached. Setting it to any value higher than a couple of megabytes is probably going to be worth an improvement already. - if you can, set the
max-performance
feature to get a 2.5x performance boost. Yes, 10x is what's possible if both of these are used. An example both in action is the baseline test run on CI. Note that it uses an overly large value for the pack cache which is me not wanting to try to find an optimum that is lower. - You can use
peek_changes_ordered()
to single-step through the history that has accumulated and thus maintain the order of user-generated changes, leading to a fair queuing order. I benchmarked it and typical changes of two days (~1500) take about 20s to obtain (90 changes/s), thus polling every 5 minutes or so wouldn't take more than a second of compute time given the amount of changes in that duration is much lower than 1500.
I am particularly excited about the fair/correct ordering of changes and would love that to land as part of this PR as well. Maybe it's something for a follow-up one though as it's not required to upgrade.
Thank you for checking this out! @Byron About your points: Still I changed to using Also I added the |
Among other things this adds dependencies on Some downstream crates where performance isn't critical that don't want to introduce additional runtime dependencies or that want to support nieche targets might therefore want to opt out. |
I agree, I wouldn't expect performance issues no matter which configuration is chosen.
I second @pascalkuthe, and am happy he joined in as my explanation would have been more hand-wavy for sure 😁.
Thanks for starting this conversation. It's one of these trade-offs where we ask what's more important - optimal performance but it might not work at all for some, or reduced performance but best compatibility. Since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docs.rs side of things looks good to me.
set the GITOXIDE_PACK_CACHE_MEMORY=256MB environment variable to increase the maximum cache to use for pack entries, which typically improves diffing performance by a factor of 4. Please note that the number given here is a rough guess and the performance improvement flatlines quickly once the optimal value is reached. Setting it to any value higher than a couple of megabytes is probably going to be worth an improvement already.
if you can, set the max-performance feature to get a 2.5x performance boost. Yes, 10x is what's possible if both of these are used. An example both in action is the baseline test run on CI. Note that it uses an overly large value for the pack cache which is me not wanting to try to find an optimum that is lower.
👍 seems good - we have quite a lot of memory slack currently so I'm not too worried about the memory limit being too high, 256 MB is low enough it doesn't affect whether we'll have production issues.
I do worry that we're landing this without tests, since it sounds like there were behavior changes on the crates-index-diff side. ideally we would land those before the update, but I know they're a pain to write ... @Byron does crates-index-diff have its own internal tests we can trust?
By now I trust the test suite, it's as good as I could make it. It was the reason we could rewrite the diffing engine so painlessly. Now there is also new baseline tests which validates that no matter how you step through the history to obtain diffs, you will end up at the same state as iterating through all crates using |
amazing, thank you! ❤️ |
This updates the crates-index-diff crates, including thread-names for all the threads that crates-index-diff spawns, to get visibility into our CPU-load problem.
related release notes:
Also:
AddedAndYanked
change as currently justAdded