-
-
Notifications
You must be signed in to change notification settings - Fork 177
Description
TL;DR
I've done some experiments with a managed Git backend (instead of libgit2).
In all scenarios, performance is better with the managed backend. In the best case, throughput increases more than tenfold.
The prototype source code is available here: https://github.com/qmfrederik/quamotion.gitversioning.
There are other managed Git backends (GitNet, GitSharp, NGit) available.
| Backend | Repository | Mean | Error | StdDev |
|---|---|---|---|---|
| managed | Cuemon | 6.585 ms | 0.0977 ms | 0.0914 ms |
| libgit2 | Cuemon | 12.946 ms | 0.1425 ms | 0.1333 ms |
| managed | SuperSocket | 7.583 ms | 0.1067 ms | 0.0998 ms |
| libgit2 | SuperSocket | 61.295 ms | 0.8168 ms | 0.7640 ms |
| managed | local | 1.938 ms | 0.0161 ms | 0.0151 ms |
| libgit2 | local | 24.805 ms | 0.3896 ms | 0.3253 ms |
| managed | xunit | 57.891 ms | 0.6076 ms | 0.5386 ms |
| libgit2 | xunit | 62.054 ms | 0.9426 ms | 0.7871 ms |
Why
NerdBank.GitVersioning uses LibGit2Sharp as its back-end. It comes with a couple of drawbacks:
- Performance - libgit2 is a general-purpose library and perhaps not geared towards read-only scenarios like nbgv, P/Invoke overhead,... .
- Maintainability - it looks like development on LibGit2Sharp has stalled a bit (last commit is from April this year)
- Portability - there's a very long list of issues related to nbgv not on various Linux distributions
What
My goal was to implement a minimal viable Git backend which you can use to calculate the Git height. That's all.
This includes
- Read commits, trees and blobs from a local Git repository
- Support for 'packed' Git repositories (i.e. what you get when you call
git gcor after a fresh Git clone) and deltafied objects - In-memory caching
I've also applied some of the suggestions related to performance made by @filipnavara and @djluck, such as
- Using tree IDs to check whether files have changed, instead of parsing the full
version.jsoncontents - Using the .NET Core JSON API instead of Newtonsoft.Json
I've not yet attempted / further exploration
- Walking the Git commit graph (freshly cloned GitHub repositories do not appear to have a Git graph file)
- The git tree and
version.jsonobjects are fully loaded into memory before parsing them; we can probably further improve performance by only reading the data we actually need.
Validation & lessons learned
- It can be done I ran tests on three popular GitHub repositories which use nbgv, and Git height calculation seems to work.
- Keep it simple Most of the GitHub repositories which use nbgv do so in a very simple way - a standard
version.jsonfile in the repository root, no path filters,... . nbgv has a lot of configuration knobs, which may impact performance. - Packed repositories have different performance characteristics than unpacked repositories. It turns out that performance of freshly cloned GitHub repositories is very different from local repositories, because all files are stored in git packs. It took some time to get the performance on par with libgit2 for repos with a large git height (like xunit); but it looks good now.
- Room for improvement I'm sure there's still room for improvement if you want to squeeze out extra performance
What's next
Obviously, that's up to the maintainers of this repository. Personally, I've spent too much time on getting LibGit2Sharp working on the platforms I care about (Visual Studio Code on Ubuntu, to name one) and will want to move to a purely managed build task for calculating Git height. My preference is to keep using nbgv, so I can take this further and open a PR if there's interest in getting it merged.
@filipnavara mentioned he has a repository with a very large git height (> 1500 IIRC). It's be interesting to run the benchmarks on that repository, too, and see how the managed implementation holds up (I'm guessing running benchmarks will uncover some bugs, too), both in an 'unpacked' and a packed state of the repository.