gix corpus
- an extendable way to run algorithms and record their results for comparison
#858
Open
2 of 3 tasks
Labels
C-tracking-issue
An issue to track to track the progress of multiple PRs or issues
Generally, it maintains information about a corpus of git repositories and writes it into a sqlite database for later data analysis.
The git repositories should be as many of the top-by-stars-and-smaller-than-5GB GitHub repos as can be held by a disk, which was 80K for a 4TB budget, leaving enough space for worktree checkouts as well. Be sure to also get one of these 100GB repos for good measure, by hand.
Initialization
Run commands
benchmarks
that validate critical performance, like opening repositories, or resolving packs.tree::Root
tracing
to record performance data about certain operations, akin to what git does, and store these spans in the database. These spans could be taken verbatim for analysis, ignoring their tree-structure at least at the beginning.Analysis
A few very simple commands to answer questions like
Ingestion Implementation
gix corpus
MVP #897gix corpus
improvements #902Analysis Implementation
Maybe at first we can limit the corpus run to specific repos that we check by hand in the
corpus.db
The text was updated successfully, but these errors were encountered: