Skip to content

Commit 220ee02

Browse files
authored
Add the experimental git survey command to analyze (large) local repositories (#667)
This command is inspired by [`git sizer`](https://github.com/github/git-sizer), having the advantage of being much closer to the internals of Git. The intention is to provide a built-in command that can be used to analyze large repositories for performance and scaling problems, for growth over time, and to correlate with other measurements (in particular with Trace2 data collected e.g. via https://github.com/git-ecosystem/trace2receiver/).
2 parents 648c5a2 + 45c981e commit 220ee02

File tree

11 files changed

+3215
-0
lines changed

11 files changed

+3215
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,7 @@
165165
/git-submodule
166166
/git-submodule--helper
167167
/git-subtree
168+
/git-survey
168169
/git-svn
169170
/git-switch
170171
/git-symbolic-ref

Documentation/config.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -531,6 +531,8 @@ include::config/status.txt[]
531531

532532
include::config/submodule.txt[]
533533

534+
include::config/survey.txt[]
535+
534536
include::config/tag.txt[]
535537

536538
include::config/tar.txt[]

Documentation/config/survey.txt

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
survey.namerev::
2+
Boolean to show/hide `git name-rev` information for
3+
each reported commit and the containing commit of each
4+
reported tree and blob.
5+
6+
survey.progress::
7+
Boolean to show/hide progress information. Defaults to
8+
true when interactive (stderr is bound to a TTY).
9+
10+
survey.showBlobSizes::
11+
A non-negative integer value. Requests details on the <n>
12+
largest file blobs by size in bytes. Provides a default
13+
value for `--blob-sizes=<n>` in linkgit:git-survey[1].
14+
15+
survey.showCommitParents::
16+
A non-negative integer value. Requests details on the <n>
17+
commits with the most number of parents. Provides a default
18+
value for `--commit-parents=<n>` in linkgit:git-survey[1].
19+
20+
survey.showCommitSizes::
21+
A non-negative integer value. Requests details on the <n>
22+
largest commits by size in bytes. Generally, these are the
23+
commits with the largest commit messages. Provides a default
24+
value for `--commit-sizes=<n>` in linkgit:git-survey[1].
25+
26+
survey.showTreeEntries::
27+
A non-negative integer value. Requests details on the <n>
28+
trees (directories) with the most number of entries (files
29+
and subdirectories). Provides a default value for
30+
`--tree-entries=<n>` in linkgit:git-survey[1].
31+
32+
survey.showTreeSizes::
33+
A non-negative integer value. Requests details on the <n>
34+
largest trees (directories) by size in bytes. This will
35+
set will usually be equal to the `survey.showTreeEntries`
36+
set, but may be skewed by very long file or subdirectory
37+
entry names. Provides a default value for
38+
`--tree-sizes=<n>` in linkgit:git-survey[1].
39+
40+
survey.verbose::
41+
Boolean to show/hide verbose output. Default to false.

Documentation/git-survey.txt

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
git-survey(1)
2+
=============
3+
4+
NAME
5+
----
6+
git-survey - EXPERIMENTAL: Measure various repository dimensions of scale
7+
8+
SYNOPSIS
9+
--------
10+
[verse]
11+
(EXPERIMENTAL!) `git survey` <options>
12+
13+
DESCRIPTION
14+
-----------
15+
16+
Survey the repository and measure various dimensions of scale.
17+
18+
As repositories grow to "monorepo" size, certain data shapes can cause
19+
performance problems. `git-survey` attempts to measure and report on
20+
known problem areas.
21+
22+
Ref Selection and Reachable Objects
23+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24+
25+
In this first analysis phase, `git survey` will iterate over the set of
26+
requested branches, tags, and other refs and treewalk over all of the
27+
reachable commits, trees, and blobs and generate various statistics.
28+
29+
OPTIONS
30+
-------
31+
32+
--progress::
33+
Show progress. This is automatically enabled when interactive.
34+
35+
--json::
36+
Print results in JSON rather than in a human-friendly format.
37+
38+
--[no-]name-rev::
39+
Print `git name-rev` output for each commit, tree, and blob.
40+
Defaults to true.
41+
42+
Ref Selection
43+
~~~~~~~~~~~~~
44+
45+
The following options control the set of refs that `git survey` will examine.
46+
By default, `git survey` will look at tags, local branches, and remote refs.
47+
If any of the following options are given, the default set is cleared and
48+
only refs for the given options are added.
49+
50+
--all-refs::
51+
Use all refs. This includes local branches, tags, remote refs,
52+
notes, and stashes. This option overrides all of the following.
53+
54+
--branches::
55+
Add local branches (`refs/heads/`) to the set.
56+
57+
--tags::
58+
Add tags (`refs/tags/`) to the set.
59+
60+
--remotes::
61+
Add remote branches (`refs/remote/`) to the set.
62+
63+
--detached::
64+
Add HEAD to the set.
65+
66+
--other::
67+
Add notes (`refs/notes/`) and stashes (`refs/stash/`) to the set.
68+
69+
Large Item Selection
70+
~~~~~~~~~~~~~~~~~~~~
71+
72+
The following options control the optional display of large items under
73+
various dimensions of scale. The OID of the largest `n` objects will be
74+
displayed in reverse sorted order. For each, `n` defaults to 10.
75+
76+
--commit-parents::
77+
Shows the OIDs of the commits with the most parent commits.
78+
79+
--commit-sizes::
80+
Shows the OIDs of the largest commits by size in bytes. This is
81+
usually the ones with the largest commit messages.
82+
83+
--tree-entries::
84+
Shows the OIDs of the trees with the most number of entries. These
85+
are the directories with the most number of files or subdirectories.
86+
87+
--tree-sizes::
88+
Shows the OIDs of the largest trees by size in bytes. This set
89+
will usually be the same as the vector of number of entries unless
90+
skewed by very long entry names.
91+
92+
--blob-sizes::
93+
Shows the OIDs of the largest blobs by size in bytes.
94+
95+
OUTPUT
96+
------
97+
98+
By default, `git survey` will print information about the repository in a
99+
human-readable format that includes overviews and tables.
100+
101+
CONFIGURATION
102+
-------------
103+
104+
include::config/survey.txt[]
105+
106+
GIT
107+
---
108+
Part of the linkgit:git[1] suite

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1332,6 +1332,7 @@ BUILTIN_OBJS += builtin/sparse-checkout.o
13321332
BUILTIN_OBJS += builtin/stash.o
13331333
BUILTIN_OBJS += builtin/stripspace.o
13341334
BUILTIN_OBJS += builtin/submodule--helper.o
1335+
BUILTIN_OBJS += builtin/survey.o
13351336
BUILTIN_OBJS += builtin/symbolic-ref.o
13361337
BUILTIN_OBJS += builtin/tag.o
13371338
BUILTIN_OBJS += builtin/unpack-file.o

builtin.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,7 @@ int cmd_status(int argc, const char **argv, const char *prefix);
229229
int cmd_stash(int argc, const char **argv, const char *prefix);
230230
int cmd_stripspace(int argc, const char **argv, const char *prefix);
231231
int cmd_submodule__helper(int argc, const char **argv, const char *prefix);
232+
int cmd_survey(int argc, const char **argv, const char *prefix);
232233
int cmd_switch(int argc, const char **argv, const char *prefix);
233234
int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
234235
int cmd_tag(int argc, const char **argv, const char *prefix);

0 commit comments

Comments
 (0)