Skip to content

Commit 92b1ea6

Browse files
committed
Merge branch 'ds/commit-graph-incremental'
The commits in a repository can be described by multiple commit-graph files now, which allows the commit-graph files to be updated incrementally. * ds/commit-graph-incremental: commit-graph: test verify across alternates commit-graph: normalize commit-graph filenames commit-graph: test --split across alternate without --split commit-graph: test octopus merges with --split commit-graph: clean up chains after flattened write commit-graph: verify chains with --shallow mode commit-graph: create options for split files commit-graph: expire commit-graph files commit-graph: allow cross-alternate chains commit-graph: merge commit-graph chains commit-graph: add --split option to builtin commit-graph: write commit-graph chains commit-graph: rearrange chunk count logic commit-graph: add base graphs chunk commit-graph: load commit-graph chains commit-graph: rename commit_compare to oid_compare commit-graph: prepare for commit-graph chains commit-graph: document commit-graph chains
2 parents 209f075 + 5b15eb3 commit 92b1ea6

10 files changed

+1414
-74
lines changed

Documentation/git-commit-graph.txt

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ SYNOPSIS
1010
--------
1111
[verse]
1212
'git commit-graph read' [--object-dir <dir>]
13-
'git commit-graph verify' [--object-dir <dir>]
13+
'git commit-graph verify' [--object-dir <dir>] [--shallow]
1414
'git commit-graph write' <options> [--object-dir <dir>]
1515

1616

@@ -26,7 +26,7 @@ OPTIONS
2626
Use given directory for the location of packfiles and commit-graph
2727
file. This parameter exists to specify the location of an alternate
2828
that only has the objects directory, not a full `.git` directory. The
29-
commit-graph file is expected to be at `<dir>/info/commit-graph` and
29+
commit-graph file is expected to be in the `<dir>/info` directory and
3030
the packfiles are expected to be in `<dir>/pack`.
3131

3232

@@ -51,6 +51,25 @@ or `--stdin-packs`.)
5151
+
5252
With the `--append` option, include all commits that are present in the
5353
existing commit-graph file.
54+
+
55+
With the `--split` option, write the commit-graph as a chain of multiple
56+
commit-graph files stored in `<dir>/info/commit-graphs`. The new commits
57+
not already in the commit-graph are added in a new "tip" file. This file
58+
is merged with the existing file if the following merge conditions are
59+
met:
60+
+
61+
* If `--size-multiple=<X>` is not specified, let `X` equal 2. If the new
62+
tip file would have `N` commits and the previous tip has `M` commits and
63+
`X` times `N` is greater than `M`, instead merge the two files into a
64+
single file.
65+
+
66+
* If `--max-commits=<M>` is specified with `M` a positive integer, and the
67+
new tip file would have more than `M` commits, then instead merge the new
68+
tip with the previous tip.
69+
+
70+
Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
71+
be the current time. After writing the split commit-graph, delete all
72+
unused commit-graph whose modified times are older than `datetime`.
5473

5574
'read'::
5675

@@ -61,6 +80,9 @@ Used for debugging purposes.
6180

6281
Read the commit-graph file and verify its contents against the object
6382
database. Used to check for corrupted data.
83+
+
84+
With the `--shallow` option, only check the tip commit-graph file in
85+
a chain of split commit-graphs.
6486

6587

6688
EXAMPLES

Documentation/technical/commit-graph-format.txt

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,9 @@ HEADER:
4444

4545
1-byte number (C) of "chunks"
4646

47-
1-byte (reserved for later use)
48-
Current clients should ignore this value.
47+
1-byte number (B) of base commit-graphs
48+
We infer the length (H*B) of the Base Graphs chunk
49+
from this value.
4950

5051
CHUNK LOOKUP:
5152

@@ -92,6 +93,12 @@ CHUNK DATA:
9293
positions for the parents until reaching a value with the most-significant
9394
bit on. The other bits correspond to the position of the last parent.
9495

96+
Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
97+
This list of H-byte hashes describe a set of B commit-graph files that
98+
form a commit-graph chain. The graph position for the ith commit in this
99+
file's OID Lookup chunk is equal to i plus the number of commits in all
100+
base graphs. If B is non-zero, this chunk must exist.
101+
95102
TRAILER:
96103

97104
H-byte HASH-checksum of all of the above.

Documentation/technical/commit-graph.txt

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,197 @@ Design Details
127127
helpful for these clones, anyway. The commit-graph will not be read or
128128
written when shallow commits are present.
129129

130+
Commit Graphs Chains
131+
--------------------
132+
133+
Typically, repos grow with near-constant velocity (commits per day). Over time,
134+
the number of commits added by a fetch operation is much smaller than the
135+
number of commits in the full history. By creating a "chain" of commit-graphs,
136+
we enable fast writes of new commit data without rewriting the entire commit
137+
history -- at least, most of the time.
138+
139+
## File Layout
140+
141+
A commit-graph chain uses multiple files, and we use a fixed naming convention
142+
to organize these files. Each commit-graph file has a name
143+
`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex-
144+
valued hash stored in the footer of that file (which is a hash of the file's
145+
contents before that hash). For a chain of commit-graph files, a plain-text
146+
file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the
147+
hashes for the files in order from "lowest" to "highest".
148+
149+
For example, if the `commit-graph-chain` file contains the lines
150+
151+
```
152+
{hash0}
153+
{hash1}
154+
{hash2}
155+
```
156+
157+
then the commit-graph chain looks like the following diagram:
158+
159+
+-----------------------+
160+
| graph-{hash2}.graph |
161+
+-----------------------+
162+
|
163+
+-----------------------+
164+
| |
165+
| graph-{hash1}.graph |
166+
| |
167+
+-----------------------+
168+
|
169+
+-----------------------+
170+
| |
171+
| |
172+
| |
173+
| graph-{hash0}.graph |
174+
| |
175+
| |
176+
| |
177+
+-----------------------+
178+
179+
Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of
180+
commits in `graph-{hash1}.graph`, and X2 be the number of commits in
181+
`graph-{hash2}.graph`. If a commit appears in position i in `graph-{hash2}.graph`,
182+
then we interpret this as being the commit in position (X0 + X1 + i), and that
183+
will be used as its "graph position". The commits in `graph-{hash2}.graph` use these
184+
positions to refer to their parents, which may be in `graph-{hash1}.graph` or
185+
`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by checking
186+
its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 +
187+
X2).
188+
189+
Each commit-graph file (except the base, `graph-{hash0}.graph`) contains data
190+
specifying the hashes of all files in the lower layers. In the above example,
191+
`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains
192+
`{hash0}` and `{hash1}`.
193+
194+
## Merging commit-graph files
195+
196+
If we only added a new commit-graph file on every write, we would run into a
197+
linear search problem through many commit-graph files. Instead, we use a merge
198+
strategy to decide when the stack should collapse some number of levels.
199+
200+
The diagram below shows such a collapse. As a set of new commits are added, it
201+
is determined by the merge strategy that the files should collapse to
202+
`graph-{hash1}`. Thus, the new commits, the commits in `graph-{hash2}` and
203+
the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}`
204+
file.
205+
206+
+---------------------+
207+
| |
208+
| (new commits) |
209+
| |
210+
+---------------------+
211+
| |
212+
+-----------------------+ +---------------------+
213+
| graph-{hash2} |->| |
214+
+-----------------------+ +---------------------+
215+
| | |
216+
+-----------------------+ +---------------------+
217+
| | | |
218+
| graph-{hash1} |->| |
219+
| | | |
220+
+-----------------------+ +---------------------+
221+
| tmp_graphXXX
222+
+-----------------------+
223+
| |
224+
| |
225+
| |
226+
| graph-{hash0} |
227+
| |
228+
| |
229+
| |
230+
+-----------------------+
231+
232+
During this process, the commits to write are combined, sorted and we write the
233+
contents to a temporary file, all while holding a `commit-graph-chain.lock`
234+
lock-file. When the file is flushed, we rename it to `graph-{hash3}`
235+
according to the computed `{hash3}`. Finally, we write the new chain data to
236+
`commit-graph-chain.lock`:
237+
238+
```
239+
{hash3}
240+
{hash0}
241+
```
242+
243+
We then close the lock-file.
244+
245+
## Merge Strategy
246+
247+
When writing a set of commits that do not exist in the commit-graph stack of
248+
height N, we default to creating a new file at level N + 1. We then decide to
249+
merge with the Nth level if one of two conditions hold:
250+
251+
1. `--size-multiple=<X>` is specified or X = 2, and the number of commits in
252+
level N is less than X times the number of commits in level N + 1.
253+
254+
2. `--max-commits=<C>` is specified with non-zero C and the number of commits
255+
in level N + 1 is more than C commits.
256+
257+
This decision cascades down the levels: when we merge a level we create a new
258+
set of commits that then compares to the next level.
259+
260+
The first condition bounds the number of levels to be logarithmic in the total
261+
number of commits. The second condition bounds the total number of commits in
262+
a `graph-{hashN}` file and not in the `commit-graph` file, preventing
263+
significant performance issues when the stack merges and another process only
264+
partially reads the previous stack.
265+
266+
The merge strategy values (2 for the size multiple, 64,000 for the maximum
267+
number of commits) could be extracted into config settings for full
268+
flexibility.
269+
270+
## Deleting graph-{hash} files
271+
272+
After a new tip file is written, some `graph-{hash}` files may no longer
273+
be part of a chain. It is important to remove these files from disk, eventually.
274+
The main reason to delay removal is that another process could read the
275+
`commit-graph-chain` file before it is rewritten, but then look for the
276+
`graph-{hash}` files after they are deleted.
277+
278+
To allow holding old split commit-graphs for a while after they are unreferenced,
279+
we update the modified times of the files when they become unreferenced. Then,
280+
we scan the `$OBJDIR/info/commit-graphs/` directory for `graph-{hash}`
281+
files whose modified times are older than a given expiry window. This window
282+
defaults to zero, but can be changed using command-line arguments or a config
283+
setting.
284+
285+
## Chains across multiple object directories
286+
287+
In a repo with alternates, we look for the `commit-graph-chain` file starting
288+
in the local object directory and then in each alternate. The first file that
289+
exists defines our chain. As we look for the `graph-{hash}` files for
290+
each `{hash}` in the chain file, we follow the same pattern for the host
291+
directories.
292+
293+
This allows commit-graphs to be split across multiple forks in a fork network.
294+
The typical case is a large "base" repo with many smaller forks.
295+
296+
As the base repo advances, it will likely update and merge its commit-graph
297+
chain more frequently than the forks. If a fork updates their commit-graph after
298+
the base repo, then it should "reparent" the commit-graph chain onto the new
299+
chain in the base repo. When reading each `graph-{hash}` file, we track
300+
the object directory containing it. During a write of a new commit-graph file,
301+
we check for any changes in the source object directory and read the
302+
`commit-graph-chain` file for that source and create a new file based on those
303+
files. During this "reparent" operation, we necessarily need to collapse all
304+
levels in the fork, as all of the files are invalid against the new base file.
305+
306+
It is crucial to be careful when cleaning up "unreferenced" `graph-{hash}.graph`
307+
files in this scenario. It falls to the user to define the proper settings for
308+
their custom environment:
309+
310+
1. When merging levels in the base repo, the unreferenced files may still be
311+
referenced by chains from fork repos.
312+
313+
2. The expiry time should be set to a length of time such that every fork has
314+
time to recompute their commit-graph chain to "reparent" onto the new base
315+
file(s).
316+
317+
3. If the commit-graph chain is updated in the base, the fork will not have
318+
access to the new chain until its chain is updated to reference those files.
319+
(This may change in the future [5].)
320+
130321
Related Links
131322
-------------
132323
[0] https://bugs.chromium.org/p/git/issues/detail?id=8
@@ -153,3 +344,7 @@ Related Links
153344

154345
[4] https://public-inbox.org/git/[email protected]/T/#u
155346
A patch to remove the ahead-behind calculation from 'status'.
347+
348+
[5] https://public-inbox.org/git/[email protected]/
349+
A discussion of a "two-dimensional graph position" that can allow reading
350+
multiple commit-graph chains at the same time.

0 commit comments

Comments
 (0)