Skip to content

Commit 23fcc98

Browse files
committed
doc: technical details about the index file format
* Clarify "string of unsigned bytes"; * Blob has two variants (regular file vs symlink), not (blob vs symlink); * Clarify permission mode bits; * Clarify ce_namelen() "too long to fit in the length field" case; * Clarify "." etc are forbidden as path components; * Match the description with the internal wording "cache-tree"; * All types of extension begin with signature and length as explained in the first part. Don't repeat the "length" part in the description of each extension (can be mistaken as if there is a separate 32-bit size field inside the extension), but state what the signature for each extension is. * Don't say "Extension tag", as we have said "Extension signature" in the first part---be consistent; * Clarify the invalidation of cache-tree entries; * Correct description on subtree_nr field in the cache-tree; * Clarify the order of entries in cache-tree; Signed-off-by: Junio C Hamano <[email protected]>
1 parent 8c7d051 commit 23fcc98

File tree

1 file changed

+57
-37
lines changed

1 file changed

+57
-37
lines changed

Documentation/technical/index-format.txt

Lines changed: 57 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -9,21 +9,21 @@ GIT index format
99
- A 12-byte header consisting of
1010

1111
4-byte signature:
12-
The signature is { 'D', 'I', 'R', 'C' }
12+
The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
1313

1414
4-byte version number:
1515
The current supported versions are 2 and 3.
1616

1717
32-bit number of index entries.
1818

19-
- A number of sorted index entries
19+
- A number of sorted index entries (see below).
2020

2121
- Extensions
2222

2323
Extensions are identified by signature. Optional extensions can
2424
be ignored if GIT does not understand them.
2525

26-
GIT currently supports tree cache and resolve undo extensions.
26+
GIT currently supports cached tree and resolve undo extensions.
2727

2828
4-byte extension signature. If the first byte is 'A'..'Z' the
2929
extension is optional and can be ignored.
@@ -38,8 +38,9 @@ GIT index format
3838
== Index entry
3939

4040
Index entries are sorted in ascending order on the name field,
41-
interpreted as a string of unsigned bytes. Entries with the same
42-
name are sorted by their stage field.
41+
interpreted as a string of unsigned bytes (i.e. memcmp() order, no
42+
localization, no special casing of directory separator '/'). Entries
43+
with the same name are sorted by their stage field.
4344

4445
32-bit ctime seconds, the last time a file's metadata changed
4546
this is stat(2) data
@@ -62,12 +63,13 @@ GIT index format
6263
32-bit mode, split into (high to low bits)
6364

6465
4-bit object type
65-
valid values in binary are 1000 (blob), 1010 (symbolic link)
66+
valid values in binary are 1000 (regular file), 1010 (symbolic link)
6667
and 1110 (gitlink)
6768

6869
3-bit unused
6970

70-
9-bit unix permission (only 0755 and 0644 are valid)
71+
9-bit unix permission. Only 0755 and 0644 are valid for regular files.
72+
Symbolic links and gitlinks have value 0 in this field.
7173

7274
32-bit uid
7375
this is stat(2) data
@@ -76,19 +78,20 @@ GIT index format
7678
this is stat(2) data
7779

7880
32-bit file size
79-
This is the on-disk size from stat(2)
81+
This is the on-disk size from stat(2), truncated to 32-bit.
8082

8183
160-bit SHA-1 for the represented object
8284

83-
A 16-bit field split into (high to low bits)
85+
A 16-bit 'flags' field split into (high to low bits)
8486

8587
1-bit assume-valid flag
8688

8789
1-bit extended flag (must be zero in version 2)
8890

8991
2-bit stage (during merge)
9092

91-
12-bit name length if the length is less than 0x0FFF
93+
12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
94+
is stored in this field.
9295

9396
(Version 3) A 16-bit field, only applicable if the "extended flag"
9497
above is 1, split into (high to low bits).
@@ -103,63 +106,80 @@ GIT index format
103106

104107
Entry path name (variable length) relative to top level directory
105108
(without leading slash). '/' is used as path separator. The special
106-
paths ".", ".." and ".git" (without quotes) are disallowed.
109+
path components ".", ".." and ".git" (without quotes) are disallowed.
107110
Trailing slash is also disallowed.
108111

109112
The exact encoding is undefined, but the '.' and '/' characters
110-
are encoded in 7-bit ASCII and the encoding cannot contain a nul
111-
byte. Generally a superset of ASCII.
113+
are encoded in 7-bit ASCII and the encoding cannot contain a NUL
114+
byte (iow, this is a UNIX pathname).
112115

113116
1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes
114117
while keeping the name NUL-terminated.
115118

116119
== Extensions
117120

118-
=== Tree cache
121+
=== Cached tree
119122

120-
Tree cache extension contains pre-computed hashes for trees that can
123+
Cached tree extension contains pre-computed hashes for trees that can
121124
be derived from the index. It helps speed up tree object generation
122125
from index for a new commit.
123126

124127
When a path is updated in index, the path must be invalidated and
125128
removed from tree cache.
126129

127-
- Extension tag { 'T', 'R', 'E', 'E' }
130+
The signature for this extension is { 'T', 'R', 'E', 'E' }.
128131

129-
- 32-bit size
132+
A series of entries fill the entire extension; each of which
133+
consists of:
130134

131-
- A number of entries
135+
- NUL-terminated path component (relative to its parent directory);
132136

133-
NUL-terminated tree name
137+
- ASCII decimal number of entries in the index that is covered by the
138+
tree this entry represents (entry_count);
134139

135-
Blank-terminated ASCII decimal number of entries in this tree
140+
- A space (ASCII 32);
136141

137-
Newline-terminated position of this tree in the parent tree. 0 for
138-
the root tree
142+
- ASCII decimal number that represents the number of subtrees this
143+
tree has;
139144

140-
160-bit SHA-1 for this tree and it's children
145+
- A newline (ASCII 10); and
146+
147+
- 160-bit object name for the object that would result from writing
148+
this span of index as a tree.
149+
150+
An entry can be in an invalidated state and is represented by having -1
151+
in the entry_count field.
152+
153+
The entries are written out in the top-down, depth-first order. The
154+
first entry represents the root level of the repository, followed by the
155+
first subtree---let's call this A---of the root level (with its name
156+
relative to the root level), followed by the first subtree of A (with
157+
its name relative to A), ...
141158

142159
=== Resolve undo
143160

144-
A conflict is represented in index as a set of higher stage entries.
161+
A conflict is represented in the index as a set of higher stage entries.
145162
When a conflict is resolved (e.g. with "git add path"), these higher
146-
stage entries will be removed and a stage-0 entry with proper
147-
resoluton is added.
163+
stage entries will be removed and a stage-0 entry with proper resoluton
164+
is added.
148165

149-
Resolve undo extension saves these higher stage entries so that
150-
conflicts can be recreated (e.g. with "git checkout -m"), in case
151-
users want to redo a conflict resolution from scratch.
166+
When these higher stage entries are removed, they are saved in the
167+
resolve undo extension, so that conflicts can be recreated (e.g. with
168+
"git checkout -m"), in case users want to redo a conflict resolution
169+
from scratch.
152170

153-
- Extension tag { 'R', 'E', 'U', 'C' }
171+
The signature for this extension is { 'R', 'E', 'U', 'C' }.
154172

155-
- 32-bit size
173+
A series of entries fill the entire extension; each of which
174+
consists of:
156175

157-
- A number of conflict entries
176+
- NUL-terminated pathname the entry describes (relative to the root of
177+
the repository, i.e. full pathname);
158178

159-
NUL-terminated conflict path
179+
- Three NUL-terminated ASCII octal numbers, entry mode of entries in
180+
stage 1 to 3 (a missing stage is represented by "0" in this field);
181+
and
160182

161-
Three NUL-terminated ASCII octal numbers, entry mode of entries in
162-
stage 1 to 3.
183+
- At most three 160-bit object names of the entry in stages from 1 to 3
184+
(nothing is written for a missing stage).
163185

164-
At most three 160-bit SHA-1s of the entry in three stages from 1
165-
to 3. SHA-1 is not saved for any stage with entry mode zero.

0 commit comments

Comments
 (0)