Skip to content

Commit 2af890b

Browse files
derrickstoleegitster
authored andcommitted
multi-pack-index: prepare 'repack' subcommand
In an environment where the multi-pack-index is useful, it is due to many pack-files and an inability to repack the object store into a single pack-file. However, it is likely that many of these pack-files are rather small, and could be repacked into a slightly larger pack-file without too much effort. It may also be important to ensure the object store is highly available and the repack operation does not interrupt concurrent git commands. Introduce a 'repack' subcommand to 'git multi-pack-index' that takes a '--batch-size' option. The subcommand will inspect the multi-pack-index for referenced pack-files whose size is smaller than the batch size, until collecting a list of pack-files whose sizes sum to larger than the batch size. Then, a new pack-file will be created containing the objects from those pack-files that are referenced by the multi-pack-index. The resulting pack is likely to actually be smaller than the batch size due to compression and the fact that there may be objects in the pack- files that have duplicate copies in other pack-files. The current change introduces the command-line arguments, and we add a test that ensures we parse these options properly. Since we specify a small batch size, we will guarantee that future implementations do not change the list of pack-files. In addition, we hard-code the modified times of the packs in the pack directory to ensure the list of packs sorted by modified time matches the order if sorted by size (ascending). This will be important in a future test. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 19575c7 commit 2af890b

File tree

5 files changed

+52
-3
lines changed

5 files changed

+52
-3
lines changed

Documentation/git-multi-pack-index.txt

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,23 @@ expire::
3636
have no objects referenced by the MIDX. Rewrite the MIDX file
3737
afterward to remove all references to these pack-files.
3838

39+
repack::
40+
Create a new pack-file containing objects in small pack-files
41+
referenced by the multi-pack-index. If the size given by the
42+
`--batch-size=<size>` argument is zero, then create a pack
43+
containing all objects referenced by the multi-pack-index. For
44+
a non-zero batch size, Select the pack-files by examining packs
45+
from oldest-to-newest, computing the "expected size" by counting
46+
the number of objects in the pack referenced by the
47+
multi-pack-index, then divide by the total number of objects in
48+
the pack and multiply by the pack size. We select packs with
49+
expected size below the batch size until the set of packs have
50+
total expected size at least the batch size. If the total size
51+
does not reach the batch size, then do nothing. If a new pack-
52+
file is created, rewrite the multi-pack-index to reference the
53+
new pack-file. A later run of 'git multi-pack-index expire' will
54+
delete the pack-files that were part of this batch.
55+
3956

4057
EXAMPLES
4158
--------

builtin/multi-pack-index.c

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,13 @@
66
#include "trace2.h"
77

88
static char const * const builtin_multi_pack_index_usage[] = {
9-
N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire)"),
9+
N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire|repack --batch-size=<size>)"),
1010
NULL
1111
};
1212

1313
static struct opts_multi_pack_index {
1414
const char *object_dir;
15+
unsigned long batch_size;
1516
} opts;
1617

1718
int cmd_multi_pack_index(int argc, const char **argv,
@@ -20,6 +21,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
2021
static struct option builtin_multi_pack_index_options[] = {
2122
OPT_FILENAME(0, "object-dir", &opts.object_dir,
2223
N_("object directory containing set of packfile and pack-index pairs")),
24+
OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
25+
N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
2326
OPT_END(),
2427
};
2528

@@ -43,12 +46,17 @@ int cmd_multi_pack_index(int argc, const char **argv,
4346

4447
trace2_cmd_mode(argv[0]);
4548

49+
if (!strcmp(argv[0], "repack"))
50+
return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size);
51+
if (opts.batch_size)
52+
die(_("--batch-size option is only for 'repack' subcommand"));
53+
4654
if (!strcmp(argv[0], "write"))
4755
return write_midx_file(opts.object_dir);
4856
if (!strcmp(argv[0], "verify"))
4957
return verify_midx_file(the_repository, opts.object_dir);
5058
if (!strcmp(argv[0], "expire"))
5159
return expire_midx_packs(the_repository, opts.object_dir);
5260

53-
die(_("unrecognized verb: %s"), argv[0]);
61+
die(_("unrecognized subcommand: %s"), argv[0]);
5462
}

midx.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1226,3 +1226,8 @@ int expire_midx_packs(struct repository *r, const char *object_dir)
12261226
string_list_clear(&packs_to_drop, 0);
12271227
return result;
12281228
}
1229+
1230+
int midx_repack(struct repository *r, const char *object_dir, size_t batch_size)
1231+
{
1232+
return 0;
1233+
}

midx.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ int write_midx_file(const char *object_dir);
5151
void clear_midx_file(struct repository *r);
5252
int verify_midx_file(struct repository *r, const char *object_dir);
5353
int expire_midx_packs(struct repository *r, const char *object_dir);
54+
int midx_repack(struct repository *r, const char *object_dir, size_t batch_size);
5455

5556
void close_midx(struct multi_pack_index *m);
5657

t/t5319-multi-pack-index.sh

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -398,7 +398,8 @@ test_expect_success 'setup expire tests' '
398398
git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
399399
refs/heads/E
400400
EOF
401-
git multi-pack-index write
401+
git multi-pack-index write &&
402+
cp -r .git/objects/pack .git/objects/pack-backup
402403
)
403404
'
404405

@@ -432,4 +433,21 @@ test_expect_success 'expire removes unreferenced packs' '
432433
)
433434
'
434435

436+
test_expect_success 'repack with minimum size does not alter existing packs' '
437+
(
438+
cd dup &&
439+
rm -rf .git/objects/pack &&
440+
mv .git/objects/pack-backup .git/objects/pack &&
441+
touch -m -t 201901010000 .git/objects/pack/pack-D* &&
442+
touch -m -t 201901010001 .git/objects/pack/pack-C* &&
443+
touch -m -t 201901010002 .git/objects/pack/pack-B* &&
444+
touch -m -t 201901010003 .git/objects/pack/pack-A* &&
445+
ls .git/objects/pack >expect &&
446+
MINSIZE=$(ls -l .git/objects/pack/*pack | awk "{print \$5;}" | sort -n | head -n 1) &&
447+
git multi-pack-index repack --batch-size=$MINSIZE &&
448+
ls .git/objects/pack >actual &&
449+
test_cmp expect actual
450+
)
451+
'
452+
435453
test_done

0 commit comments

Comments
 (0)