Skip to content

multi-pack-index: add --no-progress #337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Documentation/git-multi-pack-index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes
SYNOPSIS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"William Baker via GitGitGadget" <[email protected]> writes:

>  [verse]
> -'git multi-pack-index' [--object-dir=<dir>] <subcommand>
> +'git multi-pack-index' [--object-dir=<dir>] <subcommand> [--[no-]progress]

I am wondering what the reasoning behind having this new one *after*
the subcommand while the existing one *before* is.  Isn't the
--[no-]progress option supported by all subcommands of the
multi-pack-index command, just like the --object-dir=<dir> option
is?

If there is no reason that explains why one must come before and the
other must come after the subcommand, we are then adding another
thing the end-users or scriptors need to memorize, which is not
ideal.

> +--[no-]progress::
> +	Turn progress on/off explicitly. If neither is specified, progress is 
> +	shown if standard error is connected to a terminal.

Sounds sensible.

> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index b1ea1a6aa1..f8b2a74179 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -6,34 +6,41 @@
>  #include "trace2.h"
>  
>  static char const * const builtin_multi_pack_index_usage[] = {
> -	N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire|repack --batch-size=<size>)"),
> +	N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire|repack --batch-size=<size>) [--[no-]progress]"),
>  	NULL
>  };

The same comment as the SYNOPSIS part.

>  static struct opts_multi_pack_index {
>  	const char *object_dir;
>  	unsigned long batch_size;
> +	int progress;
>  } opts;
>  
>  int cmd_multi_pack_index(int argc, const char **argv,
>  			 const char *prefix)
>  {
> +	unsigned flags = 0;
> +
>  	static struct option builtin_multi_pack_index_options[] = {
>  		OPT_FILENAME(0, "object-dir", &opts.object_dir,
>  		  N_("object directory containing set of packfile and pack-index pairs")),
>  		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
>  		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
> +		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
>  		OPT_END(),
>  	};

Seeing that all the options that made me curious (see above) are
defined here for all subcommands, I suspect that "technically",
all options must come before the <subcommand> (side note: I know
parse-options may reorder commands after options, but in the
documentation and usage, we strongly discourage users from relying
on the reordering and always show "global --options, subcommand, and
then subcommand --options" order).  I also see in the code that
handles opts.batch_size that there is a workaround for this inverted
code structure to make sure subcommands other than repack does not
allow --batch-size option specified.

Unless and until we get rid of the "git multi-pack-index" as a
separate command (side note: when it happens, we;'d call the
underlying midx API functions directly from appropriate places in
the codepaths like "gc"), we probably would want to correct the use
of parse_options() API in the implementation of this command before
adding any new option or subcommand.

> @@ -47,14 +54,15 @@ int cmd_multi_pack_index(int argc, const char **argv,
>  	trace2_cmd_mode(argv[0]);
>  
>  	if (!strcmp(argv[0], "repack"))
> -		return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size);
> +		return midx_repack(the_repository, opts.object_dir, 
> +			(size_t)opts.batch_size, flags);
>  	if (opts.batch_size)
>  		die(_("--batch-size option is only for 'repack' subcommand"));
>  
>  	if (!strcmp(argv[0], "write"))
>  		return write_midx_file(opts.object_dir);
>  	if (!strcmp(argv[0], "verify"))
> -		return verify_midx_file(the_repository, opts.object_dir);
> +		return verify_midx_file(the_repository, opts.object_dir, flags);
>  	if (!strcmp(argv[0], "expire"))
>  		return expire_midx_packs(the_repository, opts.object_dir);

We can see that the new option only affects "verify", even though
the SYNOPSIS and usage text pretends that everybody understands and
reacts to it.  Shouldn't it be documented just like how --batch-size
is documented that it is understood only by "repack"?

If the mid-term aspiration of this patch is to later enhance other
subcommands to also understand the progress output or verbosity
option (and if the excuse given as a response to the above analysis
is "this is just a first step, more will come later"), the instead
of adding a "unsigned flag" local variable to the function, it would
probably make much more sense to

 (1) make struct opts_multi_pack_index as a part of the public API
     between cmd_multi_pack_index() and midx.c and document it in
     midx.h;

 (2) instead of passing opts.object_dir to existing command
     implementations, pass &opts, the pointer to the whole
     structure;

 (3) add a new field "unsigned progress" to the structure, and teach
     the command line parser to flip it upon seeing "--[no-]progress".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

Hi Junio,

Thanks for the review!

On 9/12/19 1:17 PM, Junio C Hamano wrote:

>> +'git multi-pack-index' [--object-dir=<dir>] <subcommand> [--[no-]progress]
> 
> I am wondering what the reasoning behind having this new one *after*
> the subcommand while the existing one *before* is.  Isn't the
> --[no-]progress option supported by all subcommands of the
> multi-pack-index command, just like the --object-dir=<dir> option
> is?
> 
> always show "global --options, subcommand, and then subcommand --options" order).

Thanks for calling this out.  I didn't have a specific reason for making this
option appear after the subcommand.  I tried looking at other commands as
examples and I missed that there's a specific ordering based on the type
of the option.  I will clean this up in the next patch. 

> I also see in the code that
> handles opts.batch_size that there is a workaround for this inverted
> code structure to make sure subcommands other than repack does not
> allow --batch-size option specified.

> we probably would want to correct the use
> of parse_options() API in the implementation of this command before
> adding any new option or subcommand.

To confirm I understand, is the recommendation that
cmd_multi_pack_index be updated to only parse "batch-size" for the repack
subcommand (i.e. use PARSE_OPT_STOP_AT_NON_OPTION to parse all of the common
options, and then only parse "batch-size" when the repack subcommand is running)?


>> @@ -47,14 +54,15 @@ int cmd_multi_pack_index(int argc, const char **argv,
>>  	trace2_cmd_mode(argv[0]);
>>  
>>  	if (!strcmp(argv[0], "repack"))
>> -		return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size);
>> +		return midx_repack(the_repository, opts.object_dir, 
>> +			(size_t)opts.batch_size, flags);
>>  	if (opts.batch_size)
>>  		die(_("--batch-size option is only for 'repack' subcommand"));
>>  
>>  	if (!strcmp(argv[0], "write"))
>>  		return write_midx_file(opts.object_dir);
>>  	if (!strcmp(argv[0], "verify"))
>> -		return verify_midx_file(the_repository, opts.object_dir);
>> +		return verify_midx_file(the_repository, opts.object_dir, flags);
>>  	if (!strcmp(argv[0], "expire"))
>>  		return expire_midx_packs(the_repository, opts.object_dir);
> 

> We can see that the new option only affects "verify", even though
> the SYNOPSIS and usage text pretends that everybody understands and
> reacts to it.  Shouldn't it be documented just like how --batch-size
> is documented that it is understood only by "repack"?
> 
> If the mid-term aspiration of this patch is to later enhance other
> subcommands to also understand the progress output or verbosity
> option (and if the excuse given as a response to the above analysis
> is "this is just a first step, more will come later")

Yep this was my thinking.  Today "repack" and "verify" are the only subcommands
that have any progress output but as the other subcommands learn how to provide
progress the [--[no-]progress] option can be used to control it. 

> instead of adding a "unsigned flag" local variable to the function, it would
> probably make much more sense to
> 
>  (1) make struct opts_multi_pack_index as a part of the public API
>      between cmd_multi_pack_index() and midx.c and document it in
>      midx.h;
> 
>  (2) instead of passing opts.object_dir to existing command
>      implementations, pass &opts, the pointer to the whole
>      structure;
> 
>  (3) add a new field "unsigned progress" to the structure, and teach
>      the command line parser to flip it upon seeing "--[no-]progress".
> 

Thanks for this suggestion I'll use this approach in the next patch.

One small point to clarify, in the current struct I'm using an int for 
"progress" because OPT_BOOL expects an int*.  Do you have any concerns with
keeping "progress" as an int in the public struct, or would you rather 
cmd_multi_pack_index parse an int internally and use it to populate the
public struct?

Thanks!
William

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

William Baker <[email protected]> writes:

>> I also see in the code that
>> handles opts.batch_size that there is a workaround for this inverted
>> code structure to make sure subcommands other than repack does not
>> allow --batch-size option specified.
>
>> we probably would want to correct the use
>> of parse_options() API in the implementation of this command before
>> adding any new option or subcommand.
>
> To confirm I understand, is the recommendation that
> cmd_multi_pack_index be updated to only parse "batch-size" for the repack
> subcommand (i.e. use PARSE_OPT_STOP_AT_NON_OPTION to parse all of the common
> options, and then only parse "batch-size" when the repack subcommand is running)?

Compare the ways how dispatching and command line option parsing of
subcommands in "multi-pack-index" and "commit-graph" are
implemented.  When a command (e.g. "commit-graph") takes common
options and also has subcommands (e.g. "read" and "write") that take
different set of options, there is a common options parser in the
primary entry point (e.g. "cmd_commit_graph()"), and after
dispatching to a chosen subcommand, the implementation of each
subcommand (e.g. "graph_read()" and "graph_write()") parses its own
options.  That's bog-standard way.

cmd_multi_pack_index() "cheats" and does not implement proper
subcommand dispatch (it directly makes underlying API calls
instead).  Started as an isolated experimental command whose
existence as a standalone command is solely because it was easier to
experiment with (as opposed to being a plumbing command to be used
by scripters), it probably was an acceptable trade-off to leave the
code in this shape.  In the longer run, I suspect we'd rather want
to get rid of "git multi-pack-index" as a standalone command and
instead make "git gc" and other commands make direct calls to the
internal machinery of midx code from strategic places.  So in that
sense, I am not sure if I should "recommend" fixing the way the
subcommand dispatching works in this command, or just accept to keep
the ugly technical debt and let it limp along...

>> subcommands to also understand the progress output or verbosity
>> option (and if the excuse given as a response to the above analysis
>> is "this is just a first step, more will come later")
>
> Yep this was my thinking.  Today "repack" and "verify" are the only subcommands
> that have any progress output but as the other subcommands learn how to provide
> progress the [--[no-]progress] option can be used to control it. 
>
>> instead of adding a "unsigned flag" local variable to the function, it would
>> probably make much more sense to
>> 
>>  (1) make struct opts_multi_pack_index as a part of the public API
>>      between cmd_multi_pack_index() and midx.c and document it in
>>      midx.h;
>> 
>>  (2) instead of passing opts.object_dir to existing command
>>      implementations, pass &opts, the pointer to the whole
>>      structure;
>> 
>>  (3) add a new field "unsigned progress" to the structure, and teach
>>      the command line parser to flip it upon seeing "--[no-]progress".
>> 
>
> Thanks for this suggestion I'll use this approach in the next patch.

...but it appears to me that you are more enthused than I am in
improving this code, so I probably should actually recommend fixing
the main entry point function of this command to imitate the way
"commit-graph" implements subcommands and their own set of options
;-)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 9/13/19 1:26 PM, Junio C Hamano wrote:

> Compare the ways how dispatching and command line option parsing of
> subcommands in "multi-pack-index" and "commit-graph" are
> implemented.  When a command (e.g. "commit-graph") takes common
> options and also has subcommands (e.g. "read" and "write") that take
> different set of options, there is a common options parser in the
> primary entry point (e.g. "cmd_commit_graph()"), and after
> dispatching to a chosen subcommand, the implementation of each
> subcommand (e.g. "graph_read()" and "graph_write()") parses its own
> options.  That's bog-standard way.

Thanks for the pointer to "commit-graph", looking through that code
cleared up any questions I had.

After taking another look through the "multi-pack-index" code my plan
is to update all of the subcommands to understand [--[no-]progress].
I gave the public struct in midx.h approach a try, but after seeing
how that looks I think it would be cleaner to update "write" and "expire"
to display progress and have an explicit parameter.

> Started as an isolated experimental command whose
> existence as a standalone command is solely because it was easier to
> experiment with (as opposed to being a plumbing command to be used
> by scripters), it probably was an acceptable trade-off to leave the
> code in this shape.  In the longer run, I suspect we'd rather want
> to get rid of "git multi-pack-index" as a standalone command and
> instead make "git gc" and other commands make direct calls to the
> internal machinery of midx code from strategic places.  So in that
> sense, I am not sure if I should "recommend" fixing the way the
> subcommand dispatching works in this command, or just accept to keep
> the ugly technical debt and let it limp along...

Thanks for the background here, it helped with understanding why the
multi-pack-index parsing is different than the other commands.

My plan is to include 3 commits in the next (v2) patch series:

1. Adding the new parameters to midx.h/c to control progress
2. Update write/expire to display progress
3. Update the multi-pack-index.c builtin to parse the [--[no-]progress]
   option and update the tests.

I wasn't thinking that I would adjust the the subcommand dispatching in 
multi-pack-index.c in this patch series.  By updating all of the subcommands
to support [--[no-]progress] I should be able to keep the changes to 
multi-pack-index.c quite small.  If you see any potential issues with this
approach please let me know.

Thanks,
William

--------
[verse]
'git multi-pack-index' [--object-dir=<dir>] <subcommand>
'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress] <subcommand>

DESCRIPTION
-----------
Expand All @@ -23,6 +23,10 @@ OPTIONS
`<dir>/packs/multi-pack-index` for the current MIDX file, and
`<dir>/packs` for the pack-files to index.

--[no-]progress::
Turn progress on/off explicitly. If neither is specified, progress is
shown if standard error is connected to a terminal.

The following subcommands are available:

write::
Expand Down
18 changes: 13 additions & 5 deletions builtin/multi-pack-index.c
Original file line number Diff line number Diff line change
Expand Up @@ -6,34 +6,41 @@
#include "trace2.h"

static char const * const builtin_multi_pack_index_usage[] = {
N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire|repack --batch-size=<size>)"),
N_("git multi-pack-index [<options>] (write|verify|expire|repack --batch-size=<size>)"),
NULL
};

static struct opts_multi_pack_index {
const char *object_dir;
unsigned long batch_size;
int progress;
} opts;

int cmd_multi_pack_index(int argc, const char **argv,
const char *prefix)
{
unsigned flags = 0;

static struct option builtin_multi_pack_index_options[] = {
OPT_FILENAME(0, "object-dir", &opts.object_dir,
N_("object directory containing set of packfile and pack-index pairs")),
OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
OPT_END(),
};

git_config(git_default_config, NULL);

opts.progress = isatty(2);
argc = parse_options(argc, argv, prefix,
builtin_multi_pack_index_options,
builtin_multi_pack_index_usage, 0);

if (!opts.object_dir)
opts.object_dir = get_object_directory();
if (opts.progress)
flags |= MIDX_PROGRESS;

if (argc == 0)
usage_with_options(builtin_multi_pack_index_usage,
Expand All @@ -47,16 +54,17 @@ int cmd_multi_pack_index(int argc, const char **argv,
trace2_cmd_mode(argv[0]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"William Baker via GitGitGadget" <[email protected]> writes:

> From: William Baker <[email protected]>
>
> Signed-off-by: William Baker <[email protected]>
> ---
>  builtin/multi-pack-index.c |  8 ++++----
>  builtin/repack.c           |  2 +-
>  midx.c                     |  8 ++++----
>  midx.h                     | 10 ++++++----
>  4 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index b1ea1a6aa1..e86b8cd17d 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -47,16 +47,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
>  	trace2_cmd_mode(argv[0]);
>  
>  	if (!strcmp(argv[0], "repack"))
> -		return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size);
> +		return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size, 0);
>  	if (opts.batch_size)
>  		die(_("--batch-size option is only for 'repack' subcommand"));
>  
>  	if (!strcmp(argv[0], "write"))
> -		return write_midx_file(opts.object_dir);
> +		return write_midx_file(opts.object_dir, 0);
>  	if (!strcmp(argv[0], "verify"))
> -		return verify_midx_file(the_repository, opts.object_dir);
> +		return verify_midx_file(the_repository, opts.object_dir, 0);
>  	if (!strcmp(argv[0], "expire"))
> -		return expire_midx_packs(the_repository, opts.object_dir);
> +		return expire_midx_packs(the_repository, opts.object_dir, 0);

Hmm, I actually would have expected to see a new .progress field in
the opts structure and these lower-level functions would start
taking a pointer to the whole opts structure, instead of just a
pointer to a single member opts.object_dir.  That way, we can later
extend and enrich the interface between this caller and the
underlying functions without much patch noise in the dispatch layer
(i.e. here).

This step is meant to be just preparing by extending the interface
and passing the new argument throughout the callchain, I believe,
and it looks correctly done, assuming that we are OK to take this
"add a separate 'progress' parameter, when we need more parameters
later, the interface will gain more and more parameters" approach.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <[email protected]> writes:

> This step is meant to be just preparing by extending the interface
> and passing the new argument throughout the callchain, I believe,
> and it looks correctly done, assuming that we are OK to take this
> "add a separate 'progress' parameter, when we need more parameters
> later, the interface will gain more and more parameters" approach.

After re-reading the remainder of the series, I think the structure
this patch series takes (i.e. adding a new parameter to these callees)
is better than the alternative (i.e. making sure everybody takes the
pointer to the opts structure).

Thanks.  I couldn't review the proposed log messages of these
commits, which were unreadable, at all, though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Fri, Sep 20, 2019 at 09:53:48AM -0700, William Baker via GitGitGadget wrote:
> diff --git a/midx.h b/midx.h
> index f0ae656b5d..e6fa356b5c 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -37,6 +37,8 @@ struct multi_pack_index {
>  	char object_dir[FLEX_ARRAY];
>  };
>  
> +#define MIDX_PROGRESS     (1 << 0)

Please consider using an enum.

A preprocessor constant is just that: a number.  It is shown as a
number while debugging, and there is no clear indication what related
values belong in the same group (apart from the prefix of the macro
name), or what possible values an 'unsigned flags' variable might have
(though in this particular case there is only a single possible
value...).  

An enum, however, is much friendlier to humans when debugging, because
even 'gdb' shows the value using the symbolic names, e.g.:

  write_commit_graph_reachable (obj_dir=0x9ab870 ".git/objects", 
      flags=(COMMIT_GRAPH_WRITE_APPEND | COMMIT_GRAPH_WRITE_PROGRESS), 
      split_opts=0x9670e0 <split_opts>) at commit-graph.c:1141

and it's quite clear what values a variable of type 'enum
commit_graph_write_flags' can have.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 9/21/19 5:11 AM, SZEDER Gábor wrote:
>>  
>> +#define MIDX_PROGRESS     (1 << 0)
> 
> Please consider using an enum.
> 
> A preprocessor constant is just that: a number.  It is shown as a
> number while debugging, and there is no clear indication what related
> values belong in the same group (apart from the prefix of the macro
> name), or what possible values an 'unsigned flags' variable might have
> (though in this particular case there is only a single possible
> value...).  
> 
> An enum, however, is much friendlier to humans when debugging, because
> even 'gdb' shows the value using the symbolic names, e.g.:

Thanks for this suggestion.  I agree on the benefits of using an enum
here and will make the switch in my next set of changes.

-William

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

SZEDER G=C3=A1bor <[email protected]> writes:

> On Fri, Sep 20, 2019 at 09:53:48AM -0700, William Baker via GitGitGadge=
t wrote:
>> diff --git a/midx.h b/midx.h
>> index f0ae656b5d..e6fa356b5c 100644
>> --- a/midx.h
>> +++ b/midx.h
>> @@ -37,6 +37,8 @@ struct multi_pack_index {
>>  	char object_dir[FLEX_ARRAY];
>>  };
>> =20
>> +#define MIDX_PROGRESS     (1 << 0)
>
> Please consider using an enum.

If they are used by assiging one of their values, definitely a good
idea to use an enum.  Are debuggers clever enough that they can
tell, when they see something like this:

	enum gress {
		PROGRESS =3D 1,
		REGRESS =3D 2,
	};

	void func(enum gress v);

	...

        void caller(void)
	{
		func(PROGRESS | REGRESS);
		func(PROGRESS + REGRESS);
		func(PROGRESS * 3);
	}

how caller came about to give 3?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 9/27/19 8:40 PM, Junio C Hamano wrote:
> SZEDER Gábor <[email protected]> writes:
> 
>>> +#define MIDX_PROGRESS     (1 << 0)
>>
>> Please consider using an enum.
> 
> If they are used by assiging one of their values, definitely a good
> idea to use an enum.  Are debuggers clever enough that they can
> tell, when they see something like this:
> 
> 	enum gress {
> 		PROGRESS = 1,
> 		REGRESS = 2,
> 	};
> 
> 	void func(enum gress v);
> 
> 	...
> 
>         void caller(void)
> 	{
> 		func(PROGRESS | REGRESS);
> 		func(PROGRESS + REGRESS);
> 		func(PROGRESS * 3);
> 	}
> 
> how caller came about to give 3?
> 

My debugger was not smart enough to figure out what flags were combined
to give the value of 3 in this example.  

I saw that the code base is currently a mix of #define and enums when it
comes to flags  (e.g. dir_struct.flags and rebase_options.flags are both
enums) and so using one here would not be something new stylistically.

Although my debugger might not be the smartest, I haven't noticed any
downsides to switching this to an enum.

Thanks,
William

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

William Baker <[email protected]> writes:

> I saw that the code base is currently a mix of #define and enums when it
> comes to flags  (e.g. dir_struct.flags and rebase_options.flags are both
> enums) and so using one here would not be something new stylistically.

Yes.  But it is a different matter to spread a bad practice to
parts of the code that haven't been contaminated by it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Tue, Oct 08, 2019 at 01:30:34PM +0900, Junio C Hamano wrote:
> SZEDER Gábor <[email protected]> writes:
> 
> >> 		func(PROGRESS | REGRESS);
> >> 		func(PROGRESS + REGRESS);
> >> 		func(PROGRESS * 3);
> >> 	}
> >> 
> >> how caller came about to give 3?
> >
> > No, they tend to show (PROGRESS | REGRESS), at least both gdb and lldb
> > do.

I was wrong here, gdb does this, but lldb, unfortunately, doesn't; see
my other reply in this thread.

> OK.
> 
> >  If the enum has only constants with power-of-two values, then that
> > is the right way to write it, and the other two are asking for trouble
> 
> If the programmer and the debugger knows the constants are used to
> represent bits that can be or'ed together, what you say is correct,
> but that is entirely irrelevant.
> 
> What I was worried about is that the constants that are used to
> represent something that are *NOT* set of bits (hence "PROGRESS * 3"
> may be perfectly a reasonable thing for such an application)

I don't really see how that could be reasonable, it's prone to break
when changing the values of the enum constants.

> may be
> mistaken by an overly clever debugger and "3" may end up getting
> shown as "PROGRESS | REGRESS".  When there are only two constants
> (PROGRESS=1 and REGRESS=2), we humans nor debuggers can tell if that
> is to represent two bits that can be or'ed together, or it is an
> enumeration.
> 
> Until we gain the third constant, that is, at which time the third
> one may likely be 3 (if enumeration) or 4 (if bits).

Humans benefit from context: they understand the name of the enum type
(e.g. does it end with "_flags"?), the name of the enum constants, and
the comment above the enum's definition (if any), and can then infer
whether those constants represent OR-able bits or not.  If they can't
find this out, then that enum is poorly named and/or documented, which
should be fixed.  As for the patch that I originally commented on, I
would expect the enum to be called e.g. 'midx_flags', and thus already
with that single constant in there it'll be clear that it is intended
as a collection of related OR-able bits.

As for the debugger, if it sees a variable of an enum type whose value
doesn't match any of the enum constants, then there are basically
three possibilities:

  - All constants in that enum have power-of-two values.  In this case
    it's reasonable from the debugger to assume that those constants
    are OR'ed together, and is extremely helpful to display the value
    that way.

  - The constants are just a set of values (1, 2, 3, 42, etc).  In
    this case the variable shouldn't have a value that doesn't match
    one of the constants in the first place, and I would first suspect
    that the program might be buggy.

  - A "mostly" power-of-two enum might contain shorthand constants for
    combinations of a set of other constants, e.g.:

      enum flags {
              BIT0 = (1 << 0),
              BIT1 = (1 << 1),
              BIT2 = (1 << 2),

              FIRST_TWO = (BIT0 | BIT1),
      };
      enum flags f0 = BIT0;
      enum flags f1 = BIT0 | BIT1;
      enum flags f2 = BIT0 | BIT2;
      enum flags f3 = BIT0 | BIT1 | BIT2;

    In this case, sadly, gdb shows only matching constants:

      (gdb) p f0
      $1 = BIT0
      (gdb) p f1
      $2 = FIRST_TWO
      (gdb) p f2
      $3 = 5
      (gdb) p f3
      $4 = 7

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 10/8/19 6:32 PM, SZEDER Gábor wrote:
>>> No, they tend to show (PROGRESS | REGRESS), at least both gdb and lldb
>>> do.
> 
> I was wrong here, gdb does this, but lldb, unfortunately, doesn't; see
> my other reply in this thread.
>
>> What I was worried about is that the constants that are used to
>> represent something that are *NOT* set of bits (hence "PROGRESS * 3"
>> may be perfectly a reasonable thing for such an application)
> 
> I don't really see how that could be reasonable, it's prone to break
> when changing the values of the enum constants.
> 
>> may be
>> mistaken by an overly clever debugger and "3" may end up getting
>> shown as "PROGRESS | REGRESS".  When there are only two constants
>> (PROGRESS=1 and REGRESS=2), we humans nor debuggers can tell if that
>> is to represent two bits that can be or'ed together, or it is an
>> enumeration.
>>
>> Until we gain the third constant, that is, at which time the third
>> one may likely be 3 (if enumeration) or 4 (if bits).
> 
> Humans benefit from context: they understand the name of the enum type
> (e.g. does it end with "_flags"?), the name of the enum constants, and
> the comment above the enum's definition (if any), and can then infer
> whether those constants represent OR-able bits or not.  If they can't
> find this out, then that enum is poorly named and/or documented, which
> should be fixed.  As for the patch that I originally commented on, I
> would expect the enum to be called e.g. 'midx_flags', and thus already
> with that single constant in there it'll be clear that it is intended
> as a collection of related OR-able bits.
> 
> As for the debugger, if it sees a variable of an enum type whose value
> doesn't match any of the enum constants, then there are basically
> three possibilities:
> 
>   - All constants in that enum have power-of-two values.  In this case
>     it's reasonable from the debugger to assume that those constants
>     are OR'ed together, and is extremely helpful to display the value
>     that way.
> 
>   - The constants are just a set of values (1, 2, 3, 42, etc).  In
>     this case the variable shouldn't have a value that doesn't match
>     one of the constants in the first place, and I would first suspect
>     that the program might be buggy.
> 
>   - A "mostly" power-of-two enum might contain shorthand constants for
>     combinations of a set of other constants, e.g.:
> 
>       enum flags {
>               BIT0 = (1 << 0),
>               BIT1 = (1 << 1),
>               BIT2 = (1 << 2),
> 
>               FIRST_TWO = (BIT0 | BIT1),
>       };
>       enum flags f0 = BIT0;
>       enum flags f1 = BIT0 | BIT1;
>       enum flags f2 = BIT0 | BIT2;
>       enum flags f3 = BIT0 | BIT1 | BIT2;
> 
>     In this case, sadly, gdb shows only matching constants:
> 
>       (gdb) p f0
>       $1 = BIT0
>       (gdb) p f1
>       $2 = FIRST_TWO
>       (gdb) p f2
>       $3 = 5
>       (gdb) p f3
>       $4 = 7
> 

I believe this is the last open thread/discussion on the "midx: add MIDX_PROGRESS
flag" patch series.  

A quick summary of where the discussion stands:

- The patch series currently uses #define for the new MIDX_PROGRESS flag.
- The git codebase uses both #defines and enums for flags.
- The debugger always shows the enum value's name if there is a match, if values
  are OR-ed together there a few possibilities (see discussion above and earlier
  in the thread).
- There are concerns regarding misinterpreting constants that are not a set of
  bits (e.g. "PROGRESS * 3").

Are there any other details I can provide that would help in reaching a
conclusion as to how to proceed?

At this time there are no other MIDX_* flags and so there's always the option
to revisit the best way to handle multiple MIDX_* flags if/when additional
flags are added.

Thanks,
William

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

William Baker <[email protected]> writes:

> At this time there are no other MIDX_* flags and so there's always the option
> to revisit the best way to handle multiple MIDX_* flags if/when additional
> flags are added.

I do not care too deeply either way, but if you wrote it in one way,
how about completing the series without changing it in the middle,
and leave the clean-ups to a follow-up series (if needed)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 10/15/19 7:09 PM, Junio C Hamano wrote:
>> At this time there are no other MIDX_* flags and so there's always the option
>> to revisit the best way to handle multiple MIDX_* flags if/when additional
>> flags are added.
> 
> I do not care too deeply either way, but if you wrote it in one way,
> how about completing the series without changing it in the middle,
> and leave the clean-ups to a follow-up series (if needed)?


That plan sounds good to me.  The most recent series (v3) should be ready
to go, I don't believe there is any outstanding feedback to address.

Thanks,
William

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 10/16/19 12:48 PM, William Baker wrote:>> I do not care too deeply either way, but if you wrote it in one way,
>> how about completing the series without changing it in the middle,
>> and leave the clean-ups to a follow-up series (if needed)?
> 
> 
> That plan sounds good to me.  The most recent series (v3) should be ready
> to go, I don't believe there is any outstanding feedback to address.

Follow-up question on this patch series.

I noticed there is a branch named 'wb/midx-progress' in
https://github.com/gitster/git but it does not appear to have the commits
from this patch series and I have not seen it mentioned in
"What's cooking in git.git" emails.

Is there anything else I should do to get these changes picked up in 'pu'?

Thanks again,
William


if (!strcmp(argv[0], "repack"))
return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size);
return midx_repack(the_repository, opts.object_dir,
(size_t)opts.batch_size, flags);
if (opts.batch_size)
die(_("--batch-size option is only for 'repack' subcommand"));

if (!strcmp(argv[0], "write"))
return write_midx_file(opts.object_dir);
return write_midx_file(opts.object_dir, flags);
if (!strcmp(argv[0], "verify"))
return verify_midx_file(the_repository, opts.object_dir);
return verify_midx_file(the_repository, opts.object_dir, flags);
if (!strcmp(argv[0], "expire"))
return expire_midx_packs(the_repository, opts.object_dir);
return expire_midx_packs(the_repository, opts.object_dir, flags);

die(_("unrecognized subcommand: %s"), argv[0]);
}
2 changes: 1 addition & 1 deletion builtin/repack.c
Original file line number Diff line number Diff line change
Expand Up @@ -562,7 +562,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
remove_temporary_files();

if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
write_midx_file(get_object_directory());
write_midx_file(get_object_directory(), 0);

string_list_clear(&names, 0);
string_list_clear(&rollback, 0);
Expand Down
71 changes: 55 additions & 16 deletions midx.c
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,8 @@ struct pack_list {
uint32_t nr;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"William Baker via GitGitGadget" <[email protected]> writes:

> diff --git a/midx.c b/midx.c
> index b2673f52e8..54e4e93b2b 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -449,6 +449,8 @@ struct pack_list {
>  	uint32_t nr;
>  	uint32_t alloc;
>  	struct multi_pack_index *m;
> +	struct progress *progress;
> +	uint32_t pack_paths_checked;
>  };

What is the reason behind u32 here?  Do we want to be consistently
limited to 4G on all platforms?

I have a feeling that we do not care too deeply all that much for
the purpose of progress output, in which case, just an ordinary
"unsigned" may be much less misleading.

FWIW, the function that uses this field is display_progress(), which
takes uint64_t there.

> @@ -457,6 +459,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
>  	struct pack_list *packs = (struct pack_list *)data;
>  
>  	if (ends_with(file_name, ".idx")) {
> +		display_progress(packs->progress, ++packs->pack_paths_checked);

OK, "git grep display_progress" tells me that we are expected to
pass the value of the counter that is already incremented, so this
is also in line with that practice.

> +	if (flags & MIDX_PROGRESS)
> +		packs.progress = start_progress(_("Adding packfiles to multi-pack-index"), 0);
> +	else
> +		packs.progress = NULL;
>
>  	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs);
> +	stop_progress(&packs.progress);

OK.

> +	if (flags & MIDX_PROGRESS)
> +		progress = start_progress(_("Writing chunks to multi-pack-index"),
> +					  num_chunks);
> ...
> +
> +		display_progress(progress, i + 1);
>  	}
> +	stop_progress(&progress);
>  

OK.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 9/20/19 1:10 PM, Junio C Hamano wrote:
>> diff --git a/midx.c b/midx.c
>> index b2673f52e8..54e4e93b2b 100644
>> --- a/midx.c
>> +++ b/midx.c
>> @@ -449,6 +449,8 @@ struct pack_list {
>>  	uint32_t nr;
>>  	uint32_t alloc;
>>  	struct multi_pack_index *m;
>> +	struct progress *progress;
>> +	uint32_t pack_paths_checked;
>>  };
> 
> What is the reason behind u32 here?  Do we want to be consistently
> limited to 4G on all platforms?
> 
> I have a feeling that we do not care too deeply all that much for
> the purpose of progress output, in which case, just an ordinary
> "unsigned" may be much less misleading.

I went with u32 here to match the data type used to track how many
entries are in the pack_list ('nr' is a u32).

I could switch to this to an unsigned but we would run the (extremely
unlikely) risk of having more than 65k packs on a system where
unsigned is 16 bits.

> FWIW, the function that uses this field is display_progress(), which
> takes uint64_t there.

Thanks for pointing this out.  Given that display_progress() expects a
u64 using that type here for 'pack_paths_checked' could help make things
more straightforward.

-William

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

William Baker <[email protected]> writes:

> On 9/20/19 1:10 PM, Junio C Hamano wrote:
>>> diff --git a/midx.c b/midx.c
>>> index b2673f52e8..54e4e93b2b 100644
>>> --- a/midx.c
>>> +++ b/midx.c
>>> @@ -449,6 +449,8 @@ struct pack_list {
>>>  	uint32_t nr;
>>>  	uint32_t alloc;
>>>  	struct multi_pack_index *m;
>>> +	struct progress *progress;
>>> +	uint32_t pack_paths_checked;
>>>  };
>> 
>> What is the reason behind u32 here?  Do we want to be consistently
>> limited to 4G on all platforms?
>> 
>> I have a feeling that we do not care too deeply all that much for
>> the purpose of progress output, in which case, just an ordinary
>> "unsigned" may be much less misleading.
>
> I went with u32 here to match the data type used to track how many
> entries are in the pack_list ('nr' is a u32).

That kind of parallel is valid when you could compare nr with this
new thing (or put it differently, the existing nr and this new thing
counts the same).  Are they both about the number of packs?

> I could switch to this to an unsigned but we would run the (extremely
> unlikely) risk of having more than 65k packs on a system where
> unsigned is 16 bits.

Why?  If an arch is small enough that the natural integer size is 16-bit,
then limiting the total number of packs to 65k sound entirely
sensible.

The only reason why you'd want fixed (across platforms and
architectures) type is when you want to make sure that a file
storing the literal values taken from these fields are portable and
everybody is limited the same way.  If a platform's natural integer
is 64-bit, by artificially limiting the size of this field to u32,
you are making disservice to the platform users, and more
importantly, you are wasting developers' time by forcing them to
wonder if there is a reason behind the choice of u32 (does it really
need to be able to store up to 4G, even on a smaller machines?  Is
it necessary to refuse to store more than 4G?  What are the
reasons?), like me wondering about these questions and writing them
down here.

So, unless there is a reason why this must be of fixed type, I'd say
just an unsigned would be the most reasonable choice.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, William Baker wrote (reply to this):

On 9/27/19 8:49 PM, Junio C Hamano wrote:
>>>> diff --git a/midx.c b/midx.c
>>>> index b2673f52e8..54e4e93b2b 100644
>>>> --- a/midx.c
>>>> +++ b/midx.c
>>>> @@ -449,6 +449,8 @@ struct pack_list {
>>>>  	uint32_t nr;
>>>>  	uint32_t alloc;
>>>>  	struct multi_pack_index *m;
>>>> +	struct progress *progress;
>>>> +	uint32_t pack_paths_checked;
>>>>  };
>>
>> I went with u32 here to match the data type used to track how many
>> entries are in the pack_list ('nr' is a u32).
> 
> That kind of parallel is valid when you could compare nr with this
> new thing (or put it differently, the existing nr and this new thing
> counts the same).  Are they both about the number of packs?
> 

Both 'nr' and 'pack_paths_checked' are about the number of packs.  
'nr' tracks the number of packs in the multi-pack-index and it grows
as add_pack_to_midx() finds new packs to add.  'pack_paths_checked' is
the number of pack ".idx" files that have been checked by add_pack_to_midx().

>> I could switch to this to an unsigned but we would run the (extremely
>> unlikely) risk of having more than 65k packs on a system where
>> unsigned is 16 bits.
> 
> Why?  If an arch is small enough that the natural integer size is 16-bit,
> then limiting the total number of packs to 65k sound entirely
> sensible.> The only reason why you'd want fixed (across platforms and
> architectures) type is when you want to make sure that a file
> storing the literal values taken from these fields are portable and
> everybody is limited the same way.  If a platform's natural integer
> is 64-bit, by artificially limiting the size of this field to u32,
> you are making disservice to the platform users, and more
> importantly, you are wasting developers' time by forcing them to
> wonder if there is a reason behind the choice of u32 (does it really
> need to be able to store up to 4G, even on a smaller machines?  Is
> it necessary to refuse to store more than 4G?  What are the
> reasons?), like me wondering about these questions and writing them
> down here.
> 
> So, unless there is a reason why this must be of fixed type, I'd say
> just an unsigned would be the most reasonable choice.
> 

I agree that it's best to avoid using a fixed type here unless there's
a reason that it must be.  Do you think that both 'nr' and
'pack_paths_checked' being about the number of packs is a strong enough
reason to use u32 for 'pack_paths_checked'?  If not, I will update
'pack_paths_checked' in the next path series to be an unsigned int.

Thanks,
William



Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Steps 4 & 5 look OK.

uint32_t alloc;
struct multi_pack_index *m;
struct progress *progress;
unsigned pack_paths_checked;
};

static void add_pack_to_midx(const char *full_path, size_t full_path_len,
Expand All @@ -456,6 +458,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
struct pack_list *packs = (struct pack_list *)data;

if (ends_with(file_name, ".idx")) {
display_progress(packs->progress, ++packs->pack_paths_checked);
if (packs->m && midx_contains_pack(packs->m, file_name))
return;

Expand Down Expand Up @@ -785,7 +788,7 @@ static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off
}

static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
struct string_list *packs_to_drop)
struct string_list *packs_to_drop, unsigned flags)
{
unsigned char cur_chunk, num_chunks = 0;
char *midx_name;
Expand All @@ -799,6 +802,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1];
uint32_t nr_entries, num_large_offsets = 0;
struct pack_midx_entry *entries = NULL;
struct progress *progress = NULL;
int large_offsets_needed = 0;
int pack_name_concat_len = 0;
int dropped_packs = 0;
Expand Down Expand Up @@ -832,8 +836,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
packs.nr++;
}
}

packs.pack_paths_checked = 0;
if (flags & MIDX_PROGRESS)
packs.progress = start_progress(_("Adding packfiles to multi-pack-index"), 0);
else
packs.progress = NULL;

for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs);
stop_progress(&packs.progress);

if (packs.m && packs.nr == packs.m->num_packs && !packs_to_drop)
goto cleanup;
Expand Down Expand Up @@ -958,6 +969,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
written += MIDX_CHUNKLOOKUP_WIDTH;
}

if (flags & MIDX_PROGRESS)
progress = start_progress(_("Writing chunks to multi-pack-index"),
num_chunks);
for (i = 0; i < num_chunks; i++) {
if (written != chunk_offsets[i])
BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32,
Expand Down Expand Up @@ -990,7 +1004,10 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
BUG("trying to write unknown chunk id %"PRIx32,
chunk_ids[i]);
}

display_progress(progress, i + 1);
}
stop_progress(&progress);

if (written != chunk_offsets[num_chunks])
BUG("incorrect final offset %"PRIu64" != %"PRIu64,
Expand All @@ -1016,9 +1033,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
return result;
}

int write_midx_file(const char *object_dir)
int write_midx_file(const char *object_dir, unsigned flags)
{
return write_midx_internal(object_dir, NULL, NULL);
return write_midx_internal(object_dir, NULL, NULL, flags);
}

void clear_midx_file(struct repository *r)
Expand Down Expand Up @@ -1076,19 +1093,20 @@ static int compare_pair_pos_vs_id(const void *_a, const void *_b)
display_progress(progress, _n); \
} while (0)

int verify_midx_file(struct repository *r, const char *object_dir)
int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags)
{
struct pair_pos_vs_id *pairs = NULL;
uint32_t i;
struct progress *progress;
struct progress *progress = NULL;
struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
verify_midx_error = 0;

if (!m)
return 0;

progress = start_progress(_("Looking for referenced packfiles"),
m->num_packs);
if (flags & MIDX_PROGRESS)
progress = start_progress(_("Looking for referenced packfiles"),
m->num_packs);
for (i = 0; i < m->num_packs; i++) {
if (prepare_midx_pack(r, m, i))
midx_report("failed to load pack in position %d", i);
Expand All @@ -1106,8 +1124,9 @@ int verify_midx_file(struct repository *r, const char *object_dir)
i, oid_fanout1, oid_fanout2, i + 1);
}

progress = start_sparse_progress(_("Verifying OID order in MIDX"),
m->num_objects - 1);
if (flags & MIDX_PROGRESS)
progress = start_sparse_progress(_("Verifying OID order in multi-pack-index"),
m->num_objects - 1);
for (i = 0; i < m->num_objects - 1; i++) {
struct object_id oid1, oid2;

Expand All @@ -1134,13 +1153,15 @@ int verify_midx_file(struct repository *r, const char *object_dir)
pairs[i].pack_int_id = nth_midxed_pack_int_id(m, i);
}

progress = start_sparse_progress(_("Sorting objects by packfile"),
m->num_objects);
if (flags & MIDX_PROGRESS)
progress = start_sparse_progress(_("Sorting objects by packfile"),
m->num_objects);
display_progress(progress, 0); /* TODO: Measure QSORT() progress */
QSORT(pairs, m->num_objects, compare_pair_pos_vs_id);
stop_progress(&progress);

progress = start_sparse_progress(_("Verifying object offsets"), m->num_objects);
if (flags & MIDX_PROGRESS)
progress = start_sparse_progress(_("Verifying object offsets"), m->num_objects);
for (i = 0; i < m->num_objects; i++) {
struct object_id oid;
struct pack_entry e;
Expand Down Expand Up @@ -1183,23 +1204,34 @@ int verify_midx_file(struct repository *r, const char *object_dir)
return verify_midx_error;
}

int expire_midx_packs(struct repository *r, const char *object_dir)
int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags)
{
uint32_t i, *count, result = 0;
struct string_list packs_to_drop = STRING_LIST_INIT_DUP;
struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
struct progress *progress = NULL;

if (!m)
return 0;

count = xcalloc(m->num_packs, sizeof(uint32_t));

if (flags & MIDX_PROGRESS)
progress = start_progress(_("Counting referenced objects"),
m->num_objects);
for (i = 0; i < m->num_objects; i++) {
int pack_int_id = nth_midxed_pack_int_id(m, i);
count[pack_int_id]++;
display_progress(progress, i + 1);
}
stop_progress(&progress);

if (flags & MIDX_PROGRESS)
progress = start_progress(_("Finding and deleting unreferenced packfiles"),
m->num_packs);
for (i = 0; i < m->num_packs; i++) {
char *pack_name;
display_progress(progress, i + 1);

if (count[i])
continue;
Expand All @@ -1217,11 +1249,12 @@ int expire_midx_packs(struct repository *r, const char *object_dir)
unlink_pack_path(pack_name, 0);
free(pack_name);
}
stop_progress(&progress);

free(count);

if (packs_to_drop.nr)
result = write_midx_internal(object_dir, m, &packs_to_drop);
result = write_midx_internal(object_dir, m, &packs_to_drop, flags);

string_list_clear(&packs_to_drop, 0);
return result;
Expand Down Expand Up @@ -1315,7 +1348,7 @@ static int fill_included_packs_batch(struct repository *r,
return 0;
}

int midx_repack(struct repository *r, const char *object_dir, size_t batch_size)
int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
{
int result = 0;
uint32_t i;
Expand All @@ -1340,6 +1373,12 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size)
strbuf_addstr(&base_name, object_dir);
strbuf_addstr(&base_name, "/pack/pack");
argv_array_push(&cmd.args, base_name.buf);

if (flags & MIDX_PROGRESS)
argv_array_push(&cmd.args, "--progress");
else
argv_array_push(&cmd.args, "-q");

strbuf_release(&base_name);

cmd.git_cmd = 1;
Expand Down Expand Up @@ -1370,7 +1409,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size)
goto cleanup;
}

result = write_midx_internal(object_dir, m, NULL);
result = write_midx_internal(object_dir, m, NULL, flags);
m = NULL;

cleanup:
Expand Down
10 changes: 6 additions & 4 deletions midx.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ struct multi_pack_index {
char object_dir[FLEX_ARRAY];
};

#define MIDX_PROGRESS (1 << 0)

struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
Expand All @@ -47,11 +49,11 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name);
int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);

int write_midx_file(const char *object_dir);
int write_midx_file(const char *object_dir, unsigned flags);
void clear_midx_file(struct repository *r);
int verify_midx_file(struct repository *r, const char *object_dir);
int expire_midx_packs(struct repository *r, const char *object_dir);
int midx_repack(struct repository *r, const char *object_dir, size_t batch_size);
int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags);

void close_midx(struct multi_pack_index *m);

Expand Down
Loading