Skip to content

Dir rename fixes #390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

newren
Copy link

@newren newren commented Oct 11, 2019

This series improves a couple things found after looking into things Dscho flagged:

  • clarify and slightly restructure code in the get_renamed_dir_portion() function
  • extend support of detecting renaming/merging of one directory into another to support the root directory as a target directory

First patch best viewed with a --histogram diff (sorry, gitgitgadget does not yet know how to generate those).

Changes since v1:

  • Incorporated code cleanups suggested by Dscho
  • Fixed to work with an alternate rename-to-root-directory case (end_of_new == NULL), with new testcase
  • Added a new patch to the end of the series to stop making setup tests be part of a separate test_expect_success block.

Cc: Johannes Schindelin [email protected]

Dscho noted a few things making this function hard to follow.
Restructure it a bit and add comments to make it easier to follow.  The
restructurings include:

  * There was a special case if-check at the end of the function
    checking whether someone just renamed a file within its original
    directory, meaning that there could be no directory rename involved.
    That check was slightly convoluted; it could be done in a more
    straightforward fashion earlier in the function, and can be done
    more cheaply too (no call to strncmp).

  * The conditions for advancing end_of_old and end_of_new before
    calling strchr were both confusing and unnecessary.  If either
    points at a '/', then they need to be advanced in order to find the
    next '/'.  If either doesn't point at a '/', then advancing them one
    char before calling strchr() doesn't hurt.  So, just rip out the
    if conditions and advance both before calling strchr().

Signed-off-by: Elijah Newren <[email protected]>
@newren
Copy link
Author

newren commented Oct 11, 2019

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 11, 2019

Submitted as [email protected]

WARNING: newren has no public email address set on GitHub

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 12, 2019

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Elijah,

On Fri, 11 Oct 2019, Elijah Newren via GitGitGadget wrote:

> This series improves a couple things found after looking into things Dsc=
ho
> flagged:
>
>  * clarify and slightly restructure code in the get_renamed_dir_portion(=
)
>    function
>
>  * extend support of detecting renaming/merging of one directory into
>    another to support the root directory as a target directory

Will have a look in a moment,

> First patch best viewed with a --histogram diff, which I sadly don't kno=
w
> how to make gitgitgadget generate.

Currently, there is no way to do that yet. It should not be too
difficult to implement code to pick up a footer in the PR description to
do that, I just don't have the time right now to do so.

Essentially, it would have to imitate the already-existing code path
that picks up the `Cc:` footer in the PR description.

If you're interested, have a look at the `parsePullRequestDescription()`
function in `lib/patch-series.ts` in
https://github.com/gitgitgadget/gitgitgadget/.

Ciao,
Dscho

> Elijah Newren (2):
>   merge-recursive: clean up get_renamed_dir_portion()
>   merge-recursive: fix merging a subdirectory into the root directory
>
>  merge-recursive.c                   | 89 +++++++++++++++++++++--------
>  t/t6043-merge-rename-directories.sh | 56 ++++++++++++++++++
>  2 files changed, 121 insertions(+), 24 deletions(-)
>
>
> base-commit: 08da6496b61341ec45eac36afcc8f94242763468
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-390%2F=
newren%2Fdir-rename-fixes-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-390/newre=
n/dir-rename-fixes-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/390
> --
> gitgitgadget
>

@@ -1943,8 +1943,8 @@ static void get_renamed_dir_portion(const char *old_path, const char *new_path,
char **old_dir, char **new_dir)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Elijah,

On Fri, 11 Oct 2019, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <[email protected]>
>
> Dscho noted a few things making this function hard to follow.
> Restructure it a bit and add comments to make it easier to follow.  The
> restructurings include:
>
>   * There was a special case if-check at the end of the function
>     checking whether someone just renamed a file within its original
>     directory, meaning that there could be no directory rename involved.
>     That check was slightly convoluted; it could be done in a more
>     straightforward fashion earlier in the function, and can be done
>     more cheaply too (no call to strncmp).
>
>   * The conditions for advancing end_of_old and end_of_new before
>     calling strchr were both confusing and unnecessary.  If either
>     points at a '/', then they need to be advanced in order to find the
>     next '/'.  If either doesn't point at a '/', then advancing them one
>     char before calling strchr() doesn't hurt.  So, just rip out the
>     if conditions and advance both before calling strchr().
>
> Signed-off-by: Elijah Newren <[email protected]>

This commit message, as well as the patch, make a lot of sense to me.
Thank you for doing this!

Ciao,
Dscho

> ---
>  merge-recursive.c | 60 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 36 insertions(+), 24 deletions(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 22a12cfeba..f80e48f623 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -1943,8 +1943,8 @@ static void get_renamed_dir_portion(const char *ol=
d_path, const char *new_path,
>  				    char **old_dir, char **new_dir)
>  {
>  	char *end_of_old, *end_of_new;
> -	int old_len, new_len;
>
> +	/* Default return values: NULL, meaning no rename */
>  	*old_dir =3D NULL;
>  	*new_dir =3D NULL;
>
> @@ -1955,43 +1955,55 @@ static void get_renamed_dir_portion(const char *=
old_path, const char *new_path,
>  	 *    "a/b/c/d" was renamed to "a/b/some/thing/else"
>  	 * so, for this example, this function returns "a/b/c/d" in
>  	 * *old_dir and "a/b/some/thing/else" in *new_dir.
> -	 *
> -	 * Also, if the basename of the file changed, we don't care.  We
> -	 * want to know which portion of the directory, if any, changed.
> +	 */
> +
> +	/*
> +	 * If the basename of the file changed, we don't care.  We want
> +	 * to know which portion of the directory, if any, changed.
>  	 */
>  	end_of_old =3D strrchr(old_path, '/');
>  	end_of_new =3D strrchr(new_path, '/');
> -
>  	if (end_of_old =3D=3D NULL || end_of_new =3D=3D NULL)
> -		return;
> +		return; /* We haven't modified *old_dir or *new_dir yet. */
> +
> +	/* Find the first non-matching character traversing backwards */
>  	while (*--end_of_new =3D=3D *--end_of_old &&
>  	       end_of_old !=3D old_path &&
>  	       end_of_new !=3D new_path)
>  		; /* Do nothing; all in the while loop */
> +
>  	/*
> -	 * We've found the first non-matching character in the directory
> -	 * paths.  That means the current directory we were comparing
> -	 * represents the rename.  Move end_of_old and end_of_new back
> -	 * to the full directory name.
> +	 * If both got back to the beginning of their strings, then the
> +	 * directory didn't change at all, only the basename did.
>  	 */
> -	if (*end_of_old =3D=3D '/')
> -		end_of_old++;
> -	if (*end_of_old !=3D '/')
> -		end_of_new++;
> -	end_of_old =3D strchr(end_of_old, '/');
> -	end_of_new =3D strchr(end_of_new, '/');
> +	if (end_of_old =3D=3D old_path && end_of_new =3D=3D new_path &&
> +	    *end_of_old =3D=3D *end_of_new)
> +		return; /* We haven't modified *old_dir or *new_dir yet. */
>
>  	/*
> -	 * It may have been the case that old_path and new_path were the same
> -	 * directory all along.  Don't claim a rename if they're the same.
> +	 * We've found the first non-matching character in the directory
> +	 * paths.  That means the current characters we were looking at
> +	 * were part of the first non-matching subdir name going back from
> +	 * the end of the strings.  Get the whole name by advancing both
> +	 * end_of_old and end_of_new to the NEXT '/' character.  That will
> +	 * represent the entire directory rename.
> +	 *
> +	 * The reason for the increment is cases like
> +	 *    a/b/star/foo/whatever.c -> a/b/tar/foo/random.c
> +	 * After dropping the basename and going back to the first
> +	 * non-matching character, we're now comparing:
> +	 *    a/b/s          and         a/b/
> +	 * and we want to be comparing:
> +	 *    a/b/star/      and         a/b/tar/
> +	 * but without the pre-increment, the one on the right would stay
> +	 * a/b/.
>  	 */
> -	old_len =3D end_of_old - old_path;
> -	new_len =3D end_of_new - new_path;
> +	end_of_old =3D strchr(++end_of_old, '/');
> +	end_of_new =3D strchr(++end_of_new, '/');
>
> -	if (old_len !=3D new_len || strncmp(old_path, new_path, old_len)) {
> -		*old_dir =3D xstrndup(old_path, old_len);
> -		*new_dir =3D xstrndup(new_path, new_len);
> -	}
> +	/* Copy the old and new directories into *old_dir and *new_dir. */
> +	*old_dir =3D xstrndup(old_path, end_of_old - old_path);
> +	*new_dir =3D xstrndup(new_path, end_of_new - new_path);
>  }
>
>  static void remove_hashmap_entries(struct hashmap *dir_renames,
> --
> gitgitgadget
>
>

@@ -1931,6 +1931,16 @@ static char *apply_dir_rename(struct dir_rename_entry *entry,
return NULL;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Elijah,

On Fri, 11 Oct 2019, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <[email protected]>
>
> We allow renaming all entries in e.g. a directory named z/ into a
> directory named y/ to be detected as a z/ -> y/ rename, so that if the
> other side of history adds any files to the directory z/ in the mean
> time, we can provide the hint that they should be moved to y/.
>
> There is no reason to not allow 'y/' to be the root directory, but the
> code did not handle that case correctly.  Add a testcase and the
> necessary special checks to support this case.
>
> Signed-off-by: Elijah Newren <[email protected]>

This makes sense.

> ---
>  merge-recursive.c                   | 29 +++++++++++++++
>  t/t6043-merge-rename-directories.sh | 56 +++++++++++++++++++++++++++++

It is good to have a test case verifying this!

FWIW I frequently run into those same issues because I have to use --
quite often! -- `contrib/fast-import/import-tars.perl` in the Git for
Windows project (in the MSYS2 part thereof, to be precise), and the
`pax_global_header` and I will probably never become friends, so I often
have to move files into the top-level directory...

> diff --git a/merge-recursive.c b/merge-recursive.c
> index f80e48f623..7bd4a7cf10 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -1931,6 +1931,16 @@ static char *apply_dir_rename(struct dir_rename_e=
ntry *entry,
>  		return NULL;
>
>  	oldlen =3D strlen(entry->dir);
> +	if (entry->new_dir.len =3D=3D 0)
> +		/*
> +		 * If someone renamed/merged a subdirectory into the root
> +		 * directory (e.g. 'some/subdir' -> ''), then we want to
> +		 * avoid returning
> +		 *     '' + '/filename'
> +		 * as the rename; we need to make old_path + oldlen advance
> +		 * past the '/' character.
> +		 */
> +		oldlen++;

This makes sense to me.

>  	newlen =3D entry->new_dir.len + (strlen(old_path) - oldlen) + 1;
>  	strbuf_grow(&new_path, newlen);
>  	strbuf_addbuf(&new_path, &entry->new_dir);
> @@ -1980,6 +1990,25 @@ static void get_renamed_dir_portion(const char *o=
ld_path, const char *new_path,
>  	    *end_of_old =3D=3D *end_of_new)
>  		return; /* We haven't modified *old_dir or *new_dir yet. */
>
> +	/*
> +	 * If end_of_new got back to the beginning of its string, and
> +	 * end_of_old got back to the beginning of some subdirectory, then
> +	 * we have a rename/merge of a subdirectory into the root, which
> +	 * needs slightly special handling.
> +	 *
> +	 * Note: There is no need to consider the opposite case, with a
> +	 * rename/merge of the root directory into some subdirectory.
> +	 * Our rule elsewhere that a directory which still exists is not
> +	 * considered to have been renamed means the root directory can
> +	 * never be renamed (because the root directory always exists).
> +	 */
> +	if (end_of_new =3D=3D new_path &&
> +	    end_of_old !=3D old_path && end_of_old[-1] =3D=3D '/') {
> +		*old_dir =3D xstrndup(old_path, end_of_old-1 - old_path);
> +		*new_dir =3D xstrndup(new_path, end_of_new - new_path);

However, here we write something convoluted that essentially amounts to
`xstrdup("")`. I would rather have that simple call than the convoluted
one that would puzzle me every time I have to look at this part of the
code.

While at it, would you mind either surrounding the `-` and the `1` by
spaces, or even write `--end_of_old - old_path`?

> +		return;
> +	}
> +
>  	/*
>  	 * We've found the first non-matching character in the directory
>  	 * paths.  That means the current characters we were looking at
> diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-=
directories.sh
> index c966147d5d..b920bb0850 100755
> --- a/t/t6043-merge-rename-directories.sh
> +++ b/t/t6043-merge-rename-directories.sh
> @@ -4051,6 +4051,62 @@ test_expect_success '12c-check: Moving one direct=
ory hierarchy into another w/ c
>  	)
>  '
>
> +# Testcase 12d, Rename/merge of subdirectory into the root
> +#   Commit O: a/b/{foo.c}
> +#   Commit A: foo.c
> +#   Commit B: a/b/{foo.c,bar.c}
> +#   Expected: a/b/{foo.c,bar.c}
> +
> +test_expect_success '12d-setup: Rename (merge) of subdirectory into the=
 root' '
> +	test_create_repo 12d &&
> +	(
> +		cd 12d &&
> +
> +		mkdir -p a/b/subdir &&
> +		test_commit a/b/subdir/foo.c &&

Why `.c`? That's a little distracting.

> +
> +		git branch O &&

Might be simpler just to use `master` subsequently and not "waste" a new
ref on that.

> +		git branch A &&

Might make more sense to create it below, via the `-b` option of `git
checkout`.

Or, for extra brownie points, via the `-c` option of `git switch`.

> +		git branch B &&

Likewise, this might want to be created below, via replacing `git
checkout B` with `git switch -c B master`.

> +
> +		git checkout A &&
> +		mkdir subdir &&
> +		git mv a/b/subdir/foo.c.t subdir/foo.c.t &&
> +		test_tick &&
> +		git commit -m "A" &&
> +
> +		git checkout B &&
> +		test_commit a/b/bar.c
> +	)
> +'
> +
> +test_expect_success '12d-check: Rename (merge) of subdirectory into the=
 root' '

For the record: I am still a huge anti-fan of splitting `setup` test
cases from the test cases that do actual things, _unless_ it is _one_,
and _only one_, big, honking `setup` test case that is the very first
one in the test script.

Splitting things into two inevitably leads to unnecessary churn when
investigating test failures.

And that's really what test cases need to be optimized for: when they
report breakages. They need to help as much as they can to investigate
why things break. Nobody cares when test cases succeed. The ~150k test
cases that pass on every CI build: nobody is interested. When a test
case reports failure, that's when people pay attention. At least some,
including me.

The most common case for me (and every other lazy person who relies on
CI/PR builds) is when a build breaks, and then I usually get to the
trace of the actually failing test case very quickly. The previous test
case's trace, not so easy. Clicks involved. Now I lose context. Not
good.

A less common case for me is when I run test scripts locally, with `-i
-v -x`. Still, I need to scroll back to get context. And then, really, I
already lost context.

> +	(
> +		cd 12d &&
> +
> +		git checkout A^0 &&
> +
> +		git -c merge.directoryRenames=3Dtrue merge -s recursive B^0 &&
> +
> +		git ls-files -s >out &&
> +		test_line_count =3D 2 out &&

Isn't this a bit overzealous?

> +
> +		git rev-parse >actual \
> +			HEAD:subdir/foo.c.t   HEAD:bar.c.t &&
> +		git rev-parse >expect \
> +			O:a/b/subdir/foo.c.t  B:a/b/bar.c.t &&
> +		test_cmp expect actual &&

Likewise?

> +
> +		git hash-object bar.c.t >actual &&
> +		git rev-parse B:a/b/bar.c.t >expect &&
> +		test_cmp expect actual &&

Likewise?

> +
> +		test_must_fail git rev-parse HEAD:a/b/subdir/foo.c.t &&
> +		test_must_fail git rev-parse HEAD:a/b/bar.c.t &&
> +		test_path_is_missing a/

Makes sense, but the part that I am missing is

		test_path_is_file bar.c.t

I.e. the _most_ important outcome of this test is: the rename was
detected, and the added file was correctly placed into the target
directory of the rename.

Thanks,
Dscho

> +	)
> +'
> +
>  #######################################################################=
####
>  # SECTION 13: Checking informational and conflict messages
>  #
> --
> gitgitgadget
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

Hi Dscho,

Thanks for the reviews!

On Sat, Oct 12, 2019 at 1:37 PM Johannes Schindelin
<[email protected]> wrote:
> On Fri, 11 Oct 2019, Elijah Newren via GitGitGadget wrote:
>
[...]
> > @@ -1980,6 +1990,25 @@ static void get_renamed_dir_portion(const char *old_path, const char *new_path,
> >           *end_of_old == *end_of_new)
> >               return; /* We haven't modified *old_dir or *new_dir yet. */
> >
> > +     /*
> > +      * If end_of_new got back to the beginning of its string, and
> > +      * end_of_old got back to the beginning of some subdirectory, then
> > +      * we have a rename/merge of a subdirectory into the root, which
> > +      * needs slightly special handling.
> > +      *
> > +      * Note: There is no need to consider the opposite case, with a
> > +      * rename/merge of the root directory into some subdirectory.
> > +      * Our rule elsewhere that a directory which still exists is not
> > +      * considered to have been renamed means the root directory can
> > +      * never be renamed (because the root directory always exists).
> > +      */
> > +     if (end_of_new == new_path &&
> > +         end_of_old != old_path && end_of_old[-1] == '/') {
> > +             *old_dir = xstrndup(old_path, end_of_old-1 - old_path);
> > +             *new_dir = xstrndup(new_path, end_of_new - new_path);
>
> However, here we write something convoluted that essentially amounts to
> `xstrdup("")`. I would rather have that simple call than the convoluted
> one that would puzzle me every time I have to look at this part of the
> code.

Makes sense; I can switch it over.

>
> While at it, would you mind either surrounding the `-` and the `1` by
> spaces, or even write `--end_of_old - old_path`?

Sounds good to me; I'll make the change.


> > diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh
> > index c966147d5d..b920bb0850 100755
> > --- a/t/t6043-merge-rename-directories.sh
> > +++ b/t/t6043-merge-rename-directories.sh
> > @@ -4051,6 +4051,62 @@ test_expect_success '12c-check: Moving one directory hierarchy into another w/ c
> >       )
> >  '
> >
> > +# Testcase 12d, Rename/merge of subdirectory into the root
> > +#   Commit O: a/b/{foo.c}
> > +#   Commit A: foo.c
> > +#   Commit B: a/b/{foo.c,bar.c}
> > +#   Expected: a/b/{foo.c,bar.c}

Note the nice explanation of the testcase setup at the beginning of
every test within this file...

> > +
> > +test_expect_success '12d-setup: Rename (merge) of subdirectory into the root' '
> > +     test_create_repo 12d &&
> > +     (
> > +             cd 12d &&
> > +
> > +             mkdir -p a/b/subdir &&
> > +             test_commit a/b/subdir/foo.c &&
>
> Why `.c`? That's a little distracting.

I can toss it.

> > +
> > +             git branch O &&
>
> Might be simpler just to use `master` subsequently and not "waste" a new
> ref on that.

I could do so, but then this makes the testcase description comment
earlier harder to read comparing "master", "A", and "B".  Having the
same length simplifies it a bit, and the triple of O, A, and B are
also used quite a bit in merge-recursive.c (e.g. in process_entry()
and merge_threeway() and other places).

Also, behaving differently for this test than the other 50+ tests in
the testfile would break the comment at the beginning of t6043 which
explains how *every* test in the file is of a certain form, using O,
A, and B.

>
> > +             git branch A &&
>
> Might make more sense to create it below, via the `-b` option of `git
> checkout`.
>
> Or, for extra brownie points, via the `-c` option of `git switch`.
>
> > +             git branch B &&
>
> Likewise, this might want to be created below, via replacing `git
> checkout B` with `git switch -c B master`.

I'm not sure I see why it'd be beneficial to switch this, though in
isolation I also don't see any drawbacks with your suggestion either.
It looks entirely reasonable, so I'd probably just do it if it weren't
for the fact that there are four dozen or so other tests in the same
file that already do it this way.  I'd rather keep the file internally
consistent, and there's a bit too much inertia for me to want to
switch all the tests over...unless you can provide a reason to
strongly prefer one style over the other?

> > +
> > +             git checkout A &&
> > +             mkdir subdir &&
> > +             git mv a/b/subdir/foo.c.t subdir/foo.c.t &&
> > +             test_tick &&
> > +             git commit -m "A" &&
> > +
> > +             git checkout B &&
> > +             test_commit a/b/bar.c
> > +     )
> > +'
> > +
> > +test_expect_success '12d-check: Rename (merge) of subdirectory into the root' '
>
> For the record: I am still a huge anti-fan of splitting `setup` test
> cases from the test cases that do actual things, _unless_ it is _one_,
> and _only one_, big, honking `setup` test case that is the very first
> one in the test script.
>
> Splitting things into two inevitably leads to unnecessary churn when
> investigating test failures.
>
> And that's really what test cases need to be optimized for: when they
> report breakages. They need to help as much as they can to investigate
> why things break. Nobody cares when test cases succeed. The ~150k test
> cases that pass on every CI build: nobody is interested. When a test
> case reports failure, that's when people pay attention. At least some,
> including me.
>
> The most common case for me (and every other lazy person who relies on
> CI/PR builds) is when a build breaks, and then I usually get to the
> trace of the actually failing test case very quickly. The previous test
> case's trace, not so easy. Clicks involved. Now I lose context. Not
> good.
>
> A less common case for me is when I run test scripts locally, with `-i
> -v -x`. Still, I need to scroll back to get context. And then, really, I
> already lost context.

I guess we have some strongly differing opinions here.  The one thing
I do agree with you on is test cases need to be optimized for when
they report breakages, but that is precisely what led me to splitting
setup and testing.  Way too many tests in the testsuite intersperse
several setup and test cases together making it really hard to
disentangle, understand what is going on, or even reverse engineer
what is relevant.  The absolute worst tests are the ones which just
keep making additional changes to some existing repo to provide extra
setup, causing all sorts of problems for skipping and resuming and
understanding (to say nothing of test prerequisites that aren't always
met).  But the ones with one big honking `setup` test case that is the
very first one in the test script can often be pretty bad too when
you've found a bug in testcase 82 and want to have a simple way to
reproduce.  It's awesome when someone can just run ./testcase.sh --ver
--imm -x --run=86,87 and reproduce the problem.  It feels sadly rare
to be able to do that in much of git.git's testsuite.  Not accreting
ever more changes into a repo to setup subsequent problems, using
entirely separate repos for different cases where testing makes any
changes, making it clear what is setup, making it clear what's the
command being tested, and making it clear what all the commands are
that are testing that the original test command produced the expected
behavior all go a long way to making things way easier to investigate.

Now, I will re-use a setup case for multiple tests and even did so in
t6043, but only when the setup really is identical (e.g. not only are
the tests expecting an identical setup, but the commands being tested
are read-only or any changes they make are unconditionally reverted as
the last step of the testcase).  Naturally, when I re-use a setup
block multiple times, I really don't want to copy the setup into each
of those tests either.

It would be really nice, though, if there were some way for me to
specify that a given "testcase" was just setup (so that they don't
even get a number in the prove output and don't count as a "test"
except maybe when they fail), and if there were some way to specify
which testcases depend on which setup blocks (so that if I want to run
just a given test, the setup test gets automatically included).


Your example of CI/PR builds makes sense to me, and I really would
like to make that nicer and improve other workflows too, especially if
it can be done without breaking my ability to investigate test
failures.  If someone can think of a clever way to do that (e.g. some
way of implementing my "It would be really nice" in the last
paragraph, and in a way that could also benefit CI/PR builds), I'd be
happy to go through and help switch things over.

> > +     (
> > +             cd 12d &&
> > +
> > +             git checkout A^0 &&
> > +
> > +             git -c merge.directoryRenames=true merge -s recursive B^0 &&
> > +
> > +             git ls-files -s >out &&
> > +             test_line_count = 2 out &&
>
> Isn't this a bit overzealous?
>
> > +
> > +             git rev-parse >actual \
> > +                     HEAD:subdir/foo.c.t   HEAD:bar.c.t &&
> > +             git rev-parse >expect \
> > +                     O:a/b/subdir/foo.c.t  B:a/b/bar.c.t &&
> > +             test_cmp expect actual &&
>
> Likewise?
>
> > +
> > +             git hash-object bar.c.t >actual &&
> > +             git rev-parse B:a/b/bar.c.t >expect &&
> > +             test_cmp expect actual &&
>
> Likewise?
>
> > +
> > +             test_must_fail git rev-parse HEAD:a/b/subdir/foo.c.t &&
> > +             test_must_fail git rev-parse HEAD:a/b/bar.c.t &&
> > +             test_path_is_missing a/
>
> Makes sense, but the part that I am missing is
>
>                 test_path_is_file bar.c.t
>
> I.e. the _most_ important outcome of this test is: the rename was
> detected, and the added file was correctly placed into the target
> directory of the rename.

That's a useful thing to add to the test, I'll add it.  (It's kind of
included in the 'git hash-object bar.c.t' half a dozen or so lines
above, but this line helps document the expectation a bit better.)

I'd be very reticent to include only this test_path_is_file check, as
it makes no guarantee that it has the right contents or that we didn't
also keep around another copy in a/b/bar.c.t, etc.  I understand my
checks above look overzealous, but merge-recursive has been a bit of a
pain to understand and deal with at times, and I've been bitten one
too many times with tests (of merge-recursive and elsewhere in
git.git) that passed but still did the wrong thing because they only
tested one way in which the test problem failed in the past rather
than specifying what correct behavior is.  And my "overzealousness" in
the merge-recursive tests really have caught and prevented bugs would
have been missed otherwise.


Oh, and I think there's another place in the code that needs to be
tweaked to make sure we handle renaming subdirectories into the root
directory that I missed (and just wasn't tested by this testcase), so
I'll check into it and if so fix the code and add another testcase,
and include the fixups I agreed to above and send out a v2.  Probably
won't get to it until the middle of next week, though.


Thanks again for the review and food for thought.  The CI/PR builds
sound particularly interesting; I'd really like to make those better
for you if it can be done in a way that doesn't break command line
builds with clear documentation of setup and expectations while
supporting test skipping and post-facto investigation (even without
--immediate or re-running), etc.  Hmm...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Elijah,

On Sat, 12 Oct 2019, Elijah Newren wrote:

> On Sat, Oct 12, 2019 at 1:37 PM Johannes Schindelin
> <[email protected]> wrote:
> >
> [...]
> > For the record: I am still a huge anti-fan of splitting `setup` test
> > cases from the test cases that do actual things, _unless_ it is
> > _one_, and _only one_, big, honking `setup` test case that is the
> > very first one in the test script.
> >
> > Splitting things into two inevitably leads to unnecessary churn when
> > investigating test failures.
> >
> > And that's really what test cases need to be optimized for: when
> > they report breakages. They need to help as much as they can to
> > investigate why things break. Nobody cares when test cases succeed.
> > The ~150k test cases that pass on every CI build: nobody is
> > interested. When a test case reports failure, that's when people pay
> > attention. At least some, including me.
> >
> > The most common case for me (and every other lazy person who relies
> > on CI/PR builds) is when a build breaks, and then I usually get to
> > the trace of the actually failing test case very quickly. The
> > previous test case's trace, not so easy. Clicks involved. Now I lose
> > context. Not good.
> >
> > A less common case for me is when I run test scripts locally, with
> > `-i -v -x`. Still, I need to scroll back to get context. And then,
> > really, I already lost context.
>
> I guess we have some strongly differing opinions here.

And probably experiences, too.

I literally debug something like a dozen per week test failures that are
reported via Azure Pipelines. And yes, I ran into some snags even with
your two-parter test cases in the past, and they did not help me. They
increased my burden of figuring out what is going wrong.

> The one thing I do agree with you on is test cases need to be
> optimized for when they report breakages, but that is precisely what
> led me to splitting setup and testing.

To me, it is so not helpful _not_ to see the output of a `setup` that
succeeded, and only the output of the actual test that actually failed.

It removes context.

I need to understand the scenario where the breakage happens, and the
only way I can understand is when I understand the context.

So the context needs to be as close as possible.

> Way too many tests in the testsuite intersperse several setup and test
> cases together making it really hard to disentangle, understand what
> is going on, or even reverse engineer what is relevant.  The absolute
> worst tests are the ones which just keep making additional changes to
> some existing repo to provide extra setup, causing all sorts of
> problems for skipping and resuming and understanding (to say nothing
> of test prerequisites that aren't always met).

I agree with this sentiment, and have to point out that this is yet
another fallout of the way our test suite is implemented. If you look
e.g. at JUnit, there are no "setup test cases". There are specific setup
steps that you can define, there is even a teardown step you can define,
and those individual test cases? They can run in parallel, or
randomized, and they run in their own sandbox, to make sure that they
don't rely on side effects of unrelated test cases.

We don't do that. We don't enforce the absence of side effects, and
therefore we have a megaton of them.

But note how this actually speaks _against_ separating the setup from
the test? Because then you _explicitly_ want those test cases to rely on
one another. Which flies in the _face_ of trying to disentangling them.

> But the ones with one big honking `setup` test case that is the very
> first one in the test script can often be pretty bad too when you've
> found a bug in testcase 82 and want to have a simple way to reproduce.

Indeed. One of the first things I try when `--run=3D82` fails is
`--run=3D1,82`. That's the best we can do in the absence of a clean design
like JUnit and its `@Before` method.

> It's awesome when someone can just run ./testcase.sh --ver --imm -x
> --run=3D86,87 and reproduce the problem.

But of course, often you would need `--run=3D1,86,87`. And it is totally
not obvious to _anybody_ who wants to contribute a new feature and whose
CI/PR build fails in one of your test cases.

> It feels sadly rare to be able to do that in much of git.git's
> testsuite.  Not accreting ever more changes into a repo to setup
> subsequent problems, using entirely separate repos for different cases
> where testing makes any changes, making it clear what is setup, making
> it clear what's the command being tested, and making it clear what all
> the commands are that are testing that the original test command
> produced the expected behavior all go a long way to making things way
> easier to investigate.

Sorry, I disagree. By squirreling away your setup phase into what looks
like an independent test case, the code is not more obvious, but less
so.

> Now, I will re-use a setup case for multiple tests and even did so in
> t6043, but only when the setup really is identical (e.g. not only are
> the tests expecting an identical setup, but the commands being tested
> are read-only or any changes they make are unconditionally reverted as
> the last step of the testcase).  Naturally, when I re-use a setup
> block multiple times, I really don't want to copy the setup into each
> of those tests either.
>
> It would be really nice, though, if there were some way for me to
> specify that a given "testcase" was just setup (so that they don't
> even get a number in the prove output and don't count as a "test"
> except maybe when they fail), and if there were some way to specify
> which testcases depend on which setup blocks (so that if I want to run
> just a given test, the setup test gets automatically included).
>
>
> Your example of CI/PR builds makes sense to me, and I really would
> like to make that nicer and improve other workflows too, especially if
> it can be done without breaking my ability to investigate test
> failures.  If someone can think of a clever way to do that (e.g. some
> way of implementing my "It would be really nice" in the last
> paragraph, and in a way that could also benefit CI/PR builds), I'd be
> happy to go through and help switch things over.

My suggestion: do not rip apart commands that belong together. The setup
phase is often more important to understand than the actually failing
command. _Especially_ when the setup test case reports success, but
actually failed to set things up as intended (which I encountered more
times than I care to count).

> > > +     (
> > > +             cd 12d &&
> > > +
> > > +             git checkout A^0 &&
> > > +
> > > +             git -c merge.directoryRenames=3Dtrue merge -s recursiv=
e B^0 &&
> > > +
> > > +             git ls-files -s >out &&
> > > +             test_line_count =3D 2 out &&
> >
> > Isn't this a bit overzealous?
> >
> > > +
> > > +             git rev-parse >actual \
> > > +                     HEAD:subdir/foo.c.t   HEAD:bar.c.t &&
> > > +             git rev-parse >expect \
> > > +                     O:a/b/subdir/foo.c.t  B:a/b/bar.c.t &&
> > > +             test_cmp expect actual &&
> >
> > Likewise?
> >
> > > +
> > > +             git hash-object bar.c.t >actual &&
> > > +             git rev-parse B:a/b/bar.c.t >expect &&
> > > +             test_cmp expect actual &&
> >
> > Likewise?
> >
> > > +
> > > +             test_must_fail git rev-parse HEAD:a/b/subdir/foo.c.t &=
&
> > > +             test_must_fail git rev-parse HEAD:a/b/bar.c.t &&
> > > +             test_path_is_missing a/
> >
> > Makes sense, but the part that I am missing is
> >
> >                 test_path_is_file bar.c.t
> >
> > I.e. the _most_ important outcome of this test is: the rename was
> > detected, and the added file was correctly placed into the target
> > directory of the rename.
>
> That's a useful thing to add to the test, I'll add it.  (It's kind of
> included in the 'git hash-object bar.c.t' half a dozen or so lines
> above, but this line helps document the expectation a bit better.)
>
> I'd be very reticent to include only this test_path_is_file check, as
> it makes no guarantee that it has the right contents or that we didn't
> also keep around another copy in a/b/bar.c.t, etc.r

I agree that it has to strike a balance. There are multiple aspects you
need to consider:

- It needs to be easy to understand what the test case tries to ensure.

- While it is important to verify that Git works correctly, you do not
  want to make the test suite so slow as to be useless. I vividly
  remember how Duy mentioned that he does not have the time to run the
  test suite before contributing. That's how bad things are _already_.

As a rule of thumb, I would like to submit that a test case should fail
when the claimed goal (i.e. the test case's title) is, and a breakage in
that should both be sufficient and necessary for it to fail.

In this instance, the title is:

	12d-setup: Rename (merge) of subdirectory into the root

Now, this does not even tell me clearly what it tries to ensure. But
from the code, I guess that it wants to ensure that the directory rename
detection can detect when a directory's contents were moved into the
top-level directory.

Now, let's apply the rule of the "sufficient". I would expect the
outcome to be that the file that was added to that directory in one
branch is moved together with the directory's contents that the other
branch moved into the top-level directory.

I was missing that test, hence I pointed it out.

But testing the absence of the file `foo.c.t` in its original location
which was moved in one branch, and not touched in the other, i.e. with a
_trivial_ three-way merge resolution, that seems to be extra. It would
not break if the directory rename detection is broken, so it is an extra
test that does not do what the test case's title claims to test.

Now for the "necessary" part. It is totally unclear to me how all those
other things you test for would be failing _only_ if the directory
rename detection failed. I would actually expect that there are _many_
ways that this test case could fail in ways that have not the faintest
thing to do with directory rename detection.

As somebody who, as I stated above, investigates dozens of test case
failures per week (although, granted, most of them are relatively
trivial to resolve, it is invariably test cases like the one I am
commenting on that require the most time to get resolved), I have to
admit that there are few more frustrating things in debugging than
having to chase issues that have _nothing_ to do with the stated goal of
the test case.

> I understand my checks above look overzealous, but merge-recursive has
> been a bit of a pain to understand and deal with at times, and I've
> been bitten one too many times with tests (of merge-recursive and
> elsewhere in git.git) that passed but still did the wrong thing
> because they only tested one way in which the test problem failed in
> the past rather than specifying what correct behavior is.  And my
> "overzealousness" in the merge-recursive tests really have caught and
> prevented bugs would have been missed otherwise.

If you must have those tests, then please structure them in a way where
they are much more actionable. Ideally, it should be clear from a glance
at the trace of a failed test case (and _only_ from that test case) for
less than five seconds _what_ is being tested, and it should be obvious
what is going wrong.

I cannot say that about the test case in this mail.

> Oh, and I think there's another place in the code that needs to be
> tweaked to make sure we handle renaming subdirectories into the root
> directory that I missed (and just wasn't tested by this testcase), so
> I'll check into it and if so fix the code and add another testcase,
> and include the fixups I agreed to above and send out a v2.  Probably
> won't get to it until the middle of next week, though.
>
>
> Thanks again for the review and food for thought.  The CI/PR builds
> sound particularly interesting; I'd really like to make those better
> for you if it can be done in a way that doesn't break command line
> builds with clear documentation of setup and expectations while
> supporting test skipping and post-facto investigation (even without
> --immediate or re-running), etc.  Hmm...

My hope was that CI/PR builds could help improve the code quality in
`pu`, and to a large part it did.

The time I need to spend on understanding test cases' code (t0021 is a
particularly nasty one, not your fault, of course) before I can even get
to fire up a debugger, this amount of time is insane, though.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

Sorry for the long delay before getting back to this; the other stuff
I was working on took longer than expected.

On Mon, Oct 14, 2019 at 3:42 AM Johannes Schindelin
<[email protected]> wrote:
> On Sat, 12 Oct 2019, Elijah Newren wrote:
> > On Sat, Oct 12, 2019 at 1:37 PM Johannes Schindelin
> > <[email protected]> wrote:
> > >
> > > For the record: I am still a huge anti-fan of splitting `setup` test
> > > cases from the test cases that do actual things, _unless_ it is
> > > _one_, and _only one_, big, honking `setup` test case that is the
> > > very first one in the test script.
[...]
> > The one thing I do agree with you on is test cases need to be
> > optimized for when they report breakages, but that is precisely what
> > led me to splitting setup and testing.
>
> To me, it is so not helpful _not_ to see the output of a `setup` that
> succeeded, and only the output of the actual test that actually failed.
>
> It removes context.
>
> I need to understand the scenario where the breakage happens, and the
> only way I can understand is when I understand the context.
>
> So the context needs to be as close as possible.

I've updated the patch series with a change that I hope helps while
still allowing the setup "steps" to be visibly differentiated from the
testing steps.

> > Way too many tests in the testsuite intersperse several setup and test
> > cases together making it really hard to disentangle, understand what
> > is going on, or even reverse engineer what is relevant.  The absolute
> > worst tests are the ones which just keep making additional changes to
> > some existing repo to provide extra setup, causing all sorts of
> > problems for skipping and resuming and understanding (to say nothing
> > of test prerequisites that aren't always met).
>
> I agree with this sentiment, and have to point out that this is yet
> another fallout of the way our test suite is implemented. If you look
> e.g. at JUnit, there are no "setup test cases". There are specific setup
> steps that you can define, there is even a teardown step you can define,
> and those individual test cases? They can run in parallel, or
> randomized, and they run in their own sandbox, to make sure that they
> don't rely on side effects of unrelated test cases.
>
> We don't do that. We don't enforce the absence of side effects, and
> therefore we have a megaton of them.
>
> But note how this actually speaks _against_ separating the setup from
> the test? Because then you _explicitly_ want those test cases to rely on
> one another. Which flies in the _face_ of trying to disentangling them.

I agree that it is desirable to avoid side effects in the tests, but
I'd like to point out that I'm not at all sure that your conclusion
here is the only logical one to draw here in comparing to JUnit.  As
you point out, JUnit has clearly delineated setup steps for a test (as
well as teardown steps), providing a place to keep them separate.  Our
testsuite lacks that, so how do folks try to get it?  One logical way
would be just inlining the setup steps in the test outside a
test_expect_* block (which has been done in the past), but that has
even worse problems.  Another way, even if suboptimal, is placing
those steps in their own test_expect_* block.  You say just throw the
setup and test together, but that breaks the separation.

I think it's a case of the testsuite not providing the right
abstractions and enough capability, leaving us to argue over which
aspects of a more featureful test harness are most important to
emulate.  You clearly picked one, while I was focusing on another.
Anyway, all that said, I think I have a nice compromise that I'll send
out with V2.

[...]
> > > Makes sense, but the part that I am missing is
> > >
> > >                 test_path_is_file bar.c.t
> > >
> > > I.e. the _most_ important outcome of this test is: the rename was
> > > detected, and the added file was correctly placed into the target
> > > directory of the rename.
> >
> > That's a useful thing to add to the test, I'll add it.  (It's kind of
> > included in the 'git hash-object bar.c.t' half a dozen or so lines
> > above, but this line helps document the expectation a bit better.)
> >
> > I'd be very reticent to include only this test_path_is_file check, as
> > it makes no guarantee that it has the right contents or that we didn't
> > also keep around another copy in a/b/bar.c.t, etc.
>
> I agree that it has to strike a balance. There are multiple aspects you
> need to consider:
>
> - It needs to be easy to understand what the test case tries to ensure.
>
> - While it is important to verify that Git works correctly, you do not
>   want to make the test suite so slow as to be useless. I vividly
>   remember how Duy mentioned that he does not have the time to run the
>   test suite before contributing. That's how bad things are _already_.

On this issue I'm having a harder time finding common ground.  Perhaps
there's something clever to be done, but I'm not seeing it.  I can at
least try to explain my perspective...

I understand you probably meant these two items as a non-exhaustive
list of things that need to be balanced, but you did omit what I feel
is the most important thing in the balance:

- The combination of tests should provide adequate confidence of
coverage that someone can refactor the code as necessary.

The tests in t6036, t6042, t6043, and t6046 are more defensive than
normal (defining "correct" behavior more finely-grained than usual),
but that came about precisely because it was so easy to make
refactoring mistakes that broke things in unexpected ways.  There have
been a handful of times already when I have gone in to refactor or fix
a reported bug, thought I had the right solution, and then only one or
maybe two tests failed and it was one of these tests and in particular
the "overzealous" portions of these tests that caught the bug (sorry I
don't still have details, I just remember being surprised and happy
about the extra defensiveness on multiple occasions).  Without this
extra defensiveness, there would have been extra bugs introduced that
I'm sure would have made it into real git releases.  So, there would
need to be a large pile of evidence and problems before I'd be willing
to take these out.

All that said...

I do very much agree with you that the overall testsuite needs to be
relatively speedy too, and that is an important counterbalance in
general.  But testsuite performance is somewhat of a red-herring in
this context: when running all testcases with 'merge' in their name
which I have contributed anything at all to (apparently 17 t[0-9]*.sh
files meet both these criteria), and running them in serial (no -j8 or
anything of the sort), they all combined complete in less than 40% of
the time that t9001-send-email.sh alone takes on my box.  It'd reduce
to barely over a quarter of the time that the send-email test takes if
it weren't for t3425-rebase-topology-merges.sh (which is the slowest
test with 'merge' in its name that I've touched, despite the fact that
it has none of this extra defensiveness you are complaining about and
labelling as unrelated).  Yes, t6043 is pretty long code-wise, and
still has over five dozen tests after ditching the separate "setup"
tests -- but all of those tests still run in 3.6s on my box.  I think
that a few extra calls to ls-files, rev-parse, hash-object and
test_path_is_missing are the wrong place to look if you're worried
about testsuite performance.

Finally, you also brought up "easy to understand" the test case.  You
are absolutely right that the extra calls to ls-files, rev-parse,
hash-object, test_path_is_missing do slow down an external reader and
almost certainly doesn't fit in your "within five seconds" goal you
mentioned below.  That's unfortunate.  But optimizing this at the
expense of preventing actual bugs that would have hit git.git releases
is a bad tradeoff and one I'm not willing to make.  If you want me to
restrict the extra defensiveness to just the merge-recursive tests,
I'm happy to do so.  I do agree that it's unusual, and probably should
be, but I think merge-recursive merits the special handling.


Hope that helps; thanks for all the feedback.

Elijah

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Elijah,

On Tue, 22 Oct 2019, Elijah Newren wrote:

> [...]
> Yes, t6043 is pretty long code-wise, and still has over five dozen
> tests after ditching the separate "setup" tests -- but all of those
> tests still run in 3.6s on my box. [...]

$ time sh t6043-*.sh --quiet
not ok 74 - 9g-check: Renamed directory that only contained immediate subdirs, immediate subdirs renamed # TODO known breakage
not ok 87 - 10e-check: Does git complain about untracked file that is not really in the way? # TODO known breakage
# still have 2 known breakage(s)
# passed all remaining 117 test(s)
1..119

real    7m22.393s
user    0m52.115s
sys     3m53.212s

And this is not a slow box. So yes, those extra spawned processes? They
accumulate. Spawning processes is what Linux was optimized for. You're
optimizing for Linux.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <[email protected]> writes:

>> Still, I rely pretty heavily on t6036, t6042, t6043, and t6046 for
>> sanity in the face of refactoring and rewriting -- and as mentioned
>> before they have caught refactoring bugs in those areas that appear at
>> first blush as "overzealous", ...
>
> One idea would be to try to guard those extra careful tests behind the
> `EXPENSIVE` prereq.

Yeah, I like that---I think it is perfectly in line with the spirit
of EXPENSIVE, too.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Mon, Oct 28, 2019 at 6:20 PM Junio C Hamano <[email protected]> wrote:
>
> Johannes Schindelin <[email protected]> writes:
>
> >> Still, I rely pretty heavily on t6036, t6042, t6043, and t6046 for
> >> sanity in the face of refactoring and rewriting -- and as mentioned
> >> before they have caught refactoring bugs in those areas that appear at
> >> first blush as "overzealous", ...
> >
> > One idea would be to try to guard those extra careful tests behind the
> > `EXPENSIVE` prereq.
>
> Yeah, I like that---I think it is perfectly in line with the spirit
> of EXPENSIVE, too.

Or perhaps EXPENSIVE_ON_WINDOWS, since it's actually pretty cheap on
linux and not that bad on Mac  However, if we're going down that
route, perhaps t9001-send-email.sh could be wrapped in an EXPENSIVE
prerequisite?  That single test file takes an inordinate percentage of
overall runtime.  One one box with a few extra cpus, that test both
starts first and finishes last...and it's not far from that on even
normal boxes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Elijah,

On Wed, 30 Oct 2019, Elijah Newren wrote:

> On Mon, Oct 28, 2019 at 6:20 PM Junio C Hamano <[email protected]> wrote=
:
> >
> > Johannes Schindelin <[email protected]> writes:
> >
> > >> Still, I rely pretty heavily on t6036, t6042, t6043, and t6046 for
> > >> sanity in the face of refactoring and rewriting -- and as mentioned
> > >> before they have caught refactoring bugs in those areas that appear=
 at
> > >> first blush as "overzealous", ...
> > >
> > > One idea would be to try to guard those extra careful tests behind t=
he
> > > `EXPENSIVE` prereq.
> >
> > Yeah, I like that---I think it is perfectly in line with the spirit
> > of EXPENSIVE, too.
>
> Or perhaps EXPENSIVE_ON_WINDOWS, since it's actually pretty cheap on
> linux and not that bad on Mac

Why the complexity? If you separate out the expensive tests (even if
they are only expensive in terms of run time on Windows), it will make
the regression tests so much more readable to the occasional reader
(making them less expensive in terms of reading time...).

> However, if we're going down that route, perhaps t9001-send-email.sh
> could be wrapped in an EXPENSIVE prerequisite?  That single test file
> takes an inordinate percentage of overall runtime.  One one box with a
> few extra cpus, that test both starts first and finishes last...and
> it's not far from that on even normal boxes.

I would be okay with that.

No, let me try that again. I would be _totally_ okay with that.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

Hi Dscho,

On Wed, Oct 30, 2019 at 3:17 PM Johannes Schindelin
<[email protected]> wrote:
>
> Hi Elijah,
>
> On Wed, 30 Oct 2019, Elijah Newren wrote:
>
> > On Mon, Oct 28, 2019 at 6:20 PM Junio C Hamano <[email protected]> wrote:
> > >
> > > Johannes Schindelin <[email protected]> writes:
> > >
> > > >> Still, I rely pretty heavily on t6036, t6042, t6043, and t6046 for
> > > >> sanity in the face of refactoring and rewriting -- and as mentioned
> > > >> before they have caught refactoring bugs in those areas that appear at
> > > >> first blush as "overzealous", ...
> > > >
> > > > One idea would be to try to guard those extra careful tests behind the
> > > > `EXPENSIVE` prereq.
> > >
> > > Yeah, I like that---I think it is perfectly in line with the spirit
> > > of EXPENSIVE, too.
> >
> > Or perhaps EXPENSIVE_ON_WINDOWS, since it's actually pretty cheap on
> > linux and not that bad on Mac
>
> Why the complexity? If you separate out the expensive tests (even if
> they are only expensive in terms of run time on Windows), it will make
> the regression tests so much more readable to the occasional reader
> (making them less expensive in terms of reading time...).

The "extra careful" things you were complaining about with the new
test I was adding to t6043 was true of every single test in that
file...and likely much of t6036, t6042, and perhaps even t6046 (though
those have fewer tests than t6043).  I have no clue where I'd even
begin to draw the line between them.  If it's possible, it sounds
extremely complex.  Just using the EXPENSIVE_ON_WINDOWS prereq that
already exists would be easy and simple.

Or did you mean you wanted me to duplicate every single test and
attempt to trim down the duplicates somehow?  That'd be a rather large
undertaking that sounds rather unappealing on a few fronts, but maybe
that's what you had in mind?

> > However, if we're going down that route, perhaps t9001-send-email.sh
> > could be wrapped in an EXPENSIVE prerequisite?  That single test file
> > takes an inordinate percentage of overall runtime.  One one box with a
> > few extra cpus, that test both starts first and finishes last...and
> > it's not far from that on even normal boxes.
>
> I would be okay with that.
>
> No, let me try that again. I would be _totally_ okay with that.

Ooh, sweet, sounds like I should propose it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Elijah,

On Wed, 30 Oct 2019, Elijah Newren wrote:

> On Wed, Oct 30, 2019 at 3:17 PM Johannes Schindelin
> <[email protected]> wrote:
> >
> > On Wed, 30 Oct 2019, Elijah Newren wrote:
> >
> > > On Mon, Oct 28, 2019 at 6:20 PM Junio C Hamano <[email protected]> w=
rote:
> > > >
> > > > Johannes Schindelin <[email protected]> writes:
> > > >
> > > > >> Still, I rely pretty heavily on t6036, t6042, t6043, and t6046 =
for
> > > > >> sanity in the face of refactoring and rewriting -- and as menti=
oned
> > > > >> before they have caught refactoring bugs in those areas that ap=
pear at
> > > > >> first blush as "overzealous", ...
> > > > >
> > > > > One idea would be to try to guard those extra careful tests behi=
nd the
> > > > > `EXPENSIVE` prereq.
> > > >
> > > > Yeah, I like that---I think it is perfectly in line with the spiri=
t
> > > > of EXPENSIVE, too.
> > >
> > > Or perhaps EXPENSIVE_ON_WINDOWS, since it's actually pretty cheap on
> > > linux and not that bad on Mac
> >
> > Why the complexity? If you separate out the expensive tests (even if
> > they are only expensive in terms of run time on Windows), it will make
> > the regression tests so much more readable to the occasional reader
> > (making them less expensive in terms of reading time...).
>
> The "extra careful" things you were complaining about with the new
> test I was adding to t6043 was true of every single test in that
> file...and likely much of t6036, t6042, and perhaps even t6046 (though
> those have fewer tests than t6043).  I have no clue where I'd even
> begin to draw the line between them.  If it's possible, it sounds
> extremely complex.  Just using the EXPENSIVE_ON_WINDOWS prereq that
> already exists would be easy and simple.
>
> Or did you mean you wanted me to duplicate every single test and
> attempt to trim down the duplicates somehow?  That'd be a rather large
> undertaking that sounds rather unappealing on a few fronts, but maybe
> that's what you had in mind?

My suggestion was in reply to your question, and not intended for
already-existing tests. That would indeed require a lot of work, and I
am not sure that it would be worth it.

The suggestion was intended for future patches.

Ciao,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 15, 2019

This branch is now known as en/merge-recursive-directory-rename-fixes.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 15, 2019

This patch series was integrated into pu via git@8a3e303.

@gitgitgadget gitgitgadget bot added the pu label Oct 15, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Oct 15, 2019

This patch series was integrated into pu via git@ba56f82.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 15, 2019

This patch series was integrated into next via git@ebfdc3f.

@gitgitgadget gitgitgadget bot added the next label Oct 15, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Oct 16, 2019

This patch series was integrated into pu via git@727f1ab.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 18, 2019

This patch series was integrated into pu via git@16806ff.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 18, 2019

This patch series was integrated into pu via git@9b21057.

@dscho
Copy link
Member

dscho commented Oct 22, 2019

Are you planning on re-doing the entire series? It already made it to next...

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 22, 2019

This patch series was integrated into pu via git@4d83499.

@newren
Copy link
Author

newren commented Oct 22, 2019

Yes, it made it to next...but then was reverted out of next specifically to allow resending with fixes. Sorry for the delay, I'll get it out later today.

We allow renaming all entries in e.g. a directory named z/ into a
directory named y/ to be detected as a z/ -> y/ rename, so that if the
other side of history adds any files to the directory z/ in the mean
time, we can provide the hint that they should be moved to y/.

There is no reason to not allow 'y/' to be the root directory, but the
code did not handle that case correctly.  Add a testcase and the
necessary special checks to support this case.

Signed-off-by: Elijah Newren <[email protected]>
Transform the setup "tests" to setup functions, and have the actual
tests call the setup functions.  Advantages:

  * Should make life easier for people working with webby CI/PR builds
    who have to abuse mice (and their own index finger as well) in
    order to switch from viewing one testcase to another.  Sounds
    awful; hopefully this will improve things for them.

  * Improves re-runnability: any failed test in any of these three
    files can now be re-run in isolation, e.g.
       ./t6042* --ver --imm -x --run=21
    whereas before it would require two tests to be specified to the
    --run argument, the other needing to be picked out as the relevant
    setup test from one or two tests before.

  * Importantly, this still keeps the "setup" and "test" sections
    somewhat separate to make it easier for readers to discern what is
    just ancillary setup and what the intent of the test is.

Signed-off-by: Elijah Newren <[email protected]>
@newren
Copy link
Author

newren commented Oct 22, 2019

Looks like those CI failures were for installing git-lfs, infrastructure not under my control and not used by my tests. Since I can't retrigger, I guess I just ignore those?

@newren
Copy link
Author

newren commented Oct 22, 2019

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 22, 2019

Submitted as [email protected]

WARNING: newren has no public email address set on GitHub

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 23, 2019

This patch series was integrated into pu via git@f649898.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 23, 2019

This patch series was integrated into pu via git@9b81521.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 24, 2019

This patch series was integrated into pu via git@2be4714.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 24, 2019

This patch series was integrated into next via git@82e6402.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 24, 2019

This patch series was integrated into pu via git@4a2735d.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 28, 2019

This patch series was integrated into pu via git@2ce929e.

@dscho
Copy link
Member

dscho commented Oct 29, 2019

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 30, 2019

This patch series was integrated into pu via git@58d5b95.

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 2, 2019

This patch series was integrated into pu via git@771d985.

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 4, 2019

This patch series was integrated into pu via git@42d16e3.

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 6, 2019

This patch series was integrated into pu via git@c95f6e2.

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 7, 2019

This patch series was integrated into pu via git@e673906.

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 8, 2019

This patch series was integrated into pu via git@cf7fdcf.

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 10, 2019

This patch series was integrated into pu via git@a9bfca9.

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 11, 2019

This patch series was integrated into pu via git@d980035.

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 11, 2019

This patch series was integrated into master via git@d980035.

@gitgitgadget gitgitgadget bot added the master label Nov 11, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Nov 11, 2019

Closed via d980035.

@gitgitgadget gitgitgadget bot closed this Nov 11, 2019
@newren newren deleted the dir-rename-fixes branch December 10, 2019 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants