Performance issues running `git blame` in a repository cloned via VFS for Git #753

dscho · 2025-05-21T14:34:35Z

The origin of lines is automatically followed across whole-file renames (currently there is no option to turn the rename-following off).

It is easily shown that git blame follows partial-file renames as well:

git init
echo abc>1.txt
echo def>>1.txt
echo ghi>>1.txt
git add .
git commit -m "Initial commit"
git mv 1.txt 2.txt
echo abc>2.txt
echo 123>>2.txt
echo ghi>>2.txt
git add .
git commit -m "Rename+edit together"
git blame 2.txt

This shows 1.txt for the first line, 2.txt for the second line, and again 1.txt for the third line. A colleague suspects this line is the source of renames being turned on (and I tend to agree), and because diff_opts.rename_score is not set, it defaults to 30000/50% instead of the documented 100%.

This causes problems (appearance of hanging) when running git blame on files in some massive repositories because Git needs to compare blob contents to do partial-file rename detection, which implies trying to download millions of files one-by-one to compare them.

We should consider changing the default blame behavior to only follow exact whole-file renames (ie where the blob sha doesn’t change).

More importantly, we need to add support for arguments like -M[<n>]/--find-renames[=<n>] like git log has.

Also, we probably want to add support for git config settings to control blame's rename behavior, similar to the existing diff.renames, merge.renames, status.renames settings.

In the least, we should update the git blame documentation to clearly state the current behavior (and not assume that all readers have the same idea what constitutes a whole-file rename: does it have to be exact, or is an inexact match acceptable, too?).

@Copilot: here are some guidelines how to implement a fix for this:

Look for the existing implementations of -M/--find-renames support in diff*.c, and see how much you can reuse (avoid copying code if it is easy).

Then look at Documentation/*diff* for the related documentation.

Now, find the spot in *blame.c where the options for git blame are parsed, and add the appropriate code to support the new options. After that, add the documentation to the proper spot in Documentation/git-blame.adoc.

Likely, this would make for a good time to commit the changes. Do pay attention to a thorough commit message, in particular focusing on addressing preemptively any questions as to what we're doing here and why (the intent, context, implementation and justification should be contained in the commit message, leaving out all unnecessary explanations that aren't readily obvious from looking at the diff).

Now it would probably be a good time to find a good spot for a regression test. Look at t/*blame*.sh to see the existing examples, test cases are all enclosed in test_expect_success calls. Find a spot that looks like it is the most appropriate to test whole-sale rename detection, and then add a new test case that verifies an inexact whole-file rename is only detected with specific -M values, by running git blame with two different values.

That would be another commit.

Now, it would be a good time to implement the support for the blame.renames config setting. To understand how to do that, look for the implementation (*.c) and documentation (Documentation/config/*) of above-mentioned .renames settings, and then imitate them.

Do augment the git blame -M test case by changing one existing git blame -M call to specify the config setting via git -c blame.renames=... blame ... instead, and then add another git blame invocation to the same test case that verifies that -M overrides blame.renames.

The text was updated successfully, but these errors were encountered:

dscho assigned Copilot May 22, 2025

Copilot AI linked a pull request May 22, 2025 that will close this issue

Add -M/--find-renames option and blame.renames config to control rename detection #755

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance issues running `git blame` in a repository cloned via VFS for Git #753

Performance issues running `git blame` in a repository cloned via VFS for Git #753

dscho commented May 21, 2025 •

edited

Loading

Performance issues running git blame in a repository cloned via VFS for Git #753

Performance issues running git blame in a repository cloned via VFS for Git #753

Comments

dscho commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance issues running `git blame` in a repository cloned via VFS for Git #753

Performance issues running `git blame` in a repository cloned via VFS for Git #753

dscho commented May 21, 2025 •

edited

Loading