Skip to content

Performance issues running git blame in a repository cloned via VFS for Git #753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dscho opened this issue May 21, 2025 · 0 comments · May be fixed by #755
Open

Performance issues running git blame in a repository cloned via VFS for Git #753

dscho opened this issue May 21, 2025 · 0 comments · May be fixed by #755
Assignees

Comments

@dscho
Copy link
Member

dscho commented May 21, 2025

The documentation for git blame states:

The origin of lines is automatically followed across whole-file renames (currently there is no option to turn the rename-following off).

It is easily shown that git blame follows partial-file renames as well:

git init
echo abc>1.txt
echo def>>1.txt
echo ghi>>1.txt
git add .
git commit -m "Initial commit"
git mv 1.txt 2.txt
echo abc>2.txt
echo 123>>2.txt
echo ghi>>2.txt
git add .
git commit -m "Rename+edit together"
git blame 2.txt

This shows 1.txt for the first line, 2.txt for the second line, and again 1.txt for the third line. A colleague suspects this line is the source of renames being turned on (and I tend to agree), and because diff_opts.rename_score is not set, it defaults to 30000/50% instead of the documented 100%.

This causes problems (appearance of hanging) when running git blame on files in some massive repositories because Git needs to compare blob contents to do partial-file rename detection, which implies trying to download millions of files one-by-one to compare them.

We should consider changing the default blame behavior to only follow exact whole-file renames (ie where the blob sha doesn’t change).

More importantly, we need to add support for arguments like -M[<n>]/--find-renames[=<n>] like git log has.

Also, we probably want to add support for git config settings to control blame's rename behavior, similar to the existing diff.renames, merge.renames, status.renames settings.

In the least, we should update the git blame documentation to clearly state the current behavior (and not assume that all readers have the same idea what constitutes a whole-file rename: does it have to be exact, or is an inexact match acceptable, too?).

@Copilot: here are some guidelines how to implement a fix for this:

Look for the existing implementations of -M/--find-renames support in diff*.c, and see how much you can reuse (avoid copying code if it is easy).

Then look at Documentation/*diff* for the related documentation.

Now, find the spot in *blame.c where the options for git blame are parsed, and add the appropriate code to support the new options. After that, add the documentation to the proper spot in Documentation/git-blame.adoc.

Likely, this would make for a good time to commit the changes. Do pay attention to a thorough commit message, in particular focusing on addressing preemptively any questions as to what we're doing here and why (the intent, context, implementation and justification should be contained in the commit message, leaving out all unnecessary explanations that aren't readily obvious from looking at the diff).

Now it would probably be a good time to find a good spot for a regression test. Look at t/*blame*.sh to see the existing examples, test cases are all enclosed in test_expect_success calls. Find a spot that looks like it is the most appropriate to test whole-sale rename detection, and then add a new test case that verifies an inexact whole-file rename is only detected with specific -M values, by running git blame with two different values.

That would be another commit.

Now, it would be a good time to implement the support for the blame.renames config setting. To understand how to do that, look for the implementation (*.c) and documentation (Documentation/config/*) of above-mentioned .renames settings, and then imitate them.

Do augment the git blame -M test case by changing one existing git blame -M call to specify the config setting via git -c blame.renames=... blame ... instead, and then add another git blame invocation to the same test case that verifies that -M overrides blame.renames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant