Skip to content

repo.Commits.QueryBy(filename) slow on large repos #1705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tster123 opened this issue Aug 20, 2019 · 6 comments
Open

repo.Commits.QueryBy(filename) slow on large repos #1705

tster123 opened this issue Aug 20, 2019 · 6 comments

Comments

@tster123
Copy link

tster123 commented Aug 20, 2019

Reproduction steps

1): Clone a large repo
2): run this function on that repo with some random file:

public IEnumerable<string> TestSlow(string filename)
{
    using (var repo = new Repository(repoRoot))
    {
        string path = filename.Substring(repoRoot.Length + 1).Replace("\\", "/");
        foreach (LogEntry entry in repo.Commits.QueryBy(path))
        {
            yield return entry.Commit.Author.ToString();
        }
    }
}
  1. run this command on the same file: "git log --follow --oneline -- "

Expected behavior

I expect similar time to be taken by TestSlow and the git log command above

Actual behavior

The git log command finishes in about 1.6 ms on my repo
The TestSlow command takes about 70 seconds.

Here is what I see in my profiler:
profiler view

Version of LibGit2Sharp (release number or SHA1)

0.27.0-preview-0017
0.26.1
0.24.1

Operating system(s) tested; .NET runtime tested

.NET Framework 4.7.2 on Windows 10

@tster123
Copy link
Author

As a note: I tried different sorting options, but was limited because FileHistory doesn't support None or Reverse:

System.ArgumentException: Unsupported sort strategy. Only 'Topological', 'Time', or 'Topological | Time' are allowed.
Parameter name: queryFilter
    at LibGit2Sharp.Core.FileHistory..ctor(Repository repo, String path, CommitFilter queryFilter) in C:\projects\libgit2sharp\LibGit2Sharp\Core\FileHistory.cs:line 76

@tster123
Copy link
Author

Just submitted a proposed solution. The change to FileHistory make it run in 14 seconds instead of 80, the change in Tree got it down to about 8 seconds.

I see the continuous integration failed, but that looks like a CI problem, not a problem with my code. Here is the error (linux only):

========================== Starting Command Output ===========================
[command]/bin/bash --noprofile --norc /home/vsts/work/_temp/4d8a90ac-c758-402b-94d3-740f95fba16d.sh
/usr/share/dotnet/sdk/2.2.105/NuGet.targets(499,5): error : Could not find a part of the path '/tmp/NuGetScratch/e31463d7-84e6-4141-aa64-0e5166476164'. [/home/vsts/work/1/s/LibGit2Sharp/LibGit2Sharp.csproj]
##[error]Bash exited with code '1'.
##[section]Finishing: CmdLine

@Blueve
Copy link

Blueve commented Feb 20, 2020

Awesome work! I met same issue and I have to wrap a git.exe and use git log to speed up the log reading.

@blackboxlogic
Copy link

@Blueve Could you share your code for parsing the git.exe output?

@Blueve
Copy link

Blueve commented Oct 15, 2021

@Blueve Could you share your code for parsing the git.exe output?

We read the cmd output from git log command, such as: git --no-pager log --date-order --no-merges --no-renames --pretty=format:@/%H/ --stat=512 -- {0} where the {0} is formatted filter. We can use other command and parameter to satisfied different intent.

And then parse the output string line by line.
The format of output combined with below:

<empty line>
<commit sha line, start with @/ and end with />
<file path> | <changed lines>
<file path> | <changed lines>
...
<file path> | <changed lines>

Sorry I couldn't share the full code since they are interval visible only.

@bytefish
Copy link

bytefish commented Jun 27, 2024

I came across this issue as well, while writing ElasticsearchCodeSearch.

I started by implementing everything using libgit2sharp, but it ended up being too slow for large repositories. I've played around with CommitSortStrategies, but it didn't help (this is the libgit2sharp implementation). So I wrapped the git command. And here is what I ended up with, it can surely be adapted to more GIT commands:

// Licensed under the MIT license. See LICENSE file in the project root for full license information.

// ...

namespace ElasticsearchCodeSearch.Git
{
    /// <summary>
    /// Exposes various GIT commands useful for indexing files.
    /// </summary>
    public class GitExecutor
    {
        /// <summary>
        /// Logger.
        /// </summary>
        private readonly ILogger<GitExecutor> _logger;

        /// <summary>
        /// Creates a new GitExecutor.
        /// </summary>
        /// <param name="logger">Logger</param>
        public GitExecutor(ILogger<GitExecutor> logger)
        {
            _logger = logger;
        }

        // ...

        /// <summary>
        /// Gets the latest commit date for a file, which is the following git command:
        ///     
        ///     log -1  --date=iso-strict --format=\"%ad\" -- "{path}"
        ///     
        /// </summary>
        /// <param name="repositoryDirectory">The Working Directory</param>
        /// <param name="path">Relative path to the file</param>
        /// <param name="cancellationToken">Cancellation Token</param>
        /// <returns>Latest Commit Date for a file</returns>
        public async Task<DateTime> LatestCommitDate(string repositoryDirectory, string path, CancellationToken cancellationToken)
        {
            var result = await RunGitAsync($"log -1  --date=iso-strict --format=\"%ad\" -- \"{path}\"", repositoryDirectory, cancellationToken).ConfigureAwait(false);

            if (!DateTime.TryParse(result, out var date))
            {
                _logger.LogWarning("Could not convert the Latest Commit Date to a DateTime for '{File}'. Raw Git Output was: '{GitOutput}'", absoluteFilename, result);

                return default;
            }

            return date;
        }

        public async Task<string[]> ListFiles(string repositoryDirectory, CancellationToken cancellationToken)
        {
            var result = await RunGitAsync($"ls-files", repositoryDirectory, cancellationToken).ConfigureAwait(false);

            var files = result
                .Split(Environment.NewLine)
                .ToArray();

            return files;
        }

        public async Task<string> RunGitAsync(string arguments, string workingDirectory, CancellationToken cancellationToken)
        {
            var result = await RunProcessAsync("git", arguments, workingDirectory, cancellationToken).ConfigureAwait(false);

            if (result.ExitCode != 0)
            {
                throw new GitException(result.ExitCode, result.Errors);
            }

            return result.Output;
        }

        private async Task<(int ExitCode, string Output, string Errors)> RunProcessAsync(string application, string arguments, string workingDirectory, CancellationToken cancellationToken)
        {
            _logger.TraceMethodEntry();

            using (var process = new Process())
            {
                process.StartInfo = new ProcessStartInfo
                {
                    CreateNoWindow = true,
                    UseShellExecute = false,
                    RedirectStandardError = true,
                    RedirectStandardOutput = true,
                    FileName = application,
                    Arguments = arguments,
                    WorkingDirectory = workingDirectory,
                };

                var outputBuilder = new StringBuilder();
                var errorsBuilder = new StringBuilder();

                process.OutputDataReceived += (_, args) => outputBuilder.AppendLine(args.Data);
                process.ErrorDataReceived += (_, args) => errorsBuilder.AppendLine(args.Data);

                process.Start();

                process.BeginOutputReadLine();
                process.BeginErrorReadLine();

                await process
                    .WaitForExitAsync(cancellationToken)
                    .ConfigureAwait(false);

                var exitCode = process.ExitCode;
                var output = outputBuilder.ToString().Trim();
                var errors = errorsBuilder.ToString().Trim();

                return (exitCode, output, errors);
            }
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants