-
Notifications
You must be signed in to change notification settings - Fork 899
repo.Commits.QueryBy(filename) slow on large repos #1705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As a note: I tried different sorting options, but was limited because FileHistory doesn't support None or Reverse:
|
Just submitted a proposed solution. The change to FileHistory make it run in 14 seconds instead of 80, the change in Tree got it down to about 8 seconds. I see the continuous integration failed, but that looks like a CI problem, not a problem with my code. Here is the error (linux only):
|
Awesome work! I met same issue and I have to wrap a git.exe and use git log to speed up the log reading. |
@Blueve Could you share your code for parsing the git.exe output? |
We read the cmd output from git log command, such as: And then parse the output string line by line.
Sorry I couldn't share the full code since they are interval visible only. |
I came across this issue as well, while writing ElasticsearchCodeSearch. I started by implementing everything using libgit2sharp, but it ended up being too slow for large repositories. I've played around with // Licensed under the MIT license. See LICENSE file in the project root for full license information.
// ...
namespace ElasticsearchCodeSearch.Git
{
/// <summary>
/// Exposes various GIT commands useful for indexing files.
/// </summary>
public class GitExecutor
{
/// <summary>
/// Logger.
/// </summary>
private readonly ILogger<GitExecutor> _logger;
/// <summary>
/// Creates a new GitExecutor.
/// </summary>
/// <param name="logger">Logger</param>
public GitExecutor(ILogger<GitExecutor> logger)
{
_logger = logger;
}
// ...
/// <summary>
/// Gets the latest commit date for a file, which is the following git command:
///
/// log -1 --date=iso-strict --format=\"%ad\" -- "{path}"
///
/// </summary>
/// <param name="repositoryDirectory">The Working Directory</param>
/// <param name="path">Relative path to the file</param>
/// <param name="cancellationToken">Cancellation Token</param>
/// <returns>Latest Commit Date for a file</returns>
public async Task<DateTime> LatestCommitDate(string repositoryDirectory, string path, CancellationToken cancellationToken)
{
var result = await RunGitAsync($"log -1 --date=iso-strict --format=\"%ad\" -- \"{path}\"", repositoryDirectory, cancellationToken).ConfigureAwait(false);
if (!DateTime.TryParse(result, out var date))
{
_logger.LogWarning("Could not convert the Latest Commit Date to a DateTime for '{File}'. Raw Git Output was: '{GitOutput}'", absoluteFilename, result);
return default;
}
return date;
}
public async Task<string[]> ListFiles(string repositoryDirectory, CancellationToken cancellationToken)
{
var result = await RunGitAsync($"ls-files", repositoryDirectory, cancellationToken).ConfigureAwait(false);
var files = result
.Split(Environment.NewLine)
.ToArray();
return files;
}
public async Task<string> RunGitAsync(string arguments, string workingDirectory, CancellationToken cancellationToken)
{
var result = await RunProcessAsync("git", arguments, workingDirectory, cancellationToken).ConfigureAwait(false);
if (result.ExitCode != 0)
{
throw new GitException(result.ExitCode, result.Errors);
}
return result.Output;
}
private async Task<(int ExitCode, string Output, string Errors)> RunProcessAsync(string application, string arguments, string workingDirectory, CancellationToken cancellationToken)
{
_logger.TraceMethodEntry();
using (var process = new Process())
{
process.StartInfo = new ProcessStartInfo
{
CreateNoWindow = true,
UseShellExecute = false,
RedirectStandardError = true,
RedirectStandardOutput = true,
FileName = application,
Arguments = arguments,
WorkingDirectory = workingDirectory,
};
var outputBuilder = new StringBuilder();
var errorsBuilder = new StringBuilder();
process.OutputDataReceived += (_, args) => outputBuilder.AppendLine(args.Data);
process.ErrorDataReceived += (_, args) => errorsBuilder.AppendLine(args.Data);
process.Start();
process.BeginOutputReadLine();
process.BeginErrorReadLine();
await process
.WaitForExitAsync(cancellationToken)
.ConfigureAwait(false);
var exitCode = process.ExitCode;
var output = outputBuilder.ToString().Trim();
var errors = errorsBuilder.ToString().Trim();
return (exitCode, output, errors);
}
}
}
} |
Reproduction steps
1): Clone a large repo
2): run this function on that repo with some random file:
Expected behavior
I expect similar time to be taken by TestSlow and the git log command above
Actual behavior
The git log command finishes in about 1.6 ms on my repo
The TestSlow command takes about 70 seconds.
Here is what I see in my profiler:

Version of LibGit2Sharp (release number or SHA1)
0.27.0-preview-0017
0.26.1
0.24.1
Operating system(s) tested; .NET runtime tested
.NET Framework 4.7.2 on Windows 10
The text was updated successfully, but these errors were encountered: