Skip to content

Feature/introduce in memory git metadata #1243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

JakeGinnivan
Copy link
Contributor

My goal here is to move GitVersion into more distinct stages.

  1. Load all the information it needs from git, this would be branches, tags, merge bases, parent branches etc.
    • Ideally this would source from both local and remote branches so we don't need to normalise
    • Afaik we introduced normalisation in to simplify the codebase and remove the concept of remote and local branches, it just makes everything local.
  2. Calculate version, this would now be entirely in memory and not touch libgit2
  3. We could then switch out the provider which actually loads the information (for instance, libgit2 or github api or git command line)

After this happens the slow part should be step 1, which should be way easier to speed up because all git operations will be in one place and we can make better decisions how to load it all.

Keen for feedback @gep13 @asbjornu

This hopefully will end up with us getting rid of normalisation and making GitVersion much faster
This reduces scanning the branch commits by one pass
…ersionStrategy to be fixed after changes to TaggedCommitVersionStrategy
@JakeGinnivan
Copy link
Contributor Author

In terms of performance, this should have reduced the number of times the current branch commits are scanned by at least one pass.

Copy link
Member

@asbjornu asbjornu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, @JakeGinnivan! I really like where this is going and expected the task to be much more massive than this appears to be.

Not sure how far we should take this, but I would personally love to have a GitVersion.Core project completely independent of LibGit2 with its own home-grown, well-nurtured and fine-tuned domain model.

I think this is a huge step in the right direction, I'm just not sure how many steps in the same direction we should take yet and if this suffices for now.

I just noticed #1244 as well; I'll hopefully have time to look at that tomorrow.

@@ -0,0 +1,76 @@
using System.Collections.Generic;

namespace GitVersion.GitRepoInformation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to be GitVersion's "Core", i.e. Domain Model (in DDD terms), wouldn't a more authoritative sounding namespace like GitVersion.Repositories, GitVersion.Graph be good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like GitVersion.Graph! Will keep that in mind as a progress

public List<MTag> Tags { get; private set; }
}

public class MTag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the M prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than aliasing in a bunch of files which reference both this namespace and libgit2 the metadata related ones are prefixed with M. I expect this to go at some point

public MBranch MasterBranch { get; }
public List<MBranch> ReleaseBranches { get; }
public MBranch CurrentBranch { get; }
public List<MTag> Tags { get; private set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the private setter?

}

// Wonder if this can be 'mainline'
public MBranch MasterBranch { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the immutability here. 👍

public string Name { get; }
public string TipSha { get; }
public MParent Parent { get; }
public List<MergeMessage> MergeMessages { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it matters much, I've just got a habit of using IList<T> instead of List<T> in public declarations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


public Commit MergeBase { get; private set; }
public string MergeBase { get; private set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change from Commit to string here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to reduce the number of libgit2 references

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Good idea.


namespace GitVersion.GitRepoInformation
{
public class Libgit2RepoMetadataProvider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If LibGit2 is going to become one of many repository metadata providers for GitVersion, perhaps this should be moved to its own GitVersion.LibGit2 project?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will do once I get the structure correct

{
public class Libgit2RepoMetadataProvider
{
public static GitRepoMetadata ReadMetadata(GitVersionContext context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect each metadata provider implementation to adhere to some sort of centrally defined interface, not a static class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think so, just wasn't sure the shape at the time, this allowed me to just hook it in.

I am trying to make seams at the moment which will simplify code elsewhere, then can put the right abstraction in place


namespace GitVersion.GitRepoInformation
{
public class GitRepoMetadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the repository metadata or data? Since GitVersion's job is to walk the graph, count commits, etc., I'd say that anything required to calculate the version is data, not metadata. I think this class can be called just Repository, no? Any collisions with LibGit2 should be resolved within the GitVersion.LibGit2 project and not exposed elsewhere, so it shouldn't be much of a problem, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure at the moment to be honest. I don't want to re-implement git data structures, this really is information about the commit graph with a bunch of pre-calculated stuff. Lets see where it goes then fix naming next

/// We should move to use this, that way libgit2 is used when bootstrapping
/// this will be far easier to optimise then all the logic for gitversion will be entirely in memory
/// </summary>
public GitRepoMetadata RepositoryMetadata { get; private set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we agree that GitRepoMetadata should be called just Repository, the same goes for this property.

@erikbra
Copy link
Contributor

erikbra commented Oct 16, 2019

@JakeGinnivan sorry to post on a closed PR, but I'm really interested in speeding up this stuff as well. Would you care to share some thoughts on why this PR was closed? And similar with #1244, Was it due to lack of time, or do you think they are a step in the wrong direction? I've only just started looking at GitVersion, so I haven't looked at this in detail. I just wanted to know if this was a good place to start if wanting to work on the perf wrt the integration with LibGit2.

@JakeGinnivan
Copy link
Contributor Author

Hey, all good.

Was a few things, I kinda lost context around where I was and how to finish it off mainly though from memory.

Generally I think the approach should be:

  1. Create a single interface which all interactions with libgit2 is put behind. The methods on this should be pretty specific like getPossibleParentBranches() or something (and could be memoized like you suggested in the other thread).
  2. Once this has been refactored, try to implement a in memory version of this interface which the in memory structure is populated in the most efficient way possible ahead of time.

Doing this would also allow you to implement say a github api implementation, then gitversion as a service could be created :P

@JakeGinnivan
Copy link
Contributor Author

It's been a while, but if there are other questions I'll try to answer best I can @erikbra

@erikbra
Copy link
Contributor

erikbra commented Oct 16, 2019

Thanks a lot for your replies, @JakeGinnivan. I'm only just starting to look at this, so I don't have a good understanding of the structures required yet. I'll read more of the original source code, and fork off this PR to get an understanding of the thoughts behind it.

I really appreciate if you would be able to answer some questions when the time comes. I'll get back to you on that one.

I might be jumping ahead here, but C# has gotten a lot of perf-focused new features lately, so we might be able to do some of the reading in pure C# with Memory spans, etc. in the future. But let's work on the separation first :)

Really like the idea of GitVersion-as-a-service ;)

@arturcic arturcic deleted the feature/IntroduceInMemoryGitMetadata branch March 13, 2024 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants