Skip to content

Provide fast variants of operations #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
epage opened this issue Mar 21, 2022 · 4 comments
Open

Provide fast variants of operations #7

epage opened this issue Mar 21, 2022 · 4 comments
Labels
enhancement Improve the expected

Comments

@epage
Copy link
Contributor

epage commented Mar 21, 2022

See https://github.com/arxanas/git-branchless/blob/a4bc3c0c31b7be5cdd95ed8cfe4385e384482a99/src/git/repo.rs#L914-L924

@epage epage added the enhancement Improve the expected label Mar 21, 2022
@epage
Copy link
Contributor Author

epage commented Mar 21, 2022

Looks like branchless uses this for

  • amend (we don't have this)
  • get_patch_for_commit (we don't have this)
  • cherry pick (though we use a different implementation for the actual cherry pick)

The basic idea for this is to create internal-only commits that restrict processing to the changed paths.

  • @arxanas under what condition should the parent be dehydrated as well? I see this is done for everything but the target commit. I assume this is just a matter of "when it will need to diff between two things, ensure both of those diffed items are dehydrated" but wanted to double check.
  • @arxanas have you looked into heuristics for when to use the normal operations?
    • I assume for small repos, it'd be faster to not go through the extra hoops but I'm guessing its also fast enough to go through the extra hoops in those cases to not matter
    • What about large changes like re-formatting code?

@arxanas
Copy link

arxanas commented Mar 21, 2022

Please feel free to copy any amount of code directly from git-branchless. The executable as a whole is licensed under GPL-3 because it uses a GPL library, but I'm happy to license all code under MIT+Apache. I'd also recommend you copy the appropriate benches: https://github.com/arxanas/git-branchless/blob/85692c507a672f7c0df9f65436ed80f441bf308c/benches/benches.rs#L83-L118

under what condition should the parent be dehydrated as well? I see this is done for everything but the target commit. I assume this is just a matter of "when it will need to diff between two things, ensure both of those diffed items are dehydrated" but wanted to double check.

Yes, that's correct. The patch's parent commit should be dehydrated so that the diff is correct, i.e. it doesn't report a million deletion operations. The target commit's parent doesn't need to be touched because it's not diffing the target commit.

A better API might be to have the user pass the parent commit explicitly. That would also be helpful for implementing squashes, etc.

have you looked into heuristics for when to use the normal operations?

I haven't looked explicitly, but I think it's pretty much always faster. The index data structure always requires a linear scan to write all the objects and build the trees, which requires a bunch of potentially-spurious memory allocations and disk writes, whereas the cherry_pick_fast code can directly reuse any common trees. (I think Git, but not libgit2, supports some optimizations to skip building existing trees for the index, but the cherry_pick_fast path should still be the same or better.) You can also try the benchmarking code linked above to confirm.

@arxanas
Copy link

arxanas commented Mar 21, 2022

What about large changes like re-formatting code?

I haven't tested this, but I expect it would be no slower via cherry_pick_fast, since the worst case simply degrades to rebuilding all the trees like you would in a regular cherry-pick.

@epage
Copy link
Contributor Author

epage commented Mar 21, 2022

Huh, I realized this won't quite work for the current algorithm that git2-ext uses. I based squash and cherry-pick off of rebase to reduce the chance of conflicts since rebase will automatically go through and check for when things apply cleanly.

epage added a commit that referenced this issue Oct 6, 2023
README.md list indentation and no bare URLs, as per Markdown Lint VS Code extension
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve the expected
Projects
None yet
Development

No branches or pull requests

2 participants