Skip to content

Importing projects with external continuity #596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tazjin opened this issue Jan 18, 2022 · 17 comments
Closed

Importing projects with external continuity #596

tazjin opened this issue Jan 18, 2022 · 17 comments

Comments

@tazjin
Copy link
Contributor

tazjin commented Jan 18, 2022

I've been trying to figure out if there is a way to import an external project into a subtree and keep history continous for external consumers of that repo (i.e. let git pull work with a new josh-backed upstream on old checkouts). Ideally all commit hashes would also be preserved.

I've tried a couple of things, such as:

  1. Clone an empty josh workspace at the target path. Add the original remote there, reset to its HEAD, push back to josh with -o create. This creates a history where all the original commits are rebased on the history of the repository, in which case pulling is possible as long as rebasing on pulling is enabled.
    This is as close as I've gotten to doing what I want, but it loses all the original commit IDs, which is not ideal.
  2. Clone an empty workspace at the target path. Create an empty base commit, add the original remote, create a merge commit with it. Push back to josh with -o create. Josh flattens the merge commit and original history is lost.
  3. Create a placeholder file in the monorepo to ensure the path has an existing history. Clone it with josh, merge the history from the remote as above. History is lost again in the same way.

What I sort of expected: In the first version, I expected the push back to synthesize a merge commit with the original history instead of rebasing the commits on top of it.

Any ideas on how to accomplish this? If we figure it out I'll contribute some docs for it.

@christian-schilling
Copy link
Member

Good that you ask. This exact use case is supposed to be one of the main selling points of josh and shame on us for not documenting it better.
The easy way is to pass -o merge instead of -o create (which should really be called -o rebase) and that should do what you want.
I have been also doing it slightly differently in the past when importing branches with very long history (because forward transform is a bit faster than backward): I you can access the to-be-imported repo though josh, you can use :prefix=where/it/should/go when fetching it, and then creating a merge in the unfiltered canonical target repo.

Hope that helps.

@tazjin
Copy link
Contributor Author

tazjin commented Jan 18, 2022

Thanks, I'll try this and contribute the missing docs page for it once I've got it figured out.

because forward transform is a bit faster than backward

Shouldn't a merge of a full history using -o merge also be a "forward transform"? (I'm assuming this means "rewriting only future commits"?)

Might be misunderstanding this point :)

@christian-schilling
Copy link
Member

christian-schilling commented Jan 18, 2022

It does not mean rewriting only future commits. It means applying the backwards transform of :/subdir to all to be imported commits. That way, when you apply the forward transform later you get back the same sha1 like that you put in.
In principle in this case the backward transform of :/subdir is equivalent to forward :prefix=subdir and josh could probably be taught to figure that out, but as of now it does not.

@tazjin
Copy link
Contributor Author

tazjin commented Jan 18, 2022

Okay, understood. I tried it the way you described and it works! 💯

@bjeanes
Copy link
Contributor

bjeanes commented Aug 17, 2024

Okay, understood. I tried it the way you described and it works! 💯

What did you run?

Thanks, I'll try this and contribute the missing docs page for it once I've got it figured out.

I'm guessing this didn't come to be, because while the docs link to this issue, they do not clarify this.

I've tried all the different documented weighs and tried to follow what was here, but if following

If you can access the to-be-imported repo though josh, you can use :prefix=where/it/should/go when fetching it, and then creating a merge in the unfiltered canonical target repo.

I wasn't sure what you meant exactly here and went through a few passes before figuring it out.

Initially, I tried the following:

$ git clone $josh/$monorepo mono && cd mono
$ git fetch $josh/$subrepo:prefix=sub.git
$ git merge --allow-unrelated FETCH_HEAD

This does create a merge commit with the full history, but when doing a git push to JOSH, it seems to show a flat history. In a fresh git clone $josh/$monorepo:/sub.git there is a single commit (named "Merge ..." but with no second parent).

image

In the existing checkout where I did git merge, I see both histories, though.

What did work is the following:

$ git clone $josh/$subrepo:prefix=$prefix.git sub && cd $prefix
$ git fetch $josh/$monorepo
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
Unpacking objects: 100% (3/3), 853 bytes | 426.00 KiB/s, done.
From $josh/$monorepo
 * branch            HEAD       -> FETCH_HEAD
$ git push -o merge $josh/$monorepo # output edited to hide URLs
To $josh/$monorepo
 ! [rejected]        main -> main (fetch first)
error: failed to push some refs to '$josh/$monorepo'
hint: Updates were rejected because the remote contains work that you do not
hint: have locally. This is usually caused by another repository pushing to
hint: the same ref. If you want to integrate the remote changes, use
hint: 'git pull' before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

As you can see, the push is rejected, but force pushing did end up doing the right thing here (I assume because it's a force push to JOSH, but not necessarily a force push to the upstream with -o merge):

git push -o merge -f $josh/$monorepo
Enumerating objects: 696, done.
Counting objects: 100% (696/696), done.
Delta compression using up to 16 threads
Compressing objects: 100% (211/211), done.
Writing objects: 100% (696/696), 300.98 KiB | 300.98 MiB/s, done.
Total 696 (delta 360), reused 692 (delta 359), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (360/360), done.
remote: josh-proxy: pre-receive hook
remote: upstream: response status: 200 OK
remote: upstream: response body:
remote: 
remote: To https://github.com/$monorepo
remote:    d9c13b812f..9d8ebd45ca  JOSH_PUSH -> main
remote: REWRITE(0b438bb071b40b5b9489bd0885607e77604ff868 -> 9d8ebd45caba01ce76530796e540ad185c24fb4b)
To $josh/$monorepo
 + d9c13b8...0b438bb main -> main (forced update)

I was able to confirm that a later clone with :/subpath filter preserved the original SHAs. Until more commits are added to the existing repositories, I cannot confirm if the bidirectional SHA preservation continues to work or if it only applies to history. I am hoping it continues, so that it eases a transition towards a monorepo.

In any case, I just wanted to document what worked for me here so that others might get a head start on this.

@bjeanes
Copy link
Contributor

bjeanes commented Aug 19, 2024

I was able to confirm that a later clone with :/subpath filter preserved the original SHAs. Until more commits are added to the existing repositories, I cannot confirm if the bidirectional SHA preservation continues to work or if it only applies to history. I am hoping it continues, so that it eases a transition towards a monorepo.

Hmm this seems to be true for some of the imported repos. However, with others the checkouts from the subpath have totally different shas than the upstream. They were all imported in precisely the same way (it was a script in a loop), so I know that the method didn't vary.

Is this a bug or a caveat I don't understand?

@tazjin
Copy link
Contributor Author

tazjin commented Aug 19, 2024

Okay, understood. I tried it the way you described and it works! 💯

What did you run?

According to my commit from back then, things were pretty simple.

The repository I imported was https://github.com/google/nixery (which has since moved to my Github). I cloned that repo and ran josh-filter ':prefix=tools/nixery' to create a commit where all the history is present, but everything is located at a subfolder.

I then merged this commit into the monorepo. Ever since then the commits are exported back to Github (without force-pushing, so the history is compatible) from the monorepo on every change. We configured that in this commit (the filter field there is the verbatim josh filter, see the code).

@bjeanes
Copy link
Contributor

bjeanes commented Aug 19, 2024

I then merged this commit into the monorepo

As I did, with git push -o merge $josh/$monorepo or git merge --allow-unrelated FILTERED_HEAD?

The latter didn't seem to work for me and the former seems to have preserved history for 2 out of 4 repositories:

For subrepo=account/a.git/prefix=a:

cd $src; git clone $josh/$monorepo:/$prefix.git ${prefix}-from-monogit --git-dir=./$prefix-from-mono/.git log -n1 HEAD --format=format:%H
7efedb2dde6ed9c92094f982ebaaee23e39ac731 Bump braces from 3.0.2 to 3.0.3 in /submit (#317)cd $src; git clone https://github.com/$subrepo ${prefix}-from-orig && cd ${prefix}-from-origingit --git-dir=./$prefix-from-origin/.git log -n1 HEAD --format=format:%H
7efedb2dde6ed9c92094f982ebaaee23e39ac731 Bump braces from 3.0.2 to 3.0.3 in /submit (#317)

Yet, for for subrepo=account/b.git/prefix=b:

cd $src; git clone $josh/$monorepo:/$prefix.git ${prefix}-from-monogit --git-dir=./$prefix-from-mono/.git log -n1 HEAD --format=format:%H
6b2011704fd5da24db146129c71b34b2571e962f Update package cachescd $src; git clone https://github.com/$subrepo ${prefix}-from-orig && cd ${prefix}-from-origingit --git-dir=./$prefix-from-origin/.git log -n1 HEAD --format=format:%H
4e43fc04145f9f7e1c371ce48e4f8a624598f0cd Update package caches

Both of these repos were created in loop (changing the $prefix and $subrepo) in the same way:

cd $src && git clone $josh/$subrepo:prefix=$prefix.git ${prefix}-prefixed && cd ${prefix}-prefixedgit push -o merge -f $josh/$monorepo

The -f seemed necessary but perhaps there is a step I am missing to make this work. And yet, 2 out of 4 imported repositories have the right SHAs, so...

@bjeanes
Copy link
Contributor

bjeanes commented Aug 19, 2024

The repository I imported was google/nixery

This is very cool, btw. I stumbled across this a few weeks ago while learning about and playing with Nix, and this is such a slick concept.

@tazjin
Copy link
Contributor Author

tazjin commented Aug 19, 2024

I created a merge commit, I believe with unrelated histories. In what way does that not work for you?

seems to have preserved history for 2 out of 4 repositories

Can you check if there are any commits with GPG signatures in the histories that didn't get preserved cleanly? I remember something related to that, will try to dig it out.

@bjeanes
Copy link
Contributor

bjeanes commented Aug 19, 2024

Yeah there are definitely GPG-signed commits in one of the repos which didn't preserve SHAs. They are broken, as I would expect, but I'll have to look a bit more closely tomorrow as I think there would be GPG-signed commits in the ones which did preserve SHAs too.

I created a merge commit, I believe with unrelated histories. In what way does that not work for you?

When I pushed it back up, it just created broken merge commits and showed a completely flat history in GitHub.

However, I just did a test again and realised this might only be true when I push up the monorepo via Josh. If I push the created monorepo directly to GitHub, the merge commits seem intact.

I used this script to re-create the monorepo just now:

mkdir monorepo && cd monorepo && git init && touch README && git add . && git commit -m "Start monorepo"

for rp in {prefix1:user/subrepo1,prefix2:user/subrepo2}; do 
  IFS=':' read prefix subrepo <<< $rp
  origin="$josh/$subrepo:prefix=$prefix.git"
  git remote add $prefix $origin || git remote set-url $prefix $origin
  git fetch $prefix main
  git merge --allow-unrelated -m "Merge $prefix subrepo" FETCH_HEAD main
done

If I push git push $josh/$monorepo HEAD:main then GitHub shows each directory as introduced by the "Merge $prefix subrepo" branch (expected) but every file in each of those directories is also newly introduced by that commit with no history. It's like the merge commit was flattened into a regular single-parent commit. I wouldn't have expected this to be any different than pushing directly to GitHub, given there are no filters applied to the origin in this push.

It's late here in my timezone, so more testing tomorrow may be required. I'll re-do this from scratch, push via GitHub, then try cloning a subrepo from that via JOSH and see if the SHAs are intact in all cases.

However, I am not hopeful, given the way that I tried it earlier did preserve SHAs in some cases, but not in all. Perhaps the GPG issue or some other non-reversible aspect of the commit content is at fault here. Maybe I can write a script to compare the histories of the two repos and see if there is a specific commit where the history diverged and the SHAs changed, as the content of that commit might yield some clues.

@tazjin
Copy link
Contributor Author

tazjin commented Aug 19, 2024

Oh, I haven't tried to push back a merge through josh. That might cause issues. We don't use Github, and I'm not sure what "When I pushed it back up, it just created broken merge commits" means (why does pushing create new commits?), but I'd expect you'd have to push the raw repo after creating the merge locally.

As for the signatures, there's a filter called :unsign which removes signatures, but apparently by default they're now left intact (which means that filtered commits have invalid signatures until filtered back).

@bjeanes
Copy link
Contributor

bjeanes commented Aug 19, 2024

"When I pushed it back up, it just created broken merge commits" means (why does pushing create new commits?), but I'd expect you'd have to push the raw repo after creating the merge locally.

Yeah that wasn't very clear. What I mean is that locally the commit which merges the subrepo in is indeed a merge commit with 2 parents. On GitHub, that same commit only lists one parent and all files in the repo are seen as introduced directly by that commit. The only parent it has is the monorepo root commit (or the previous merge for another subrepo).

I am wondering if what is happening here is pushing through JOSH is corrupting that merge in some way or is rewriting the commits the merge parents points to without rewriting the merge commit to point to the new (rewritten) parent. In other words, one of the two parents points to an unresolvable commit, so GitHub just treats it as non-merge commit.

As for the signatures, there's a filter called :unsign which removes signatures, but apparently by default they're now left intact (which means that filtered commits have invalid signatures until filtered back).

That is extremely good to know. I am not sure whether I'd want this or not, but it's something to experiment with.

@bjeanes
Copy link
Contributor

bjeanes commented Aug 20, 2024

OK looking at this again today:

mkdir monorepo && cd monorepo && git init && touch README && git add . && git commit -m "Start monorepo"

for rp in {prefix1:user/subrepo1,prefix2:user/subrepo2,...}; do 
  IFS=':' read prefix subrepo <<< $rp
  import_origin="$josh/$subrepo:prefix=$prefix.git"
  git remote add $prefix $import_origin
  git fetch $prefix main HEAD
  git merge --allow-unrelated -m "Merge $prefix subrepo" FETCH_HEAD main
done

This is nicer than the approach I took initially, but is ending up with the same outcome. Changing to git fetch $prefix HEAD used the correct heads but then I once again have incorrect SHAs when re-cloning.

However, surprisingly, they are consistent for 7 years of the commit history and then suddenly they diverge only 2 weeks ago:

image

This is not a signed commit and nothing in particular stands out about it.

Any ideas of what I could explore to explain this divergence. It seems like it's possibly a JOSH bug a la #1345.

@tazjin
Copy link
Contributor Author

tazjin commented Aug 20, 2024

Diff the raw commit objects, diff their raw tree objects etc. and see what is different, might give us a clue!

@bjeanes
Copy link
Contributor

bjeanes commented Aug 20, 2024

Great idea. I didn't think about --pretty=raw.

Anyway, it seems this is squarely a GitButler thing:

diff <(git show --pretty=raw 8f49df1dc) <(git show --pretty=raw 34d061bdc)
1c1
< commit 8f49df1dc0acf9389c7aa712847cc8309f01755d
---
> commit 34d061bdca18e243ccc8bfe53333037ce17500f9
6,7d5
< gitbutler-headers-version 2
< gitbutler-change-id baefc019-8ef0-40f2-9f1f-dc92b9984387

It seems to be adding its own headers.

@bjeanes
Copy link
Contributor

bjeanes commented Aug 20, 2024

I think at this point this is becoming off-topic, though. It's no longer about clarification on how to import projects with external continuity. I'll open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants