Skip to content

Multi repo folders #22588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Multi repo folders #22588

wants to merge 4 commits into from

Conversation

juur
Copy link

@juur juur commented Jan 23, 2023

First attempt at extending the backend storage of Gitea so that it can support multiple folders (i.e. mount points) for repos, enabling multiple filesystems (local or remote) with repo storage locations hashed across them. This enables more horizontal scaling of storage.

Further work is required to enable the varying of the number of folders as a day-2 operational task.

Most of the logic is in cmd/serv.go and models/user/user.go (this could perhaps be merged), the rest is to enable configuration and installation tasks.

@kousu
Copy link
Contributor

kousu commented Jan 23, 2023

(with the caveat that I'm just a user and sysadmin of gitea, not a dev) This is a neat feature, but personally, I would solve this with LVM on Linux or zfs on FreeBSD, or maybe even something like Ceph for a bigger system. Expandable storage is neat, but is already solved reliably at lower levels. Teaching Gitea this is going to be impose all kinds of subtle bugs, like that forking a repo cannot be done efficiently with hardlinks anymore.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 23, 2023
@juur
Copy link
Author

juur commented Jan 23, 2023

(with the caveat that I'm just a user and sysadmin of gitea, not a dev) This is a neat feature, but personally, I would solve this with LVM on Linux or zfs on FreeBSD, or maybe even something like Ceph for a bigger system. Expandable storage is neat, but is already solved reliably at lower levels. Teaching Gitea this is going to be impose all kinds of subtle bugs, like that forking a repo cannot be done efficiently with hardlinks anymore.

How does forking use hardlinks? I didn't think you could have hardlinks to a folder under Linux...

@kousu
Copy link
Contributor

kousu commented Jan 23, 2023

I'm running a local build of Gitea:

Screenshot 2023-01-23 at 16-28-05 fork-test

Internally, this repo looks like:

kousu@dev:~/src/neurogitea/gitea/data/gitea-repositories$ ls -l kousu/fork-test.git/
total 32
drwx------ 2 p115628 domain users 4096 jan 23 16:28 branches
-rw------- 1 p115628 domain users   66 jan 23 16:28 config
-rw------- 1 p115628 domain users   73 jan 23 16:28 description
-rw------- 1 p115628 domain users    0 jan 23 16:28 git-daemon-export-ok
-rw------- 1 p115628 domain users   21 jan 23 16:28 HEAD
drwx------ 6 p115628 domain users 4096 jan 23 16:28 hooks
drwx------ 2 p115628 domain users 4096 jan 23 16:28 info
drwx------ 7 p115628 domain users 4096 jan 23 16:28 objects
drwx------ 4 p115628 domain users 4096 jan 23 16:28 refs

then I make a fork under a different user:

Screenshot 2023-01-23 at 16-31-20 Gitea Git with a cup of tea

The folders aren't hardlinked, but their contents are:

kousu@dev:~/src/neurogitea/gitea/data/gitea-repositories$ find . -links 2
./kousu/fork-test.git/info
./kousu/fork-test.git/refs/tags
./kousu/fork-test.git/refs/heads
./kousu/fork-test.git/objects/72
./kousu/fork-test.git/objects/72/204f33b1245c4b3e83f197b579cc3cfb2ddf7c
./kousu/fork-test.git/objects/info
./kousu/fork-test.git/objects/info/packs
./kousu/fork-test.git/objects/15
./kousu/fork-test.git/objects/15/b38bf32dd48c9ff186ce024f420420f1cb1ed7
./kousu/fork-test.git/objects/c3
./kousu/fork-test.git/objects/c3/262db3180bdff653b6bba525f542372f0e19e5
./kousu/fork-test.git/objects/pack
./kousu/fork-test.git/branches
./kousu/fork-test.git/hooks/proc-receive.d
./kousu/fork-test.git/hooks/update.d
./kousu/fork-test.git/hooks/pre-receive.d
./kousu/fork-test.git/hooks/post-receive.d
./bats/fork-test.git/info
./bats/fork-test.git/refs/tags
./bats/fork-test.git/refs/heads
./bats/fork-test.git/objects/72
./bats/fork-test.git/objects/72/204f33b1245c4b3e83f197b579cc3cfb2ddf7c
./bats/fork-test.git/objects/info
./bats/fork-test.git/objects/info/packs
./bats/fork-test.git/objects/15
./bats/fork-test.git/objects/15/b38bf32dd48c9ff186ce024f420420f1cb1ed7
./bats/fork-test.git/objects/c3
./bats/fork-test.git/objects/c3/262db3180bdff653b6bba525f542372f0e19e5
./bats/fork-test.git/objects/pack
./bats/fork-test.git/branches
./bats/fork-test.git/hooks/proc-receive.d
./bats/fork-test.git/hooks/update.d
./bats/fork-test.git/hooks/pre-receive.d
./bats/fork-test.git/hooks/post-receive.d

eg. these are both inode 12231819

kousu@dev:~/src/neurogitea/gitea/data/gitea-repositories$ stat ./kousu/fork-test.git/objects/72/204f33b1245c4b3e83f197b579cc3cfb2ddf7c   ./bats/fork-test.git/objects/72/204f33b1245c4b3e83f197b579cc3cfb2ddf7c 
  File: ./kousu/fork-test.git/objects/72/204f33b1245c4b3e83f197b579cc3cfb2ddf7c
  Size: 131             Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d      Inode: 12231819    Links: 2
Access: (0400/-r--------)  Uid: (703204575/ p115628)   Gid: (703200513/domain users)
Access: 2023-01-23 16:31:24.567587456 -0500
Modify: 2023-01-23 16:28:00.482664206 -0500
Change: 2023-01-23 16:31:24.563587515 -0500
 Birth: 2023-01-23 16:28:00.482664206 -0500
  File: ./bats/fork-test.git/objects/72/204f33b1245c4b3e83f197b579cc3cfb2ddf7c
  Size: 131             Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d      Inode: 12231819    Links: 2
Access: (0400/-r--------)  Uid: (703204575/ p115628)   Gid: (703200513/domain users)
Access: 2023-01-23 16:31:24.567587456 -0500
Modify: 2023-01-23 16:28:00.482664206 -0500
Change: 2023-01-23 16:31:24.563587515 -0500
 Birth: 2023-01-23 16:28:00.482664206 -0500

This is because per git-clone(1):

When the repository to clone from is on a local machine, this flag bypasses the normal "Git aware" transport mechanism and clones the repository by making a copy of HEAD and everything under objects and refs directories. The files under .git/objects/ directory are hardlinked to save space when possible.

If the repository is specified as a local path (e.g., /path/to/repo), this is the default, and --local is essentially a no-op.

@juur
Copy link
Author

juur commented Jan 23, 2023

This is because per git-clone(1):

When the repository to clone from is on a local machine, this flag bypasses the normal "Git aware" transport mechanism and clones the repository by making a copy of HEAD and everything under objects and refs directories. The files under .git/objects/ directory are hardlinked to save space when possible.
If the repository is specified as a local path (e.g., /path/to/repo), this is the default, and --local is essentially a no-op.

That's pretty neat, I didn't know git did that. I guess because I rarely clone a local repo it's never come up!
That said -- any backend abstraction type layer is likely to defeat that..

@kousu
Copy link
Contributor

kousu commented Jan 23, 2023

I'm sorry to burst your bubble! This seems like a really cool idea. And I've only administered gitea using basic posix filesystems, and since I understand them best I plan to stick with them. But if there's already alternate backends in Gitea anyway then maybe this'd be handy for others.

@kousu
Copy link
Contributor

kousu commented Jan 23, 2023

But I would be careful, hardlinks aren't the only thing that breaks if you start splitting up onto multiple mount points. Repo size counting might get bonked on the nose. And permissions might get weird, since different filesystems sometimes interpret UIDs and modes differently (I've seen this lots of times with CIFS aka samba). I've also seen git hooks get skipped when a repo is on certain remote file systems -- and gitea relies on git hooks to enforce enforce consistency between git and gitea when stuff gets uploaded; so I'm worrying about a situation where some repos end up on a remote filesystem with subtly different rules and no clear reason to users why their content or LFS files are misbehaving but only sometimes.

@fnetX
Copy link
Contributor

fnetX commented Jan 25, 2023

Hi, sysadmin from Codeberg here. While I appreciate the effort, I can only second that this is not an easy job and prone to all kinds of possible bugs. You'd likely need a system that comes close to GitLab / Gitaly for distributed Git storage.

What will happen if a user or repo gets renamed, or transferred? What about forks, like mentioned above?

At Codeberg, we are using Ceph. Our experience is that it works well, but Git operations need to be really low-latency. So just adding an NFS mount or something like this for remote storage will not work anyway (I tried this locally once, and even basic operations took minutes from time to time).

So I really think that this responsibility should not be part of Gitea, but on a lower level.

@juur juur closed this Mar 30, 2023
@lunny
Copy link
Member

lunny commented Apr 1, 2023

I think #22775 will be helpful for this.

@juur juur deleted the multi-repo-folders branch April 1, 2023 13:42
@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants