Skip to content

sparse index support #562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
8 of 17 tasks
SidneyDouw opened this issue Oct 20, 2022 · 3 comments
Open
8 of 17 tasks

sparse index support #562

SidneyDouw opened this issue Oct 20, 2022 · 3 comments
Labels
C-tracking-issue An issue to track to track the progress of multiple PRs or issues

Comments

@SidneyDouw
Copy link
Contributor

SidneyDouw commented Oct 20, 2022

Quick recap: What is a sparse index?

Instead of containing one entry for every file in the worktree ("regular" index structure), a sparse index only contains a subset of these. Additionaly, it contains entries to directories that are marked with the SKIP_WORKTREE flag. All files within these entries can be skipped by functions that read / update the index and thereby increase performance.

If the index file contains the the "Sparse Directory Entries" extension marked by the signature sdir, it is classified as a sparse index.

Motivation

The goal of this issue is to keep track of the requirements necessary to eventually fully integrate sparse index support for gitoxide.

This issue does not yet contain all the tasks and considerations by any means, but the goal is to add new knowledge and keep everything up to date as I go along and things become more clear.

Tasks

  • reading
    • "regular" index with files containing the SKIP_WORKTREE flag
    • sparse index with directories containing the SKIP_WORKTREE flag, in cone mode
    • write specific tests to verify those behaviours
  • Write sparse index #563
    • Tree extension order in gitoxide is different than in git, prevents raw byte comparisons
    • configure index version via write::Options
  • find out what options in git-config influence / configure sparse index related tasks to better understand what capabilities are needed
    • update gix progress with those findings
  • git-repository loads worktree configs #635
  • implement functionality similar to ensure_full_index()
    • scan index for sparse directory entries (trees) and expand them into a full list of filepaths (regular index structure), mutating the current index State, for use in subsequent functions that don't support working with sparse indexes yet
    • find out where and how it make sense to use that function
  • matching logic of .git/info/sparse-checkout for cone mode
    • cone mode
    • no-cone mode (inverted .gitignore) this functionality is deprecated in git
  • restore DIR information during writing or as separate step as indicated here
  • command similar to `git sparse-checkout set / add
    • support --cone and --no-cone flags

Notes

  • the git sparse-checkout set / add commands modify the list of files contained in .git/info/sparse-checkout, which uses the same syntax as a .gitignore file. Cone mode and non-cone mode decide how this file gets interpreted. Cone mode will match only directories while non-cone mode will use the same matching logic used for .gitignore files. read more
  • non-cone mode and sparse index are incompatible with eachother
    that makes sense because sparse indexes mark entire directories as SKIP_WORKTREE which is what cone-mode matches on, while non-cone mode can also match on single files which does not give an advantage to the amount of entries in the index
  • non-cone mode is now deprecated

References

@SidneyDouw SidneyDouw mentioned this issue Oct 20, 2022
7 tasks
@Byron Byron added the C-tracking-issue An issue to track to track the progress of multiple PRs or issues label Oct 21, 2022
@Byron
Copy link
Member

Byron commented Oct 21, 2022

Thanks so much for setting up this tracking issue and taking the lead on this! I can't wait to see more and more of these boxes ticked.

inverted .gitignore matching logic
does this need to be supported with non-cone mode being deprecated?

I think it's OK to focus on cone mode but fail gracefully in non-cone mode from day one. From there we can decide if it's worth maintaining non-cone mode as well, probably based on people actually requesting it to be supported.

@Byron
Copy link
Member

Byron commented Nov 22, 2022

For posterity, since I keep finding myself puzzled about what states sparse indices exist in, here is an analysis in code that sums it up.

@Byron
Copy link
Member

Byron commented Nov 25, 2022

Interesting bits of this recently added technical document

  • rename 'sparse directory' to 'skipped directory - I think we should do that too.
  • reading about partial clones and automatic on-demand downloads of packs makes me afraid of all the added complexity that will be needed to handle all of that.
  • this portion makes me think that within gitoxide, probably the Repository instance, there should be settings for how sparsity should affect operations to be adjustable on a case-by-case basis.
  • partial clone is mentioned multiple times, and I think there is a lot that I don't know about that.
  • I absolutely think that turning off dynamic downloads/partial clones on demand is going to be part of a first implementation to support fully offline use of git (everything else seems like a 'perversion'), which is a bit different from what git plans to do
  • whether or not commands see the sparse index as sparse should ultimately be configurable to ideally handle this VFC (behaviour C) usecase as well.
  • loving this oversimplified listing of behaviours and what they mean, along with the A* part of not auto-downloading objects.
  • really helpful to have a list of commands that need to be sparsity-aware.
  • a nice summary of command-behaviours based on an analysis if all git commands
  • it's interesting to learn that merges operate on all files and thus might conflict and temporarily 'vivify' these conflicts into the worktree despite otherwise being skipped.
  • there is nice list with suggestions on how to name these 'sparsity' related flags on commands
  • The Known Bugs section is probably good to learn what to avoid early on, or the traps that implementing sparsity correctly might contain. It's also good to know for the time when we try to learn from git code, and wonder why it doesn't handle things we think it should handle - some like read-tree don't do it correctly when sparsity is involved. We should do better from day one.
  • On the mailing list there is a nice sample repo from which to build a test-case that has terrible performance characteristics with some git operations. Can we one day run this and see what happens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue An issue to track to track the progress of multiple PRs or issues
Projects
None yet
Development

No branches or pull requests

2 participants