forked from git/git
-
Notifications
You must be signed in to change notification settings - Fork 146
[FOR TESTS ONLY] Check sparse-checkou with en/clean-nested-with-ignored #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
derrickstolee
wants to merge
1,979
commits into
gitgitgadget:maint-2.20
from
derrickstolee:sparse-checkout/nested-with-ignored
Closed
[FOR TESTS ONLY] Check sparse-checkou with en/clean-nested-with-ignored #341
derrickstolee
wants to merge
1,979
commits into
gitgitgadget:maint-2.20
from
derrickstolee:sparse-checkout/nested-with-ignored
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There is no reason to allow %00 to terminate a string, so do not allow it. Otherwise, we end up returning arbitrary content in the string (that which is after the %00) which is effectively hidden from callers and can escape sanity checks and validation, and possible be used in tandem with a security vulnerability to introduce a payload. Helped-by: brian m. carlson <[email protected]> Signed-off-by: Matthew DeVore <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The ForEachMacros list can reasonably be generated grepping the C source code for macros with 'for_each' in their name. Taken almost verbatim from the .clang-format file in the Linux kernel. Signed-off-by: Miguel Ojeda <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Signed-off-by: Christopher Diaz Riveros <[email protected]>
* 'it-l10n-wip' of github.com:AlessandroMenti/git-po: l10n: it.po: Updated Italian translation
* '2.22' of https://github.com/ChrisADR/git-po: l10n: es: 2.22.0 round 3
Translate 274 new messages (4581t0f0u) for git 2.22.0. Signed-off-by: Jiang Xin <[email protected]>
Revise 51 translations, improving consistency for some phrased. Update email address for Fangyi Zhou Signed-off-by: Fangyi Zhou <[email protected]> Signed-off-by: Jiang Xin <[email protected]>
Signed-off-by: Alexander Shopov <[email protected]>
In commit 8daec1d ("merge-recursive: switch from (oid,mode) pairs to a diff_filespec", 2019-04-05), we actually switched from (oid,mode,path) triplets to a diff_filespec -- but most callsites in the patch only needed to worry about oid and mode so the commit message focused on that. The oversight in the commit message apparently spilled over to the code as well; one of the dozen or so callsites accidentally dropped the setting of the path in the conversion. Restore the path setting in that location. Also, this pointed out that our testsuite was lacking a good rename/add test, at least one that involved the need for merge content with the rename. Add such a test, and since rename/add vs. add/rename could possibly be important, redo the merge the opposite direction to make sure we don't have issues with the direction of the merge. These testcases failed before restoring the setting of path, but with the paths appropriately set the testcases both pass. Reported-by: Ben Humphreys <[email protected]> Based-on-patch-by: SZEDER Gábor <[email protected]> Tested-by: Ben Humphreys <[email protected]> Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Acked-by: Ralf Thielow <[email protected]> Signed-off-by: Matthias Rüster <[email protected]>
Signed-off-by: Ralf Thielow <[email protected]>
Reviewed-by: Ralf Thielow <[email protected]> Signed-off-by: Matthias Rüster <[email protected]>
Before, the documentation would mix " and ' for code and config snippets. Change these instances to ` so that they are marked up in monospace. Signed-off-by: Denton Liu <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
One can see that an alias that begins with a non-command first word, such as `loud-rebase = -c commit.verbose=true rebase`, is permitted. However, this isn't immediately obvious to users as alias instances typically begin with a command. Document the fact that an alias can begin with a non-command first word so that users will be able to discover that this is a feature. Signed-off-by: Denton Liu <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Recent code restructuring of merge-recursive engine introduced a regression dealing with rename/add conflict. * en/merge-directory-renames-fix: merge-recursive: restore accidentally dropped setting of path
Signed-off-by: Cédric Malard <[email protected]> Signed-off-by: Jean-Noel Avila <[email protected]>
* 'master' of git://github.com/alshopov/git-po: l10n: bg.po: Updated Bulgarian translation (4581t)
* 'fr_review' of git://github.com/jnavila/git: l10n: fr.po: Review French translation
l10n-2.22.0-rnd3 * tag 'l10n-2.22.0-rnd3' of git://github.com/git-l10n/git-po: (25 commits) l10n: fr.po: Review French translation l10n: de.po: Update German translation l10n: de.po: improve description of 'git reset --quiet' l10n: TEAMS: Change German translation team leader l10n: bg.po: Updated Bulgarian translation (4581t) l10n: zh_CN: Revision for git v2.22.0 l10n l10n: zh_CN: for git v2.22.0 l10n round 1~3 l10n: es: 2.22.0 round 3 l10n: it.po: Updated Italian translation l10n: fr v2.22.0 rnd 3 l10n: vi.po(4581t): Updated Vietnamese translation for v2.22.0 round 3 l10n: git.pot: v2.22.0 round 3 (3 new, 2 removed) l10n: es: 2.22.0 round 2 l10n: bg.po: Updated Bulgarian translation (4580t) l10n: vi.po(4580t): Updated Vietnamese translation for v2.22.0 round 2 l10n: fr.po v2.22.0 round 2 l10n: git.pot: v2.22.0 round 2 (6 new, 3 removed) l10n: bg.po: Updated Bulgarian translation (4577t) l10n: es: 2.22.0 round 1 l10n: vi.po(4577t): Updated Vietnamese translation for v2.22.0 round 1 ...
Signed-off-by: Junio C Hamano <[email protected]>
If we want to check whether an object is missing, the correct flag to pass to rev-list is --ignore-missing; --exclude-promisor-objects will exclude any object that came from the promisor remote, whether it is present or missing. Use the correct flag. Signed-off-by: Jonathan Tan <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
When fetching into a partial clone, Git first prefetches missing REF_DELTA bases from the promisor remote. (This feature was introduced in [1].) But as can be seen in a recent test coverage report [2], the case in which a REF_DELTA base is already present is not covered by tests. Extend the tests slightly to cover this case. [1] 8a30a1e ("index-pack: prefetch missing REF_DELTA bases", 2019-05-15). [2] https://public-inbox.org/git/[email protected]/ Signed-off-by: Jonathan Tan <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
"git <cmd> --git-completion-helper" could fail if the command checks for a repo before parse_options(). If the result is cached, later on when the user moves to a worktree with repo, tab completion will still fail. Avoid this by detecting errors and not cache the completion output. We can try again and hopefully succeed next time (e.g. when a repo is found). Of course if --git-completion-helper fails permanently because of other reasons (*), this will slow down completion. But I don't see any better option to handle that case. (*) one of those cases is if __gitcomp_builtin is called on a command that does not support --git-completion-helper. And we do have a generic call __git_complete_common "$command" but this case is protected with __git_support_parseopt_helper so we're good. Reported-by: Felipe Contreras <[email protected]> Signed-off-by: Nguyễn Thái Ngọc Duy <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The write_commit_graph() method uses die() to report failure and exit when confronted with an unexpected condition. This use of die() in a library function is incorrect and is now replaced by error() statements and an int return type. Return zero on success and a negative value on failure. Now that we use 'goto cleanup' to jump to the terminal condition on an error, we have new paths that could lead to uninitialized values. New initializers are added to correct for this. The builtins 'commit-graph', 'gc', and 'commit' call these methods, so update them to check the return value. Test that 'git commit-graph write' returns a proper error code when hitting a failure condition in write_commit_graph(). Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The write_commit_graph() and write_commit_graph_reachable() methods currently take two boolean parameters: 'append' and 'report_progress'. As we update these methods, adding more parameters this way becomes cluttered and hard to maintain. Collapse these parameters into a 'flags' parameter, and adjust the callers to provide flags as necessary. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The commit-graph feature began with a long list of planned benefits, most of which are now complete. The future work section has only a few items left. As for making more algorithms aware of generation numbers, some are only waiting for generation number v2 to ensure the performance matches the existing behavior using commit date. It is unlikely that we will ever send a commit-graph file as part of the protocol, since we would need to verify the data, and that is expensive. If we want to start trusting remote content, then that item can be investigated again. While there is more work to be done on the feature, having a section of the docs devoted to a TODO list is wasteful and hard to keep up-to-date. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The write_commit_graph() method is too large and complex. To simplify it, we should extract several helper functions. However, we will risk repeating a lot of declarations related to progress incidators and object id or commit lists. Create a new write_commit_graph_context struct that contains the core data structures used in this process. Replace the other local variables with the values inside the context object. Following this change, we will start to lift code segments wholesale out of the write_commit_graph() method and into helper functions. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The write_commit_graph() method is too complex, so we are extracting helper functions one by one. This extracts fill_oids_from_packs() that reads the given pack-file list and fills the oid list in the context. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract fill_oids_from_commit_hex() that reads the given commit id list and fille the oid list in the context. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract fill_oids_from_all_packs() that reads all pack-files for commits and fills the oid list in the context. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! Now that this library is renamed to use 'struct pattern_list' and 'struct pattern', we can now rename the method used by the sparse-checkout feature to determine which paths should appear in the working directory. The method is_excluded_from_list() is only used by the sparse-checkout logic in unpack-trees and list-objects-filter. The confusing part is that it returned 1 for "excluded" (i.e. it matches the list of exclusions) but that really manes that the path matched the list of patterns for _inclusion_ in the working directory. Rename the method to be path_matches_pattern_list() and have it return an explicit 'enum pattern_match_result'. Here, the values MATCHED = 1, UNMATCHED = 0, and UNDECIDED = -1 agree with the previous integer values. This shift allows future consumers to better understand what the retur values mean, and provides more type checking for handling those values. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Someone brought me a testcase where multiple git-clean invocations were required to clean out unwanted files: mkdir d{1,2} touch d{1,2}/ut touch d1/t && git add d1/t With this setup, the user would need to run git clean -ffd */ut twice to delete both ut files. A little testing showed some interesting variants: * If only one of those two ut files existed (either one), then only one clean command would be necessary. * If both directories had tracked files, then only one git clean would be necessary to clean both files. * If both directories had no tracked files then the clean command above would never clean either of the untracked files despite the pathspec explicitly calling both of them out. A bisect showed that the failure to clean out the files started with commit cf424f5 ("clean: respect pathspecs with "-d", 2014-03-10). However, that pointed to a separate issue: while the "-d" flag was used by the original user who showed me this problem, that flag should have been irrelevant to this problem. Testing again without the "-d" flag showed that the same buggy behavior exists without using that flag, and has in fact existed since before cf424f5. Although these problems at first are perceived to be different (e.g. never clearing out the requested files vs. taking multiple invocations to get everything cleared out), they are actually just different manifestations of the same problem. The case with multiple directories that have no tracked files is the more general case; solving it will solve all the others. So, I concentrate on it. Add testcases showing that multiple untracked files within entirely untracked directories cannot be cleaned when specifying these files to git clean via pathspecs. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
For a pathspec like 'foo/bar' comparing against a path named "foo/", namelen will be 4, and match[namelen] will be 'b'. The correct location of the directory separator is namelen-1. However, other callers of match_pathspec_item() such as builtin/grep.c's submodule_path_match() will compare against a path named "foo" instead of "foo/". It might be better to change all the callers to be consistent, as discussed at https://public-inbox.org/git/[email protected]/ and https://public-inbox.org/git/CABPp-BERWUPCPq-9fVW1LNocqkrfsoF4BPj3gJd9+En43vEkTQ@mail.gmail.com/ but there are many cases to audit, so for now just make sure we handle both cases with and without a trailing slash. The reason the code worked despite this sometimes-off-by-one error was that the subsequent code immediately checked whether the first matchlen characters matched (which they do) and then bailed and return MATCHED_RECURSIVELY anyway since wildmatch doesn't have the ability to check if "name" can be matched as a directory (or prefix) against the pathspec. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Even if a directory doesn't match a pathspec, it is possible, depending on the precise pathspecs, that some file underneath it might. So we special case and recurse into the directory for such situations. However, we previously always added any untracked directory that we recursed into to the list of untracked paths, regardless of whether the directory itself matched the pathspec. For the case of git-clean and a set of pathspecs of "dir/file" and "more", this caused a problem because we'd end up with dir entries for both of "dir" "dir/file" Then correct_untracked_entries() would try to helpfully prune duplicates for us by removing "dir/file" since it's under "dir", leaving us with "dir" Since the original pathspec only had "dir/file", the only entry left doesn't match and leaves nothing to be removed. (Note that if only one pathspec was specified, e.g. only "dir/file", then the common_prefix_len optimizations in fill_directory would cause us to bypass this problem, making it appear in simple tests that we could correctly remove manually specified pathspecs.) Fix this by actually checking whether the directory we are about to add to the list of dir entries actually matches the pathspec; only do this matching check after we have already returned from recursing into the directory. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The specific checks done in match_pathspec_item for the DO_MATCH_SUBMODULE case are useful for other cases which have nothing to do with submodules. Rename this constant; a subsequent commit will make use of this change. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
For git clean, if a directory is entirely untracked and the user did not specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do not want to remove that directory and thus do not recurse into it. However, if the user manually specified specific (or even globbed) paths somewhere under that directory to remove, then we need to recurse into the directory to make sure we remove the relevant paths under that directory as the user requested. Note that this does not mean that the recursed-into directory will be added to dir->entries for later removal; as of a few commits earlier in this series, there is another more strict match check that is run after returning from a recursed-into directory before deciding to add it to the list of entries. Therefore, this will only result in files underneath the given directory which match one of the pathspecs being added to the entries list. Two notes of potential interest to future readers: * If we wanted to only recurse into a directory when it is specifically matched rather than matched-via-glob (e.g. '*.c'), then we could do so via making the final non-zero return in match_pathspec_item be MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC. (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC and MATCHED_RECURSIVELY are important for such a change.) I was leaving open that possibility while writing an RFC asking for the behavior we want, but even though we don't want it, that knowledge might help you understand the code flow better. * There is a growing amount of logic in read_directory_recursive() for deciding whether to recurse into a subdirectory. However, there is a comment immediately preceding this logic that says to recurse if instructed by treat_path(). It may be better for the logic in read_directory_recursive() to ultimately be moved to treat_path() (or another function it calls, such as treat_directory()), but I have left that for someone else to tackle in the future. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The way match_pathspec_item() handles names and pathspecs with trailing slash characters, in conjunction with special options like DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC were non-obvious, and broken until this patch series. Add a table in a comment explaining the intent of how these work. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
It appears that the wrong option got included in the list of what will cause git-clean to actually take action. Correct the list. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The -d flag pre-dated git-clean's ability to have paths specified. As such, the default for git-clean was to only remove untracked files in the current directory, and -d existed to allow it to recurse into subdirectories. The interaction of paths and the -d option appears to not have been carefully considered, as evidenced by numerous bugs and a dearth of tests covering such pairings in the testsuite. The definition turns out to be important, so let's look at some of the various ways one could interpret the -d option: A) Without -d, only look in subdirectories which contain tracked files under them; with -d, also look in subdirectories which are untracked for files to clean. B) Without specified paths from the user for us to delete, we need to have some kind of default, so...without -d, only look in subdirectories which contain tracked files under them; with -d, also look in subdirectories which are untracked for files to clean. The important distinction here is that choice B says that the presence or absence of '-d' is irrelevant if paths are specified. The logic behind option B is that if a user explicitly asked us to clean a specified pathspec, then we should clean anything that matches that pathspec. Some examples may clarify. Should git clean -f untracked_dir/file remove untracked_dir/file or not? It seems crazy not to, but a strict reading of option A says it shouldn't be removed. How about git clean -f untracked_dir/file1 tracked_dir/file2 or git clean -f untracked_dir_1/file1 untracked_dir_2/file2 ? Should it remove either or both of these files? Should it require multiple runs to remove both the files listed? (If this sounds like a crazy question to even ask, see the commit message of "t7300: Add some testcases showing failure to clean specified pathspecs" added earlier in this patch series.) What if -ffd were used instead of -f -- should that allow these to be removed? Should it take multiple invocations with -ffd? What if a glob (such as '*tracked*') were used instead of spelling out the directory names? What if the filenames involved globs, such as git clean -f '*.o' or git clean -f '*/*.o' ? The current documentation actually suggests a definition that is slightly different than choice A, and the implementation prior to this series provided something radically different than either choices A or B. (The implementation, though, was clearly just buggy). There may be other choices as well. However, for almost any given choice of definition for -d that I can think of, some of the examples above will appear buggy to the user. The only case that doesn't have negative surprises is choice B: treat a user-specified path as a request to clean all untracked files which match that path specification, including recursing into any untracked directories. Change the documentation and basic implementation to use this definition. There were two regression tests that indirectly depended on the current implementation, but neither was about subdirectory handling. These two tests were introduced in commit 5b7570c ("git-clean: add tests for relative path", 2008-03-07) which was solely created to add coverage for the changes in commit fb328947c8e ("git-clean: correct printing relative path", 2008-03-07). Both tests specified a directory that happened to have an untracked subdirectory, but both were only checking that the resulting printout of a file that was removed was shown with a relative path. Update these tests appropriately. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Users expect files in a nested git repository to be left alone unless sufficiently forced (with two -f's). Unfortunately, in certain circumstances, git would delete both tracked (and possibly dirty) files and untracked files within a nested repository. To explain how this happens, let's contrast a couple cases. First, take the following example setup (which assumes we are already within a git repo): git init nested cd nested >tracked git add tracked git commit -m init >untracked cd .. In this setup, everything works as expected; running 'git clean -fd' will result in fill_directory() returning the following paths: nested/ nested/tracked nested/untracked and then correct_untracked_entries() would notice this can be compressed to nested/ and then since "nested/" is a directory, we would call remove_dirs("nested/", ...), which would check is_nonbare_repository_dir() and then decide to skip it. However, if someone also creates an ignored file: >nested/ignored then running 'git clean -fd' would result in fill_directory() returning the same paths: nested/ nested/tracked nested/untracked but correct_untracked_entries() will notice that we had ignored entries under nested/ and thus simplify this list to nested/tracked nested/untracked Since these are not directories, we do not call remove_dirs() which was the only place that had the is_nonbare_repository_dir() safety check -- resulting in us deleting both the untracked file and the tracked (and possibly dirty) file. One possible fix for this issue would be walking the parent directories of each path and checking if they represent nonbare repositories, but that would be wasteful. Even if we added caching of some sort, it's still a waste because we should have been able to check that "nested/" represented a nonbare repository before even descending into it in the first place. Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use it to prevent fill_directory() and friends from descending into nested git repos. With this change, we also modify two regression tests added in commit 91479b9 ("t7300: add tests to document behavior of clean and nested git", 2015-06-15). That commit, nor its series, nor the six previous iterations of that series on the mailing list discussed why those tests coded the expectation they did. In fact, it appears their purpose was simply to test _existing_ behavior to make sure that the performance changes didn't change the behavior. However, these two tests directly contradicted the manpage's claims that two -f's were required to delete files/directories under a nested git repository. While one could argue that the user gave an explicit path which matched files/directories that were within a nested repository, there's a slippery slope that becomes very difficult for users to understand once you go down that route (e.g. what if they specified "git clean -f -d '*.c'"?) It would also be hard to explain what the exact behavior was; avoid such problems by making it really simple. Also, clean up some grammar errors describing this functionality in the git-clean manpage. Finally, there are still a couple bugs with -ffd not cleaning out enough (e.g. missing the nested .git) and with -ffdX possibly cleaning out the wrong files (paying attention to outer .gitignore instead of inner). This patch does not address these cases at all (and does not change the behavior relative to those flags), it only fixes the handling when given a single -f. See https://public-inbox.org/git/[email protected]/ for more discussion of the -ffd[X?] bugs. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
cmd_clean() had the following code structure: struct strbuf abs_path = STRBUF_INIT; for_each_string_list_item(item, &del_list) { strbuf_addstr(&abs_path, prefix); strbuf_addstr(&abs_path, item->string); PROCESS(&abs_path); strbuf_reset(&abs_path); } where I've elided a bunch of unnecessary details and PROCESS(&abs_path) represents a big chunk of code rather than an actual function call. One piece of PROCESS was: if (lstat(abs_path.buf, &st)) continue; which would cause the strbuf_reset() to be missed -- meaning that the next path to be handled would have two paths concatenated. This path used to use die_errno() instead of continue prior to commit 396049e ("git-clean: refactor git-clean into two phases", 2013-06-25), but my understanding of how correct_untracked_entries() works is that it will prevent both dir/ and dir/file from being in the list to clean so this should be dead code and the die_errno() should be safe. But I hesitate to remove it since I am not certain. However, we can fix both this bug and possible similar future bugs by simply moving the strbuf_reset(&abs_path) to the beginning of the loop. It'll result in N calls to strbuf_reset() instead of N-1, but that's a small price to pay to avoid sneaky bugs like this. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The sparse-checkout feature is mostly hidden to users, as its only documentation is supplementary information in the docs for 'git read-tree'. In addition, users need to know how to edit the .git/info/sparse-checkout file with the right patterns, then run the appropriate 'git read-tree -mu HEAD' command. Keeping the working directory in sync with the sparse-checkout file requires care. Begin an effort to make the sparse-checkout feature a porcelain feature by creating a new 'git sparse-checkout' builtin. This builtin will be the preferred mechanism for manipulating the sparse-checkout file and syncing the working directory. The `$GIT_DIR/info/sparse-checkout` file defines the skip- worktree reference bitmap. When Git updates the working directory, it updates the skip-worktree bits in the index based on this file and removes or restores files in the working copy to match. The documentation provided is adapted from the "git read-tree" documentation with a few edits for clarity in the new context. Extra sections are added to hint toward a future change to a more restricted pattern set. Helped-by: Elijah Newren <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
Getting started with a sparse-checkout file can be daunting. Help users start their sparse enlistment using 'git sparse-checkout init'. This will set 'core.sparseCheckout=true' in their config, write an initial set of patterns to the sparse-checkout file, and update their working directory. Using 'git read-tree' to clear directories does not work cleanly on Windows, so manually delete directories that are tracked by Git before running read-tree. The use of running another process for 'git read-tree' is likely suboptimal, but that can be improved in a later change, if valuable. Signed-off-by: Derrick Stolee <[email protected]>
When someone wants to clone a large repository, but plans to work using a sparse-checkout file, they either need to do a full checkout first and then reduce the patterns they included, or clone with --no-checkout, set up their patterns, and then run a checkout manually. This requires knowing a lot about the repo shape and how sparse-checkout works. Add a new '--sparse' option to 'git clone' that initializes the sparse-checkout file to include the following patterns: /* !/*/* These patterns include every file in the root directory, but no directories. This allows a repo to include files like a README or a bootstrapping script to grow enlistments from that point. During the 'git sparse-checkout init' call, we must first look to see if HEAD is valid, or else we will fail while trying to update the working directory. The first checkout will actually update the working directory correctly. Signed-off-by: Derrick Stolee <[email protected]>
The 'git sparse-checkout set' subcommand takes a list of patterns as arguments and writes them to the sparse-checkout file. Then, it updates the working directory using 'git read-tree -mu HEAD'. The 'set' subcommand will replace the entire contents of the sparse-checkout file. The write_patterns_and_update() method is extracted from cmd_sparse_checkout() to make it easier to implement 'add' and/or 'remove' subcommands in the future. Signed-off-by: Derrick Stolee <[email protected]>
The 'git sparse-checkout set' subcommand takes a list of patterns and places them in the sparse-checkout file. Then, it updates the working directory to match those patterns. For a large list of patterns, the command-line call can get very cumbersome. Add a '--stdin' option to instead read patterns over standard in. Signed-off-by: Derrick Stolee <[email protected]>
The instructions for disabling a sparse-checkout to a full working directory are complicated and non-intuitive. Add a subcommand, 'git sparse-checkout disable', to perform those steps for the user. Signed-off-by: Derrick Stolee <[email protected]>
When Git updates the working directory with the sparse-checkout feature enabled, the unpack_trees() method calls clear_ce_flags() to update the skip-wortree bits on the cache entries. This check can be expensive, depending on the patterns used. Add trace2 regions around the method, including some flag information, so we can get granular performance data during experiments. This data will be used to measure improvements to the pattern-matching algorithms for sparse-checkout. Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
The sparse-checkout feature can have quadratic performance as the number of patterns and number of entries in the index grow. If there are 1,000 patterns and 1,000,000 entries, this time can be very significant. Create a new Boolean config option, core.sparseCheckoutCone, to indicate that we expect the sparse-checkout file to contain a more limited set of patterns. This is a separate config setting from core.sparseCheckout to avoid breaking older clients by introcuding a tri-state option. The config option does nothing right now, but will be expanded upon in a later commit. Signed-off-by: Derrick Stolee <[email protected]>
The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match. As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here: /$folder/* !/$folder/*/* This resulted in a sparse-checkout file sith 8,296 patterns. Running 'git read-tree -mu HEAD' on this file had the following performance: core.sparseCheckout=false: 0.21 s (0.00 s) core.sparseCheckout=true: 3.75 s (3.50 s) core.sparseCheckout=cone: 0.23 s (0.01 s) The times in parentheses above correspond to the time spent in the first clear_ce_flags() call, according to the trace2 performance traces. While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature. Signed-off-by: Derrick Stolee <[email protected]>
To make the cone pattern set easy to use, update the behavior of 'git sparse-checkout [init|set]'. Add '--cone' flag to 'git sparse-checkout init' to set the config option 'core.sparseCheckoutCone=true'. When running 'git sparse-checkout set' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories. Signed-off-by: Derrick Stolee <[email protected]>
The sparse-checkout feature in "cone mode" can use the fact that the recursive patterns are "connected" to the root via parent patterns to decide if a directory is entirely contained in the sparse-checkout or entirely removed. In these cases, we can skip hashing the paths within those directories and simply set the skipworktree bit to the correct value. Signed-off-by: Derrick Stolee <[email protected]>
…o sparse-checkout/nested-with-ignored
Signed-off-by: Derrick Stolee <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.