-
Notifications
You must be signed in to change notification settings - Fork 145
fixup! trace2: collect Windows-specific process information #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixup! trace2: collect Windows-specific process information #123
Conversation
In protocol v2, instead of just running "upload-pack", we have a generic "serve" loop which runs command requests from the client. What used to be "upload-pack" is now generally split into two operations: "ls-refs" and "fetch". The latter knows it must respect uploadpack.* config, but the former is not actually specific to a fetch operation (we do not yet do v2 receive-pack, but eventually we may, and ls-refs would support both operations). However, ls-refs does need to know which operation we're performing, in order to read the correct config (e.g., uploadpack.hiderefs versus receive.hiderefs; we don't read _either_ right now, but this is the first step to fixing that). In the generic "git-serve" program, we don't have that information. But in the protocol as it is actually used by clients, the client still asks to run "git-upload-pack", and then issues an "ls-refs" from there. So we _do_ know at that point that "uploadpack" is the right config context. This patch teaches the protocol v2 "serve" code to pass that context through to the individual commands (a future patch will teach ls-refs to actually use it). Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
This helper function looks for config in two places: transfer.hiderefs, or $section.hiderefs, where $section is passed in by the caller (and is "uploadpack" or "receive", depending on the context). In preparation for callers which do not even have that context (namely the "git-serve" command), let's handle a NULL by looking only at transfer.hiderefs (as opposed to segfaulting). Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
In the v2 protocol, upload-pack's advertisement has been moved to the "ls-refs" command. That command does not respect hidden-ref config (like transfer.hiderefs) at all, and advertises everything. While there are some features that are not supported in v2 (e.g., v2 always allows fetching any sha1 without respect to advertisements), the lack of this feature is not documented and is likely just a bug. Let's make it work, as otherwise upgrading a server to a v2-capable git will start exposing these refs that the repository admin has asked to remain hidden. Note that we depend on the config_context from the caller here to realize that we should respect uploadpack.hiderefs. When run from the "git-serve" tool, we won't have that context and will only respect transfer.hiderefs. This should be OK, as git-serve is not actually used for normal protocol operations. Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
More _("i18n") markings. * nd/i18n: fsck: mark strings for translation fsck: reduce word legos to help i18n parse-options.c: mark more strings for translation parse-options.c: turn some die() to BUG() parse-options: replace opterror() with optname() repack: mark more strings for translation remote.c: mark messages for translation remote.c: turn some error() or die() to BUG() reflog: mark strings for translation read-cache.c: add missing colon separators read-cache.c: mark more strings for translation read-cache.c: turn die("internal error") to BUG() attr.c: mark more string for translation archive.c: mark more strings for translation alias.c: mark split_cmdline_strerror() strings for translation git.c: mark more strings for translation
Updates for corner cases in merge-recursive. * en/merge-path-collision: t6036: avoid non-portable "cp -a" merge-recursive: combine error handling t6036, t6043: increase code coverage for file collision handling merge-recursive: improve rename/rename(1to2)/add[/add] handling merge-recursive: use handle_file_collision for add/add conflicts merge-recursive: improve handling for rename/rename(2to1) conflicts merge-recursive: fix rename/add conflict handling merge-recursive: new function for better colliding conflict resolutions merge-recursive: increase marker length with depth of recursion t6036, t6042: testcases for rename collision of already conflicting files t6042: add tests for consistency in file collision conflict handling
The "http.version" configuration variable can be used with recent enough cURL library to force the version of HTTP used to talk when fetching and pushing. * fc/http-version: http: add support selecting http version
Code clean-up with optimization for the codepath that checks (non-)existence of loose objects. * jk/loose-object-cache: odb_load_loose_cache: fix strbuf leak fetch-pack: drop custom loose object cache sha1-file: use loose object cache for quick existence check object-store: provide helpers for loose_objects_cache sha1-file: use an object_directory for the main object dir handle alternates paths the same as the main object dir sha1_file_name(): overwrite buffer instead of appending rename "alternate_object_database" to "object_directory" submodule--helper: prefer strip_suffix() to ends_with() fsck: do not reuse child_process structs
More codepaths become aware of working with in-core repository instance other than the default "the_repository". * nd/the-index: (22 commits) rebase-interactive.c: remove the_repository references rerere.c: remove the_repository references pack-*.c: remove the_repository references pack-check.c: remove the_repository references notes-cache.c: remove the_repository references line-log.c: remove the_repository reference diff-lib.c: remove the_repository references delta-islands.c: remove the_repository references cache-tree.c: remove the_repository references bundle.c: remove the_repository references branch.c: remove the_repository reference bisect.c: remove the_repository reference blame.c: remove implicit dependency the_repository sequencer.c: remove implicit dependency on the_repository sequencer.c: remove implicit dependency on the_index transport.c: remove implicit dependency on the_index notes-merge.c: remove implicit dependency the_repository notes-merge.c: remove implicit dependency on the_index list-objects.c: reduce the_repository references list-objects-filter.c: remove implicit dependency on the_index ...
Small fixes and features for fast-export and fast-import, mostly on the fast-export side. * en/fast-export-import: fast-export: add a --show-original-ids option to show original names fast-import: remove unmaintained duplicate documentation fast-export: add --reference-excluded-parents option fast-export: ensure we export requested refs fast-export: when using paths, avoid corrupt stream with non-existent mark fast-export: move commit rewriting logic into a function for reuse fast-export: avoid dying when filtering by paths and old tags exist fast-export: use value from correct enum git-fast-export.txt: clarify misleading documentation about rev-list args git-fast-import.txt: fix documentation for --quiet option fast-export: convert sha1 to oid
"git push $there $src:$dst" rejects when $dst is not a fully qualified refname and not clear what the end user meant. The codepath has been taught to give a clearer error message, and also guess where the push should go by taking the type of the pushed object into account (e.g. a tag object would want to go under refs/tags/). * ab/push-dwim-dst: push doc: document the DWYM behavior pushing to unqualified <dst> push: test that <src> doesn't DWYM if <dst> is unqualified push: add an advice on unqualified <dst> push push: move unqualified refname error into a function push: improve the error shown on unqualified <dst> push i18n: remote.c: mark error(...) messages for translation remote.c: add braces in anticipation of a follow-up change
"git checkout frotz" (without any double-dash) avoids ambiguity by making sure 'frotz' cannot be interpreted as a revision and as a path at the same time. This safety has been updated to check also a unique remote-tracking branch 'frotz' in a remote, when dwimming to create a local branch 'frotz' out of a remote-tracking branch 'frotz' from a remote. * nd/checkout-dwim-fix: checkout: disambiguate dwim tracking branches and local files
Refspecs configured with "git -c var=val clone" did not propagate to the resulting repository, which has been corrected. * sg/clone-initial-fetch-configuration: Documentation/clone: document ignored configuration variables clone: respect additional configured fetch refspecs during initial fetch clone: use a more appropriate variable name for the default refspec
A properly configured username/email is required under user.useConfigOnly in order to create commits; now "git stash" (even though it creates commit objects to represent stash entries) command is excempt from the requirement. * sd/stash-wo-user-name: stash: tolerate missing user identity
The http-backend CGI process did not correctly clean up the child processes it spawns to run upload-pack etc. when it dies itself, which has been corrected. * mk/http-backend-kill-children-before-exit: http-backend: enable cleaning up forked upload/receive-pack on exit
"git rev-list --exclude-promisor-objects" had to take an object that does not exist locally (and is lazily available) from the command line without barfing, but the code dereferenced NULL. * md/list-lazy-objects-fix: list-objects.c: don't segfault for missing cmdline objects
The traversal over tree objects has learned to honor ":(attr:label)" pathspec match, which has been implemented only for enumerating paths on the filesystem. * nd/attr-pathspec-in-tree-walk: tree-walk: support :(attr) matching dir.c: move, rename and export match_attrs() pathspec.h: clean up "extern" in function declarations tree-walk.c: make tree_entry_interesting() take an index tree.c: make read_tree*() take 'struct repository *'
* ab/commit-graph-progress-fix: commit-graph: split up close_reachable() progress output
"git checkout [<tree-ish>] path..." learned to report the number of paths that have been checked out of the index or the tree-ish, which gives it the same degree of noisy-ness as the case in which the command checks out a branch. * nd/checkout-noisy: t0027: squelch checkout path run outside test_expect_* block checkout: print something when checking out paths
Test update. * hb/t0061-dot-in-path-fix: t0061: do not fail test if '.' is part of $PATH
BSD port update. * cb/openbsd-allows-reading-directory: config.mak.uname: OpenBSD uses BSD semantics with fread for directories
BSD port update. * cb/t5004-empty-tar-archive-fix: t5004: avoid using tar for empty packages
BSD port update. * cb/test-lint-cp-a: tests: add lint for non portable cp -a
Lines that begin with a certain keyword that come over the wire, as well as lines that consist only of one of these keywords, ought to be painted in color for easier eyeballing, but the latter was broken ever since the feature was introduced in 2.19, which has been corrected. * hn/highlight-sideband-keywords: sideband: color lines with keyword only
Minor inconsistency fix. * sb/diff-color-moved-config-option-fixup: diff: align move detection error handling with other options
"git log -G<regex>" looked for a hunk in the "git log -p" patch output that contained a string that matches the given pattern. Optimize this code to ignore binary files, which by default will not show any hunk that would match any pattern (unless textconv or the --text option is in effect, that is). * tb/log-G-binary: log -G: ignore binary files
Code clean-up. * md/exclude-promisor-objects-fix-cleanup: revision.c: put promisor option in specialized struct
Doc update. * km/rebase-doc-typofix: rebase docs: drop stray word in merge command description
Cygwin update. * tb/use-common-win32-pathfuncs-on-cygwin: git clone <url> C:\cygwin\home\USER\repo' is working (again)
Doc update. * fd/gitweb-snapshot-conf-doc-fix: docs/gitweb.conf: config variable typo
* bw/mailmap: mailmap: update brandon williams's email address
The list of conflicted paths shown in the editor while concluding a conflicted merge was shown above the scissors line when the clean-up mode is set to "scissors", even though it was commented out just like the list of updated paths and other information to help the user explain the merge better. * dl/merge-cleanup-scissors-fix: merge: add scissors line on merge conflict merge: cleanup messages like commit t7600: clean up 'merge --squash c3 with c7' test commit: extract cleanup_mode functions to sequencer
"git checkout [<tree-ish>] <pathspec>" started reporting the number of paths that have got updated recently, but the same messages were given when "git checkout -m <pathspec>" to unresolve conflicts that have just been resolved. The message now reports these unresolved paths separately from the paths that are checked out from the index. * nd/checkout-noisy-unmerge: checkout: count and print -m paths separately checkout: update count-checkouts messages
For "rebase -i --reschedule-failed-exec", we do not want the "-y" shortcut after all. * js/rebase-i-redo-exec-fix: Revert "rebase: introduce a shortcut for --reschedule-failed-exec"
A flakey "p4" test has been removed. * ld/git-p4-remove-flakey-test: git-p4: remove ticket expiry test
Four new configuration variables {author,committer}.{name,email} have been introduced to override user.{name,email} in more specific cases. * wh/author-committer-ident-config: config: allow giving separate author and committer idents
The %(trailers) formatter in "git log --format=..." now allows to optionally pick trailers selectively by keyword, show only values, etc. * aw/pretty-trailers: pretty: add support for separator option in %(trailers) strbuf: separate callback for strbuf_expand:ing literals pretty: add support for "valueonly" option in %(trailers) pretty: allow showing specific trailers pretty: single return path in %(trailers) handling pretty: allow %(trailers) options with explicit value doc: group pretty-format.txt placeholders descriptions
Command-line completion (in contrib/) learned to tab-complete the "git submodule absorbgitdirs" subcommand. * dl/complete-submodule-absorbgitdirs: completion: complete git submodule absorbgitdirs
Build update. * sg/ci-parallel-build: ci: clear and mark MAKEFLAGS exported just once ci: make sure we build Git parallel
Doc update. * kl/pretty-doc-markup-fix: doc: prevent overflowing <code> tag in rendered HTML
Update to the fuzzer. * js/fuzz-commit-graph-update: object: fix leak of shallow_stat fuzz-commit-graph: initialize repo object
Windows update. * js/mingw-host-cpu: mingw: fix CPU reporting in `git version --build-options`
Test fix. * os/rebase-runs-post-checkout-hook: t5403: correct bash ambiguous redirect error in subtest 8 by quoting $GIT_DIR
Test fix. * tz/gpg-test-fix: t/lib-gpg: drop redundant killing of gpg-agent t/lib-gpg: quote path to ${GNUPGHOME}/trustlist.txt
"git branch" learned a new subcommand "--show-current". * du/branch-show-current: branch: introduce --show-current display option
Update the implementation of pack-redundant for performance in a repository with many packfiles. * sc/pack-redundant: pack-redundant: consistent sort method pack-redundant: rename pack_list.all_objects pack-redundant: new algorithm to find min packs pack-redundant: delete redundant code pack-redundant: delay creation of unique_objects t5323: test cases for git-pack-redundant
Test improvement. * sg/stress-test: test-lib: make '--stress' more bisect-friendly
* master: .mailmap: map Clemens Buchacher's mail addresses
A more structured way to obtain execution trace has been added. * jh/trace2: trace2: add for_each macros to clang-format trace2: t/helper/test-trace2, t0210.sh, t0211.sh, t0212.sh trace2:data: add subverb for rebase trace2:data: add subverb to reset command trace2:data: add subverb to checkout command trace2:data: pack-objects: add trace2 regions trace2:data: add trace2 instrumentation to index read/write trace2:data: add trace2 hook classification trace2:data: add trace2 transport child classification trace2:data: add trace2 sub-process classification trace2:data: add editor/pager child classification trace2:data: add trace2 regions to wt-status trace2: collect Windows-specific process information trace2: create new combined trace facility trace2: Documentation/technical/api-trace2.txt
Output from "diff --cc" did not show the original paths when the merge involved renames. A new option adds the paths in the original trees to the output. * en/combined-all-paths: log,diff-tree: add --combined-all-paths option
* master: Seventh batch for 2.21
Guard against infinite loop while computing the parent process hierarchy. CreateToolhelp32Snapshot() is used to get a list of all processes on the system. Each process entry contains the process PID and PPID (alive at the time of the snapshot). We compute the set of ancestors of the current process by repeated searches on this list. Testing revealed that the snapshot can contain PPID cycles. This causes an infinite loop during the set construction and causes the git.exe command to hang. Testing found an instance where 3 processes were in a PPID cycle. The snapshot implied that each of these processes was its own great-grandparent. This should not be possible unless a PID was recycled and just happened to match up. For full disclosure, the Windows "System Idle Process" has PID and PPID 0. If it were to launch a Git command, it could cause a similar infinite loop. Or more properly, if any ancestor of the current Git command has PPID 0, it will appear to be a descendant of the idle process and trigger the problem. This commit fixes both cases by maintaining a list of the PIDs seen during the ancestor walk and stopping if a cycle is detected. Additionally, code was added to truncate the search after a reasonable fixed depth limit. Signed-off-by: Jeff Hostetler <[email protected]>
/submit |
Submitted as [email protected] |
@@ -3,6 +3,8 @@ | |||
#include <Psapi.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Jeff Hostetler via GitGitGadget" <[email protected]> writes:
> From: Jeff Hostetler <[email protected]>
>
> Guard against infinite loop while computing the parent process hierarchy.
>
> CreateToolhelp32Snapshot() is used to get a list of all processes on the
> system. Each process entry contains the process PID and PPID (alive at the
> time of the snapshot). We compute the set of ancestors of the current process
> by repeated searches on this list.
>
> Testing revealed that the snapshot can contain PPID cycles. This causes an
> infinite loop during the set construction and causes the git.exe command to
> hang.
>
> Testing found an instance where 3 processes were in a PPID cycle. The
> snapshot implied that each of these processes was its own great-grandparent.
> This should not be possible unless a PID was recycled and just happened to
> match up.
>
> For full disclosure, the Windows "System Idle Process" has PID and PPID 0.
> If it were to launch a Git command, it could cause a similar infinite loop.
> Or more properly, if any ancestor of the current Git command has PPID 0, it
> will appear to be a descendant of the idle process and trigger the problem.
>
> This commit fixes both cases by maintaining a list of the PIDs seen during
> the ancestor walk and stopping if a cycle is detected.
>
> Additionally, code was added to truncate the search after a reasonable fixed
> depth limit.
>
> Signed-off-by: Jeff Hostetler <[email protected]>
> ---
> compat/win32/trace2_win32_process_info.c | 32 ++++++++++++++++++------
> 1 file changed, 25 insertions(+), 7 deletions(-)
Thanks. Will queue for now, but shortly after the final, I expect
the whole topic to be eligible to graduate to the 'master' track.
At that point, you may want to help rephrase the log message of the
original when this gets squashed in. Alternatively, we could leave
these two separate patches.
>
> diff --git a/compat/win32/trace2_win32_process_info.c b/compat/win32/trace2_win32_process_info.c
> index 253199f812..751b366470 100644
> --- a/compat/win32/trace2_win32_process_info.c
> +++ b/compat/win32/trace2_win32_process_info.c
> @@ -3,6 +3,8 @@
> #include <Psapi.h>
> #include <tlHelp32.h>
>
> +#define NR_PIDS_LIMIT 42
> +
;-)
Any backstory about the choice of this number we want to share with
our future selves?
> /*
> * Find the process data for the given PID in the given snapshot
> * and update the PROCESSENTRY32 data.
> @@ -21,13 +23,17 @@ static int find_pid(DWORD pid, HANDLE hSnapshot, PROCESSENTRY32 *pe32)
> }
>
> /*
> - * Accumulate JSON array:
> + * Accumulate JSON array of our parent processes:
> * [
> * exe-name-parent,
> * exe-name-grand-parent,
> * ...
> * ]
> *
> + * We artificially limit this to NR_PIDS_LIMIT to quickly guard against cycles
> + * in the parent PIDs without a lot of fuss and because we just want some
> + * context and don't need an absolute answer.
> + *
> * Note: we only report the filename of the process executable; the
> * only way to get its full pathname is to use OpenProcess()
> * and GetModuleFileNameEx() or QueryfullProcessImageName()
> @@ -38,16 +44,28 @@ static void get_processes(struct json_writer *jw, HANDLE hSnapshot)
> {
> PROCESSENTRY32 pe32;
> DWORD pid;
> + DWORD pid_list[NR_PIDS_LIMIT];
> + int k, nr_pids = 0;
>
> pid = GetCurrentProcessId();
> + while (find_pid(pid, hSnapshot, &pe32)) {
> + /* Only report parents. Omit self from the JSON output. */
> + if (nr_pids)
> + jw_array_string(jw, pe32.szExeFile);
>
> - /* We only want parent processes, so skip self. */
> - if (!find_pid(pid, hSnapshot, &pe32))
> - return;
> - pid = pe32.th32ParentProcessID;
> + /* Check for cycle in snapshot. (Yes, it happened.) */
> + for (k = 0; k < nr_pids; k++)
> + if (pid == pid_list[k]) {
> + jw_array_string(jw, "(cycle)");
> + return;
> + }
>
> - while (find_pid(pid, hSnapshot, &pe32)) {
> - jw_array_string(jw, pe32.szExeFile);
> + if (nr_pids == NR_PIDS_LIMIT) {
> + jw_array_string(jw, "(truncated)");
> + return;
> + }
> +
> + pid_list[nr_pids++] = pid;
>
> pid = pe32.th32ParentProcessID;
> }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff Hostetler wrote (reply to this):
On 2/11/2019 6:19 PM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <[email protected]> writes:
>
>> From: Jeff Hostetler <[email protected]>
>>
>> Guard against infinite loop while computing the parent process hierarchy.
...
>> ---
>> compat/win32/trace2_win32_process_info.c | 32 ++++++++++++++++++------
>> 1 file changed, 25 insertions(+), 7 deletions(-)
>
> Thanks. Will queue for now, but shortly after the final, I expect
> the whole topic to be eligible to graduate to the 'master' track.
> At that point, you may want to help rephrase the log message of the
> original when this gets squashed in. Alternatively, we could leave
> these two separate patches.
I'll squash this into the next version of my main patch series
and update the in-code comments to explain.
I thought I saw a note that you were moving my V6 into "next" and
didn't want this patch to "cross in the mail" if I quickly re-rolled
a V7.
Thanks,
Jeff
Fix infinite loop observed on Windows computing process ancestry.
This should be applied to V6 of "jh/trace2" currently in next.
See: #108
Thanks
Jeff
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]