Skip to content

fsmonitor: start using an opaque token for last update #510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

kewillford
Copy link

@kewillford kewillford commented Jan 2, 2020

Only change for this version is to sync the watchman script that is in templates with the one in t/t7519 since it had some commented out debug code that could be useful.

Also haven't received any feedback on this patch series which would be good to make sure I'm not making any obvious mistakes.

Thanks!

…token

Some file system monitors might not use or take a timestamp for processing
and in the case of watchman could have race conditions with using a
timestamp. Watchman uses something called a clockid that is used for race
free queries to it. The clockid for watchman is simply a string.

Change the fsmonitor_last_update from being a uint64_t to a char pointer
so that any arbitrary data can be stored in it and passed back to the
fsmonitor.

Signed-off-by: Kevin Willford <[email protected]>
@kewillford kewillford force-pushed the fsmonitor_opaque_token branch 4 times, most recently from 3ef0288 to a12bbc2 Compare January 6, 2020 17:19
Some file monitors like watchman will use something other than a timestamp
to keep better track of what changes happen in between calls to query
the fsmonitor. The clockid in watchman is a string. Now that the index
is storing an opaque token for the last update the code needs to be
updated to pass that opaque token to a verion 2 of the fsmonitor hook.

Because there are repos that already have version 1 of the hook and we
want them to continue to work when git is updated, we need to handle
both version 1 and version 2 of the hook. In order to do that a
config value is being added core.fsmonitorHookVersion to force what
version of the hook should be used.  When this is not set it will default
to -1 and then the code will attempt to call version 2 of the hook first.
If that fails it will fallback to trying version 1.

Signed-off-by: Kevin Willford <[email protected]>
@kewillford kewillford force-pushed the fsmonitor_opaque_token branch from a12bbc2 to 8d381b7 Compare January 7, 2020 17:57
@kewillford
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 7, 2020

Submitted as [email protected]

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 7, 2020

This branch is now known as kw/fsmonitor-watchman-racefix.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 7, 2020

This patch series was integrated into pu via git@f93c640.

@gitgitgadget gitgitgadget bot added the pu label Jan 7, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 8, 2020

This patch series was integrated into pu via git@0d875df.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 9, 2020

This patch series was integrated into pu via git@da075a8.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 9, 2020

This patch series was integrated into pu via git@86e79c5.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 10, 2020

This patch series was integrated into pu via git@c10cd17.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 10, 2020

This patch series was integrated into pu via git@9277609.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 13, 2020

This patch series was integrated into pu via git@145a106.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 13, 2020

This patch series was integrated into pu via git@2d2d8b9.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 13, 2020

This patch series was integrated into pu via git@ba3eb76.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 15, 2020

This patch series was integrated into pu via git@49b9202.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 17, 2020

This patch series was integrated into pu via git@d53ae78.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 17, 2020

This patch series was integrated into pu via git@c175555.

derrickstolee added a commit to microsoft/scalar that referenced this pull request Jan 22, 2020
Resolves #291.

Mostly a code move from `CloneVerb` to `ConfigStep`. We lose some output during clone, like "Watchman configured!" as we don't have access to `Output` during the `ConfigStep`. The `CloneVerb` runs the `ConfigStep` directly, but so does the `RegisterStep` and background maintenance.

There is a benefit, though: we will try configuring Watchman on every upgrade and once a day. If a user installs Watchman after cloning, then we will configure it automatically.

One remaining thing to think about: we are currently _copying_ the existing hook template from the hooks directory. As this changes (see microsoft/git#225 and gitgitgadget/git#510), we will need a different way to populate the hook. Perhaps we should store the hook contents in the Scalar codebase to avoid using a stale (or malicious) hook. Cc @kewillford for thoughts on this.
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2020

This patch series was integrated into pu via git@80f169c.

Version 2 of the fsmonitor hooks is passed the version and an update
token and must pass back a last update token to use for subsequent calls
to the hook.

Signed-off-by: Kevin Willford <[email protected]>
A new config value for core.fsmonitorHookVersion was added to be able
to force the version of the fsmonitor hook.  Possible values are 1 or 2.
When this is not set the code will use a value of -1 and attempt to use
version 2 of the hook first and if that fails will attempt version 1.

Signed-off-by: Kevin Willford <[email protected]>
@kewillford kewillford force-pushed the fsmonitor_opaque_token branch from 8d381b7 to 1db2a69 Compare January 23, 2020 00:09
@kewillford
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 23, 2020

Submitted as [email protected]

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 24, 2020

This patch series was integrated into pu via git@886c35f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 27, 2020

This patch series was integrated into pu via git@8f5f596.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 30, 2020

This patch series was integrated into pu via git@49db93f.

@@ -324,7 +324,7 @@ struct index_state {
struct hashmap dir_hash;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Kevin,

On Thu, 23 Jan 2020, Kevin Willford via GitGitGadget wrote:

> diff --git a/fsmonitor.c b/fsmonitor.c
> index 868cca01e2..9860587225 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -89,11 +98,12 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
>  		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
>  		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
>
> -	put_be32(&hdr_version, INDEX_EXTENSION_VERSION);
> +	put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);
>  	strbuf_add(sb, &hdr_version, sizeof(uint32_t));
>
> -	put_be64(&tm, istate->fsmonitor_last_update);
> -	strbuf_add(sb, &tm, sizeof(uint64_t));
> +	strbuf_addstr(sb, istate->fsmonitor_last_update);
> +	strbuf_addch(sb, 0); /* Want to keep a NUL */

I have a slight preference to use '\0' here, which my brain somehow reads
as `NUL`.

> +
>  	fixup = sb->len;
>  	strbuf_add(sb, &ewah_size, sizeof(uint32_t)); /* we'll fix this up later */
>
> @@ -110,9 +120,9 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
>  }
>
>  /*
> - * Call the query-fsmonitor hook passing the time of the last saved results.
> + * Call the query-fsmonitor hook passing the last update token of the saved results.
>   */
> -static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *query_result)
> +static int query_fsmonitor(int version, const char *last_update, struct strbuf *query_result)
>  {
>  	struct child_process cp = CHILD_PROCESS_INIT;
>
> @@ -121,7 +131,7 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que
>
>  	argv_array_push(&cp.args, core_fsmonitor);
>  	argv_array_pushf(&cp.args, "%d", version);
> -	argv_array_pushf(&cp.args, "%" PRIuMAX, (uintmax_t)last_update);
> +	argv_array_pushf(&cp.args, "%s", last_update);

Maybe `argv_array_push(&cp.args, last_update)`?

>  	cp.use_shell = 1;
>  	cp.dir = get_git_work_tree();
>
> @@ -151,6 +161,7 @@ void refresh_fsmonitor(struct index_state *istate)
>  	int query_success = 0;
>  	size_t bol; /* beginning of line */
>  	uint64_t last_update;
> +	struct strbuf last_update_token = STRBUF_INIT;
>  	char *buf;
>  	unsigned int i;
>
> @@ -164,6 +175,7 @@ void refresh_fsmonitor(struct index_state *istate)
>  	 * should be inclusive to ensure we don't miss potential changes.
>  	 */
>  	last_update = getnanotime();
> +	strbuf_addf(&last_update_token, "%"PRIu64"", last_update);
>
>  	/*
>  	 * If we have a last update time, call query_fsmonitor for the set of
> @@ -217,18 +229,21 @@ void refresh_fsmonitor(struct index_state *istate)
>  	}
>  	strbuf_release(&query_result);
>
> -	/* Now that we've updated istate, save the last_update time */
> -	istate->fsmonitor_last_update = last_update;
> +	/* Now that we've updated istate, save the last_update_token */
> +	FREE_AND_NULL(istate->fsmonitor_last_update);
> +	istate->fsmonitor_last_update = strbuf_detach(&last_update_token, NULL);

I see quite a few `strbuf_detach()` calls in this patch, and could imagine
that this is a good indicator that the `fsmonitor_last_update` attribute
of `struct index_state` could be a `struct strbuf` instead, that is
`strbuf_reset()`ed and `strbuf_addf()`ed to, rather than having the
strbufs as local variables.

Other than that, this patch looks very good to me.

Thanks!
Dscho

>  }
>
>  void add_fsmonitor(struct index_state *istate)
>  {
>  	unsigned int i;
> +	struct strbuf last_update = STRBUF_INIT;
>
>  	if (!istate->fsmonitor_last_update) {
>  		trace_printf_key(&trace_fsmonitor, "add fsmonitor");
>  		istate->cache_changed |= FSMONITOR_CHANGED;
> -		istate->fsmonitor_last_update = getnanotime();
> +		strbuf_addf(&last_update, "%"PRIu64"", getnanotime());
> +		istate->fsmonitor_last_update = strbuf_detach(&last_update, NULL);
>
>  		/* reset the fsmonitor state */
>  		for (i = 0; i < istate->cache_nr; i++)
> @@ -250,7 +265,7 @@ void remove_fsmonitor(struct index_state *istate)
>  	if (istate->fsmonitor_last_update) {
>  		trace_printf_key(&trace_fsmonitor, "remove fsmonitor");
>  		istate->cache_changed |= FSMONITOR_CHANGED;
> -		istate->fsmonitor_last_update = 0;
> +		FREE_AND_NULL(istate->fsmonitor_last_update);
>  	}
>  }
>
> diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c
> index 2786f47088..975f0ac890 100644
> --- a/t/helper/test-dump-fsmonitor.c
> +++ b/t/helper/test-dump-fsmonitor.c
> @@ -13,7 +13,7 @@ int cmd__dump_fsmonitor(int ac, const char **av)
>  		printf("no fsmonitor\n");
>  		return 0;
>  	}
> -	printf("fsmonitor last update %"PRIuMAX"\n", (uintmax_t)istate->fsmonitor_last_update);
> +	printf("fsmonitor last update %s\n", istate->fsmonitor_last_update);
>
>  	for (i = 0; i < istate->cache_nr; i++)
>  		printf((istate->cache[i]->ce_flags & CE_FSMONITOR_VALID) ? "+" : "-");
> --
> gitgitgadget
>
>

@@ -6,8 +6,10 @@
#include "run-command.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Kevin,

On Thu, 23 Jan 2020, Kevin Willford via GitGitGadget wrote:

> diff --git a/fsmonitor.c b/fsmonitor.c
> index 9860587225..932bd9012d 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -175,27 +195,60 @@ void refresh_fsmonitor(struct index_state *istate)
>  	 * should be inclusive to ensure we don't miss potential changes.
>  	 */
>  	last_update = getnanotime();
> -	strbuf_addf(&last_update_token, "%"PRIu64"", last_update);
> +	if (hook_version == HOOK_INTERFACE_VERSION1)
> +		strbuf_addf(&last_update_token, "%"PRIu64"", last_update);
>
>  	/*
> -	 * If we have a last update time, call query_fsmonitor for the set of
> -	 * changes since that time, else assume everything is possibly dirty
> +	 * If we have a last update token, call query_fsmonitor for the set of
> +	 * changes since that token, else assume everything is possibly dirty
>  	 * and check it all.
>  	 */
>  	if (istate->fsmonitor_last_update) {
> -		query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION,
> -			istate->fsmonitor_last_update, &query_result);
> +		if (hook_version == -1 || hook_version == HOOK_INTERFACE_VERSION2) {
> +			query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION2,
> +				istate->fsmonitor_last_update, &query_result);
> +
> +			if (query_success) {
> +				if (hook_version < 0)
> +					hook_version = HOOK_INTERFACE_VERSION2;
> +
> +				/*
> +				 * First entry will be the last update token
> +				 * Need to use a char * variable because static
> +				 * analysis was suggesting to use strbuf_addbuf
> +				 * but we don't want to copy the entire strbuf
> +				 * only the the chars up to the first NUL
> +				 */
> +				buf = query_result.buf;
> +				strbuf_addstr(&last_update_token, buf);
> +				if (!last_update_token.len) {
> +					warning("Empty last update token.");
> +					query_success = 0;
> +				} else {
> +					bol = last_update_token.len + 1;
> +				}
> +			} else if (hook_version < 0) {
> +				hook_version = HOOK_INTERFACE_VERSION1;
> +				if (!last_update_token.len)
> +					strbuf_addf(&last_update_token, "%"PRIu64"", last_update);
> +			}
> +		}
> +
> +		if (hook_version == HOOK_INTERFACE_VERSION1) {
> +			query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION1,
> +				istate->fsmonitor_last_update, &query_result);

I suspect that `istate->fsmonitor_last_update` might be incorrect here and
that you need `last_update_token.buf` instead. Besides...

> +		}

I could imagine that this would be easier to read if you initialized

	int interface_version = HOOK_INTERFACE_VERSION2;
	const char *token = istate->fsmonitor_last_update;

and then, at the beginning of this hunk, where you already added an `if`,
extend it to

	if (hook_version == HOOK_INTERFACE_VERSION1) {
		interface_version = HOOK_INTERFACE_VERSION1;
		strbuf_addf(&last_update_token, "%"PRIu64"", last_update);
		token = last_update_token.buf;
	}

Now, you can call

		query_success = !query_fsmonitor(interface_version,
			token, &query_result);
		if (!query_success && hook_version < 0) {
			hook_version = HOOK_INTERFACE_VERSION1;
			strbuf_reset(&last_update_token);
			strbuf_addf(&last_update_token, "%"PRIu64"", last_update);
			token = last_update_token.buf;
			query_success = !query_fsmonitor(hook_version,
				token, &query_result);
		}

		if (query_success && interface_version == HOOK_INTERFACE_VERSION2)
			bol = last_update_token.len + 1;


Technically, you could force `hook_version` to non-negative, via:

		if (hook_version < 0) {
			if (query_success)
				hook_version = HOOK_INTERFACE_VERSION2;
			else {
				hook_version = HOOK_INTERFACE_VERSION1;
				strbuf_reset(&last_update_token);
				strbuf_addf(&last_update_token, "%"PRIu64"", last_update);
				token = last_update_token.buf;
				query_success = !query_fsmonitor(hook_version,
					token, &query_result);
			}
		}

but that would not make anything quicker, and would make the code more
convoluted.

It is good that you keep the `fsmonitor-all` and `fsmonitor-watchman`
versions at 1, to verify that this fall-back mechanism works. I wonder
whether we should add one explicit test for that, so that those two hooks
can be upgraded to the interface v2.

Thanks,
Dscho

> +
>  		trace_performance_since(last_update, "fsmonitor process '%s'", core_fsmonitor);
>  		trace_printf_key(&trace_fsmonitor, "fsmonitor process '%s' returned %s",
>  			core_fsmonitor, query_success ? "success" : "failure");
>  	}
>
>  	/* a fsmonitor process can return '/' to indicate all entries are invalid */
> -	if (query_success && query_result.buf[0] != '/') {
> +	if (query_success && query_result.buf[bol] != '/') {
>  		/* Mark all entries returned by the monitor as dirty */
>  		buf = query_result.buf;
> -		bol = 0;
> -		for (i = 0; i < query_result.len; i++) {
> +		for (i = bol; i < query_result.len; i++) {
>  			if (buf[i] != '\0')
>  				continue;
>  			fsmonitor_refresh_callback(istate, buf + bol);
> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> index cf0fda2d5a..fbfdcca000 100755
> --- a/t/t7519-status-fsmonitor.sh
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -32,11 +32,12 @@ write_integration_script () {
>  		echo "$0: exactly 2 arguments expected"
>  		exit 2
>  	fi
> -	if test "$1" != 1
> +	if test "$1" != 2
>  	then
>  		echo "Unsupported core.fsmonitor hook version." >&2
>  		exit 1
>  	fi
> +	printf "last_update_token\0"
>  	printf "untracked\0"
>  	printf "dir1/untracked\0"
>  	printf "dir2/untracked\0"
> @@ -107,6 +108,7 @@ EOF
>  # test that "update-index --fsmonitor-valid" sets the fsmonitor valid bit
>  test_expect_success 'update-index --fsmonitor-valid" sets the fsmonitor valid bit' '
>  	write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +		printf "last_update_token\0"
>  	EOF
>  	git update-index --fsmonitor &&
>  	git update-index --fsmonitor-valid dir1/modified &&
> @@ -167,6 +169,7 @@ EOF
>  # test that newly added files are marked valid
>  test_expect_success 'newly added files are marked valid' '
>  	write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +		printf "last_update_token\0"
>  	EOF
>  	git add new &&
>  	git add dir1/new &&
> @@ -207,6 +210,7 @@ EOF
>  # test that *only* files returned by the integration script get flagged as invalid
>  test_expect_success '*only* files returned by the integration script get flagged as invalid' '
>  	write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +	printf "last_update_token\0"
>  	printf "dir1/modified\0"
>  	EOF
>  	clean_repo &&
> @@ -276,6 +280,7 @@ do
>  		# (if enabled) files unless it is told about them.
>  		test_expect_success "status doesn't detect unreported modifications" '
>  			write_script .git/hooks/fsmonitor-test<<-\EOF &&
> +			printf "last_update_token\0"
>  			:>marker
>  			EOF
>  			clean_repo &&
> diff --git a/t/t7519/fsmonitor-all b/t/t7519/fsmonitor-all
> index 691bc94dc2..94ab66bd3d 100755
> --- a/t/t7519/fsmonitor-all
> +++ b/t/t7519/fsmonitor-all
> @@ -17,7 +17,6 @@ fi
>
>  if test "$1" != 1
>  then
> -	echo "Unsupported core.fsmonitor hook version." >&2
>  	exit 1
>  fi
>
> diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
> index d8e7a1e5ba..264b9daf83 100755
> --- a/t/t7519/fsmonitor-watchman
> +++ b/t/t7519/fsmonitor-watchman
> @@ -26,8 +26,7 @@ if ($version == 1) {
>  	# subtract one second to make sure watchman will return all changes
>  	$time = int ($time / 1000000000) - 1;
>  } else {
> -	die "Unsupported query-fsmonitor hook version '$version'.\n" .
> -	    "Falling back to scanning...\n";
> +	exit 1;
>  }
>
>  my $git_work_tree;
> --
> gitgitgadget
>
>

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 5, 2020

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Kevin,

On Thu, 23 Jan 2020, Kevin Willford via GitGitGadget wrote:

> Only change for this version is to sync the watchman script that is in
> templates with the one in t/t7519 since it had some commented out debug code
> that could be useful.
>
> Also haven't received any feedback on this patch series which would be good
> to make sure I'm not making any obvious mistakes.

I just read through the series (focusing on the C part) and offered a
couple of suggestions, nothing major.

While I would love to see the fallback-mechanism simplified a little I
think this patch series is already in a pretty good shape.

Thanks,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 5, 2020

This patch series was integrated into pu via git@03d594d.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 7, 2020

This patch series was integrated into pu via git@45de354.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 7, 2020

This patch series was integrated into next via git@3c42195.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 11, 2020

This patch series was integrated into pu via git@8096fd0.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 12, 2020

This patch series was integrated into pu via git@37bd9e9.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 14, 2020

This patch series was integrated into pu via git@c9a33e5.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 14, 2020

This patch series was integrated into next via git@c9a33e5.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 14, 2020

This patch series was integrated into master via git@c9a33e5.

@gitgitgadget gitgitgadget bot added the master label Feb 14, 2020
@gitgitgadget gitgitgadget bot closed this Feb 14, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Feb 14, 2020

Closed via c9a33e5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant