Skip to content

Namespace prebuild tracing tags #8648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions components/server/ee/src/prebuilds/prebuild-manager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,11 @@ export class PrebuildManager {

async startPrebuild(ctx: TraceContext, { contextURL, cloneURL, commit, branch, project, user }: StartPrebuildParams): Promise<StartPrebuildResult> {
const span = TraceContext.startSpan("startPrebuild", ctx);
span.setTag("contextURL", contextURL);
span.setTag("cloneURL", cloneURL);
span.setTag("commit", commit);
span.setTag("prebuild.contextURL", contextURL);
span.setTag("prebuild.cloneURL", cloneURL);
span.setTag("prebuild.commit", commit);
span.setTag("prebuild.branch", branch);
span.setTag("prebuild.project", project);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project, for example, is a first class domain property which exists in other features. It feels odd to namespace it with prebuild. This effectively prevents tracing across services and features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@easyCZ I agree. But it's hard to get this right if you want to follow the "separate by domain" pattern. Like, in this case that would give:

context.contextURL
context.cloneURL
context.commit
context.branch
project

But:

  • our internal shapes are different
  • there are times where it's not clear what "the" commit of a request is
  • impossible to enforce correctness, and hard to get right because it's not function-local

IMO it's easier to follow the "namespace by span/function" pattern, except for very generic use cases (we do that for Owner/Workspace/Instance in GitpodServerImpl, for instance). Lookup works surprisingly well - you have the code open for that mostly anyways.

BTW, here's a document started by @mads-hartmann on the topic of naming convetions. I bet he has an opinion on this as well 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to namespace your attributes depends on how you'd like to query for it.

  1. Find all spans related to a specific entity (e.g. user). Then you'd do something like COUNT where user.id = "foobar" group by name. A query like that is only possible if we use user.id for the user id everywhere.
  2. Filter/group a specific span based on attributes. For this it's a much nicer experience in Honeycomb if the attributes are namespaced using the name of the span. E.g. in this case startPrebuild.cloneURL. When you write your Honeycomb query you can then do COUNT where name = startPrebuild, startPrebuild.<autocomplete> and it will show all the attributes that it has ever seen for that span. If we didn't namespace it then you'd have to know attributes might be available on the span, e.g. that context.cloneURL happens to be a valid attribute for that span ☺️

So for this PR I would use the startPrebuild namespace for all the attributes (so rename prebuild to startPrebuild). If you believe the use-case (1) is valid for any of the attributes then just add those as well. A bit of duplication is fine, we pay per event, not by the number of attributes.

If you do add a new top-level namespace please do update the document though! ☺️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience, I'd actually expect both to work. Let me elaborate:

I would indeed expect to be able to find all interactions by a given user, or everything related to a given workspace. Here I'd use the user.id or workspace.id fields.

But I'd also expect to be able to drill deeper and get all prebuild (startPrebuild) traces and do operations on them which aggregate over (or away) the user.id or workspace.id fields. I'd expect to query this with a more specific tag - component: prebuild or something along the lines. This would be yet another field which helps me go deeper. Each span could then also have very custom attributes of course but those would be required to be namespaced as I wouldn't expect these to be cross-cutting, or have semantics outside of a given component: prebuild trace.

I'm happy to follow whichever guidence you provide as I don't yet have enough context but it feels we're preventing ourselves from leveraging some features of distributed tracing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to follow whichever guidence you provide as I don't yet have enough context but it feels we're preventing ourselves from leveraging some features of distributed tracing.

We are not, it's a trade-off. If you want to get discoverability of what attributes are available on a specific span as you write your query for that specific span, then have to go with (2).

If you want to find all spans for a specific entity go with (1).

If you need both, go with (1) and (2).

In my experience, when you're debugging you're mostly trying to figure out why a specific operation (span) is slow. For that you want to run some aggregates over all of those spans and do some grouping on the attributes to help you drill down. So providing a good experience when writing queries of the form <AGGREGATE> WHERE name = startPrebuild GROUP BY startPrebuild.XXX <CODE COMPLETE> is a worthwhile thing to optimise for.

And as I said, you can just add both startPrebuild.context.id and context.id. A bit of duplication doesn't hurt. ☺️


try {
if (user.blocked) {
Expand Down Expand Up @@ -134,16 +136,16 @@ export class PrebuildManager {
prebuild.error = "Prebuild is rate limited. Please contact Gitpod if you believe this happened in error.";

await this.workspaceDB.trace({ span }).storePrebuiltWorkspace(prebuild);
span.setTag("starting", false);
span.setTag("ratelimited", true);
span.setTag("prebuild.starting", false);
span.setTag("prebuild.ratelimited", true);
return {
wsid: workspace.id,
prebuildId: prebuild.id,
done: false,
};
}

span.setTag("starting", true);
span.setTag("prebuild.starting", true);
const projectEnvVars = await projectEnvVarsPromise;
await this.workspaceStarter.startWorkspace({ span }, workspace, user, [], projectEnvVars, {excludeFeatureFlags: ['full_workspace_backup']});
return { prebuildId: prebuild.id, wsid: workspace.id, done: false };
Expand Down