Skip to content

Namespace prebuild tracing tags #8648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Namespace prebuild tracing tags #8648

wants to merge 1 commit into from

Conversation

easyCZ
Copy link
Member

@easyCZ easyCZ commented Mar 7, 2022

Description

Namespace startPrebuild tags as mentioned in #8568 (comment)

Open Questions

  • Fields like contextURL, project and few others feel like shared domain tags between many RPCs - do we have any unified tagging for domain properties which are wide-reaching? In practice, we want to be able to find relevant traces for any of these properties and for that we need a common tag schema.

How to test

Release Notes

Namespace prebuild tags

@easyCZ easyCZ force-pushed the af/rate-limits branch 2 times, most recently from 5abcbe1 to 519dbe9 Compare March 8, 2022 11:51
Base automatically changed from af/rate-limits to main March 8, 2022 14:10
@roboquat roboquat added size/L and removed size/S labels Mar 8, 2022
@easyCZ easyCZ force-pushed the mp/prebuilds-tags branch from 024b4a5 to ac3c87e Compare March 8, 2022 14:26
@roboquat roboquat added size/S and removed size/L labels Mar 8, 2022
span.setTag("prebuild.cloneURL", cloneURL);
span.setTag("prebuild.commit", commit);
span.setTag("prebuild.branch", branch);
span.setTag("prebuild.project", project);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project, for example, is a first class domain property which exists in other features. It feels odd to namespace it with prebuild. This effectively prevents tracing across services and features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@easyCZ I agree. But it's hard to get this right if you want to follow the "separate by domain" pattern. Like, in this case that would give:

context.contextURL
context.cloneURL
context.commit
context.branch
project

But:

  • our internal shapes are different
  • there are times where it's not clear what "the" commit of a request is
  • impossible to enforce correctness, and hard to get right because it's not function-local

IMO it's easier to follow the "namespace by span/function" pattern, except for very generic use cases (we do that for Owner/Workspace/Instance in GitpodServerImpl, for instance). Lookup works surprisingly well - you have the code open for that mostly anyways.

BTW, here's a document started by @mads-hartmann on the topic of naming convetions. I bet he has an opinion on this as well 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to namespace your attributes depends on how you'd like to query for it.

  1. Find all spans related to a specific entity (e.g. user). Then you'd do something like COUNT where user.id = "foobar" group by name. A query like that is only possible if we use user.id for the user id everywhere.
  2. Filter/group a specific span based on attributes. For this it's a much nicer experience in Honeycomb if the attributes are namespaced using the name of the span. E.g. in this case startPrebuild.cloneURL. When you write your Honeycomb query you can then do COUNT where name = startPrebuild, startPrebuild.<autocomplete> and it will show all the attributes that it has ever seen for that span. If we didn't namespace it then you'd have to know attributes might be available on the span, e.g. that context.cloneURL happens to be a valid attribute for that span ☺️

So for this PR I would use the startPrebuild namespace for all the attributes (so rename prebuild to startPrebuild). If you believe the use-case (1) is valid for any of the attributes then just add those as well. A bit of duplication is fine, we pay per event, not by the number of attributes.

If you do add a new top-level namespace please do update the document though! ☺️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience, I'd actually expect both to work. Let me elaborate:

I would indeed expect to be able to find all interactions by a given user, or everything related to a given workspace. Here I'd use the user.id or workspace.id fields.

But I'd also expect to be able to drill deeper and get all prebuild (startPrebuild) traces and do operations on them which aggregate over (or away) the user.id or workspace.id fields. I'd expect to query this with a more specific tag - component: prebuild or something along the lines. This would be yet another field which helps me go deeper. Each span could then also have very custom attributes of course but those would be required to be namespaced as I wouldn't expect these to be cross-cutting, or have semantics outside of a given component: prebuild trace.

I'm happy to follow whichever guidence you provide as I don't yet have enough context but it feels we're preventing ourselves from leveraging some features of distributed tracing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to follow whichever guidence you provide as I don't yet have enough context but it feels we're preventing ourselves from leveraging some features of distributed tracing.

We are not, it's a trade-off. If you want to get discoverability of what attributes are available on a specific span as you write your query for that specific span, then have to go with (2).

If you want to find all spans for a specific entity go with (1).

If you need both, go with (1) and (2).

In my experience, when you're debugging you're mostly trying to figure out why a specific operation (span) is slow. For that you want to run some aggregates over all of those spans and do some grouping on the attributes to help you drill down. So providing a good experience when writing queries of the form <AGGREGATE> WHERE name = startPrebuild GROUP BY startPrebuild.XXX <CODE COMPLETE> is a worthwhile thing to optimise for.

And as I said, you can just add both startPrebuild.context.id and context.id. A bit of duplication doesn't hurt. ☺️

@easyCZ easyCZ marked this pull request as ready for review March 8, 2022 14:28
@easyCZ easyCZ requested a review from a team March 8, 2022 14:28
@easyCZ easyCZ changed the title Namespace prebuild tags Namespace prebuild tracing tags Mar 8, 2022
@github-actions github-actions bot added the team: webapp Issue belongs to the WebApp team label Mar 8, 2022
@geropl geropl marked this pull request as draft March 11, 2022 16:46
@codecov
Copy link

codecov bot commented Mar 11, 2022

Codecov Report

Merging #8648 (ac3c87e) into main (54a765d) will decrease coverage by 3.83%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             main   #8648      +/-   ##
=========================================
- Coverage   11.17%   7.34%   -3.84%     
=========================================
  Files          18      32      +14     
  Lines         993    2234    +1241     
=========================================
+ Hits          111     164      +53     
- Misses        880    2067    +1187     
- Partials        2       3       +1     
Flag Coverage Δ
components-gitpod-cli-app 11.17% <ø> (ø)
install-installer-raw-app 4.27% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
install/installer/pkg/common/ca.go 0.00% <0.00%> (ø)
...l/installer/pkg/components/ws-manager/configmap.go 23.75% <0.00%> (ø)
install/installer/pkg/common/render.go 0.00% <0.00%> (ø)
...installer/pkg/components/ws-manager/rolebinding.go 0.00% <0.00%> (ø)
install/installer/pkg/common/storage.go 0.00% <0.00%> (ø)
install/installer/pkg/common/objects.go 0.00% <0.00%> (ø)
install/installer/pkg/common/networkpolicies.go 0.00% <0.00%> (ø)
.../installer/pkg/components/ws-manager/deployment.go 0.00% <0.00%> (ø)
...staller/pkg/components/ws-manager/networkpolicy.go 0.00% <0.00%> (ø)
...components/ws-manager/unpriviledged-rolebinding.go 0.00% <0.00%> (ø)
... and 4 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@stale
Copy link

stale bot commented Mar 22, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the meta: stale This issue/PR is stale and will be closed soon label Mar 22, 2022
@stale stale bot closed this Mar 27, 2022
@easyCZ easyCZ deleted the mp/prebuilds-tags branch March 27, 2022 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress meta: stale This issue/PR is stale and will be closed soon release-note size/S team: webapp Issue belongs to the WebApp team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants