-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Blazor - rendering metrics and tracing #61609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
328a584
to
cebb68e
Compare
# Conflicts: # src/Components/Components/src/PublicAPI.Unshipped.txt
- add tracing
You're adding a lot of metrics here. I think you should do some performance testing. There is performance overhead of metrics - they require some synronization when incrementing counters and recording values. Having many low level metrics could cause performance issues. |
I removed few and kept only the most useful ones. I have 2 remaining issues
|
I don't know how Blazor circuits are created, but if it's from a Hub method then Activity.Current won't be the HTTP activity. We hop off the HTTP activity on purpose in SignalR: aspnetcore/src/SignalR/server/Core/src/Internal/DefaultHubDispatcher.cs Lines 398 to 403 in 9f2b088
Is that because the HTTP request is still running? I don't think activites show up in the dashboard until they're stopped, and if you're using SignalR you're likely using a websocket request which is long running. |
I'm capturing
This is it, thank you @BrennanConroy ! |
It's also topic to discuss for long running activities on Blazor.
We have 2 way how to deal with them I think
Right now I have short+links implementation. I guess developers use OTEL mostly in production and so even the long running traces would be recorded already. But maybe developers also use it in inner dev loop ? In which case it would be great to have "trace preview" for thing that started but not stopped yet. To not get confused the same way as I did. |
- cleanup
Adding a general naming one here - |
description: "Total number of exceptions during browser event processing."); | ||
|
||
_parametersDuration = _meter.CreateHistogram( | ||
"aspnetcore.components.parameters.duration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a neophyte Blazor developer, I don't quite understand why you'd want metrics broken down to the level of parameters. I am probably misunderstanding what this represents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blazor "parameters" are properties of a component that can receive values from its parent component, marked with the [Parameter] attribute. They enable data to flow down from parent to child components. When Blazor parameters change, the component goes through a re-rendering cycle. I think it they are well defined term.
The duration measured here is the act of parameter propagation and the user business logic that is triggered by it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your feedback is that meaning of individual diagnostic instruments needs to be documented after we are done here.
cc @guardrex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally an experienced Blazor developer should be able to make a good guess at the meaning of the metric given just its name. I'm fine to assume those devs understand the meaning of 'parameters'. As described above it sounded like 'parameters' is a noun that doesn't inherently have a notion of time duration associated with it? Perhaps we could name this something like aspnetcore.components.update_parameters.duration
? I'm not sure if there is a better term than update that Blazor uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aspnetcore.components.update_parameters.duration
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the granularity here is probably too fine. I don't think we need to be tracking this on a per component/parameter basis - there could potentially be hundreds of those on a page. I would suggest that we focus on what the end user will see which is that they take an action and that results in an update to the page. That will admittedly include a network round-trip, but understanding it from the server level is probably sufficient as it is what is in the developers control (unlike the network from the browser)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user code that's running in the triggered events could make async HTTP call or database calls. If they do SELECT N+1 anti-pattern, it would be visible here.
Those problems are currently not easy to diagnose, especially if the components are from different vendors or teams.
I think it's good to know which component was rendered when state changed. How many times and how long it took.
The action they could take based on this data, is to cache/redesign data acquisition in their components or reduce number of components or tree depth. Maybe also reduce percentage of cases that the sub-tree is re-rendered on data propagation.
Maybe we could have separate meter called Microsoft.AspNetCore.Components.Lifecycle
which have this finer granularity and Microsoft.AspNetCore.Components
could be for the big events.
@danroth27 thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to know which component was rendered when state changed. How many times and how long it took.
I agree
|
I'm a bit confused by the distributed tracing part of this. Partly that might be my limited experience with Blazor. I'm not sure what period of time is being measured by the different OnCircuit, OnRoute, OnEvent spans. For example I know what a Blazor circuit is but I don't know what 'OnCircuit' is measuring. Is this a span that measures the entire duration of 1 blazor circuit? I'll probably have more questions once I understand what each of the spans represents. |
I don't think we currently define in our public Blazor docs what a render batch is. The only public mention of render batches that I could find is the CircuitOptions.MaxBufferedUnacknowledgedRenderBatches property. So, it's not clear to me what this value represents or whether it is useful. Should we be measuring something else that is more directly correlated the publicly documented component lifecycle? Or do we want to introduce the concept of a render batch in our docs?
Again, since render batch isn't currently a publicly defined concept, should we be counting the number of exceptions per some other period?
I assume this is an average duration of all browser event handlers across the entire app regardless of render mode. That seems reasonable as a high-level view of the responsiveness of the app. But what does the "asynchronously" imply? Are synchronous event handlers not included in this metric?
I'm not sure what's included in the "processing" of component parameters. Is this the duration of the
Is this a total count of all page navigations across the entire app regardless of render mode? What would that be used for? |
I was under the (perhaps mistaken?) impression that the scenarios using the metrics had already been looked at and appropriate metrics identified. If not, perhaps a good starting point is to identify what diagnostic questions we'd like users to be able to solve here. Usually I'd recommend:
|
Co-authored-by: Noah Falk <[email protected]>
Co-authored-by: Noah Falk <[email protected]>
Co-authored-by: Noah Falk <[email protected]>
P0-P2 - This is useful angle, thanks!
Note, I also mention |
This goes back to my questions about long running activities. We can definitely improve naming.
Right now, the short circuit and route activities mostly serve as something that click event activities could link to. For the context. |
Maybe we just need to rename it? Anyway, this is more on the troubleshooting side of misbehaving component. Producing long diffs/batches leads to network traffic, latency and slow rendering. As I suggested above, we could have separate namespace for it with separate opt-in.
We also count exceptions per click/event. But I need to see if the exceptions from batch related problems would appear there.
At the moment this works only for SignalR interactive. I think we could also make it work for form-submit.
I already renamed this and dropped "async". It means including your DB request or whatever async business logic.
Yes, or
Except WASM.
It has the route pattern as tag/dimension that you can use as filter. It's more business oriented KPI. Which of my pages are hot ? |
Making circuit/route activity/trace long lived has troubles with re-installing them into If we keep them short, maybe they should be literally 0ms long. Just an context anchor, grouping other traces. Re Activity names: they are not very visible in the Aspire UI, and Circuit Activity/trace is created in internal Route Activity/trace is created in Regarding click/event. We already have concept of event. The activity should be active thru whole duration of I would like event Activity also trigger for form submit, interop call from JS, and enhanced navigation. Maybe we can change it to |
Sam: Is circuitId useful for DisplayName of Circuit activity? What else we could display there instead. Could we have IP address ? |
I met with @pavelsavara and I now understand what everything is for - it looks great. I think customers will be really happy with this. My concerns about granularity in terms of sending too much data have been mitigated. |
My mind set - if a customer is having an issue with your site, and calls IT to complain - how do they match the traces to the user? Is there somewhere that ID gets displayed to them? If we don't stick in that data (which is a good from a being secure by default position) is there a hook-point we can document before the activity is finished that the customer can access the activity and add their extra tags to it? As the circuit is created as an instantaneous activity, it may not be on the stack for many calls to user code where they can access it. |
I think this problem of mapping traces to users in hotline is not specific to Blazor. I'm linking the HTTP Activity/trace that created the circuit, so if there is more tags on the HTTP trace, they could use that (after the session and the long running HTTP/WS connection is finished). Sometimes there is authenticated user in the HTTP context, but I think it's probably not good for use to expose any PII. If the app developer wanted to add more tags, I believe that they could capture Maybe the app developer could also add "show my circuitID" into application settings menu and the IT call center could ask them to click it. The circuitID is random number and it's not a secret from security perspective. |
The activity is created in CircuitFactory at line 127 and is stopped at line 173. Is there any user code executed between these points where the activity would be active, so they can retrieve it and add tags. Once its stopped, AFAIK its too late to add anything to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some naming suggestions.
For sessions, I wonder if when log messages are fired, is the activity context going to be the session activity, and if not is there a way to force it to be? @noahfalk this is an issue with essentially zero length spans - you might want to force log messages to be parented to it but something else is the activity at that point?
activity.AddLink(new ActivityLink(httpContext)); | ||
} | ||
} | ||
activity.DisplayName = $"CIRCUIT {circuitId ?? ""}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activity.DisplayName = $"CIRCUIT {circuitId ?? ""}"; | |
activity.DisplayName = $"Circuit {circuitId ?? ""}"; |
Its in all-caps for HTTP as its the method name, this is our own term.
_httpContext = CaptureHttpContext(); | ||
} | ||
|
||
var activity = ActivitySource.CreateActivity(OnRouteName, ActivityKind.Server, parentId: null, null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In talking with Dan Roth, the ActivityKind should probably be Internal when running in WASM. Its neither really client nor server in that case.
} | ||
} | ||
|
||
activity.DisplayName = $"ROUTE {route ?? "unknown"} -> {componentType ?? "unknown"}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activity.DisplayName = $"ROUTE {route ?? "unknown"} -> {componentType ?? "unknown"}"; | |
activity.DisplayName = $"Route {route ?? "[unknown path]"} -> {componentType ?? "[unknown component]"}"; |
} | ||
} | ||
|
||
activity.DisplayName = $"EVENT {attributeName ?? "unknown"} -> {componentType ?? "unknown"}.{methodName ?? "unknown"}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activity.DisplayName = $"EVENT {attributeName ?? "unknown"} -> {componentType ?? "unknown"}.{methodName ?? "unknown"}"; | |
activity.DisplayName = $"Event {attributeName ?? "[unknown]"} -> {componentType ?? "[unknown]"}.{methodName ?? "[unknown]"}"; |
"aspnetcore.components.update_parameters.duration", | ||
unit: "s", | ||
description: "Duration of processing component parameters.", | ||
advice: new InstrumentAdvice<double> { HistogramBucketBoundaries = MetricsConstants.ShortSecondsBucketBoundaries }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ShortSecondsBucketBoundaries
is designed for network level latencies such as an http request. I suspect that components may need buckets more focused on the sub ms level timings?
{ | ||
{ "component.type", componentType ?? "unknown" }, | ||
{ "component.method", methodName ?? "unknown" }, | ||
{ "attribute.name", attributeName ?? "unknown" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why this is based on the attribute, but its really also the event name. would that make more sense as "event"?
|
||
var tags = new TagList | ||
{ | ||
{ "component.type", componentType ?? "unknown" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code.class.name
? to match https://github.com/open-telemetry/semantic-conventions/blob/main/docs/attributes-registry/code.md
var tags = new TagList | ||
{ | ||
{ "component.type", componentType ?? "unknown" }, | ||
{ "component.method", methodName ?? "unknown" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code.function.name
is the semconv for this - it may mean we don't need a separate one for the component?
{ | ||
var tags = new TagList | ||
{ | ||
{ "diff.length.bucket", BucketEditLength(diffLength) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff.approx_length
? I was a bit confused by the bucket in the name. Totally the right thing to do to bucketize the sizes as that eliminates dimension explosion, but the name confused me as to whether the feature has buckets or something)
@lmolkova - do you know of a precedent for where this is done elsewhere?
// For Blazor/signalR sessions, which can last a long time. | ||
public static readonly IReadOnlyList<double> VeryLongSecondsBucketBoundaries = [0.5, 1, 2, 5, 10, 30, 60, 120, 300, 600, 1500, 60*60, 2 * 60 * 60, 4 * 60 * 60]; | ||
// For blazor circuit sessions, which can last a long time. | ||
public static readonly IReadOnlyList<double> VeryLongSecondsBucketBoundaries = [1, 10, 30, 1 * 60, 2 * 60, 3 * 60, 4 * 60, 5 * 60, 6 * 60, 7 * 60, 8 * 60, 9 * 60, 10 * 60, 1 * 60 * 60, 2 * 60 * 60, 3 * 60 * 60, 6 * 60 * 60, 12 * 60 * 60, 24 * 60 * 60]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should ideally be exponential eg
1, 3, 10, 30,
1*60, 3*60, 10*60, 30*60,
1*60*60, 3*60*60, 10*60*60, 24*60*60
This uses an approx multiplier of 3, while meeting user sensible boundaries
Better rendering metrics
new meter
Microsoft.AspNetCore.Components
aspnetcore.components.navigation.count
- Total number of route changes.aspnetcore.components.event.duration
- Duration of processing browser event asynchronously.aspnetcore.components.event.exceptions
- Duration of processing browser event asynchronously.new meter
Microsoft.AspNetCore.Components.Lifecycle
aspnetcore.components.update_parameters.duration
- Duration of processing component parameters asynchronously.aspnetcore.components.update_parameters.exceptions
- Duration of processing component parameters asynchronously.aspnetcore.components.rendering.batch.duration
- Duration of rendering batch.aspnetcore.components.rendering.batch.exceptions
- Total number of exceptions during batch rendering.Blazor activity tracing
Microsoft.AspNetCore.Components
Microsoft.AspNetCore.Components.OnCircuit
:CIRCUIT {circuitId}
circuit.id
Microsoft.AspNetCore.Components.OnRoute
:ROUTE {route} -> {componentType}
circuit.id
,component.type
,route
Microsoft.AspNetCore.Components.OnEvent
:EVENT {attributeName} -> {componentType}.{methodName}
circuit.id
,component.type
,component.method
,attribute.name
Feedback
IMeterFactory
to be available in DITODO - Metrics need to be documented at https://learn.microsoft.com/en-us/aspnet/core/log-mon/metrics/built-in
Out of scope
Contributes to #53613
Contributes to #29846
Feedback for #61516