-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Proposed Telemetry
To support projects such as YARP, we plan to add new counters (EventCounter, PollingCounter, etc) and EventSource events. The main goal of this new telemetry is to support automated tools that can measure the performance/health of the overall system.
Note: This top-post will be edited and keep up to date with changes.
Goals
- For 'dotnet-counters' scenarios add some basic counters that don't have any dimensions. The EventCounter, PollingCounter, etc. classes provide for in-memory aggregation and are generally lightweight.
- For consumption via automated tools where developers want to do their own custom aggregation, emit EventSource events that include both numeric data and some fixed dimension name/value pairs. Initially, the fixed dimensions will be things like 'host name', 'ip address', 'port', etc.
- Include some EventSource event names that have the 'Start' and 'Stop' suffixes which will automatically create in-process activity-id tracing for subsequent events. This allows for easy correlation for many events to higher-level activities. I.e. aggregate all SslStream operations to particular HttpClient requests. See Vance's blog for background.
NonGoals (for .NET 5)
- Distributed tracing (cross-process) using OpenTelemetry. Note that the automatic generation of activity-ids (GUIDs) with EventSource 'Start', 'Stop' events can be used in the future to map into OpenTelemetry APIs.
- Adding dimensions directly to EventCounter's. The current APIs for EventCounter don't work well for dimensions.
Prior Art
- .NET Framework networking performance counters.
- ASP.NET Core counters: vision, coding pattern, recent PR.
Overall Design
New EventSource classes and namespaces
We currently have an existing 'NetEventSource' class which is used for all current System.Net.* event tracing. The current design uses the same class name factored into partial classes per System.Net.* assembly.
The primary purpose of these existing events is for diagnostics and debugging. They were added in-masse when .NET Core was first ported from .NET Framework. All the existing System.Net Logging was copied and used as the original basis for .NET Core events. While the tracing in useful for debugging, it is very verbose and not ideally suited for consumption with automated tools.
These existing EventSource namespaces have a prefix of the form "Microsoft-System-Net-". It was a naming pattern that was based on historical patterns in Windows OS for ETW events. Newer EventSource and EventCounter classes in .NET Core have been added that now follow a .NET namespace naming pattern. ASP.NET Core uses this naming pattern. After discussions with the .NET Diagnostics team, it was suggested that we start new namespaces for telemetry focused events/counters.
The plan for these telemetry focused events and counters is to create new EventSource namespaces matching the namespace name of the .NET types (Sockets, NameResolution, Http, etc.). We should also treat all these new events and counters like APIs. We should be rigorous in documenting them since customer software will be used to consume them. We should also prioritize end-to-end scenarios as we add new events/counters.
Proposed Counters
Counters are listed below under each new EventSource namespace. They are described in the following form
EventCounter Name
(Display Name) Detailed information
The actual .NET class used for implementation might be EventCounter
or PollingCounter
etc. depending on implementation requirements.
In addition to the counters listed below, there will be some corresponding EventSource events that will add dimensional information such as 'host name', 'ip address', 'port' etc. The name of the EventSource event will be similar to the counter name. Actual details will be provided in subsequent PRs.
System.Net.Http
requests-queue-duration
(HTTP Requests Queue Duration)
The time-on-queue (min/max/mean) for all HTTP request objects that left the queue in the last interval since the process started.
HTTP request objects are measured as HttpRequestMessage objects being processed by SocketsHttpHandler.Send(Async).
requests-started
(HTTP Requests Started)
The cumulative number of HTTP request objects started since the process started.
requests-completed
(HTTP Requests Completed)
The cumulative number of HTTP request objects completed since the process started. Completed means that an HTTP response was received regardless of status code value.
requests-aborted
(HTTP Requests Aborted)
The cumulative number of HTTP request objects aborted since the process started. Aborted means that an Exception occurred during the SocketsHttpHandler.Send(Async) call.
requests-started-per-second
(HTTP Requests Started Rate)
The number of HTTP request objects started per second since the process started.
requests-completed-per-second
(HTTP Requests Completed Rate)
The number of HTTP request objects completed per second since the process started.
requests-aborted-per-second
(HTTP Requests Aborted Rate)
The number of HTTP request objects aborted per second since the process started.
http11-connections-single-pool-max
(HTTP 1.1 connections per pool)
The maximum number (high-water mark) of TCP connections across any HTTP 1.1 connection pool in SocketsHttpHandler.
http20-streams-single-connection-max
(HTTP 2.0 streams per connection)
The maximum number (high-water mark) of HTTP/2 streams across any HTTP 2.0 connection in SocketsHttpHandler.
Update 7/1/2020
Current connection counters as suggested below
http11-connections-current-total
(HTTP 1.1 current total connections)
The total number of open TCP connections across all HTTP 1.1 connection pools in SocketsHttpHandler.
http20-connections-current-total
(HTTP 2.0 current total connections)
The total number of open TCP connections across all HTTP/2 connection pools in SocketsHttpHandler.
Connection lifecycle events
Http11ConnectionOpened
New HTTP 1.1. connection established.
Http11ConnectionAborted
HTTP 1.1 connection abruptly terminated.
Http11ConnectionClosed
Existing HTTP 1.1 connection closed.
Http2ConnectionOpened
New HTTP/2 connection established.
Http2ConnectionAborted
HTTP/2 connection abruptly terminated.
Http2ConnectionClosed
Existing HTTP/2 connection closed.
Request timings not covered by other events as requested in #827
ResponseBegin
Start receiving response. Response end matches RequestStop
, so no extra event required.
System.Net.Security
tls-handshakes-per-second
(TLS Handshake Rate)
The number of TLS handshakes completed per second since the process started.
total-tls-handshakes
(Total TLS Handshakes)
The total number of TLS handshakes attempted since the process started. This includes all handshakes completely successfully or failed.
current-tls-handshakes
(Current TLS Handshakes)
The current number of TLS handshakes started but not yet completed.
failed-tls-handshakes
(Failed TLS Handshakes)
The number of TLS handshakes failed since the process started.
tls10-connections-open
(TLS 1.0 Connections Open)
The current number of active TLS 1.0 connections opened by SslStream objects.
tls11-connections-open
(TLS 1.1 Connections Open)
The current number of active TLS 1.1 connections opened by SslStream objects.
tls12-connections-open
(TLS 1.2 Connections Open)
The current number of active TLS 1.2 connections opened by SslStream objects.
tls13-connections-open
(TLS 1.3 Connections Open)
The current number of active TLS 1.3 connections opened by SslStream objects.
tls10-handshake-duration
(TLS 1.0 Handshake Duration)
The duration for completion (min/max/mean) for all TLS 1.0 handshakes since the process started.
tls11-handshake-duration
(TLS 1.1 Handshake Duration)
The duration for completion (min/max/mean) for all TLS 1.1 handshakes since the process started.
tls12-handshake-duration
(TLS 1.2 Handshake Duration)
The duration for completion (min/max/mean) for all TLS 1.2 handshakes since the process started.
tls13-handshake-duration
(TLS 1.3 Handshake Duration)
The duration for completion (min/max/mean) for all TLS 1.3 handshakes since the process started.
System.Net.NameResolution
dns-lookups-requested
(DNS Lookups Requested)
The cumulative number of DNS lookups requested since the process started.
dns-lookups-duration
(DNS Lookups Duration)
The duration for completion (min/max/mean) for all DNS name lookup queries since the process started.
System.Net.Sockets
bytes-received
(Bytes Received)
The cumulative total number of bytes received by all Socket objects since the process started.
bytes-sent
(Bytes Sent)
The cumulative number of bytes sent by all Socket objects since the process started.
connections-established
(Connections Established)
The cumulative total number of Socket objects for stream sockets that were ever connected since the process started.
datagrams-received
(Datagrams Received)
The cumulative total number of datagram packets received by all Socket objects since the process started.
datagrams-sent
(Datagrams Sent)
The cumulative total number of datagram packets sent by all Socket objects since the process started.