-
Notifications
You must be signed in to change notification settings - Fork 3.9k
otel: subchannel metrics #12202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
otel: subchannel metrics #12202
Conversation
eabbf58
to
9595507
Compare
9595507
to
c713561
Compare
@@ -415,7 +415,7 @@ void exitIdleMode() { | |||
LbHelperImpl lbHelper = new LbHelperImpl(); | |||
lbHelper.lb = loadBalancerFactory.newLoadBalancer(lbHelper); | |||
// Delay setting lbHelper until fully initialized, since loadBalancerFactory is user code and | |||
// may throw. We don't want to confuse our state, even if we will enter panic mode. | |||
// may throw. We don't want to confuse our state, even if we enter panic mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is more accurate as it is. After entering panic mode there is nothing much we can do about state, the comment implies delaying entering that panic mode and maintaining a sane state for the channel for as long as possible before bringing in potential user code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a grammar change...
addressIndex.getCurrentEagAttributes(), NameResolver.ATTR_BACKEND_SERVICE), | ||
getAttributeOrDefault( | ||
addressIndex.getCurrentEagAttributes(), LoadBalancer.ATTR_LOCALITY_NAME), | ||
"Peer Pressure", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not in the gRFC? Instead there is a "List of allowed values for grpc.disconnect_error".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Phase 1 we won't be plumbing disconnect_error, will raise another PR with this as the base branch for the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So? We can use a valid value now: unknown
. Using "unknown" means we are technically implementing the gRFC with this change, and it is just an optimization to reduce the amount of unknowns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
core/src/main/java/io/grpc/internal/PickFirstLeafLoadBalancer.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/grpc/internal/DelayedClientTransport.java
Outdated
Show resolved
Hide resolved
* @param optionalLabelValues the optional label values for the metric. | ||
*/ | ||
@Override | ||
public void addLongUpDownCounter(LongUpDownCounterMetricInstrument metricInstrument, long value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just wrappers on the underlying OTel API for UpDownCounter...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but I do see logic here and there are still things to check. See that we have tests in MetricRecorderImplTest for addLongCounter(), which is very similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -117,6 +119,22 @@ public void addLongCounter(LongCounterMetricInstrument metricInstrument, long va | |||
counter.add(value, attributes); | |||
} | |||
|
|||
@Override | |||
public void addLongUpDownCounter(LongUpDownCounterMetricInstrument metricInstrument, long value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added for the one above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about in OpenTelemetryMetricsSinkTest for the updown metric ? Existing tests also check for metric enabled and disabled cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
* @return the newly created LongUpDownCounterMetricInstrument | ||
* @throws IllegalStateException if a metric with the same name already exists | ||
*/ | ||
public LongUpDownCounterMetricInstrument registerLongUpDownCounter(String name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added for the one above
* @param optionalLabelValues the optional label values for the metric. | ||
*/ | ||
@Override | ||
public void addLongUpDownCounter(LongUpDownCounterMetricInstrument metricInstrument, long value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but I do see logic here and there are still things to check. See that we have tests in MetricRecorderImplTest for addLongCounter(), which is very similar.
addressIndex.getCurrentEagAttributes(), NameResolver.ATTR_BACKEND_SERVICE), | ||
getAttributeOrDefault( | ||
addressIndex.getCurrentEagAttributes(), LoadBalancer.ATTR_LOCALITY_NAME), | ||
"Peer Pressure", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So? We can use a valid value now: unknown
. Using "unknown" means we are technically implementing the gRFC with this change, and it is just an optimization to reduce the amount of unknowns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
@@ -117,6 +119,22 @@ public void addLongCounter(LongCounterMetricInstrument metricInstrument, long va | |||
counter.add(value, attributes); | |||
} | |||
|
|||
@Override | |||
public void addLongUpDownCounter(LongUpDownCounterMetricInstrument metricInstrument, long value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about in OpenTelemetryMetricsSinkTest for the updown metric ? Existing tests also check for metric enabled and disabled cases.
@@ -75,7 +76,7 @@ public SubchannelMetrics(MetricRecorder metricRecorder) { | |||
); | |||
} | |||
|
|||
public void recordConnectionAttemptSucceeded(OtelMetricsAttributes labelSet) { | |||
public void recordConnectionAttemptSucceeded(MetricsAttributes labelSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you consider inlining this class into InternalSubchannel. MetricsAttributes
is awkward, as it isn't at all clear what values are used when. The caller is basically expected to know the implementation here, yet each method has at most two lines. We're not making anything simpler by adding this abstraction. Inlining these would seem to make things much more obvious. It's not like this has tests either; any change here will require a change to InternalSubchannelTest. I'd agree it is disappointing that registering the metrics is so long, but they aren't complicated lines. So complexity is low. (If we needed this API for real, I'd seriously consider having a separate class type passed into each of the methods.)
DisconnectError will end up needing to be public, in order to be used by transports. DisconnectError could be its own file, but really I'd put it in ManagedClientTransport
, as it is really a part of the transportShutdown() API. But either way, this doesn't seem like a good place for it, as this class is just a helper for InternalSubchannel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it more, having this utility with just a ton of String arguments is really little different than the metric recorder API itself. In both cases we have to line up the arguments. So we could keep this utility class and just pass in a bunch of arguments instead of trying to package it in MetricsAttributes.
Thinking even more, we can sort of have our cake and eat it too without making a bunch of builders. If use the "lots o arguments" approach, but we put comments in the calling code (InternalSubchannel) for each argument, we can see that the argument names match. And ErrorProne will check that the comments match the variable names in SubchannelMetrics. See https://errorprone.info/bugpattern/ParameterName
(There's still the question of "how much value does the utility provide" and we could still choose to ditch it. But if you do want to keep it, I think this turns out better for this very specific problem at hand.)
* The name of the locality that this EquivalentAddressGroup is in. | ||
*/ | ||
public static final Attributes.Key<String> ATTR_LOCALITY_NAME = | ||
Attributes.Key.create("io.grpc.lb.locality"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string here is just for debugging. We typically try to match the API where the value is accessible. See ATTR_AUTHORITY_OVERRIDE just above, for example.
Implements A94