Skip to content

Conversation

@jsm
Copy link
Contributor

@jsm jsm commented Jul 23, 2024

Note: this is a degradation introduced in this PR: #5192

Which used to wrap calls to addChannelzTraceEvent in a if channelz.IsOn() {

Problem

Large and frequent calls to this method in UpdateState can cause an extremely high usage of CPU time and Memory due to underlying json.Marshal calls that are not used or needed if channelz is not on.

See alternative approach here: #7437

Pprofs

image

image

Large and frequent calls to this method in UpdateState can cause an
extremely high usage of CPU time and Memory due to underlying json.Marshal
calls that are not used or needed if channelz is not on.
@codecov
Copy link

codecov bot commented Jul 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.44%. Comparing base (ac5a7fe) to head (a99bc46).
Report is 11 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7436      +/-   ##
==========================================
+ Coverage   81.33%   81.44%   +0.10%     
==========================================
  Files         352      352              
  Lines       26919    26921       +2     
==========================================
+ Hits        21895    21926      +31     
+ Misses       3824     3804      -20     
+ Partials     1200     1191       -9     
Files Coverage Δ
resolver_wrapper.go 85.41% <100.00%> (+0.31%) ⬆️

... and 18 files with indirect coverage changes

@jsm
Copy link
Contributor Author

jsm commented Jul 29, 2024

Note: this is a degradation introduced in this PR: #5192

Which used to wrap calls to addChannelzTraceEvent in a if channelz.IsOn() {

@aranjans aranjans requested review from arjan-bal and removed request for aranjans July 30, 2024 08:16
@aranjans aranjans assigned arjan-bal and unassigned aranjans Jul 30, 2024
@arjan-bal arjan-bal added the Type: Performance Performance improvements (CPU, network, memory, etc) label Jul 30, 2024
@arjan-bal arjan-bal added this to the 1.66 Release milestone Jul 30, 2024
@dfawley
Copy link
Member

dfawley commented Jul 30, 2024

@jsm,

Large and frequent calls to this method in UpdateState

Do you understand why this would be happening? These types of calls are generally expected to be infrequent, and are often expensive as they result in a full state update of the channel.

@jsm
Copy link
Contributor Author

jsm commented Jul 30, 2024

@jsm,

Large and frequent calls to this method in UpdateState

Do you understand why this would be happening? These types of calls are generally expected to be infrequent, and are often expensive as they result in a full state update of the channel.

We call this every 10 seconds to update the list of addresses, since they change frequently. That seems like a reasonable usage of this, but perhaps I'm missing something?

@jsm
Copy link
Contributor Author

jsm commented Jul 30, 2024

These types of calls are generally expected to be infrequent, and are often expensive as they result in a full state update of the channel.

I think regardless, my CPU profile shows that most of it's expense is coming from the addChannelZTraceEvent call (99.7% of the UpdateState call's CPU time in this particular example to be exact), rather than from actually updating the state?

@arjan-bal
Copy link
Contributor

Responded in #7437, we can move forward with the other PR.

@arjan-bal arjan-bal closed this Jul 30, 2024
@dfawley
Copy link
Member

dfawley commented Jul 30, 2024

We call this every 10 seconds to update the list of addresses, since they change frequently. That seems like a reasonable usage of this, but perhaps I'm missing something?

That doesn't seem particularly bad.

my CPU profile shows that most of it's expense is coming from the addChannelZTraceEvent call

That is quite bad, but out of curiosity, what is the total effect of this on the binary when you call it every 10s? Are you seeing it causing other problems, or does total CPU usage spike to a visible amount, etc?

@jsm
Copy link
Contributor Author

jsm commented Aug 5, 2024

We call this every 10 seconds to update the list of addresses, since they change frequently. That seems like a reasonable usage of this, but perhaps I'm missing something?

That doesn't seem particularly bad.

my CPU profile shows that most of it's expense is coming from the addChannelZTraceEvent call

That is quite bad, but out of curiosity, what is the total effect of this on the binary when you call it every 10s? Are you seeing it causing other problems, or does total CPU usage spike to a visible amount, etc?

image

This was a large upgrade jump, from v1.29.0 to v.1.64.0

You can see CPU utilization almost tripled for one of our services. That comes down after about 30 minutes, I think due to the optimizations in #1455

Note: we also double the CPU/Memory of this service to accommodate for the increased memory usage as well. The CPU utilization tripled despite having twice as many vCPUs to use.

@dfawley
Copy link
Member

dfawley commented Aug 5, 2024

You can see CPU utilization almost tripled for one of our services

But being 30+ versions apart, these graphs don't help prove that the log you are concerned about is related.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 2, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Type: Performance Performance improvements (CPU, network, memory, etc)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants