Description
This is a fun one....
When load testing a new app using...
- Spring Boot 2.4.2
- WebFlux 5.3.3
- Spring Security 5.4.2
- Tomcat 9.0.41
- tomcat-native 1.2.25
- APR 1.6.5
- OpenSSL 1.1.1f
...CPU utilization will max out, and stay maxed out even after the load test completes.
When investigating, I found that threads in the global boundedElastic
Scheduler are consuming the entire CPU, as seen in top
(broken out by threads)...
top - 17:56:42 up 12 min, 0 users, load average: 0.61, 0.24, 0.15
Threads: 112 total, 3 running, 109 sleeping, 0 stopped, 0 zombie
%Cpu(s): 53.8 us, 0.3 sy, 0.0 ni, 45.7 id, 0.0 wa, 0.0 hi, 0.1 si, 0.1 st
MiB Mem : 11852.9 total, 8478.8 free, 1267.9 used, 2106.2 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 10278.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
212 bogus 20 0 6635460 493008 18064 R 99.9 4.1 0:07.44 boundedElastic- <--- CPU maxed out
170 bogus 20 0 6635460 493008 18064 R 99.7 4.1 0:21.02 boundedElastic- <--- CPU maxed out
17 bogus 20 0 6635460 493008 18064 S 9.6 4.1 0:26.19 C2 CompilerThre
235 bogus 20 0 6635460 493008 18064 S 1.3 4.1 0:00.13 boundedElastic-
18 bogus 20 0 6635460 493008 18064 S 0.7 4.1 0:04.61 C1 CompilerThre
234 bogus 20 0 6635460 493008 18064 S 0.7 4.1 0:00.79 boundedElastic-
85 bogus 20 0 9416 2340 1468 R 0.7 0.0 0:00.43 top
36 bogus 20 0 6635460 493008 18064 S 0.3 4.1 0:00.39 https-openssl-n
47 bogus 20 0 6635460 493008 18064 S 0.3 4.1 0:01.70 https-openssl-n
166 bogus 20 0 6635460 493008 18064 S 0.3 4.1 0:00.77 parallel-4
167 bogus 20 0 6635460 493008 18064 S 0.3 4.1 0:00.15 https-openssl-n
175 bogus 20 0 6635460 493008 18064 S 0.3 4.1 0:00.11 https-openssl-n
179 bogus 20 0 6635460 493008 18064 S 0.3 4.1 0:00.09 https-openssl-n
184 bogus 20 0 6635460 493008 18064 S 0.3 4.1 0:00.11 https-openssl-n
192 bogus 20 0 6635460 493008 18064 S 0.3 4.1 0:00.09 https-openssl-n
204 bogus 20 0 6635460 493008 18064 S 0.3 4.1 0:00.09 https-openssl-n
210 bogus 20 0 6635460 493008 18064 S 0.3 4.1 0:00.10 https-openssl-n
...
Taking a stackdump of the process reveals the two threads are inside tomcat's OpenSSLEngine
...
(note that nid=0xd4
correlates to PID 212
above)
"boundedElastic-8" #86 daemon prio=5 os_prio=0 cpu=128128.31ms elapsed=215.53s allocated=28746K defined_classes=1 tid=0x00007fee00035800 nid=0xd4 runnable [0x00007fed91cbe000]
java.lang.Thread.State: RUNNABLE
at org.apache.tomcat.util.net.openssl.OpenSSLEngine.unwrap(OpenSSLEngine.java:603)
- locked <0x00000007576d7df8> (a org.apache.tomcat.util.net.openssl.OpenSSLEngine)
at javax.net.ssl.SSLEngine.unwrap([email protected]/SSLEngine.java:637)
at org.apache.tomcat.util.net.SecureNioChannel.read(SecureNioChannel.java:617)
at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.fillReadBuffer(NioEndpoint.java:1229)
at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.read(NioEndpoint.java:1141)
at org.apache.coyote.http11.Http11InputBuffer.fill(Http11InputBuffer.java:795)
at org.apache.coyote.http11.Http11InputBuffer.available(Http11InputBuffer.java:675)
at org.apache.coyote.http11.Http11Processor.available(Http11Processor.java:1201)
at org.apache.coyote.AbstractProcessor.isReadyForRead(AbstractProcessor.java:838)
at org.apache.coyote.AbstractProcessor.action(AbstractProcessor.java:577)
at org.apache.coyote.Request.action(Request.java:432)
at org.apache.catalina.connector.InputBuffer.isReady(InputBuffer.java:305)
at org.apache.catalina.connector.CoyoteInputStream.isReady(CoyoteInputStream.java:201)
at org.springframework.http.server.reactive.ServletServerHttpRequest$RequestBodyPublisher.checkOnDataAvailable(ServletServerHttpRequest.java:295)
at org.springframework.http.server.reactive.AbstractListenerReadPublisher.changeToDemandState(AbstractListenerReadPublisher.java:222)
at org.springframework.http.server.reactive.AbstractListenerReadPublisher.access$1000(AbstractListenerReadPublisher.java:48)
at org.springframework.http.server.reactive.AbstractListenerReadPublisher$State$2.request(AbstractListenerReadPublisher.java:333)
at org.springframework.http.server.reactive.AbstractListenerReadPublisher$ReadSubscription.request(AbstractListenerReadPublisher.java:260)
...
In a debug session, I discovered that an infinite loop is executing in tomcat's OpenSSLEngine.unwrap
where:
- pendingApp = 2
- idx = 1
- endOffset = 1
- capacity = 16384
I would have expected the OpenSSL I/O code to execute on one of the https-openssl-nio-*
threads, not the boundedElastic
Scheduler. Therefore, I started investigating why this code was executing on the boundedElastic
Scheduler.
After more debugging I narrowed it down to InMemoryWebSessionStore.createWebSession()
.
This is the only location in this particular app that uses the boundedElastic
Scheduler.
The WebSession
is being created because Spring Security's WebSessionServerRequestCache
is being used, which persists requests in the WebSession
.
If I disable the request cache (which removes the usage of WebSession
, which removes the call to InMemoryWebSessionStore.createWebSession()
, which removes usage of boundedElastic
), then all I/O is performed on the https-openssl-nio-*
threads, and the infinite loop does not occur.
I haven't fully investigated why the infinite loop occurs, but I assume there is a threadsafety bug somewhere in tomcat's OpenSSLEngine
. (Either that or it was never intended to be used from multiple threads.) Having said that, I don't think that the I/O should be occurring on the boundedElastic
thread, so I did not investigate further.
In other words, in my opinion, using InMemoryWebSessionStore
should not cause the OpenSSL I/O to occur on a boundedElastic
thread.
I have attached a simple application that can be used to reproduce the problem.
After extracting, use docker-compose up
to build and start a container with the spring boot app with the above configuration.
Sending a lot of load (>= 2000 calls per second) to the /echo
endpoint will reproduce the infinite loop.
However, you can see OpenSSL I/O occurring on the boundedElastic
threads with any amount of load.