-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
when using a shared dict where all nginx workers repeatedly :get() the same 1-2 keys with a high number of simultaenous requests, some nginx processes seem to get stuck in a deadlock caused by locking causing 100% CPU load even when the number of requests has subsided already (all still existing requests are idle keepalive).
stub_status
Active connections: 102
server accepts handled requests
39308 39308 339293
Reading: 0 Writing: 287 Waiting: 93
strace shows (however it takes up to a minute for this to show/add another line, which is a further indicator this is not in nginx but in userland lua)
futex(0x7f7ad4c37080, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = -1 EAGAIN (Resource temporarily unavailable)
https://mailman.nginx.org/pipermail/nginx/2017-September/054687.html reports a similar issue in nginx, however nginx doesn't natively use mutex and the issue could clearly be traced back to lua code
Checking the nginx processes with pstack
#0 0x0000000000438c86 in ngx_shmtx_lock (mtx=0x7f7ad4c37068) at src/core/ngx_shmtx.c:86
#1 0x0000000000527937 in ngx_http_lua_ffi_shdict_get (zone=0xbd5cc0, key=0x7f7ace894b60 "REDACTED-2", key_len=12, value_type=0x7f7ad4099928, str_value_buf=0x7f7ad409c340, str_value_len=0x7f7ad40b8850, num_value=0x7f7ad4093b40, user_flags=0x7f7ad4093b20, get_stale=0, is_stale=0x7f7ad409c300, err=0x7f7ad40bc318) at ../ngx_lua-0.10.26/src/ngx_http_lua_shdict.c:1593
#2 0x00007f7ad635dd19 in ?? ()
#3 0x00007f7ad4093b40 in ?? ()
#4 0x00007f7ad4093b20 in ?? ()
#5 0x0000000000000000 in ?? ()
https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L1568 (-> nginx https://github.com/nginx/nginx/blob/master/src/core/ngx_shmtx.c#L70C1-L70C15) shows that "get" creates a lock.
Is the lock for "get" really necessary? Is there a way to disable it?
Any ideas what could caus this? Is it possibly not related to the :get() at all?