-
Notifications
You must be signed in to change notification settings - Fork 83
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Bug
As the title says. The issue is intermittent in that it doesn't happen for every job, but once a job freezes, no other query jobs can proceed suggesting the query scheduler has frozen.
After running the following command twice:
sbin/search.sh "DESERIALIZE_ERRORS" --count-by-time $((60 * 60 * 1000))The second job froze. Relevant logs are below.
query_scheduler.log
2025-07-02 13:37:07,828 search-job-handler [INFO] Connected to archive database 10.1.0.22:5101.
2025-07-02 13:37:07,828 search-job-handler [INFO] Query scheduler started.
2025-07-02 13:38:53,213 search-job-handler [INFO] Got reducer for job 1 at 10.1.0.22:5200
2025-07-02 13:38:53,462 search-job-handler [INFO] Dispatched job 1 with 8 archives to search.
2025-07-02 13:38:55,400 search-job-handler [INFO] Search task job-1-task-1 succeeded in 1.009902 second(s).
2025-07-02 13:38:55,400 search-job-handler [INFO] Search task job-1-task-2 succeeded in 0.904199 second(s).
2025-07-02 13:38:55,400 search-job-handler [INFO] Search task job-1-task-3 succeeded in 1.006122 second(s).
2025-07-02 13:38:55,401 search-job-handler [INFO] Search task job-1-task-4 succeeded in 1.329241 second(s).
2025-07-02 13:38:55,401 search-job-handler [INFO] Search task job-1-task-5 succeeded in 1.327435 second(s).
2025-07-02 13:38:55,401 search-job-handler [INFO] Search task job-1-task-6 succeeded in 1.342147 second(s).
2025-07-02 13:38:55,402 search-job-handler [INFO] Search task job-1-task-7 succeeded in 1.402149 second(s).
2025-07-02 13:38:55,402 search-job-handler [INFO] Search task job-1-task-8 succeeded in 1.393261 second(s).
2025-07-02 13:38:55,428 search-job-handler [INFO] Completed job 1.
2025-07-02 13:39:00,437 search-job-handler [INFO] Got reducer for job 2 at 10.1.0.22:5201
2025-07-02 13:39:00,519 search-job-handler [INFO] Dispatched job 2 with 8 archives to search.
worker.log
[2025-07-02 13:38:53,451: INFO/ForkPoolWorker-1] job_orchestration.executor.query.fs_search_task.search[72655868-72e4-410e-9c81-b4e83d96ec7b]: Started search task for job 1
[2025-07-02 13:38:53,451: INFO/ForkPoolWorker-7] job_orchestration.executor.query.fs_search_task.search[743541cb-4794-4a7b-9ce2-63c1edcca999]: Started search task for job 1
[2025-07-02 13:38:53,452: INFO/ForkPoolWorker-8] job_orchestration.executor.query.fs_search_task.search[f2bfb3c5-1b7d-4030-bc36-73454c328cbe]: Started search task for job 1
[2025-07-02 13:38:53,454: INFO/ForkPoolWorker-2] job_orchestration.executor.query.fs_search_task.search[16884c46-b5fc-437b-9088-6811b5b6f29a]: Started search task for job 1
[2025-07-02 13:38:53,455: INFO/ForkPoolWorker-3] job_orchestration.executor.query.fs_search_task.search[3281c3da-2ede-441c-ad75-bfcf2fbbc0e2]: Started search task for job 1
[2025-07-02 13:38:53,456: INFO/ForkPoolWorker-4] job_orchestration.executor.query.fs_search_task.search[56e4984c-9b14-40f3-9348-ada588486174]: Started search task for job 1
[2025-07-02 13:38:53,458: INFO/ForkPoolWorker-5] job_orchestration.executor.query.fs_search_task.search[96cafa6e-79a2-416c-8216-0102295db779]: Started search task for job 1
[2025-07-02 13:38:53,460: INFO/ForkPoolWorker-6] job_orchestration.executor.query.fs_search_task.search[78f32ef5-faac-42c4-a0cc-3215f2cc24d5]: Started search task for job 1
[2025-07-02 13:38:53,464: INFO/ForkPoolWorker-7] job_orchestration.executor.query.fs_search_task.search[743541cb-4794-4a7b-9ce2-63c1edcca999]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/dde43976-1c9f-4ecc-872e-b69cc5ffebd8 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5200 --job-id 1
[2025-07-02 13:38:53,465: INFO/ForkPoolWorker-1] job_orchestration.executor.query.fs_search_task.search[72655868-72e4-410e-9c81-b4e83d96ec7b]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/6f04f109-1e1b-440e-aad8-6d96206efe9c DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5200 --job-id 1
[2025-07-02 13:38:53,466: INFO/ForkPoolWorker-8] job_orchestration.executor.query.fs_search_task.search[f2bfb3c5-1b7d-4030-bc36-73454c328cbe]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/1f4f2e2a-86d2-49dd-aa7a-8df70200003e DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5200 --job-id 1
[2025-07-02 13:38:53,467: INFO/ForkPoolWorker-2] job_orchestration.executor.query.fs_search_task.search[16884c46-b5fc-437b-9088-6811b5b6f29a]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/a78eed10-790e-424b-8af7-64ebb03cb933 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5200 --job-id 1
[2025-07-02 13:38:53,470: INFO/ForkPoolWorker-3] job_orchestration.executor.query.fs_search_task.search[3281c3da-2ede-441c-ad75-bfcf2fbbc0e2]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/a4696c40-afb8-472e-bf68-28c59b198a06 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5200 --job-id 1
[2025-07-02 13:38:53,470: INFO/ForkPoolWorker-7] job_orchestration.executor.query.fs_search_task.search[743541cb-4794-4a7b-9ce2-63c1edcca999]: Waiting for search to finish
[2025-07-02 13:38:53,471: INFO/ForkPoolWorker-5] job_orchestration.executor.query.fs_search_task.search[96cafa6e-79a2-416c-8216-0102295db779]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/6502cbf3-0657-482b-8528-468a00bc5fc5 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5200 --job-id 1
[2025-07-02 13:38:53,472: INFO/ForkPoolWorker-1] job_orchestration.executor.query.fs_search_task.search[72655868-72e4-410e-9c81-b4e83d96ec7b]: Waiting for search to finish
[2025-07-02 13:38:53,472: INFO/ForkPoolWorker-4] job_orchestration.executor.query.fs_search_task.search[56e4984c-9b14-40f3-9348-ada588486174]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/3b0e8c32-cd78-449c-bd70-9c6c144fd011 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5200 --job-id 1
[2025-07-02 13:38:53,472: INFO/ForkPoolWorker-8] job_orchestration.executor.query.fs_search_task.search[f2bfb3c5-1b7d-4030-bc36-73454c328cbe]: Waiting for search to finish
[2025-07-02 13:38:53,473: INFO/ForkPoolWorker-2] job_orchestration.executor.query.fs_search_task.search[16884c46-b5fc-437b-9088-6811b5b6f29a]: Waiting for search to finish
[2025-07-02 13:38:53,476: INFO/ForkPoolWorker-3] job_orchestration.executor.query.fs_search_task.search[3281c3da-2ede-441c-ad75-bfcf2fbbc0e2]: Waiting for search to finish
[2025-07-02 13:38:53,476: INFO/ForkPoolWorker-6] job_orchestration.executor.query.fs_search_task.search[78f32ef5-faac-42c4-a0cc-3215f2cc24d5]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/55cbc4cb-dfb1-42d1-8c16-94a615651b83 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5200 --job-id 1
[2025-07-02 13:38:53,476: INFO/ForkPoolWorker-5] job_orchestration.executor.query.fs_search_task.search[96cafa6e-79a2-416c-8216-0102295db779]: Waiting for search to finish
[2025-07-02 13:38:53,478: INFO/ForkPoolWorker-4] job_orchestration.executor.query.fs_search_task.search[56e4984c-9b14-40f3-9348-ada588486174]: Waiting for search to finish
[2025-07-02 13:38:53,481: INFO/ForkPoolWorker-6] job_orchestration.executor.query.fs_search_task.search[78f32ef5-faac-42c4-a0cc-3215f2cc24d5]: Waiting for search to finish
[2025-07-02 13:38:54,356: INFO/ForkPoolWorker-8] job_orchestration.executor.query.fs_search_task.search[f2bfb3c5-1b7d-4030-bc36-73454c328cbe]: search task 2 completed for job 1
[2025-07-02 13:38:54,457: INFO/ForkPoolWorker-1] job_orchestration.executor.query.fs_search_task.search[72655868-72e4-410e-9c81-b4e83d96ec7b]: search task 3 completed for job 1
[2025-07-02 13:38:54,461: INFO/ForkPoolWorker-7] job_orchestration.executor.query.fs_search_task.search[743541cb-4794-4a7b-9ce2-63c1edcca999]: search task 1 completed for job 1
[2025-07-02 13:38:54,783: INFO/ForkPoolWorker-3] job_orchestration.executor.query.fs_search_task.search[3281c3da-2ede-441c-ad75-bfcf2fbbc0e2]: search task 5 completed for job 1
[2025-07-02 13:38:54,784: INFO/ForkPoolWorker-2] job_orchestration.executor.query.fs_search_task.search[16884c46-b5fc-437b-9088-6811b5b6f29a]: search task 4 completed for job 1
[2025-07-02 13:38:54,798: INFO/ForkPoolWorker-4] job_orchestration.executor.query.fs_search_task.search[56e4984c-9b14-40f3-9348-ada588486174]: search task 6 completed for job 1
[2025-07-02 13:38:54,853: INFO/ForkPoolWorker-6] job_orchestration.executor.query.fs_search_task.search[78f32ef5-faac-42c4-a0cc-3215f2cc24d5]: search task 8 completed for job 1
[2025-07-02 13:38:54,860: INFO/ForkPoolWorker-5] job_orchestration.executor.query.fs_search_task.search[96cafa6e-79a2-416c-8216-0102295db779]: search task 7 completed for job 1
[2025-07-02 13:39:00,487: INFO/ForkPoolWorker-7] job_orchestration.executor.query.fs_search_task.search[b6c2edd9-91f4-4d66-b76f-05bf6b8f78c6]: Started search task for job 2
[2025-07-02 13:39:00,489: INFO/ForkPoolWorker-8] job_orchestration.executor.query.fs_search_task.search[3f70a1ac-1b40-42a6-a4a6-a57f90174ff8]: Started search task for job 2
[2025-07-02 13:39:00,492: INFO/ForkPoolWorker-1] job_orchestration.executor.query.fs_search_task.search[f307b9a8-6b15-4c30-9885-6a795a42fb19]: Started search task for job 2
[2025-07-02 13:39:00,495: INFO/ForkPoolWorker-2] job_orchestration.executor.query.fs_search_task.search[5e382c71-2e1c-4fa8-bd34-f215a4248d4b]: Started search task for job 2
[2025-07-02 13:39:00,498: INFO/ForkPoolWorker-3] job_orchestration.executor.query.fs_search_task.search[e434b617-0890-4384-b302-c98c92c9edfc]: Started search task for job 2
[2025-07-02 13:39:00,501: INFO/ForkPoolWorker-4] job_orchestration.executor.query.fs_search_task.search[8353005e-d544-4b68-90c1-f4318f1605f3]: Started search task for job 2
[2025-07-02 13:39:00,505: INFO/ForkPoolWorker-5] job_orchestration.executor.query.fs_search_task.search[fe01886d-5e17-4aae-8d07-0aba0fa8c207]: Started search task for job 2
[2025-07-02 13:39:00,505: INFO/ForkPoolWorker-8] job_orchestration.executor.query.fs_search_task.search[3f70a1ac-1b40-42a6-a4a6-a57f90174ff8]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/1f4f2e2a-86d2-49dd-aa7a-8df70200003e DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5201 --job-id 2
[2025-07-02 13:39:00,506: INFO/ForkPoolWorker-7] job_orchestration.executor.query.fs_search_task.search[b6c2edd9-91f4-4d66-b76f-05bf6b8f78c6]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/dde43976-1c9f-4ecc-872e-b69cc5ffebd8 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5201 --job-id 2
[2025-07-02 13:39:00,510: INFO/ForkPoolWorker-2] job_orchestration.executor.query.fs_search_task.search[5e382c71-2e1c-4fa8-bd34-f215a4248d4b]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/a78eed10-790e-424b-8af7-64ebb03cb933 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5201 --job-id 2
[2025-07-02 13:39:00,512: INFO/ForkPoolWorker-8] job_orchestration.executor.query.fs_search_task.search[3f70a1ac-1b40-42a6-a4a6-a57f90174ff8]: Waiting for search to finish
[2025-07-02 13:39:00,512: INFO/ForkPoolWorker-7] job_orchestration.executor.query.fs_search_task.search[b6c2edd9-91f4-4d66-b76f-05bf6b8f78c6]: Waiting for search to finish
[2025-07-02 13:39:00,513: INFO/ForkPoolWorker-4] job_orchestration.executor.query.fs_search_task.search[8353005e-d544-4b68-90c1-f4318f1605f3]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/3b0e8c32-cd78-449c-bd70-9c6c144fd011 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5201 --job-id 2
[2025-07-02 13:39:00,513: INFO/ForkPoolWorker-6] job_orchestration.executor.query.fs_search_task.search[43430b10-d9cf-4e0a-9d6f-111d25d9b877]: Started search task for job 2
[2025-07-02 13:39:00,514: INFO/ForkPoolWorker-3] job_orchestration.executor.query.fs_search_task.search[e434b617-0890-4384-b302-c98c92c9edfc]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/a4696c40-afb8-472e-bf68-28c59b198a06 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5201 --job-id 2
[2025-07-02 13:39:00,516: INFO/ForkPoolWorker-1] job_orchestration.executor.query.fs_search_task.search[f307b9a8-6b15-4c30-9885-6a795a42fb19]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/6f04f109-1e1b-440e-aad8-6d96206efe9c DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5201 --job-id 2
[2025-07-02 13:39:00,516: INFO/ForkPoolWorker-5] job_orchestration.executor.query.fs_search_task.search[fe01886d-5e17-4aae-8d07-0aba0fa8c207]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/6502cbf3-0657-482b-8528-468a00bc5fc5 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5201 --job-id 2
[2025-07-02 13:39:00,516: INFO/ForkPoolWorker-2] job_orchestration.executor.query.fs_search_task.search[5e382c71-2e1c-4fa8-bd34-f215a4248d4b]: Waiting for search to finish
[2025-07-02 13:39:00,520: INFO/ForkPoolWorker-4] job_orchestration.executor.query.fs_search_task.search[8353005e-d544-4b68-90c1-f4318f1605f3]: Waiting for search to finish
[2025-07-02 13:39:00,520: INFO/ForkPoolWorker-3] job_orchestration.executor.query.fs_search_task.search[e434b617-0890-4384-b302-c98c92c9edfc]: Waiting for search to finish
[2025-07-02 13:39:00,522: INFO/ForkPoolWorker-6] job_orchestration.executor.query.fs_search_task.search[43430b10-d9cf-4e0a-9d6f-111d25d9b877]: Running: /opt/clp/bin/clo s /home/rodri228/projects/log-compressor/open-source/clp/build/clp-package/var/data/archives/55cbc4cb-dfb1-42d1-8c16-94a615651b83 DESERIALIZE_ERRORS --count-by-time 3600000 reducer --host 10.1.0.22 --port 5201 --job-id 2
[2025-07-02 13:39:00,524: INFO/ForkPoolWorker-1] job_orchestration.executor.query.fs_search_task.search[f307b9a8-6b15-4c30-9885-6a795a42fb19]: Waiting for search to finish
[2025-07-02 13:39:00,524: INFO/ForkPoolWorker-5] job_orchestration.executor.query.fs_search_task.search[fe01886d-5e17-4aae-8d07-0aba0fa8c207]: Waiting for search to finish
[2025-07-02 13:39:00,527: INFO/ForkPoolWorker-6] job_orchestration.executor.query.fs_search_task.search[43430b10-d9cf-4e0a-9d6f-111d25d9b877]: Waiting for search to finish
[2025-07-02 13:39:01,390: INFO/ForkPoolWorker-8] job_orchestration.executor.query.fs_search_task.search[3f70a1ac-1b40-42a6-a4a6-a57f90174ff8]: search task 10 completed for job 2
[2025-07-02 13:39:01,525: INFO/ForkPoolWorker-7] job_orchestration.executor.query.fs_search_task.search[b6c2edd9-91f4-4d66-b76f-05bf6b8f78c6]: search task 9 completed for job 2
[2025-07-02 13:39:01,527: INFO/ForkPoolWorker-1] job_orchestration.executor.query.fs_search_task.search[f307b9a8-6b15-4c30-9885-6a795a42fb19]: search task 11 completed for job 2
[2025-07-02 13:39:01,798: INFO/ForkPoolWorker-3] job_orchestration.executor.query.fs_search_task.search[e434b617-0890-4384-b302-c98c92c9edfc]: search task 13 completed for job 2
[2025-07-02 13:39:01,830: INFO/ForkPoolWorker-2] job_orchestration.executor.query.fs_search_task.search[5e382c71-2e1c-4fa8-bd34-f215a4248d4b]: search task 12 completed for job 2
[2025-07-02 13:39:01,851: INFO/ForkPoolWorker-4] job_orchestration.executor.query.fs_search_task.search[8353005e-d544-4b68-90c1-f4318f1605f3]: search task 14 completed for job 2
[2025-07-02 13:39:01,877: INFO/ForkPoolWorker-6] job_orchestration.executor.query.fs_search_task.search[43430b10-d9cf-4e0a-9d6f-111d25d9b877]: search task 16 completed for job 2
[2025-07-02 13:39:01,920: INFO/ForkPoolWorker-5] job_orchestration.executor.query.fs_search_task.search[fe01886d-5e17-4aae-8d07-0aba0fa8c207]: search task 15 completed for job 2
reducer-*.log
2025-07-02T13:37:13.172+00:00 [info] Starting on host 10.1.0.22 port 5200
2025-07-02T13:37:13.172+00:00 [info] Acceptor socket listening successfully
2025-07-02T13:37:13.172+00:00 [info] Waiting for job...
2025-07-02T13:38:53.212+00:00 [info] Setting up pipeline for job 1
2025-07-02T13:38:55.402+00:00 [info] Accept task cancelled
2025-07-02T13:38:55.408+00:00 [info] Job 1 finished successfully
2025-07-02T13:38:55.408+00:00 [info] Acceptor socket listening successfully
2025-07-02T13:38:55.408+00:00 [info] Waiting for job...
2025-07-02T13:37:13.172+00:00 [info] Starting on host 10.1.0.22 port 5201
2025-07-02T13:37:13.172+00:00 [info] Acceptor socket listening successfully
2025-07-02T13:37:13.172+00:00 [info] Waiting for job...
2025-07-02T13:39:00.437+00:00 [info] Setting up pipeline for job 2
2025-07-02T13:37:13.172+00:00 [info] Starting on host 10.1.0.22 port 5202
2025-07-02T13:37:13.172+00:00 [info] Acceptor socket listening successfully
2025-07-02T13:37:13.172+00:00 [info] Waiting for job...
2025-07-02T13:37:13.172+00:00 [info] Starting on host 10.1.0.22 port 5203
2025-07-02T13:37:13.172+00:00 [info] Acceptor socket listening successfully
2025-07-02T13:37:13.172+00:00 [info] Waiting for job...
2025-07-02T13:37:13.174+00:00 [info] Starting on host 10.1.0.22 port 5204
2025-07-02T13:37:13.174+00:00 [info] Acceptor socket listening successfully
2025-07-02T13:37:13.174+00:00 [info] Waiting for job...
2025-07-02T13:37:13.174+00:00 [info] Starting on host 10.1.0.22 port 5205
2025-07-02T13:37:13.174+00:00 [info] Acceptor socket listening successfully
2025-07-02T13:37:13.174+00:00 [info] Waiting for job...
2025-07-02T13:37:13.174+00:00 [info] Starting on host 10.1.0.22 port 5206
2025-07-02T13:37:13.174+00:00 [info] Acceptor socket listening successfully
2025-07-02T13:37:13.174+00:00 [info] Waiting for job...
2025-07-02T13:37:13.175+00:00 [info] Starting on host 10.1.0.22 port 5207
2025-07-02T13:37:13.175+00:00 [info] Acceptor socket listening successfully
2025-07-02T13:37:13.176+00:00 [info] Waiting for job...
reducer.log
2025-07-02 13:37:13,168 reducer [INFO] Reducers started.
2025-07-02 13:37:13,168 reducer [INFO] Host=10.1.0.22 Base port=5200 Concurrency=8 Upsert Interval=100
CLP version
Environment
- Ubuntu 22.04
- Docker 27.5.0, build a187fa5
Reproduction steps
sbin/start-clp.sh
sbin/compress.sh logs/hive-24hrs/
sbin/search.sh "DESERIALIZE_ERRORS" --count-by-time $((60 * 60 * 1000))
sbin/search.sh "DESERIALIZE_ERRORS" --count-by-time $((60 * 60 * 1000))Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working