-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Description
Name and Version
$ ./llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes
Device 1: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes
version: 6019 (8ad7b3e6)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
RTX 4090D x 2
Models
medgemma-27b-text-it-Q4_K_M.gguf
Problem description & steps to reproduce
CUDA_VISIBLE_DEVICES="0,1" ./llama-server -m /data/Modules/medgemma-27b-text-it-GGUF/medgemma-27b-text-it-Q4_K_M.gguf --port 8080 --host 0.0.0.0 -ngl 99 -dev cuda0,cuda1 -sm row -fa --jinja -c 50000 -v -n 32768 --no-context-shift
I started the api service with the above command and then noticed that the generated token repeats as '' at some prompts and also changes to '' for new requests.
Yesterday I saw #14888 which was very similar to my issue, based on @lcarrere's description the issue seemed to be related to the fa parameter, I was using version: 6004 (bbfc849) Then I removed the -fa
parameter to start and all requests responded normally but the response would be slower.
Today I saw that the latest reply #14916 solved the problem, so I pulled the latest code and compiled the new llama-server, but adding the -fa
startup still gives the same error, so the problem doesn't seem to be completely solved!
First Bad Commit
No response
Relevant log output
slot launch_slot_: id 0 | task 1112 | processing task
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1113, front = 0
slot update_slots: id 0 | task 1112 | new prompt, n_ctx_slot = 50176, n_keep = 0, n_prompt_tokens = 24441
slot update_slots: id 0 | task 1112 | kv cache rm [12, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 2060, n_tokens = 2048, progress = 0.083794
srv update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 2048
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 256, ret = 1
decode: failed to find a memory slot for batch of size 768
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1280, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 768
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1280, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1280, n_batch = 256, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1536, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1536, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1536, n_batch = 256, ret = 1
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1113
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1114, front = 0
slot update_slots: id 0 | task 1112 | kv cache rm [2060, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 4108, n_tokens = 2048, progress = 0.167587
srv update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 2048
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 256, ret = 1
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1114
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1115, front = 0
slot update_slots: id 0 | task 1112 | kv cache rm [4108, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 6156, n_tokens = 2048, progress = 0.251381
srv update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 2048
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 256, ret = 1
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1115
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1116, front = 0
slot update_slots: id 0 | task 1112 | kv cache rm [6156, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 8204, n_tokens = 2048, progress = 0.335175
srv update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 2048
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 256, ret = 1
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1116
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1117, front = 0
slot update_slots: id 0 | task 1112 | kv cache rm [8204, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 10252, n_tokens = 2048, progress = 0.418968
srv update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 2048
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 1536
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 512, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 512, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 512, n_batch = 256, ret = 1
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1117
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1118, front = 0
slot update_slots: id 0 | task 1112 | kv cache rm [10252, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 12300, n_tokens = 2048, progress = 0.502762
srv update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 2048
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 256, ret = 1
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1118
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1119, front = 0
slot update_slots: id 0 | task 1112 | kv cache rm [12300, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 14348, n_tokens = 2048, progress = 0.586555
srv update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 2048
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 256, ret = 1
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1119
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1120, front = 0
slot update_slots: id 0 | task 1112 | kv cache rm [14348, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 16396, n_tokens = 2048, progress = 0.670349
srv update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 2048
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 1536
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 512, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 512, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 512, n_batch = 256, ret = 1
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1120
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1121, front = 0
slot update_slots: id 0 | task 1112 | kv cache rm [16396, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 18444, n_tokens = 2048, progress = 0.754143
srv update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 2048
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 256, ret = 1
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1121
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1122, front = 0
slot update_slots: id 0 | task 1112 | kv cache rm [18444, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 20492, n_tokens = 2048, progress = 0.837936
srv update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 2048
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 1024, n_batch = 256, ret = 1
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1122
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1123, front = 0
slot update_slots: id 0 | task 1112 | kv cache rm [20492, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 22540, n_tokens = 2048, progress = 0.921730
srv update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 2048
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 1536
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 512, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 512, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 512, n_batch = 256, ret = 1
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1123
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1124, front = 0
slot update_slots: id 0 | task 1112 | kv cache rm [22540, end)
slot update_slots: id 0 | task 1112 | prompt processing progress, n_past = 24441, n_tokens = 1901, progress = 0.999509
slot update_slots: id 0 | task 1112 | prompt done, n_past = 24441, n_tokens = 1901
srv update_slots: decoding batch, n_tokens = 1901
clear_adapter_lora: call
set_embeddings: value = 0
decode: failed to find a memory slot for batch of size 1901
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
decode: failed to find a memory slot for batch of size 1024
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 512, ret = 1
decode: failed to find a memory slot for batch of size 512
srv update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 256, ret = 1
slot process_toke: id 0 | task 1112 | n_decoded = 1, n_remaining = 32767, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1124
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1125, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24442, n_cache_tokens = 24442, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 2, n_remaining = 32766, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1125
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1126, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24443, n_cache_tokens = 24443, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 3, n_remaining = 32765, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1126
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1127, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24444, n_cache_tokens = 24444, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 4, n_remaining = 32764, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1127
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1128, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24445, n_cache_tokens = 24445, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 5, n_remaining = 32763, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1128
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1129, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24446, n_cache_tokens = 24446, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 6, n_remaining = 32762, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1129
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1130, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24447, n_cache_tokens = 24447, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 7, n_remaining = 32761, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1130
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1131, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24448, n_cache_tokens = 24448, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 8, n_remaining = 32760, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1131
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1132, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24449, n_cache_tokens = 24449, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 9, n_remaining = 32759, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1132
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1133, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24450, n_cache_tokens = 24450, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 10, n_remaining = 32758, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1133
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1134, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24451, n_cache_tokens = 24451, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 11, n_remaining = 32757, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1134
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1135, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24452, n_cache_tokens = 24452, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 12, n_remaining = 32756, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1135
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1136, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24453, n_cache_tokens = 24453, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 13, n_remaining = 32755, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1136
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1137, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24454, n_cache_tokens = 24454, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 14, n_remaining = 32754, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1137
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1138, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24455, n_cache_tokens = 24455, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 15, n_remaining = 32753, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1138
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1139, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24456, n_cache_tokens = 24456, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 16, n_remaining = 32752, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1139
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1140, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24457, n_cache_tokens = 24457, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 17, n_remaining = 32751, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1140
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1141, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24458, n_cache_tokens = 24458, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 18, n_remaining = 32750, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1141
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1142, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24459, n_cache_tokens = 24459, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 19, n_remaining = 32749, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1142
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1143, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24460, n_cache_tokens = 24460, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 20, n_remaining = 32748, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1143
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1144, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24461, n_cache_tokens = 24461, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 21, n_remaining = 32747, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1144
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1145, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24462, n_cache_tokens = 24462, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 22, n_remaining = 32746, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1145
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1146, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24463, n_cache_tokens = 24463, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 23, n_remaining = 32745, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1146
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1147, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24464, n_cache_tokens = 24464, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 24, n_remaining = 32744, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1147
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1148, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24465, n_cache_tokens = 24465, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 25, n_remaining = 32743, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1148
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1149, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24466, n_cache_tokens = 24466, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 26, n_remaining = 32742, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1149
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1150, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24467, n_cache_tokens = 24467, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 27, n_remaining = 32741, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1150
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1151, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24468, n_cache_tokens = 24468, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 28, n_remaining = 32740, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1151
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1152, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24469, n_cache_tokens = 24469, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 29, n_remaining = 32739, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1152
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1153, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24470, n_cache_tokens = 24470, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 30, n_remaining = 32738, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1153
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1154, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24471, n_cache_tokens = 24471, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 31, n_remaining = 32737, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1154
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1155, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24472, n_cache_tokens = 24472, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 32, n_remaining = 32736, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1155
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1156, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24473, n_cache_tokens = 24473, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 33, n_remaining = 32735, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1156
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1157, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24474, n_cache_tokens = 24474, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 34, n_remaining = 32734, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1157
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1158, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24475, n_cache_tokens = 24475, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 35, n_remaining = 32733, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1158
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1159, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24476, n_cache_tokens = 24476, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 36, n_remaining = 32732, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1159
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1160, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24477, n_cache_tokens = 24477, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 37, n_remaining = 32731, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1160
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1161, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24478, n_cache_tokens = 24478, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 38, n_remaining = 32730, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1161
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1162, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24479, n_cache_tokens = 24479, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 39, n_remaining = 32729, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1162
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1163, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24480, n_cache_tokens = 24480, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 40, n_remaining = 32728, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1163
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1164, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24481, n_cache_tokens = 24481, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 41, n_remaining = 32727, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1164
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1165, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24482, n_cache_tokens = 24482, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 42, n_remaining = 32726, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1165
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1166, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24483, n_cache_tokens = 24483, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 43, n_remaining = 32725, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1166
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1167, front = 0
slot update_slots: id 0 | task 1112 | slot decode token, n_ctx = 50176, n_past = 24484, n_cache_tokens = 24484, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
slot process_toke: id 0 | task 1112 | n_decoded = 44, n_remaining = 32724, next token: 38 ''
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1167
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE