Server always incorrectly reports 1 for prompt_n, tokens_evaluated, and n_prompt_tokens_processed when using Llava 1.6. #5863

chigkim · 2024-03-04T01:32:51Z

commit 67be2ce
Windows10 , cpu only.

Server always returns 1 for prompt_n, tokens_evaluated, and n_prompt_tokens_processed when using Llava 1.6.
Llava-cli returns the proper prompt token count.

From llava-cli:

llama_print_timings:    load time =   25007.57 ms
llama_print_timings:    sample time =    68.54 ms /   256 runs   (  0.27 ms per token,  3734.94 tokens per second)
llama_print_timings: prompt eval time =  421164.62 ms /  2902 tokens (  145.13 ms per token,   6.89 tokens per second)
llama_print_timings:    eval time =   66393.95 ms /   257 runs   (  258.34 ms per token,   3.87 tokens per second)
llama_print_timings:     total time =  511967.49 ms /  3159 tokens

From server through API:

{
	......
	"timings": {
		"predicted_ms": 57040.203,
		"predicted_n": 233,
		"predicted_per_second": 4.084838197367565,
		"predicted_per_token_ms": 244.8077381974249,
		"prompt_ms": 429987.864,
		"prompt_n": 1,
		"prompt_per_second": 0.0023256470326799734,
		"prompt_per_token_ms": 429987.864
	},
	"tokens_cached": 3129,
	"tokens_evaluated": 1,
	"tokens_predicted": 233,
	"truncated": false
}

From server console:

encode_image_with_clip: 5 segments encoded in 22462.62 ms
encode_image_with_clip: image embedding created: 2880 tokens

encode_image_with_clip: image encoded in 22495.54 ms by CLIP (    7.81 ms per image patch)
{"function":"print_timings","level":"INFO","line":260,"msg":"prompt eval time     =  429987.86 ms /     1 tokens (429987.86 ms per token,     0.00 tokens per second)","n_prompt_tokens_processed":1,"n_tokens_second":0.0023256470326799734,"slot_id":0,"t_prompt_processing":429987.864,"t_token":429987.864,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":274,"msg":"generation eval time =   57040.20 ms /   233 runs   (  244.81 ms per token,     4.08 tokens per second)","n_decoded":233,"n_tokens_second":4.084838197367565,"slot_id":0,"t_token":244.8077381974249,"t_token_generation":57040.203,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":283,"msg":"          total time =  487028.07 ms","slot_id":0,"t_prompt_processing":429987.864,"t_token_generation":57040.203,"t_total":487028.067,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"update_slots","level":"INFO","line":1626,"msg":"slot released","n_cache_tokens":234,"n_ctx":4096,"n_past":3129,"n_system_tokens":0,"slot_id":0,"task_id":0,"tid":"8368","timestamp":1709356420,"truncated":false}
{"function":"log_server_request","level":"INFO","line":2693,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"127.0.0.1","remote_port":55351,"status":200,"tid":"7172","timestamp":1709356420}

The text was updated successfully, but these errors were encountered:

cjpais · 2024-03-05T16:46:50Z

ill provide a branch/PR this evening (PST) with a proposed fix

cjpais · 2024-03-06T04:55:02Z

PR #5896 should address this issue

Let me know if you can test on your side as well

github-actions · 2024-04-20T01:06:56Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

chigkim added the bug-unconfirmed label Mar 4, 2024

This was referenced Mar 4, 2024

Question about llama.cpp and llava-cli when used with llava 1.6 for vision: #5852

Closed

support llava 1.6 image embedding dimension in server #5553

Merged

cjpais mentioned this issue Mar 6, 2024

server: multimodal - fix misreported prompt and num prompt tokens #5896

Closed

github-actions bot added the stale label Apr 6, 2024

github-actions bot closed this as completed Apr 20, 2024

cjpais mentioned this issue May 2, 2024

server: multimodal - fix misreported prompt and num prompt tokens Mozilla-Ocho/llamafile#392

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Server always incorrectly reports 1 for prompt_n, tokens_evaluated, and n_prompt_tokens_processed when using Llava 1.6. #5863

Server always incorrectly reports 1 for prompt_n, tokens_evaluated, and n_prompt_tokens_processed when using Llava 1.6. #5863

chigkim commented Mar 4, 2024 •

edited

Loading

cjpais commented Mar 5, 2024 •

edited

Loading

Uh oh!

cjpais commented Mar 6, 2024

Uh oh!

github-actions bot commented Apr 20, 2024

Uh oh!

Server always incorrectly reports 1 for prompt_n, tokens_evaluated, and n_prompt_tokens_processed when using Llava 1.6. #5863

Server always incorrectly reports 1 for prompt_n, tokens_evaluated, and n_prompt_tokens_processed when using Llava 1.6. #5863

Comments

chigkim commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cjpais commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjpais commented Mar 6, 2024

Uh oh!

github-actions bot commented Apr 20, 2024

Uh oh!

chigkim commented Mar 4, 2024 •

edited

Loading

cjpais commented Mar 5, 2024 •

edited

Loading