Skip to content

Server always incorrectly reports 1 for prompt_n, tokens_evaluated, and n_prompt_tokens_processed when using Llava 1.6. #5863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chigkim opened this issue Mar 4, 2024 · 3 comments

Comments

@chigkim
Copy link

chigkim commented Mar 4, 2024

commit 67be2ce
Windows10 , cpu only.

Server always returns 1 for prompt_n, tokens_evaluated, and n_prompt_tokens_processed when using Llava 1.6.
Llava-cli returns the proper prompt token count.

From llava-cli:

llama_print_timings:    load time =   25007.57 ms
llama_print_timings:    sample time =    68.54 ms /   256 runs   (  0.27 ms per token,  3734.94 tokens per second)
llama_print_timings: prompt eval time =  421164.62 ms /  2902 tokens (  145.13 ms per token,   6.89 tokens per second)
llama_print_timings:    eval time =   66393.95 ms /   257 runs   (  258.34 ms per token,   3.87 tokens per second)
llama_print_timings:     total time =  511967.49 ms /  3159 tokens

From server through API:

{
	......
	"timings": {
		"predicted_ms": 57040.203,
		"predicted_n": 233,
		"predicted_per_second": 4.084838197367565,
		"predicted_per_token_ms": 244.8077381974249,
		"prompt_ms": 429987.864,
		"prompt_n": 1,
		"prompt_per_second": 0.0023256470326799734,
		"prompt_per_token_ms": 429987.864
	},
	"tokens_cached": 3129,
	"tokens_evaluated": 1,
	"tokens_predicted": 233,
	"truncated": false
}

From server console:

encode_image_with_clip: 5 segments encoded in 22462.62 ms
encode_image_with_clip: image embedding created: 2880 tokens

encode_image_with_clip: image encoded in 22495.54 ms by CLIP (    7.81 ms per image patch)
{"function":"print_timings","level":"INFO","line":260,"msg":"prompt eval time     =  429987.86 ms /     1 tokens (429987.86 ms per token,     0.00 tokens per second)","n_prompt_tokens_processed":1,"n_tokens_second":0.0023256470326799734,"slot_id":0,"t_prompt_processing":429987.864,"t_token":429987.864,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":274,"msg":"generation eval time =   57040.20 ms /   233 runs   (  244.81 ms per token,     4.08 tokens per second)","n_decoded":233,"n_tokens_second":4.084838197367565,"slot_id":0,"t_token":244.8077381974249,"t_token_generation":57040.203,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":283,"msg":"          total time =  487028.07 ms","slot_id":0,"t_prompt_processing":429987.864,"t_token_generation":57040.203,"t_total":487028.067,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"update_slots","level":"INFO","line":1626,"msg":"slot released","n_cache_tokens":234,"n_ctx":4096,"n_past":3129,"n_system_tokens":0,"slot_id":0,"task_id":0,"tid":"8368","timestamp":1709356420,"truncated":false}
{"function":"log_server_request","level":"INFO","line":2693,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"127.0.0.1","remote_port":55351,"status":200,"tid":"7172","timestamp":1709356420}
@cjpais
Copy link
Contributor

cjpais commented Mar 5, 2024

ill provide a branch/PR this evening (PST) with a proposed fix

@cjpais
Copy link
Contributor

cjpais commented Mar 6, 2024

PR #5896 should address this issue

Let me know if you can test on your side as well

@github-actions github-actions bot added the stale label Apr 6, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants