Skip to content

Don't crash when prompt cannot be tokenized #2580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jrudolph opened this issue Aug 10, 2023 · 4 comments
Closed

Don't crash when prompt cannot be tokenized #2580

jrudolph opened this issue Aug 10, 2023 · 4 comments
Labels

Comments

@jrudolph
Copy link
Contributor

As observed in #2379 (comment), custom vocabularies might not include tokens to represent all prompts. In the above case the static instruction mode prefix / suffix could not be represented, even if not used.

In that situation, main is killed by a std::out_of_range exception out of the tokenizer. It might make sense to give a better error message in that case and/or clarify the assumptions llama.cpp makes about the vocabulary.

Backtrace
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352595264) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352595264) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352595264) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352595264, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff78287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7ca2b9e in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7cae20c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff7cae277 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff7cae4d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff7ca54a0 in std::__throw_out_of_range(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00005555555c9a2c in std::__detail::_Map_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::at (this=0x55555571f398, __k="\n")
    at /usr/include/c++/11/bits/hashtable_policy.h:776
#11 0x00005555555c2a0f in std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >::at (this=0x55555571f398, __k="\n") at /usr/include/c++/11/bits/unordered_map.h:1001
#12 0x00005555555bfd10 in llama_tokenizer::tokenize (this=0x7fffffffbea0, text="\n\n### Instruction:\n\n", output=std::vector of length 1, capacity 1 = {...}) at llama.cpp:2024
#13 0x00005555555ac971 in llama_tokenize (vocab=..., text="\n\n### Instruction:\n\n", bos=true) at llama.cpp:2077
#14 0x00005555555b658a in llama_tokenize_with_model (model=0x55555571f2b0, text=0x5555557b3c30 "\n\n### Instruction:\n\n", tokens=0x5555557bca00, n_max_tokens=21, add_bos=true) at llama.cpp:4115
#15 0x00005555555b6716 in llama_tokenize (ctx=0x5555557b44c0, text=0x5555557b3c30 "\n\n### Instruction:\n\n", tokens=0x5555557bca00, n_max_tokens=21, add_bos=true) at llama.cpp:4135
#16 0x00005555555ed7cb in llama_tokenize (ctx=0x5555557b44c0, text="\n\n### Instruction:\n\n", add_bos=true) at examples/common.cpp:640
#17 0x000055555555c7d2 in main (argc=9, argv=0x7fffffffdb88) at examples/main/main.cpp:259
@saltyduckegg
Copy link

#2379 (comment)

Through your modification, it seems to have not crashed now.

it is useful for me !

but it seems your useful modification has not be merged ?

befor

$ ./main -m ./xs --prompt "你"
main: build = 0 (unknown)
main: seed  = 1691805675
llama.cpp: loading model from ./xs
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 8000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 288
llama_model_load_internal: n_mult     = 32
llama_model_load_internal: n_head     = 6
llama_model_load_internal: n_head_kv  = 6
llama_model_load_internal: n_layer    = 6
llama_model_load_internal: n_rot      = 48
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 768
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 0 (all F32)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.02 MB
llama_model_load_internal: mem required  =   40.39 MB (+    3.38 MB per state)
llama_new_context_with_model: kv self size  =    3.38 MB
llama_new_context_with_model: compute buffer total size =   17.53 MB

system_info: n_threads = 28 / 56 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)

now

$ /mnt/sdb/lizz/project/003.lizz/16.llama/llama2cpp.other/llama.cpp-lzz/bin/main  -m ./xs --prompt "你"
main: build = 0 (unknown)
main: seed  = 1691806660
llama.cpp: loading model from ./xs
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 8000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 288
llama_model_load_internal: n_mult     = 32
llama_model_load_internal: n_head     = 6
llama_model_load_internal: n_head_kv  = 6
llama_model_load_internal: n_layer    = 6
llama_model_load_internal: n_rot      = 48
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 768
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 0 (all F32)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.02 MB
llama_model_load_internal: mem required  =   40.39 MB (+    3.38 MB per state)
llama_new_context_with_model: kv self size  =    3.38 MB
llama_new_context_with_model: compute buffer total size =   17.53 MB

system_info: n_threads = 28 / 56 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0



<s>
 你喜欢 \n\n- 要需要的任务和建议\n# 我在我们的技术可以能够获得一些更关注。请帮助我们考虑一些学习领域,并是在个人的公司中选择的情况,并且也将能够了解你所地帮助,因此将你不不<unk>时。\n小明: 他能在她去, 'input': '', 'output': '\nA: 好的,我是我,我们已经有你的人到,我我向,你应该从我想为你的团队。我们没有看了。但是要感到努力和市场,他也可以更加改进的帮助,因为你在这个公司对自己是学习的。\n\n请续写他们的对话内容。', 'input': '', 'output': 'The 当然'}
<s>
 {'instruction': '根据给定的句子,给出三个主题。\\n\n\\n“AI””?\n', 'input': '', 'output': '\n\n\n\n以下是一篇单词: \n\n在Python的中文日期和列表。你是一个关于新的主题的一个句子,并生成三个时间\n\n1. 计算: \n-

@saltyduckegg
Copy link

Oh I got some things wrong, when I use special Chinese symbols, similar errors still occur.

For example, the Chinese symbol “?” is not in the vocabulary list, and the new main program will still have errors.

befor

$ ./bin/main  -m ./xs --prompt "A:你?"
main: build = 0 (unknown)
main: seed  = 1691807956
llama.cpp: loading model from ./xs
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 8000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 288
llama_model_load_internal: n_mult     = 32
llama_model_load_internal: n_head     = 6
llama_model_load_internal: n_head_kv  = 6
llama_model_load_internal: n_layer    = 6
llama_model_load_internal: n_rot      = 48
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 768
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 0 (all F32)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.02 MB
llama_model_load_internal: mem required  =   40.39 MB (+    3.38 MB per state)
llama_new_context_with_model: kv self size  =    3.38 MB
llama_new_context_with_model: compute buffer total size =   17.53 MB

system_info: n_threads = 28 / 56 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)

now

$ /mnt/sdb/lizz/project/003.lizz/16.llama/llama2cpp.other/llama.cpp-lzz/bin/main  -m ./xs --prompt "A:你?"
main: build = 0 (unknown)
main: seed  = 1691807973
llama.cpp: loading model from ./xs
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 8000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 288
llama_model_load_internal: n_mult     = 32
llama_model_load_internal: n_head     = 6
llama_model_load_internal: n_head_kv  = 6
llama_model_load_internal: n_layer    = 6
llama_model_load_internal: n_rot      = 48
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 768
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 0 (all F32)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.02 MB
llama_model_load_internal: mem required  =   40.39 MB (+    3.38 MB per state)
llama_new_context_with_model: kv self size  =    3.38 MB
llama_new_context_with_model: compute buffer total size =   17.53 MB

system_info: n_threads = 28 / 56 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)

@igoforth
Copy link

I'm getting the same error with the deepseek coder instruct 7B and 33B variants when trying to run a grammar file.

./main -f /home/igoforth/.local/src/sd_gpu/src/prompt.txt -m /home/igoforth/.local/src/ai_detect/deepseek-coder-33B-instruct-GGUF/deepseek-coder-33b-instruct.Q4_K_M.gguf --grammar-file /home/igoforth/.local/src/sd_gpu/src/vuln.gbnf -nommq -ngl 64 -c 512
### Response:terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
[1]    3675739 IOT instruction (core dumped)

I believe my problem was mentioned here by TB: #3633

Worth noting that I get this crash when running deepseek on the latest master when I attempt to offload all layers to gpu (no grammar):

./main -f /home/igoforth/.local/src/sd_gpu/src/prompt.txt -m /home/igoforth/.local/src/ai_detect/deepseek-coder-33B-instruct-GGUF/deepseek-coder-33b-instruct.Q4_K_M.gguf -nommq -ngl 65 -c 512
CUDA error 700 at /home/igoforth/.local/src/llama.cpp.cublas/ggml-cuda.cu:7546: an illegal memory access was encountered
current device: 0

But it runs fine when I offload all but one layer ?? The commits at #3633 allow me to offload all layers, but the grammar problem remains

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

anisse added a commit to anisse/llama.cpp that referenced this issue Apr 29, 2024
This program verifies that a given gguf model file can tokenize all
potential valid characters. Since llama.cpp currently raises an
exception when tokenization is not possible[1], this tool helps
verifying that valid ascii and utf-8 will always be properly tokenized.

[1] ggml-org#2580
anisse added a commit to anisse/llama.cpp that referenced this issue Apr 29, 2024
This program verifies that a given gguf model file can tokenize all
potential valid characters. Since llama.cpp currently raises an
exception when tokenization is not possible[1], this tool helps
verifying that valid ascii and utf-8 will always be properly tokenized.

[1] ggml-org#2580
anisse added a commit to anisse/llama.cpp that referenced this issue Apr 30, 2024
This program verifies that a given gguf model file can tokenize all
potential valid characters. Since llama.cpp currently raises an
exception when tokenization is not possible[1], this tool helps
verifying that valid ascii and utf-8 will always be properly tokenized.

[1] ggml-org#2580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants