Skip to content

Bug: Adreno740 GPU device can't load model in Android system #8965

@FranzKafkaYu

Description

@FranzKafkaYu

What happened?

I tried to run llama.cpp in Samsug Galaxy Tab S9 Ultra,the Android System is Android13.and I have compiled these libraries accoding the guide.I used these libraries in my APK and when I load model it met a fatal crash.

Name and Version

tag:3400,commit:97bdd26e,support GPU acceleration:true

What operating system are you seeing the problem on?

Other? (Please let us know in description)

Relevant log output

08-10 16:06:07.269 30852 30926 I LLama-android: build info:tag:3400,commit:97bdd26e,support GPU acceleration:true
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: loaded meta data with 20 key-value pairs and 290 tensors from /data/user/0/com.set.ai/files/ai_model.gguf (version GGUF V3 (latest))
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv   1:                               general.name str              = seres_model
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv   2:                          qwen2.block_count u32              = 24
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv   3:                       qwen2.context_length u32              = 32768
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv   4:                     qwen2.embedding_length u32              = 896
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv   5:                  qwen2.feed_forward_length u32              = 4864
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv   6:                 qwen2.attention.head_count u32              = 14
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv   7:              qwen2.attention.head_count_kv u32              = 2
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv   8:                       qwen2.rope.freq_base f32              = 1000000.000000
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv   9:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv  10:                          general.file_type u32              = 2
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = gpt2
08-10 16:06:07.334 30852 30926 I LLama-android: llama_model_loader: - kv  12:                         tokenizer.ggml.pre str              = qwen2
08-10 16:06:07.362 30852 30926 I LLama-android: llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "", "&", "'", ...
08-10 16:06:07.371 30852 30926 I LLama-android: llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
08-10 16:06:07.402 30852 30926 I LLama-android: llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
08-10 16:06:07.402 30852 30926 I LLama-android: llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 151643
08-10 16:06:07.402 30852 30926 I LLama-android: llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 151645
08-10 16:06:07.402 30852 30926 I LLama-android: llama_model_loader: - kv  18:                    tokenizer.chat_template str              = {-107732238428550025633549537852171948407976130944385741446622902831951351080628521997716918865536884607535372703052150861230582896697462443075202517321702951537854339417602815342824911808967527308411848461112923592282659498077075523239936.000000or message in messages }{ 0f lo...
08-10 16:06:07.402 30852 30926 I LLama-android: llama_model_loader: - kv  19:               general.quantization_version u32              = 2
08-10 16:06:07.402 30852 30926 I LLama-android: llama_model_loader: - type  f32:  121 tensors
08-10 16:06:07.402 30852 30926 I LLama-android: llama_model_loader: - type q4_0:  168 tensors
08-10 16:06:07.402 30852 30926 I LLama-android: llama_model_loader: - type q8_0:    1 tensors
08-10 16:06:07.562 30852 30926 I LLama-android: llm_load_vocab: special tokens cache size = 293
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_vocab: token to piece cache size = 0.9338 MB
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: format           = GGUF V3 (latest)
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: arch             = qwen2
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: vocab type       = BPE
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_vocab          = 151936
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_merges         = 151387
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: vocab_only       = 0
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_ctx_train      = 32768
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_embd           = 896
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_layer          = 24
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_head           = 14
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_head_kv        = 2
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_rot            = 64
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_swa            = 0
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_embd_head_k    = 64
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_embd_head_v    = 64
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_gqa            = 7
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_embd_k_gqa     = 128
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_embd_v_gqa     = 128
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: f_norm_eps       = 0.0e+00
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: f_logit_scale    = 0.0e+00
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_ff             = 4864
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_expert         = 0
08-10 16:06:07.616 30852 30926 I LLama-android: llm_load_print_meta: n_expert_used    = 0
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: causal attn      = 1
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: pooling type     = 0
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: rope type        = 2
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: rope scaling     = linear
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: freq_base_train  = 1000000.0
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: freq_scale_train = 1
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: n_ctx_orig_yarn  = 32768
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: rope_finetuned   = unknown
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: ssm_d_conv       = 0
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: ssm_d_inner      = 0
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: ssm_d_state      = 0
08-10 16:06:07.617 30852 30926 I LLama-android: llm_load_print_meta: ssm_dt_rank      = 0
08-10 16:06:07.618 30852 30926 I LLama-android: llm_load_print_meta: model type       = 1B
08-10 16:06:07.618 30852 30926 I LLama-android: llm_load_print_meta: model ftype      = Q4_0
08-10 16:06:07.618 30852 30926 I LLama-android: llm_load_print_meta: model params     = 494.03 M
08-10 16:06:07.618 30852 30926 I LLama-android: llm_load_print_meta: model size       = 330.17 MiB (5.61 BPW) 
08-10 16:06:07.618 30852 30926 I LLama-android: llm_load_print_meta: general.name     = ai_model
08-10 16:06:07.618 30852 30926 I LLama-android: llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
08-10 16:06:07.618 30852 30926 I LLama-android: llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
08-10 16:06:07.618 30852 30926 I LLama-android: llm_load_print_meta: LF token         = 148848 'ÄĬ'
08-10 16:06:07.618 30852 30926 I LLama-android: llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
08-10 16:06:07.618 30852 30926 I LLama-android: llm_load_print_meta: max token length = 256
08-10 16:06:07.624 30852 30926 D vulkan  : searching for layers in '/data/app/~~OvYsMz18c3DQFfK8i-sPtQ==/com.set.ai-gU7EJsFpEOK5rgbEU08wQw==/lib/arm64'
08-10 16:06:07.624 30852 30926 D vulkan  : searching for layers in '/data/app/~~OvYsMz18c3DQFfK8i-sPtQ==/com.set.ai-gU7EJsFpEOK5rgbEU08wQw==/base.apk!/lib/arm64-v8a'
08-10 16:06:07.627 30852 30926 W Adreno-AppProfiles: Could not find QSPM HAL service. Skipping adreno profile processing.
08-10 16:06:07.627 30852 30926 I AdrenoVK-0: ===== BEGIN DUMP OF OVERRIDDEN SETTINGS =====
08-10 16:06:07.627 30852 30926 I AdrenoVK-0: ===== END DUMP OF OVERRIDDEN SETTINGS =====
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: QUALCOMM build          : d44197479c, I2991b7e11e
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Build Date              : 05/31/23
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Shader Compiler Version : E031.41.03.36
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Local Branch            : 
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Remote Branch           : 
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Remote Branch           : 
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Reconstruct Branch      : 
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Build Config            : S P 14.1.4 AArch64
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Driver Path             : /vendor/lib64/hw/vulkan.adreno.so
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Driver Version          : 0676.32
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: PFP                     : 0x01740158
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: ME                      : 0x00000000
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Application Name    : ggml-vulkan
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Application Version : 0x00000001
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Engine Name         : (null)
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Engine Version      : 0x00000000
08-10 16:06:07.628 30852 30926 I AdrenoVK-0: Api Version         : 0x00402000
08-10 16:06:09.099 30852 30926 I AdrenoVK-0: Failed to link shaders.
08-10 16:06:09.099 30852 30926 I AdrenoVK-0: Pipeline create failed
08-10 16:06:09.108 30852 30926 E LLama-android: llama_model_load: error loading model: vk::Device::createComputePipeline: ErrorUnknown
08-10 16:06:09.108 30852 30926 E LLama-android: llama_load_model_from_file: failed to load model
08-10 16:06:09.132 30852 30926 E LLama-android: llama_new_context_with_model: model cannot be NULL
08-10 16:06:09.132 30852 30926 F libc    : exiting due to SIG_DFL handler for signal 11, ucontext 0x7317ea5e20

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedcritical severityUsed to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)stale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions