forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
Description
Hi @TerryT9, I encountered the same issue and I'm not sure how to resolve it. Could you share how you solved it?
Hi @Gianthard-cyh , would you mind to say which model you were using?
I'm using Llama-3.2-1B-Instruct-f16.gguf. By the way, I run the model successfully after increasing the size limit of MUL_MAT op and changing the precision option of NPU (the execution of Convert op will fail without this) .
However, as stated in previous issues, the NPU backend achives around 1/3 performance of CPU. I think more profiling and optimizing work could be done. I'm happy to help with that.
My device is Oneplus Ace 3 with Snapdragon 8 Gen 2.
--- a/ggml/src/ggml-qnn/graph.cpp +++ b/ggml/src/ggml-qnn/graph.cpp @@ -192,8 +192,15 @@ qnn_graph::qnn_graph(const std::string &graph_name, QNNBackend device, std::shar graph_vtcm_config.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM; graph_vtcm_config.customConfig = &vtcm_config; + QnnHtpGraph_CustomConfig_t precision_config; + precision_config.option = QNN_HTP_GRAPH_CONFIG_OPTION_PRECISION; + precision_config.precision = QNN_PRECISION_FLOAT16; + QnnGraph_Config_t graph_precision_config; + graph_precision_config.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM; + graph_precision_config.customConfig = &precision_config; + const QnnGraph_Config_t *graph_configs[] = {&graph_hvx_config, &graph_dlbc_config, &graph_vtcm_config, - &graph_opt_config, nullptr}; + &graph_opt_config, &graph_precision_config, nullptr}; error = qnn_interface->qnn_graph_create(qnn_context, graph_name.c_str(), graph_configs, &graph_handle); } else { error = qnn_interface->qnn_graph_create(qnn_context, graph_name.c_str(), nullptr, &graph_handle);
Originally posted by @Gianthard-cyh in #20
chraac
Metadata
Metadata
Assignees
Labels
Projects
Status
In progress