Skip to content

[bug] the NPU backend achives around 1/3 performance of CPU #34

@chraac

Description

@chraac

Hi @TerryT9, I encountered the same issue and I'm not sure how to resolve it. Could you share how you solved it?

Hi @Gianthard-cyh , would you mind to say which model you were using?

I'm using Llama-3.2-1B-Instruct-f16.gguf. By the way, I run the model successfully after increasing the size limit of MUL_MAT op and changing the precision option of NPU (the execution of Convert op will fail without this) .

However, as stated in previous issues, the NPU backend achives around 1/3 performance of CPU. I think more profiling and optimizing work could be done. I'm happy to help with that.

My device is Oneplus Ace 3 with Snapdragon 8 Gen 2.

--- a/ggml/src/ggml-qnn/graph.cpp
+++ b/ggml/src/ggml-qnn/graph.cpp
@@ -192,8 +192,15 @@ qnn_graph::qnn_graph(const std::string &graph_name, QNNBackend device, std::shar
         graph_vtcm_config.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
         graph_vtcm_config.customConfig = &vtcm_config;

+        QnnHtpGraph_CustomConfig_t precision_config;
+        precision_config.option = QNN_HTP_GRAPH_CONFIG_OPTION_PRECISION;
+        precision_config.precision = QNN_PRECISION_FLOAT16;
+        QnnGraph_Config_t graph_precision_config;
+        graph_precision_config.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
+        graph_precision_config.customConfig = &precision_config;
+
         const QnnGraph_Config_t *graph_configs[] = {&graph_hvx_config, &graph_dlbc_config, &graph_vtcm_config,
-                                                    &graph_opt_config, nullptr};
+                                                    &graph_opt_config, &graph_precision_config, nullptr};
         error = qnn_interface->qnn_graph_create(qnn_context, graph_name.c_str(), graph_configs, &graph_handle);
     } else {
         error = qnn_interface->qnn_graph_create(qnn_context, graph_name.c_str(), nullptr, &graph_handle);

Originally posted by @Gianthard-cyh in #20

Metadata

Metadata

Assignees

Labels

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions