Qualcomm AI Engine Direct - Add QNN support for to_edge_transform_and_lower #9643

shewu-quic · 2025-03-26T13:30:12Z

Summary:

Support to_edge_transform_and_lower
- Replace capture_program with new API to_edge_transform_and_lower_to_qnn
- Replace capture_program with to_edge_transform_and_lower_to_qnn for unit_test
- Replace capture_program with to_edge_transform_and_lower_to_qnn for examples
- Replace capture_program with to_edge_transform_and_lower_to_qnn for llama
Add QnnPassManager to manage all passes in different stage
- Deprecated _transform in export_llama_lib with qnn_pass_manager
- Add transform_for_export_pipeline for LiftConstantScalarOperands to avoid creating temporary tensors in the operation builder. However, this pass will create a get_attr node, which should be converted into a lifted tensor constant by the lift_constant_tensor_pass. If placed in the to_edge_transform_passes, it will be executed after the lift_constant_tensor_pass, causing the operation builder to fail to correctly retrieve the parameter by the get_parameter for get_attr node.
Refactor the passes
- Fix the output dtype doesn't match in runtime after build quant io
- Combine constant_i64_to_i32 and tensor_i64_to_i32 into i64_to_i32
- Replace convert_to_linear pass with fixed_linear_keep_dim pass
  - Since QNN has no keep dims for linear op, we will need to add squeeze and unsqueeze around linear node
- Add TagQuantIO pass to tag io nodes to avoid inserting q/dq in qnn_preprocess
- Add prelu, leaky_relu, linear, rms_norm into decompose_table
  - Remove recompose_prelu.py
- Remove unused variable in insert_requantize.py, and replace_index_put_input.py
Support aten.split_with_sizes_copy.default
Support leaky_relu with inplace=True

pytorch-bot · 2025-03-26T13:30:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9643

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3732242 with merge base 1ea101e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shewu-quic · 2025-03-26T13:31:51Z

Hi @cccclai ,
This PR is to support the API to_edge_transform_and_lower and refactor the pass management.
Could you please help take a look?

Thanks!

cccclai · 2025-03-26T16:27:04Z

Hi, thank you so much for adding the support for to_edge_transform_and_lower, we will merge it soon. One question: it looks like there are lots of changes during this refactor, is there feedback for us to reduce the engineer work to make your lives easier?

shewu-quic · 2025-03-27T00:39:52Z

Hi, thank you so much for adding the support for to_edge_transform_and_lower, we will merge it soon. One question: it looks like there are lots of changes during this refactor, is there feedback for us to reduce the engineer work to make your lives easier?

Thank you for your effort. We are trying to align closely with the official API instead of calling the wrapper API "to_edge_transform_and_lowering_to_qnn". Therefore, we have revisited our passes and are attempting to either remove it or move it to the QNN preprocess or QNN partitioner. I have the following points:

Is there an official way to perform decomposition, such as custom decomposition, exception lists, or something similar?
We currently have some nodes that need to decompose a group of nodes to delegate to QNN before partitioning, such as DecomposeScaledDotProductAttention, DecomposeLinalgVectorNorm, and DecomposeAny.
It doesn't allow changes to the graph in qnn_partitioner.
We would like to move some passes to qnn_partitioner to avoid the wrapper API, but there are some passes that insert or remove nodes in the graph, such as decomposition passes and i64_to_i32.
Can I get the edge graph module in the new API?
Our debugger might need to compare it with the graph module before to_backend.
Is it possible to move lift_constant_tensor_pass after edge_manager.transform(transform_passes)?
In the LiftConstantScalarOperands pass, we create a getattr node to lift scalars. However, if we apply this pass after to_edge, we will get a getattr node in qnn_partitioner, and it seems to miss the constant value in qnn_preprocess. From my investigation, lift_constant_tensor_pass in to_edge converts the get_attr node to a placeholder node and adds the constant into the state_dict in exported_program. Therefore, we need to call LiftConstantScalarOperands before that. Alternatively, is there another way to lift scalars instead of using the getattr node?

shewu-quic · 2025-03-27T01:09:44Z

It seems there are some conflicts. I will rebase this PR ASAP.

…_lower summary: - Support `to_edge_transform_and_lower` - Replace capture_program with new API `to_edge_transform_and_lower_to_qnn` - Replace capture_program with to_edge_transform_and_lower_to_qnn for unit_test - Replace capture_program with to_edge_transform_and_lower_to_qnn for examples - Replace capture_program with to_edge_transform_and_lower_to_qnn for llama - Add QnnPassManager to manage all passes in different stage - Deprecated _transform in export_llama_lib with qnn_pass_manager - Add transform_for_export_pipeline for LiftConstantScalarOperands to avoid creating temporary tensors in the operation builder. However, this pass will create a get_attr node, which should be converted into a lifted tensor constant by the lift_constant_tensor_pass. If placed in the to_edge_transform_passes, it will be executed after the lift_constant_tensor_pass, causing the operation builder to fail to correctly retrieve the parameter by the get_parameter for get_attr node. - Refactor the passes - Fix the output dtype doesn't match in runtime after build quant io - Combine constant_i64_to_i32 and tensor_i64_to_i32 into i64_to_i32 - Replace convert_to_linear pass with fixed_linear_keep_dim pass - Since QNN has no keep dims for linear op, we will need to add squeeze and unsqueeze around linear node - Add TagQuantIO pass to tag io nodes to avoid inserting q/dq in qnn_preprocess - Add prelu, leaky_relu, linear, rms_norm into decompose_table - Remove recompose_prelu.py - Remove unused variable in insert_requantize.py, and replace_index_put_input.py - Support aten.split_with_sizes_copy.default - Support leaky_relu with inplace=True

shewu-quic · 2025-03-27T17:41:13Z

"I have rebased the branch. Additionally, I tested static_llama with story llama and confirmed that we get the same results both before and after this PR."

INFO:root:Results[0]:
Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite toy was a big, red ball. One day, Lily's mom asked her to help her with the laundry. Lily was happy to help and she put all the clothes in the washing machine. 
After the clothes were washed, Lily's mom asked her to help her hang them up to dry. Lily saw a big, black iron on the counter and asked her mom what it was for. Her mom explained that it was used to make clothes smooth

facebook-github-bot · 2025-03-27T18:02:47Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2025-03-27T18:02:59Z

Thank you for the detailed notes! We'll work on those.

cccclai

Looks good. There are some internal error and I'll send forward fix.

Summary: forward fix for pytorch#9643 Reviewed By: kirklandsign Differential Revision: D72353830

Summary: Pull Request resolved: pytorch#9864 forward fix for pytorch#9643 Reviewed By: kirklandsign Differential Revision: D72353830

cccclai · 2025-04-03T17:38:08Z

hello! Looks like there are some CI failure from this PR https://hud.pytorch.org/pytorch/executorch/commit/2f408dd79d9656c8bfb90b1e8fd990ed326ea36f, can you take a look? These are trunk jobs (longer time to run), so it wasn't triggered by the PR right away

Summary: As title, it's broken in pytorch#9643 Differential Revision: D72472098

Summary: - As title, it's broken in pytorch#9643

Summary: - As title, it's broken in #9643

…_lower (#9643) Summary: - Support `to_edge_transform_and_lower` - Replace capture_program with new API `to_edge_transform_and_lower_to_qnn` - Replace capture_program with to_edge_transform_and_lower_to_qnn for unit_test - Replace capture_program with to_edge_transform_and_lower_to_qnn for examples - Replace capture_program with to_edge_transform_and_lower_to_qnn for llama - Add QnnPassManager to manage all passes in different stage - Deprecated _transform in export_llama_lib with qnn_pass_manager - Add transform_for_export_pipeline for LiftConstantScalarOperands to avoid creating temporary tensors in the operation builder. However, this pass will create a get_attr node, which should be converted into a lifted tensor constant by the lift_constant_tensor_pass. If placed in the to_edge_transform_passes, it will be executed after the lift_constant_tensor_pass, causing the operation builder to fail to correctly retrieve the parameter by the get_parameter for get_attr node. - Refactor the passes - Fix the output dtype doesn't match in runtime after build quant io - Combine constant_i64_to_i32 and tensor_i64_to_i32 into i64_to_i32 - Replace convert_to_linear pass with fixed_linear_keep_dim pass - Since QNN has no keep dims for linear op, we will need to add squeeze and unsqueeze around linear node - Add TagQuantIO pass to tag io nodes to avoid inserting q/dq in qnn_preprocess - Add prelu, leaky_relu, linear, rms_norm into decompose_table - Remove recompose_prelu.py - Remove unused variable in insert_requantize.py, and replace_index_put_input.py - Support aten.split_with_sizes_copy.default - Support leaky_relu with inplace=True

Summary: - As title, it's broken in #9643

) Summary: - As title, it's broken in pytorch#9643

shewu-quic requested review from jackzhxng, iseeyuan, larryliu0820, swolchok, cccclai and lucylq as code owners March 26, 2025 13:30

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2025

shewu-quic force-pushed the dev1/hutton/qnn-partitioner-update branch from 7b367bd to db112a0 Compare March 27, 2025 17:34

shewu-quic force-pushed the dev1/hutton/qnn-partitioner-update branch from db112a0 to 3732242 Compare March 27, 2025 17:39

cccclai added the release notes: qualcomm Changes to the Qualcomm backend delegate label Mar 27, 2025

cccclai approved these changes Apr 2, 2025

View reviewed changes

cccclai merged commit 2f408dd into pytorch:main Apr 2, 2025
87 of 90 checks passed

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Apr 3, 2025

forward fix

4cbda7b

Summary: forward fix for pytorch#9643 Reviewed By: kirklandsign Differential Revision: D72353830

cccclai mentioned this pull request Apr 3, 2025

forward fix #9864

Merged

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Apr 3, 2025

forward fix (pytorch#9864)

bb5560e

Summary: Pull Request resolved: pytorch#9864 forward fix for pytorch#9643 Reviewed By: kirklandsign Differential Revision: D72353830

This was referenced Apr 3, 2025

Arm backend: Convert assert to throw TypeError in arm_pass_utils #9866

Merged

Arm backend: Add where.self #9869

Merged

Arm backend: Improve pre-push hook #9873

Merged

cccclai mentioned this pull request Apr 4, 2025

Fix mobile bert fine tune #9915

Closed

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Apr 4, 2025

Fix mobile bert fine tune

50a652f

Summary: As title, it's broken in pytorch#9643 Differential Revision: D72472098

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Apr 4, 2025

Fix mobile bert fine tune

3127f31

Summary: As title, it's broken in pytorch#9643 Differential Revision: D72472098

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Apr 4, 2025

Fix mobile bert fine tune (pytorch#9915)

0451dd9

Summary: As title, it's broken in pytorch#9643 Differential Revision: D72472098

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Apr 5, 2025

Fix mobile bert fine tune (pytorch#9915)

9b1b44b

Summary: As title, it's broken in pytorch#9643 Differential Revision: D72472098

shewu-quic added a commit to CodeLinaro/executorch that referenced this pull request Apr 7, 2025

Qualcomm AI Engine Direct - Fix mobilebert finetune script

12f628e

Summary: - As title, it's broken in pytorch#9643

shewu-quic mentioned this pull request Apr 7, 2025

Qualcomm AI Engine Direct - Fix mobilebert finetune script #9927

Merged

cccclai pushed a commit that referenced this pull request Apr 7, 2025

Qualcomm AI Engine Direct - Fix mobilebert finetune script (#9927)

709560f

Summary: - As title, it's broken in #9643

kirklandsign pushed a commit that referenced this pull request Apr 11, 2025

Qualcomm AI Engine Direct - Fix mobilebert finetune script (#9927)

432c72a

Summary: - As title, it's broken in #9643

keyprocedure pushed a commit to keyprocedure/executorch that referenced this pull request Apr 21, 2025

Qualcomm AI Engine Direct - Fix mobilebert finetune script (pytorch#9927

968dec9

) Summary: - As title, it's broken in pytorch#9643

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Add QNN support for to_edge_transform_and_lower #9643

Qualcomm AI Engine Direct - Add QNN support for to_edge_transform_and_lower #9643

Uh oh!

shewu-quic commented Mar 26, 2025

Uh oh!

pytorch-bot bot commented Mar 26, 2025 •

edited

Loading

Uh oh!

shewu-quic commented Mar 26, 2025

Uh oh!

cccclai commented Mar 26, 2025

Uh oh!

shewu-quic commented Mar 27, 2025

Uh oh!

shewu-quic commented Mar 27, 2025

Uh oh!

shewu-quic commented Mar 27, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 27, 2025

Uh oh!

cccclai commented Mar 27, 2025

Uh oh!

cccclai left a comment

Uh oh!

Uh oh!

cccclai commented Apr 3, 2025

Uh oh!

Uh oh!

Qualcomm AI Engine Direct - Add QNN support for to_edge_transform_and_lower #9643

Qualcomm AI Engine Direct - Add QNN support for to_edge_transform_and_lower #9643

Uh oh!

Conversation

shewu-quic commented Mar 26, 2025

Uh oh!

pytorch-bot bot commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9643

✅ No Failures

Uh oh!

shewu-quic commented Mar 26, 2025

Uh oh!

cccclai commented Mar 26, 2025

Uh oh!

shewu-quic commented Mar 27, 2025

Uh oh!

shewu-quic commented Mar 27, 2025

Uh oh!

shewu-quic commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Mar 27, 2025

Uh oh!

cccclai commented Mar 27, 2025

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cccclai commented Apr 3, 2025

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 26, 2025 •

edited

Loading

shewu-quic commented Mar 27, 2025 •

edited

Loading