-
Notifications
You must be signed in to change notification settings - Fork 572
Qualcomm AI Engine Direct - Add QNN support for to_edge_transform_and_lower #9643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualcomm AI Engine Direct - Add QNN support for to_edge_transform_and_lower #9643
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9643
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 3732242 with merge base 1ea101e ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @cccclai , Thanks! |
Hi, thank you so much for adding the support for |
Thank you for your effort. We are trying to align closely with the official API instead of calling the wrapper API "to_edge_transform_and_lowering_to_qnn". Therefore, we have revisited our passes and are attempting to either remove it or move it to the QNN preprocess or QNN partitioner. I have the following points:
|
It seems there are some conflicts. I will rebase this PR ASAP. |
7b367bd
to
db112a0
Compare
…_lower summary: - Support `to_edge_transform_and_lower` - Replace capture_program with new API `to_edge_transform_and_lower_to_qnn` - Replace capture_program with to_edge_transform_and_lower_to_qnn for unit_test - Replace capture_program with to_edge_transform_and_lower_to_qnn for examples - Replace capture_program with to_edge_transform_and_lower_to_qnn for llama - Add QnnPassManager to manage all passes in different stage - Deprecated _transform in export_llama_lib with qnn_pass_manager - Add transform_for_export_pipeline for LiftConstantScalarOperands to avoid creating temporary tensors in the operation builder. However, this pass will create a get_attr node, which should be converted into a lifted tensor constant by the lift_constant_tensor_pass. If placed in the to_edge_transform_passes, it will be executed after the lift_constant_tensor_pass, causing the operation builder to fail to correctly retrieve the parameter by the get_parameter for get_attr node. - Refactor the passes - Fix the output dtype doesn't match in runtime after build quant io - Combine constant_i64_to_i32 and tensor_i64_to_i32 into i64_to_i32 - Replace convert_to_linear pass with fixed_linear_keep_dim pass - Since QNN has no keep dims for linear op, we will need to add squeeze and unsqueeze around linear node - Add TagQuantIO pass to tag io nodes to avoid inserting q/dq in qnn_preprocess - Add prelu, leaky_relu, linear, rms_norm into decompose_table - Remove recompose_prelu.py - Remove unused variable in insert_requantize.py, and replace_index_put_input.py - Support aten.split_with_sizes_copy.default - Support leaky_relu with inplace=True
db112a0
to
3732242
Compare
"I have rebased the branch. Additionally, I tested static_llama with story llama and confirmed that we get the same results both before and after this PR."
|
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Thank you for the detailed notes! We'll work on those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. There are some internal error and I'll send forward fix.
Summary: forward fix for pytorch#9643 Reviewed By: kirklandsign Differential Revision: D72353830
Summary: Pull Request resolved: pytorch#9864 forward fix for pytorch#9643 Reviewed By: kirklandsign Differential Revision: D72353830
hello! Looks like there are some CI failure from this PR https://hud.pytorch.org/pytorch/executorch/commit/2f408dd79d9656c8bfb90b1e8fd990ed326ea36f, can you take a look? These are trunk jobs (longer time to run), so it wasn't triggered by the PR right away |
Summary: As title, it's broken in pytorch#9643 Differential Revision: D72472098
Summary: As title, it's broken in pytorch#9643 Differential Revision: D72472098
Summary: As title, it's broken in pytorch#9643 Differential Revision: D72472098
Summary: As title, it's broken in pytorch#9643 Differential Revision: D72472098
Summary: - As title, it's broken in pytorch#9643
Summary: - As title, it's broken in #9643
…_lower (#9643) Summary: - Support `to_edge_transform_and_lower` - Replace capture_program with new API `to_edge_transform_and_lower_to_qnn` - Replace capture_program with to_edge_transform_and_lower_to_qnn for unit_test - Replace capture_program with to_edge_transform_and_lower_to_qnn for examples - Replace capture_program with to_edge_transform_and_lower_to_qnn for llama - Add QnnPassManager to manage all passes in different stage - Deprecated _transform in export_llama_lib with qnn_pass_manager - Add transform_for_export_pipeline for LiftConstantScalarOperands to avoid creating temporary tensors in the operation builder. However, this pass will create a get_attr node, which should be converted into a lifted tensor constant by the lift_constant_tensor_pass. If placed in the to_edge_transform_passes, it will be executed after the lift_constant_tensor_pass, causing the operation builder to fail to correctly retrieve the parameter by the get_parameter for get_attr node. - Refactor the passes - Fix the output dtype doesn't match in runtime after build quant io - Combine constant_i64_to_i32 and tensor_i64_to_i32 into i64_to_i32 - Replace convert_to_linear pass with fixed_linear_keep_dim pass - Since QNN has no keep dims for linear op, we will need to add squeeze and unsqueeze around linear node - Add TagQuantIO pass to tag io nodes to avoid inserting q/dq in qnn_preprocess - Add prelu, leaky_relu, linear, rms_norm into decompose_table - Remove recompose_prelu.py - Remove unused variable in insert_requantize.py, and replace_index_put_input.py - Support aten.split_with_sizes_copy.default - Support leaky_relu with inplace=True
Summary: - As title, it's broken in #9643
) Summary: - As title, it's broken in pytorch#9643
Summary:
Support
to_edge_transform_and_lower
to_edge_transform_and_lower_to_qnn
Add QnnPassManager to manage all passes in different stage
Refactor the passes
Support aten.split_with_sizes_copy.default
Support leaky_relu with inplace=True