Add vectorization in elementwise_util #9432

swolchok · 2025-03-20T00:36:21Z

This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the unary_ufunc_* utilities in pattern.h for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util.

This PR adds an interesting testing problem: in theory, all operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above.

[ghstack-poisoned]

swolchok · 2025-03-20T00:36:21Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-03-20T00:36:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9432

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 27 Pending

As of commit 4a0d1db with merge base bc42d8d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

this works with op_mul, which is vectorized-friendly, but doesn't work when we roll out to pattern.h because those ops will not work with Vectorized yet. See TODO in elementwise_util.h ghstack-source-id: 30d2311 ghstack-comment-id: 2738665976 Pull Request resolved: #9432

[ghstack-poisoned]

swolchok · 2025-06-09T19:01:43Z

starting stacked land, noting that CI is green

Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432.

Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432.

[ghstack-poisoned]

This reverts commit 4c35fe0.

…nels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) [ghstack-poisoned]

…nels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) ghstack-source-id: 289985405 Pull Request resolved: #11604

…table_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)"" Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) [ghstack-poisoned]

…nels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Pull Request resolved: #11604 Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) ghstack-source-id: 289996914

…nels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Pull Request resolved: #11604 Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. ghstack-source-id: 290334876 Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/)

…"Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)"" Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) [ghstack-poisoned]

…ortable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)"" Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) [ghstack-poisoned]

…nels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Differential Revision: D76467389 Pull Request resolved: #11604

…ble_kernels test (pytorch#11205)", and "Add vectorization in elementwise_util (pytorch#9432)" Summary: Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if defined(...) && ...` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for pytorch#11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in pytorch#9432. Original summary for pytorch#11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support pytorch#9432. Original summary for pytorch#9432: This is a first cut at pytorch#9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: D76467389 *** fix ET_USE_PYTORCH_HEADERS

…kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Stack was reverted (again! I bypassed some broken jobs and it turns out this re-broke them) due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes in first reapply: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) New fixes in second reapply: - So far, none; D76843086 and D76857541 fix things up in preparation for this diff. (some rebase conflict fixes though) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76754826](https://our.internmc.facebook.com/intern/diff/D76754826/) [ghstack-poisoned]

…kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Stack was reverted (again! I bypassed some broken jobs and it turns out this re-broke them) due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes in first reapply: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) New fixes in second reapply: - So far, none; D76843086 and D76857541 fix things up in preparation for this diff. (some rebase conflict fixes though) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76754826](https://our.internmc.facebook.com/intern/diff/D76754826/) ghstack-source-id: 291370586 Pull Request resolved: #11802

…kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Differential Revision: D76754826 Pull Request resolved: #11802

… optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" ghstack PR number: #11802 Please see that original PR for details; this is a manual cherry-pick because mergebot failed. ghstack-source-id: e54d27c ghstack-comment-id: 3001228646 Pull-Request-resolved: #11912

… optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" (#11912) ghstack PR number: #11802 Please see that original PR for details; this is a manual cherry-pick because mergebot failed.

this works with op_mul, which is vectorized-friendly, but doesn't work when we roll out to pattern.h because those ops will not work with Vectorized yet. See TODO in elementwise_util.h ghstack-source-id: a12238d ghstack-comment-id: 2738665976 Pull Request resolved: pytorch/executorch#9432

…", "Add optimized_portable_kernels test (pytorch#11205)", and "Add vectorization in elementwise_util (pytorch#9432)" (pytorch#11912) ghstack PR number: pytorch#11802 Please see that original PR for details; this is a manual cherry-pick because mergebot failed.

swolchok added 18 commits March 18, 2025 17:32

Update

31a49e0

[ghstack-poisoned]

Update

9fcd885

[ghstack-poisoned]

Update

29d6de9

[ghstack-poisoned]

Update

79b908c

[ghstack-poisoned]

Update

fd62a07

[ghstack-poisoned]

Update

854c991

[ghstack-poisoned]

Update

def7ed4

[ghstack-poisoned]

Update

40c1b1b

[ghstack-poisoned]

Update

7c78357

[ghstack-poisoned]

Update

7ba269a

[ghstack-poisoned]

Update

edd45fb

[ghstack-poisoned]

Update

b9c545f

[ghstack-poisoned]

Update

3091007

[ghstack-poisoned]

Update

4a00cac

[ghstack-poisoned]

Update

21b81bf

[ghstack-poisoned]

Update

4c4add0

[ghstack-poisoned]

Update

8782a90

[ghstack-poisoned]

Update

75f8970

[ghstack-poisoned]

swolchok requested review from JacobSzwejbka, lucylq and manuelcandales as code owners March 20, 2025 00:36

swolchok added a commit that referenced this pull request May 29, 2025

rebase, fix lint in #9432

7baa2f4

[ghstack-poisoned]

swolchok marked this pull request as ready for review May 29, 2025 19:27

manuelcandales approved these changes Jun 4, 2025

View reviewed changes

swolchok added a commit that referenced this pull request Jun 9, 2025

Add vectorized_math.h (#11204)

1720f2f

Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432.

swolchok added a commit that referenced this pull request Jun 9, 2025

Add optimized_portable_kernels test (#11205)

bc42d8d

Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432.

Base automatically changed from gh/swolchok/439/head to main June 9, 2025 19:03

Update

4a0d1db

[ghstack-poisoned]

swolchok merged commit 4c35fe0 into main Jun 9, 2025
95 checks passed

swolchok deleted the gh/swolchok/386/head branch June 9, 2025 19:28

JacobSzwejbka added a commit that referenced this pull request Jun 10, 2025

Revert "Add vectorization in elementwise_util (#9432)"

c275c64

This reverts commit 4c35fe0.

swolchok mentioned this pull request Jun 12, 2025

Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11604

Merged

DariusHolmgren mentioned this pull request Jun 14, 2025

Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11682

Open

swolchok mentioned this pull request Jun 18, 2025

Re-reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11802

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vectorization in elementwise_util #9432

Add vectorization in elementwise_util #9432

Uh oh!

swolchok commented Mar 20, 2025 •

edited

Loading

Uh oh!

swolchok commented Mar 20, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 20, 2025 •

edited

Loading

Uh oh!

swolchok commented Jun 9, 2025

Uh oh!

Uh oh!

Uh oh!

Add vectorization in elementwise_util #9432

Add vectorization in elementwise_util #9432

Uh oh!

Conversation

swolchok commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swolchok commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9432

⏳ No Failures, 27 Pending

Uh oh!

swolchok commented Jun 9, 2025

Uh oh!

Uh oh!

Uh oh!

swolchok commented Mar 20, 2025 •

edited

Loading

swolchok commented Mar 20, 2025 •

edited

Loading

pytorch-bot bot commented Mar 20, 2025 •

edited

Loading