Skip to content

[partial-apply-combiner] Fix partial_apply combiner implementation for indirect parameters #29365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gottesmm
Copy link
Contributor

I do not believe that this code was ever correct and we just got lucky since in
the vast majority of cases where this optimization kicks in we also eliminate
the underlying partial apply/allocations.

To make sure that we can properly test this code and make sure that we do not
run into this problem again, I added an option to SILCombinerApplyVisitors that
disables partial apply deletion so we can verify that we do not rely on
eliminating the underlying partial apply for correctness.

rdar://problem/57855082


To be more specific, partial_apply expects that it will take all of its input
arguments at +1. This includes parameters that are passed @in or @in_guaranteed
to the underlying function. Just to summarize the expected semantics are that:

  1. If the underlying function takes the parameter @in then the partial_apply
    forwarder allows the underlying function to consume the argument.

  2. If the underlying function takes the parameter @in_guaranteed then the
    partial_apply forwarder itself will destroy the underlying value.

In terms of the partial_apply combiner this means that unless we can guarantee
that the partial_apply is eliminated as a result of our transformation (which we
can not givne the way this utility is written), we must introduce new copies of
@in, @in_guaranteed partial applied arguments and lifetime extend those copies
to the new direct apply that we are creating. In the case of the parameter being
in_guaranteed we insert a destroy after that call site to ensure that our new
copy is cleaned up. If we have an @in parameter then we allow the underlying
function to clean up the value.


NOTE: the 2nd commit here just clarifies the documentation around partial_apply to make it much more explicit what the actual semantics are of partial_apply arguments.

I do not believe that this code was ever correct and we just got lucky since in
the vast majority of cases where this optimization kicks in we also eliminate
the underlying partial apply/allocations.

To make sure that we can properly test this code and make sure that we do not
run into this problem again, I added an option to SILCombinerApplyVisitors that
disables partial apply deletion so we can verify that we do not rely on
eliminating the underlying partial apply for correctness.

<rdar://problem/57855082>

----

To be more specific, partial_apply expects that it will take all of its input
arguments at +1. This includes parameters that are passed @in or @in_guaranteed
to the underlying function. Just to summarize the expected semantics are that:

1. If the underlying function takes the parameter @in then the partial_apply
forwarder allows the underlying function to consume the argument.

2. If the underlying function takes the parameter @in_guaranteed then the
partial_apply forwarder itself will destroy the underlying value.

In terms of the partial_apply combiner this means that unless we can guarantee
that the partial_apply is eliminated as a result of our transformation (which we
can not givne the way this utility is written), we must introduce new copies of
@in, @in_guaranteed partial applied arguments and lifetime extend those copies
to the new direct apply that we are creating. In the case of the parameter being
in_guaranteed we insert a destroy after that call site to ensure that our new
copy is cleaned up. If we have an @in parameter then we allow the underlying
function to clean up the value.
@gottesmm gottesmm requested a review from jckarter January 22, 2020 22:12
@gottesmm
Copy link
Contributor Author

@swift-ci test

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm gottesmm force-pushed the pr-298ec13864bf3ed6e84dd4e1b422207940f19783 branch from 1525bdc to 1b807b8 Compare January 22, 2020 22:20
@gottesmm
Copy link
Contributor Author

@swift-ci test

3 similar comments
@gottesmm
Copy link
Contributor Author

@swift-ci test

@gottesmm
Copy link
Contributor Author

@swift-ci test

@gottesmm
Copy link
Contributor Author

@swift-ci test

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

2 similar comments
@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Improvement OLD NEW DELTA RATIO
PrefixArrayLazy 15 14 -6.7% 1.07x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 2889 3896 +34.9% 0.74x (?)
SuffixCountableRange 4 5 +25.0% 0.80x (?)

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@aschwaighofer
Copy link
Contributor

Does this code handle the following situation correctly? (I left out alloc_stack and instead use a value representation)

// f(arg0: @in SomeThing, arg1: SomeClass)
%context = partial_apply f(%arg0)

for i in 0..<100 {
   apply %context(%arg1)
}

destroy %context

This should be transformed to:

// f(arg0: @in SomeThing, arg1: SomeClass)
// context = partial_apply %f(%arg0)
%context_arg0_lifetime_extend =  copy %arg0


for i in 0..<100 {
   // apply context(arg1)
   %copy_for_@in_arg0 = copy %context_arg0_lifetime_extend
   apply f(%copy_for_@in_arg0, %arg1)
}


//destroy context
destroy %context_arg0_lifetime_extend

We need to handle the lifetime extension for the values captured in the closure's context separately from the handling of the convention in the partial_apply forwarder at the call site.

@gottesmm
Copy link
Contributor Author

Arnold and I spoke offline. I am pretty sure that this code assumes that the partial apply site and the actual call site are strongly control equivalent. This is not true if the partial apply site is out side of a loop and the call site are in a loop.

That being said, the thing I am fixing in this PR is actually a case where we are wrong even in the strongly control equivalent case, so I am going to fix the weakly control equivalent case in a follow on PR.

Assuming again that I am correct, in terms of propagating the fix/cherry-picking, I am going to get this one in and then have two follow on commits:

  1. A commit that disables the optimization if we do not have strong control equivalence. This can be cherry-picked.
  2. A follow on commit that will only be on master that extends the optimization so that we handle the loop case correctly.

@aschwaighofer
Copy link
Contributor

sgtm

@gottesmm
Copy link
Contributor Author

Found some more problems latent in the implementation here. I am going to split this into a few different PRs. Specifically:

  1. This code has a bunch of implicit assumptions in it. As a simple example, it assumes that a partial_applies lifetime can be completely discovered by its immediate users which is not a safe assumption to make. It is non-obvious that this is true especially since the code is structured with a first check users are safe check. Clearly the recursive “check users are safe check” should provide the list of such users to the lifetime analysis.

  2. In the object case, this code is relying on the underlying partial apply keeping the object alive until its call site. This is an unsafe assumption to make since we could eliminate that partial_apply. Instead we need to copy the value at the partial_apply site, lifetime extend it over all call sites that we found, and then at each call site, copy as needed.

  3. In the address case, it is creating a new stack allocated copy for all potential invocations of the partial_apply and isn’t creating new copies per apply site. That suggests to me that this optimization in the indirect case is also inherently unsafe.

I am going to fix the optimization by moving to a model where at the partial apply site, we only once create a copy of each partial applied argument (noting that alloc_stack, dealloc_stack are still created at the begin/end of the function and we would copy indirect arguments into those) and lifetime extend over all call sites. Then at each call site that we are eliminating, we create a new separate copy if the call site needs to consume the argument. Otherwise, we just pass in our original copy.

@gottesmm gottesmm closed this Jan 24, 2020
@gottesmm gottesmm deleted the pr-298ec13864bf3ed6e84dd4e1b422207940f19783 branch July 23, 2021 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants