Skip to content

Fix bug in local_elemwise_alloc and add tests #767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ricardoV94
Copy link
Contributor

@ricardoV94 ricardoV94 commented Jan 19, 2022

Rewrite would return a new graph identical to itself, resulting in an endless rewrite cycle

import aesara
import aesara.tensor as at

x = at.matrix("x")
y = at.matrix("y")
z = at.mul(at.alloc(x, 1, 5), y)

with aesara.config.change_flags(optimizer_verbose=True):
    f = aesara.function([x, y], z, mode="FAST_RUN")
optimizer: rewrite local_elemwise_alloc replaces Elemwise{mul,no_inplace}.0 with Elemwise{mul,no_inplace}.0
optimizer: rewrite local_elemwise_alloc replaces Elemwise{mul,no_inplace}.0 with Elemwise{mul,no_inplace}.0
optimizer: rewrite local_elemwise_alloc replaces Elemwise{mul,no_inplace}.0 with Elemwise{mul,no_inplace}.0
optimizer: rewrite local_elemwise_alloc replaces Elemwise{mul,no_inplace}.0 with Elemwise{mul,no_inplace}.0
optimizer: rewrite local_elemwise_alloc replaces Elemwise{mul,no_inplace}.0 with Elemwise{mul,no_inplace}.0
optimizer: rewrite local_elemwise_alloc replaces Elemwise{mul,no_inplace}.0 with Elemwise{mul,no_inplace}.0
optimizer: rewrite local_elemwise_alloc replaces Elemwise{mul,no_inplace}.0 with Elemwise{mul,no_inplace}.0
optimizer: rewrite local_elemwise_alloc replaces Elemwise{mul,no_inplace}.0 with Elemwise{mul,no_inplace}.0
...

There are still flaws in the rewrite logic, as indicated by the new triggered assertion

First showed up in pymc-devs/pymc#5369

TODO:

  • Rewrite docstrings to make clear what is going on in this rewrite
  • Make sure tests cover all rewrite cases
  • Consider removing the special Dimshuffle of Alloc logic

@codecov
Copy link

codecov bot commented Jan 19, 2022

Codecov Report

Merging #767 (7e3792c) into main (240827c) will increase coverage by 0.00%.
The diff coverage is 95.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #767   +/-   ##
=======================================
  Coverage   78.35%   78.35%           
=======================================
  Files         152      152           
  Lines       47685    47687    +2     
  Branches    10881    10880    -1     
=======================================
+ Hits        37364    37367    +3     
+ Misses       7773     7772    -1     
  Partials     2548     2548           
Impacted Files Coverage Δ
aesara/tensor/basic_opt.py 85.22% <95.00%> (+0.07%) ⬆️

@brandonwillard brandonwillard added bug Something isn't working important labels Jan 19, 2022
Copy link
Member

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a MWE that reproduces the issue in the description (and explicitly shows the problematic graphs)?

@ricardoV94
Copy link
Contributor Author

Can you add a MWE that reproduces the issue in the description?

Done

@brandonwillard
Copy link
Member

brandonwillard commented Jan 19, 2022

I'm starting to see that local_elemwise_alloc is one of those old rewrites that needs to be completely rewritten.

For instance, some of its DimShuffle parts seem like they should be independent rewrites (or even existing ones).

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Jan 19, 2022

I don't quite get how this rewrite is supposed to work now that I think about it. If we start with Elemwise(Alloc1(x, shape), Alloc2(y, shape)) and then remove Alloc1 because we have Alloc2 -> Elemwise(x, Alloc2(y, shape)), how do we prevent Alloc2 from being removed in the next iteration? If we do that, we lost all shape information, no? In other words, these new tests should actually fail:

(lambda x, y: at.mul(at.alloc(x, 15, 1), at.alloc(y, 15, 1))),
(lambda x, y: at.mul(at.alloc(x, 15, 2), at.alloc(y, 15, 2))),

Nevermind, that's what the asssert_op_in does

@brandonwillard
Copy link
Member

brandonwillard commented Jan 19, 2022

I don't quite get how this rewrite is supposed to work now that I think about it. If we start with Elemwise(Alloc1(x, shape), Alloc2(y, shape)) and then remove Alloc1 because we have Alloc2 -> Elemwise(x, Alloc2(y, shape)), how do we prevent Alloc2 from being removed in the next iteration? If we do that, we lost all shape information, no? In other words, these new tests should actually fail:

(lambda x, y: at.mul(at.alloc(x, 15, 1), at.alloc(y, 15, 1))),
(lambda x, y: at.mul(at.alloc(x, 15, 2), at.alloc(y, 15, 2))),

Nevermind, that's what the asssert_op_in does

Yeah, the docstring and logic are both unnecessarily confusing and convoluted. That—combined with a lack of quality (e.g. direct) unit tests—almost necessarily gives rise to these kinds of issues.

This is yet another example of why we need to move toward formalizing our rewrites and their descriptions. In this case, the docstring uses its own notation (e.g. y.TensorType(BROADCAST CONDITION)), and that only serves to make things more confusing.

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Jan 19, 2022

I cleaned a bit the test logic in a separate commit. The two test conditions with the dimshuffle fail (now that I removed "specialize" from the include), because those Alloc remain in the Assert that is added around y.

This is not necessarily a failure, but suggests the original rewrite was not quite able to deliver what it meant to?

@brandonwillard
Copy link
Member

This is not necessarily a failure, but suggests the original rewrite was not quite able to deliver what it meant to?

Wouldn't surprise me, but, while we're fixing this, let's at least make our assumptions/goals very clear in the docstring.

Also, we need to do something about that DimShuffle complexity. I think it's nearly doubling the logic in this rewrite and it doesn't even need to be there.

Is there a DimShuffle lift/removal that can be applied to Allocs independently. For example, DimShuffle{"x", ...}(Alloc(y, s1, ...)) -> Alloc(y, 1, s1, ...)?

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Jan 19, 2022

This is not necessarily a failure, but suggests the original rewrite was not quite able to deliver what it meant to?

Wouldn't surprise me, but, while we're fixing this, let's at least make our assumptions/goals very clear in the docstring.

Also, we need to do something about that DimShuffle complexity. I think it's nearly doubling the logic in this rewrite and it doesn't even need to be there.

Is there a DimShuffle lift/removal that can be applied to Allocs independently. For example, DimShuffle{"x", ...}(Alloc(y, s1, ...)) -> Alloc(y, 1, s1, ...)?

Yes that could be done, but according to the comments in the docstrings that would be less efficient in the general case (where Alloc stay in the graph). I have no basis to believe evaluate that statement though

@brandonwillard
Copy link
Member

brandonwillard commented Jan 19, 2022

Yes that could be done, but according to the comments in the docstrings that would be less efficient in the general case (where Alloc stay in the graph)

I don't see how that could be a real problem (for at least the case of adding broadcastable dimensions to an Alloc), and, if it actually was realistically more efficient, wouldn't we want an independent "specialization" rewrite that changes all such Allocs into DimShuffles? It wouldn't make any sense to limit that to the context of this rewrite only.

Here's a quick and dirty check:

def alloc(v, shape):
    res = np.empty(shape)
    res[...] = v
    return res


%timeit alloc(2, (1000, 1000))
258 µs ± 8.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit alloc(2, (1, 1, 1, 1, 1000, 1000, 1, 1, 1, 1))
278 µs ± 19.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.expand_dims(alloc(2, (1000, 1000)), (0, 1, 2, 3, -1, -2, -3, -4))
282 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


%timeit alloc(2, (5000, 5000))
21.7 ms ± 2.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit alloc(2, (1, 1, 1, 1, 5000, 5000, 1, 1, 1, 1))
21 ms ± 166 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit np.expand_dims(alloc(2, (5000, 5000)), (0, 1, 2, 3, -1, -2, -3, -4))
21.4 ms ± 764 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The old codebase has some confusing "efficiency"-based comments like that—ones that I'm not sure were ever necessarily true or relevant.

Regardless of the practical truth of these kinds of statements, if we consider the entire context (e.g. the ability to add complementary rewrites/canonicalizations, the fact that we do multiple passes and have distinct rewrite stages, etc.), they may be completely irrelevant.

@brandonwillard
Copy link
Member

brandonwillard commented Jan 19, 2022

The docstring of local_canonicalize_alloc mentions this Alloc-to-DimShuffle topic and says Theano/Theano#4072 provides reasons for why it's needed. I don't see anything about performance—or any real justifications other than that it might've helped resolve a different problem by generating graphs that some logic could already handle.

In other words, the rewrites and canonical form for Alloc are a mess.

Update: Yes, that's definitely the reason, because local_useless_dimshuffle_in_reshape won't be applied unless those Allocs are (partially) converted to DimShuffles.

The rewrite would sometimes return a new graph identical to the original,
resulting in divergence.
@brandonwillard brandonwillard force-pushed the fix_local_elemwise_alloc_bug branch from e4c025f to 7e3792c Compare January 20, 2022 02:13
@brandonwillard brandonwillard merged commit 7c1558a into aesara-devs:main Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working important
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants