Add an AccessPath abstraction and formalize memory access #33121

atrick · 2020-07-25T20:38:26Z

Begin to formalize the model for valid memory access in SIL. Ignoring ownership, every access is a def-use chain in three parts:

object root -> formal access base -> memory operation address

AccessPath abstracts over this path and standardizes the identity of a
memory access throughout the optimizer. This abstraction is the basis for a new AccessPathVerification.

With that verification, we now have all the properties we need for the type of analysis requires for exclusivity enforcement, but now generalized for any memory analysis. This is suitable for an extremely lightweight analysis with no side data structures. We currently have a massive amount of ad-hoc memory analysis throughout SIL, which is incredibly unmaintainable, bug-prone, and not performance-robust. We can begin taking advantage of this verifably complete model to solve that problem.

The properties this gives us are:

Access analysis must be complete over memory operations: every memory operation needs a recognizable valid access. An access can be unidentified only to the extent that it is rooted in some non-address type and we can prove that it is at least not part of an access to a nominal class or global property. Pointer provenance is also required for future IRGen-level bitfield optimizations.

Access analysis must be complete over address users: for an identified object root all memory accesses including subobjects must be discoverable.

Access analysis must be symmetric: use-def and def-use analysis must
be consistent.

AccessPath is merely a wrapper around the existing accessed-storage utilities and IndexTrieNode. Existing passes already very succesfully use this approach, but in an ad-hoc way. With a general utility we can:

update passes to use this approach to identify memory access,
reducing the space and time complexity of those algorithms.
implement an inexpensive on-the-fly, debug mode address lifetime analysis
implement a lightweight debug mode alias analysis
ultimately improve the power, efficiency, and maintainability of
full alias analysis
make our type-based alias analysis sensistive to the access path

varungandhi-apple · 2020-07-26T23:58:20Z

Would it be helpful to have a high-level description of access paths in docs/SIL.rst (or somewhere else)?

atrick · 2020-07-29T05:34:27Z

@varungandhi-apple yes, to the extent that I want to add constraints to SIL, that should be documented in SIL.rst. Independently, we should have documentation as a starting point for the central abstractions we use in the SIL optimizer. I haven't been doing that, but it looks like SIL programmers manual is the right place https://github.com/apple/swift/blob/master/docs/SILProgrammersManual.md. I'll add something there.

All of this is secondary to good file-level comments though, which is where I think anything of substantial detail should go. We're sorely lacking adequate comments IMO

eeckstein · 2020-08-03T12:24:21Z

@atrick Somehow I'm missing the essentials here.

Introducing new abstractions always comes with a cost. It's very important for an abstraction like this to have a small and clean API. Is the API here the ==/!= operators? So that one can use it to check if two accesses are aliased or not?
Maybe I just missed it, but it's definitely worth documenting that.

How does AccessPath relate to ProjectionPath? Should it be a replacement for ProjectionPath?

If yes, then I'm strongly in favor of replacing ProjectionPath with AccessPath before this PR is merged. Because otherwise it will never be done and we already have too many utilities which are piling up in the code base.

If no, it would be worth documenting what's the difference and when to use which utility.

atrick · 2020-09-14T07:42:42Z

@eeckstein @varungandhi-apple @airspeedswift
I wrote documentation in SILProgrammersManual.md.
https://github.com/apple/swift/blob/4bd8dfdb9a46c6eb471571ac8798c955c0941ca3/docs/SILProgrammersManual.md#accessedstorage-and-accesspath
Although, I think the file and class-level comments in MemAccessUtils.h are quite clear now.

Yes, the basic API for the data type is == and !=; it simply identifies a memory location within an access. Importantly, it's a 3-word value that functions as a hash key. Most of the important API comes from the AccessedStorage that it contains. However, as you can read in documentation, or from the class definition, other important APIs are: AccessPath::contains(subPath), AccessPath::mayOverlap(otherPath), and AccessPath::collectUses().

We currently have nothing like AccessPath::collectUses, which makes it possible to verify the model and will allow us to replace ad-hoc, unstable, and bug-prone code throughout the compiler.

AccessPath is also a complete abstraction, unlike anything we have today. It handles all SIL patterns that I consider to be well-formed. This includes dynamic index offsets, negative index offsets, and arbitrary index chains. It also models all types of pointer casting and boxing/unboxing patterns that should be supported in SIL.

The goal of this PR is not to add another utility. It's about creating a verified correct and complete model for SIL memory access. This needs to exist so we can make progress on several things that have come up in the past few months (at least if we don't want to make the current mess worse):

Making SemanticARC-related analyses and verification complete as we
continue to write more and move them later in the pipeline.
Adding new forms AliasAnalysis without creating even more holes in the logic.
Fixing LICM perforance regressions (I went ahead and implemented this as an
example in a separate PR, since it's similar to how AccessPath could be used in RLE/DSE).
Fixing holes in the RLE/DSE logic (I've been pushing back on this
until AccessPath is checked in).
Writing new diagnostic passes that have a light-weight flow-sensitive
escape analysis.

I don't think RLE/DSE should ever have been checked in with the current design for efficiency reasons, and I made that clear at the time. It's true that simply substituting AccessPath would reduce the space and time complexity, but others have been interested in working on it, and my current priority is not fixing compile-time and memory usage issues (the arbitrary thresholds that are in place limit that overhead).

ProjectionPath is different in that it tries to fully abstract over the SIL instructions instead of simply augmenting SIL. I don't think that goal fits well with any of the current uses. So, I agree it should be straightforward to remove ProjectionPath over time. I believe the additional functionality of working with projections should be built alongside and on-top of AccessPath, rather than building it into AccessPath. For example, I implemented a simple projection cloner cleanly on top of AccessPath as a separate utility.

The less urgent but more interesting need for AccessPath is not to replace ProjectionPath, but to replace all of the ad-hoc anlayses scattered throughout the code that rely on forming a complete transitive closure of non-escaping uses of memory locations. For example, if StackPromotion used this, then we wouldn't need all the complexity of use-points in EscapeAnalysis. Passes that badly need a standard, verified model of known-memory-uses are CopyForwarding, COWArrayOpt, ArrayPropertyOpt. However, my current interest is less about optimization and more about writing light-weight diagnostic passes.

I don't think that the AccessPath should directly provide an alias analysis interface. I actually think it should remain minimal. The key is to define a simple model that can be fully verified, then build efficient, purposeful utilities on top of that. The problem with most of our existing utilites is that they aren't proper abstractions at all. They don't model any well-defined concepts and aren't complete or verified. They try to do everything for everyone without really knowing what they're doing. All passes have slightly different needs, so the utilities become misleading for most uses by hiding all of their assumptions behind an interface.

It's very straightforward to provide an alias analysis interface on top of AccessPath::contains() and AccessPath::mayOverlap(). Various passes already do their own light-weight alias analysis and flow-sensitive escape analysis (exclusivity passes do this using AccessedStorage::isDistinctFrom). AccessPath is a big step in standardizing that. In the near term, I agree we should provide a separate light-weight alias/escape analysis interface for diagnostics as a thin layer on top of AccessPath.

atrick · 2020-09-14T07:45:57Z

@swift-ci test

swift-ci · 2020-09-14T07:48:52Z

Build failed
Swift Test Linux Platform
Git Sha - 4bd8dfdb9a46c6eb471571ac8798c955c0941ca3

swift-ci · 2020-09-14T07:49:14Z

Build failed
Swift Test OS X Platform
Git Sha - 4bd8dfdb9a46c6eb471571ac8798c955c0941ca3

atrick · 2020-09-14T22:45:31Z

Something important I forgot to mention: one of the original motivations for implementing AccessPathVerification is to get closer to enabling access marker verification by default. Currently exclusivity verification is under a special flag, only run by a special bot that no one looks at.

...and avoid reallocation. This is immediately necessary for LICM, in addition to its current uses. I suspect this could be used by many passes that work with addresses. RLE/DSE should absolutely migrate to it.

Change ProjectionIndex for ref_tail_addr to std::numeric_limits<int>::max(); This is necessary to disambiguate the tail elements from ref_element_addr field zero.

To clarify and unify logic, improve precision, and behave consistently with other code that does the same thing.

Things that have come up recently but are somewhat blocked on this: - Moving AccessMarkerElimination down in the pipeline - SemanticARCOpts correctness and improvements - AliasAnalysis improvements - LICM performance regressions - RLE/DSE improvements Begin to formalize the model for valid memory access in SIL. Ignoring ownership, every access is a def-use chain in three parts: object root -> formal access base -> memory operation address AccessPath abstracts over this path and standardizes the identity of a memory access throughout the optimizer. This abstraction is the basis for a new AccessPathVerification. With that verification, we now have all the properties we need for the type of analysis requires for exclusivity enforcement, but now generalized for any memory analysis. This is suitable for an extremely lightweight analysis with no side data structures. We currently have a massive amount of ad-hoc memory analysis throughout SIL, which is incredibly unmaintainable, bug-prone, and not performance-robust. We can begin taking advantage of this verifably complete model to solve that problem. The properties this gives us are: Access analysis must be complete over memory operations: every memory operation needs a recognizable valid access. An access can be unidentified only to the extent that it is rooted in some non-address type and we can prove that it is at least *not* part of an access to a nominal class or global property. Pointer provenance is also required for future IRGen-level bitfield optimizations. Access analysis must be complete over address users: for an identified object root all memory accesses including subobjects must be discoverable. Access analysis must be symmetric: use-def and def-use analysis must be consistent. AccessPath is merely a wrapper around the existing accessed-storage utilities and IndexTrieNode. Existing passes already very succesfully use this approach, but in an ad-hoc way. With a general utility we can: - update passes to use this approach to identify memory access, reducing the space and time complexity of those algorithms. - implement an inexpensive on-the-fly, debug mode address lifetime analysis - implement a lightweight debug mode alias analysis - ultimately improve the power, efficiency, and maintainability of full alias analysis - make our type-based alias analysis sensistive to the access path

The main goal of this checker is to find the access scope. To do that, it needs to use `getAccessBegin`, which in turn needs to handle address_to_pointer. Use isAccessedStorageCast to handle more cases, do a small refactoring, and fix minor issues.

Add a flag: enable-accessed-storage-dump-uses

atrick · 2020-09-29T21:46:41Z

@swift-ci test

atrick · 2020-09-29T21:46:49Z

@swift-ci benchmark

atrick · 2020-09-29T21:46:59Z

@swift-ci test source compatibility

atrick · 2020-09-29T21:49:29Z

These docs have gone through one round of review. Now I want to get this AccessPathVerification checked in ASAP. I have a set of new kinds of verification to land, but don't want to do more than one per week (starting with this one) to give them each plenty of exposure.

atrick · 2020-09-29T21:50:29Z

Some of the functionality discussed in the docs will be landed in a follow-up PR to avoid introducing too many deep functional changes at once.

swift-ci · 2020-09-29T22:24:07Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	2523	4571	+81.2%	0.55x (?)
FlattenListLoop	1031	1568	+52.1%	0.66x (?)

Improvement	OLD	NEW	DELTA	RATIO
UTF8Decode_InitFromBytes	176	164	-6.8%	1.07x (?)

Code size: -O

Performance: -Osize

Improvement	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	3217	2741	-14.8%	1.17x (?)
RemoveWhereMoveInts	10	9	-10.0%	1.11x (?)
ObjectiveCBridgeStubToNSStringRef	82	76	-7.3%	1.08x (?)
DropWhileCountableRange	15	14	-6.7%	1.07x (?)
PolymorphicCalls	15	14	-6.7%	1.07x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitFromData	155	172	+11.0%	0.90x (?)
UTF8Decode_InitFromBytes	160	177	+10.6%	0.90x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

swift-ci · 2020-09-29T23:21:16Z

Build failed
Swift Test Linux Platform
Git Sha - d5590801e44b51f96c371dc8d81e619687ae71b0

swift-ci · 2020-09-30T00:00:42Z

Build failed
Swift Test OS X Platform
Git Sha - d5590801e44b51f96c371dc8d81e619687ae71b0

atrick force-pushed the add-accesspath branch from d4d85f9 to b6d7e36 Compare September 4, 2020 19:00

atrick force-pushed the add-accesspath branch from b6d7e36 to 4bd8dfd Compare September 14, 2020 06:52

atrick changed the title ~~[WIP] Add an AccessPath abstraction and formalize memory access~~ Add an AccessPath abstraction and formalize memory access Sep 14, 2020

atrick force-pushed the add-accesspath branch from 4bd8dfd to f9a407f Compare September 14, 2020 07:45

atrick force-pushed the add-accesspath branch from f9a407f to f635c3d Compare September 14, 2020 07:50

atrick mentioned this pull request Sep 18, 2020

Use AccessPath in LICM and split loads for combine load/store hoisting #33987

Merged

atrick force-pushed the add-accesspath branch from f635c3d to 26ce46b Compare September 29, 2020 18:54

atrick added 10 commits September 29, 2020 11:57

Add indexTrieRoot to the SILModule to share across Analyses.

15ca864

...and avoid reallocation. This is immediately necessary for LICM, in addition to its current uses. I suspect this could be used by many passes that work with addresses. RLE/DSE should absolutely migrate to it.

Change IndexTrie to allow signed indices

d97acdd

Change Projection to support signed indices.

1280630

Change ProjectionIndex for ref_tail_addr to std::numeric_limits<int>::max(); This is necessary to disambiguate the tail elements from ref_element_addr field zero.

Fix a pointer addition bug in SwiftReflectionTest.

2d35653

Fix ValueTracking isUniquelyIdentified to use AccessedStorage.

1803083

To clarify and unify logic, improve precision, and behave consistently with other code that does the same thing.

Update LoadCopyToLoadBorrow for MemAccessUtils API.

ff076ee

Add AccessPath support to the AccessedStorageDumper pass.

b3b7f34

Add a flag: enable-accessed-storage-dump-uses

Add AccessPathVerification pass and run it in the pipeline.

0690ef7

atrick force-pushed the add-accesspath branch 3 times, most recently from 3c6148a to d559080 Compare September 29, 2020 21:45

atrick added 5 commits September 29, 2020 22:26

accessed_storage.sil tests

2888598

accesspath_uses.sil tests

f82a83a

alias-analysis test

7b208ae

RLE tests for boundary Projection indices.

43d21cc

AccessedStorage and AccessPath documentation.

0fd6be8

atrick force-pushed the add-accesspath branch from d559080 to 0fd6be8 Compare September 30, 2020 05:33

atrick closed this Sep 30, 2020

atrick mentioned this pull request Sep 30, 2020

Add an AccessPath abstraction and formalize memory access #34126

Merged

Add an AccessPath abstraction and formalize memory access #33121

Add an AccessPath abstraction and formalize memory access #33121

Uh oh!

Conversation

atrick commented Jul 25, 2020

Uh oh!

varungandhi-apple commented Jul 26, 2020

Uh oh!

atrick commented Jul 29, 2020

Uh oh!

eeckstein commented Aug 3, 2020

Uh oh!

atrick commented Sep 14, 2020

Uh oh!

atrick commented Sep 14, 2020

Uh oh!

swift-ci commented Sep 14, 2020

Uh oh!

swift-ci commented Sep 14, 2020

Uh oh!

atrick commented Sep 14, 2020

Uh oh!

atrick commented Sep 29, 2020

Uh oh!

atrick commented Sep 29, 2020

Uh oh!

atrick commented Sep 29, 2020

Uh oh!

atrick commented Sep 29, 2020

Uh oh!

atrick commented Sep 29, 2020

Uh oh!

swift-ci commented Sep 29, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

swift-ci commented Sep 29, 2020

Uh oh!

swift-ci commented Sep 30, 2020

Uh oh!

Uh oh!