Skip to content

Overhaul interning #481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
nnethercote opened this issue Feb 4, 2022 · 4 comments
Closed
3 tasks done

Overhaul interning #481

nnethercote opened this issue Feb 4, 2022 · 4 comments
Labels
major-change A proposal to make a major change to rustc major-change-accepted A major change proposal that was accepted T-compiler Add this label so rfcbot knows to poll the compiler team

Comments

@nnethercote
Copy link

nnethercote commented Feb 4, 2022

Proposal

The compiler uses interning a lot, especially within rustc_middle::ty.

Pros of interning

  • It saves lots of memory by avoiding duplicated values.
  • Speedups by using pointer-based Eq and Hash, i.e. compare/hash the addresses of the interned value (which is guaranteed to be unique) instead of the contents. (I recently tried changing some of the highly used types like Ty and List to use contents-based Eq and Hash, and got widespread instruction count increases of up to 10%, which is a lot.)

Cons of interning

  • Interning values involves hashing their contents, which isn't free, but hopefully is balanced out by the pros above.
  • Pointer-based hashing increases non-determinism.

An obvious way to deal with interned values is to use Rust's type system to encapsulate things. Surprisingly, this isn't currently done.

  • There is rustc_data_structures::ptr_key::PtrKey<T>, which is a smart pointer that does pointer-based Eq and Hash. But it's barely used, barely documented, has a non-descriptive name, and has no protection against misuse.
  • Several frequently used types are interned and use pointer-based Eq and Hash, e.g. List<T>, Ty, Predicate.
  • Other frequently used types are interned but don't use pointer-based Eq and Hash, e.g. Region, Const.
  • It's not obvious that these types are interned from their type declaration; you have to look around nearby code to realize this. There's also not much protection against accidental misuse, e.g. comparing an interned value against a non-interned value.

I propose renaming ptr_key::PtrKey<T> as intern::Interned, improving its documentation, and providing some protection against misuse:

  • Have a zero-sized private type as a field so you can pattern match an Interned<T> but not construct an Interned<T>.
  • Have a method Interned::new_unchecked for constructing values, the name of which makes it obvious that some care is required.
  • Eventually, moving some of the interning structures (e.g. InternedSet, ShardedHashMap::intern) into this module so that most places don't even need to use Interned::new_unchecked, but instead just call an intern method directly.

I then propose using Interned<T> for most/all of the interned types in rustc_middle::ty: Ty, Predicate, Region, Const, List, etc. For some of these it will involve using a newtype instead of a typedef. E.g. this

pub type Ty<'tcx> = &'tcx TyS<'tcx>;

will become this:

pub struct Ty<'tcx>(Interned<'tcx, TyS<'tcx>>);

Pros of this change

  • Clarity: it'll be obvious which types are interned and thus use pointer-based Eq and Hash. Also, the use of newtypes allows methods to be impl'd directly on the frequently used types like Ty, rather than on the infrequently used inner types like TyS.
  • Correctness: because of the improve clarity, it'll be much harder to mess things up, e.g. comparing an interned value with a non-interned value.
  • Consistency: all the interned types will look and act the same.
  • Performance: more pointer-based Eq and Hash will give some small wins.

Cons of this change

  • It'll touch a lot of lines of code. Most of the changes are tedious; lots of sigil fiddling, pattern matching fiddling, stuff like that.
  • More layering of types means pattern matching is a bit uglier in places, where you have to unwrap an additional layer or two of newtypes.
  • Non-determinism: slightly increased due to more pointer hashing, but we're already doing a bunch of that anyway.

I think this is a clear improvement. TBH, I'm a bit surprised that the code isn't already like this. It's no fun to have to schlep through the code to work out which types are interned and whether they're using pointer-based Eq and Hash. Because of this, it took me some time to reach my current level of understanding about interning. E.g. see rust-lang/rust#91617 where I added documentation to List<T>; prior to that PR list.rs had no mention of interning or uniqueness of values, and I had to ask a bunch of questions on Zulip to understand how the type worked.

I also think this it's borderline whether this change needs an MCP. It will touch a lot of code but not in a deep way; it's mostly just using types to more clearly encode the existing behaviour. There are no user-visible changes, existing major types are not renamed, and it's not really an architectural change. But an MCP was requested so here it is 😄

I have already written rust-lang/rust#93148 that introduces Interned and changes Ty, Predicate, Region, and Const. I pretty much had to write the code to make sure that the proposal was reasonable.

Mentors or Reviewers

I'm planning to do the work myself, and I've already done a big chunk of it. @fee1-dead or @michaelwoerister are likely reviewers.

Process

The main points of the Major Change Process are as follows:

  • File an issue describing the proposal.
  • A compiler team member or contributor who is knowledgeable in the area can second by writing @rustbot second.
    • Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a -C flag, then full team check-off is required.
    • Compiler team members can initiate a check-off via @rfcbot fcp merge on either the MCP or the PR.
  • Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

@nnethercote nnethercote added major-change A proposal to make a major change to rustc T-compiler Add this label so rfcbot knows to poll the compiler team labels Feb 4, 2022
@rustbot
Copy link
Collaborator

rustbot commented Feb 4, 2022

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

cc @rust-lang/compiler @rust-lang/compiler-contributors

@rustbot rustbot added the to-announce Announce this issue on triage meeting label Feb 4, 2022
@oli-obk
Copy link
Contributor

oli-obk commented Feb 5, 2022

@rustbot second

@rustbot rustbot added the final-comment-period The FCP has started, most (if not all) team members are in agreement label Feb 5, 2022
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Feb 10, 2022
@nnethercote
Copy link
Author

The 10 days has passed. My first PR has r+, I will rebase and put it in the merge queue shortly.

@apiraino
Copy link
Contributor

@rustbot label -final-comment-period +major-change-accepted

@rustbot rustbot added major-change-accepted A major change proposal that was accepted to-announce Announce this issue on triage meeting and removed final-comment-period The FCP has started, most (if not all) team members are in agreement labels Feb 17, 2022
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-change A proposal to make a major change to rustc major-change-accepted A major change proposal that was accepted T-compiler Add this label so rfcbot knows to poll the compiler team
Projects
None yet
Development

No branches or pull requests

4 participants