Skip to content

Remove redundant integer suffixes. #22501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eddyb opened this issue Feb 18, 2015 · 22 comments
Closed

Remove redundant integer suffixes. #22501

eddyb opened this issue Feb 18, 2015 · 22 comments
Labels
C-cleanup Category: PRs that clean code up or issues documenting cleanup. P-low Low priority

Comments

@eddyb
Copy link
Member

eddyb commented Feb 18, 2015

As #21511 has shown, there are lots of cases in which integer suffixes aren't needed at all.
Usually audits are needed to make sure the intended type is inferred, and disabling the fallback to i32 wouldn't really work for tests and whatnot.

It should be easy to modify the compiler to dump the equivalent of --pretty=typed (with pretty-printing showing no suffixes on expressions) into some temporary directory, then compare the results from two make check runs, one with all the suffixes removed.

@eddyb eddyb added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. C-cleanup Category: PRs that clean code up or issues documenting cleanup. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Feb 18, 2015
@gchp
Copy link
Contributor

gchp commented Feb 19, 2015

I'd like to look into this one, though I'm not sure how to modify the compiler in the way that you mention. Any suggestions on where to start?

@eddyb
Copy link
Member Author

eddyb commented Feb 20, 2015

The entry-point into the pretty printer seems to be this call in driver::pretty::pretty_print_input.
call_with_pp_support handles --pretty=typed by running analysis passes itself, which isn't exactly what we want.
By combining those two, we get

let src_name = source_name(input);
let src = sess.codemap().get_filemap(&src_name)
                        .src.as_bytes().to_vec();
// For this you'd have to make TypedAnnotation's analysis field public.
let annotation = TypedAnnotation { analysis: analysis };
pprust::print_crate(sess.codemap(),
                    sess.diagnostic(),
                    ast_map.krate(),
                    src_name.to_string(),
                    &mut MemReader::new(src),
                    box old_io::File::create(&pp_out).unwrap(),
                    &annotation
                    true).unwrap();
let analysis = annotation.analysis;

which can be inserted at this point in compile_input.

pp_out_path is still missing, but I think the following method should produce an unique path for each rustc invocation during the build:

let pp_out = outputs.with_extension("rs");
let pp_out_file = pp_out.as_str().expect("non-utf8 pp output file").replace("/", "--");
// If this doesn't work, hardcode an absolute path here.
let pp_out_path = Path::new(concat!(env!(CFG_BUILD_DIR), "int-suffix-pp")).join(pp_out_file);

Aside from that, you would also need to hack the pretty-printer so it doesn't output integer suffixes - should be easy to treat them just like the unsuffixed cases.

@eddyb
Copy link
Member Author

eddyb commented Feb 27, 2015

Since I didn't hear back from @gchp, I implemented what I mentioned in a branch of mine - I intend to open a PR soon.
I did go a bit crazy and cleaned up a few things in the process. I test it like so:

rm -rf test-pretty-dump; mkdir test-pretty-dump
# compile stage1 rustc first, with no flags, to avoid errors as stage0 doesn't have them
make x86_64-unknown-linux-gnu/stage1/bin/rustc
make check-stage1 RUSTFLAGS="-Z unstable-options -Z pretty-keep-going -Z pretty-dump-dir=test-pretty-dump --xpretty=typed,unsuffixed_literals"

This gives me a bunch of files (I didn't run this build for long - an almost full one with tests had over 2600):

home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--alloc-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--arena-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--collections-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--core-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--flate-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--getopts-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--graphviz-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--libc-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--log-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--rand-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--rustc_bitflags-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--serialize-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--std-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--term-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--unicode-4e7c5e5c.rs

An example from libcollections:

// source
static TAG_CONT_U8: u8 = 128u8;
// pretty-printed
static TAG_CONT_U8: u8 = (128: u8);

The flag-based approach has two big problems, though: it messes up pretty tests and it wouldn't work with cargo.
I'll attempt adding support for an env var of the form RUSTC_PRETTY_DUMP=mode:dir that is equivalent to --xpretty=mode -Z pretty-keep-going -Z pretty-dump-dir=dir, would be ignored when --pretty is already passed, and it would also work with cargo.

My invocation would thus become:

# no bootstrapping issues!
make check-stage1 RUSTC_PRETTY_DUMP="typed,unsuffixed_literals:test-pretty-dump"

@eddyb
Copy link
Member Author

eddyb commented Mar 1, 2015

I've rebased my branch and added the env var functionality - oh and I think I have an almost-perfect way to remove integer literal suffixes everywhere:

sed -i -r -e 's/\b(0x[0-9a-fA-F_]*[0-9a-fA-F]|0o[0-7_]*[0-7]|0b[01_]*[01]|[0-9]([0-9_]*[0-9])?)_*([ui](8|16|32|64|s(ize)?)?|uint|int)/\1/g' **.rs

@eddyb eddyb removed the E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. label Mar 1, 2015
@eddyb
Copy link
Member Author

eddyb commented Mar 1, 2015

This is working out! I have some stats (cc @huonw @aturon @alexcrichton):

  • same type before and after suffix removal: 1120 lines (eddyb/rust@19354d9)
  • harmless fallback to i32: 302 lines (eddyb/rust@3586312)
  • remaining (dangerous to remove) suffixes: 627 lines

How I did the diffs:

# make a fresh git working dir
cd ..; mkdir rust-unsuffix-data; cd rust-unsuffix-data; git init
# add a commit for the "before" dataset (sed removes comments which are only diff noise)
cp ../rust/before-int-suffix-removal/* .; sed -i -e 's|//.*$||' *.rs; git add *; git commit -m 'before'
# add a commit for the "after" dataset, on top of "before" (same sed to remove comments)
git rm *; cp ../rust/after-int-suffix-removal/* .; sed -i -e 's|//.*$||' *.rs; git add *; git commit -m 'after'

@aturon
Copy link
Member

aturon commented Mar 2, 2015

@eddyb Just want to say, this is awesome 👍

Manishearth added a commit to Manishearth/rust that referenced this issue Mar 5, 2015
…chton

 Automatic has-same-types testing methodology can be found in rust-lang#22501.
Because most of them don't work with `--pretty=typed`, compile-fail tests were manually audited.

r? @aturon
Manishearth added a commit to Manishearth/rust that referenced this issue Mar 5, 2015
…chton

 Automatic has-same-types testing methodology can be found in rust-lang#22501.
Because most of them don't work with `--pretty=typed`, compile-fail tests were manually audited.

r? @aturon
@tamird
Copy link
Contributor

tamird commented Mar 26, 2015

closed by #22994 yeah?

@joliv
Copy link
Contributor

joliv commented Apr 11, 2015

I would think. Could someone with authority close this?

@huonw
Copy link
Member

huonw commented Apr 12, 2015

Thanks!

@huonw huonw closed this as completed Apr 12, 2015
@eddyb
Copy link
Member Author

eddyb commented Apr 16, 2015

Somehow I forgot to comment that this was never finished, #22994 contained merely the easy cases.

@eddyb eddyb reopened this Apr 16, 2015
@Havvy
Copy link
Contributor

Havvy commented Aug 12, 2015

So, what are the hard cases? Is this bug still E-Easy?

@cramertj
Copy link
Member

cramertj commented May 5, 2016

@eddyb Just found this-- what's the current status? Still easy?

@eddyb
Copy link
Member Author

eddyb commented May 5, 2016

@cramertj Well, I gave up on trying to remove integer suffixes from the language, before 1.0 I was convinced we could replace them with type ascription.
And the tools I used to automatically check actual impact are beyond rebasing.

One can still clean up existing code by hand, of course.

@brson brson removed the E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. label Jun 27, 2016
@brson
Copy link
Contributor

brson commented Jun 27, 2016

Making any more progress on this will requiring hacking up new tools in the compiler to detect them. It would be a nice generally to be able to detect unused suffixes (maybe clippy could do it cc @llogiq @Manishearth).

@brson brson added the P-low Low priority label Jun 27, 2016
@Manishearth
Copy link
Member

Works for me in clippy. @eddyb, could you file an issue there with heuristics on how to detect an unnecessary suffix?

@eddyb
Copy link
Member Author

eddyb commented Jun 28, 2016

@brson It doesn't really work out within a lint. Although it could be possible to take out each suffix and pass the function through typeck to see how it infers, but that seems slow and painful.

@llogiq
Copy link
Contributor

llogiq commented Jun 28, 2016

I see two options:

  1. Heuristics for easy cases – for example with binary ops and integers, we know that the types are (mostly) the same (well, as long as Rust keeps its stance on integer types vs. ops), we know that comparisons return bools etc. We know that within arrays, all elements share the same type. Within struct literals, fields have a known type (however substitutions can make it generic). Same goes for function arguments. So by implementing some simple heuristics, we can have some quick wins without requiring full inference. Not a 100% solution, though.
  2. I wonder if type inference stores the constraints it uses to determine the type somewhere (e.g. "this binding must be usize because it's used in a function foo(..) that takes usize and is added to bar elsewhere which is also usize"). It's at least present during error reporting. Making this information available to late lints would enable us to implement needless integer literal suffix detection (or even needless type annotation) without incurring the cost of building an artificial method and feeding it to the type checker.

@eddyb
Copy link
Member Author

eddyb commented Jun 28, 2016

@llogiq That information is not there during error reporting. Obligations are put in a graph of sorts so you can tell what caused an obligation to be in the graph, but sources of type inference information are lost, as the set of inference variables is directly updated.

@brson
Copy link
Contributor

brson commented Jun 28, 2016

Although it could be possible to take out each suffix and pass the function through typeck to see how it infers, but that seems slow and painful.

This is what I was imagining. Doesn't matter if it's slow if it's the only way to get that useful info!

@llogiq
Copy link
Contributor

llogiq commented Jun 28, 2016

@eddyb: Then it's probably heuristics to speed up easy cases and type inference for harder ones.

@steveklabnik
Copy link
Member

Triage: it seems this issue has mutated since the original issue, is it still relevant? I know that the idea, long ago, was that type ascription could replace suffixes in general, but that never came to fruition.

@Mark-Simulacrum
Copy link
Member

Since it seems this issue has migrated to a feature request for a lint that warns on integer suffixes that aren't used in code, I'm going to close it as we want those to go through RFCs and it also seems like without that lint there's not going to be traction within the compiler codebase either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-cleanup Category: PRs that clean code up or issues documenting cleanup. P-low Low priority
Projects
None yet
Development

No branches or pull requests