-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix #6542: Pickle line sizes in TASTy #10363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #6542: Pickle line sizes in TASTy #10363
Conversation
b256b21
to
6e47481
Compare
compiler/src/dotty/tools/dotc/core/tasty/LineSizesPickler.scala
Outdated
Show resolved
Hide resolved
test performance please |
71a6c12
to
65afab0
Compare
test performance please |
I don't know if it would be better or worse in term of size, but it might be worth trying to store the line number as part of the PositionPickler, using delta-coding like it's already doing for the start/end/span. |
The delta coding wouldn't save us anything as we are already close to one byte per line. We could save on the section name and size of we join them. |
The second commit puts the line sizes directly in the This simplifies considerably the code required to implement this addition. The jar of the standard library increased from |
45a2479
to
5db4c1f
Compare
test performance please |
performance test scheduled: 3 job(s) in queue, 1 running. |
Since the lien number delta between two trees should be 0 or 1 in most cases, it's possible we could find a better encoding that uses less than one byte per line, but I'm not sure. |
One thing to pay attention to, if not already done: the different kinds of newlines and how they interact with offsets. In particular, the Windows |
It would be good to add neg test for inlined trees from separately compiled files to make sure it points to the correct source file and line. |
Performance test finished successfully: Visit http://dotty-bench.epfl.ch/10363/ to see the changes. Benchmarks is based on merging with master (bbbcfde) |
5db4c1f
to
9c326f2
Compare
9c326f2
to
c7106ef
Compare
Added a regression test in |
sourceFile.setLineIndices(lineSizesUnpickler.lineIndices) | ||
posUnpicklerOpt match | ||
case Some(posUnpickler) => | ||
sourceFile.setLineIndicesFromLineSizes(posUnpickler.lineSizes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential issue here: the relative path of a file is not a unique identifier of the file, there might be path conflicts in the Scala ecosystem.
If we also store the hash of source files, that would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you use that hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ctx.getSource(path)
can take hash as an argument? I haven't thought through the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hash would be on the contents of the source? Then we would need to read all the sources eagerly when we add them to the context. That may be quite expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we compile files, we already have the file contents in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it is only loaded when we first access something the depends on the source https://github.com/lampepfl/dotty/blob/master/compiler/src/dotty/tools/dotc/util/SourceFile.scala#L43.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we compile Scala files, the contents for the Scala files will be forced.
Could someone with a Windows machine help me check if the following behaves the same? Reproduction steps
|
I just sent you by email. |
Executing the test on Windows yielded the same result Reproduction steps (Windows)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
a9976e9
to
361a680
Compare
@odersky I also changed |
This description is outdated, see comment for second commit bellow.
Line Sizes section
This PR add a new section called
LineSizes
to the tasty format.This section starts with anThis section h sequence ofInt
containing the number of lines followed by anInt
with the size of each line. It requires around one byte for each line (2 bytes if line longer than 127 characters) and extra section info.This information is then used to compute the line offsets in
SourceFile
to be able to translateSpan
offsets into line/column numbers.TASTy file size increase
The jar of the standard library increased form
6,647,453 bytes
to6,893,064 bytes
. This is roughly a3.7%
increase in file size.For sources in
scala.collection.immutable
the non-compressed TASTy files increased cumulative sizes increased from964,421 bytes
to1,100,330 bytes
(filesystem wise in Mac). This is roughly a14%
increase in file size.Here is a subset of byte sizes of some sections (not including headers and names)