Fix #6542: Pickle line sizes in TASTy #10363

nicolasstucki · 2020-11-17T16:39:05Z

This description is outdated, see comment for second commit bellow.

Line Sizes section

This PR add a new section called LineSizes to the tasty format. ~~This section starts with an Int containing the number of lines followed by an~~ This section h sequence of Int with the size of each line. It requires around one byte for each line (2 bytes if line longer than 127 characters) and extra section info.

This information is then used to compute the line offsets in SourceFile to be able to translate Span offsets into line/column numbers.

TASTy file size increase

Numbers before removing the initial Int for line count

The jar of the standard library increased form 6,647,453 bytes to 6,893,064 bytes. This is roughly a 3.7% increase in file size.

For sources in scala.collection.immutable the non-compressed TASTy files increased cumulative sizes increased from 964,421 bytes to 1,100,330 bytes (filesystem wise in Mac). This is roughly a 14% increase in file size.

Here is a subset of byte sizes of some sections (not including headers and names)

Source	AST	Positions	Line sizes	Docs
List.tasty	7343	7671	735	4609
Vector.tsaty	5542	5192	2617	2158
Map.tasty	14939	13566	844	1143
SortedSet.tasty	509	389	64	263
Set.tasty	6657	6237	456	675
Range.tasty	9321	8830	798	4531
Iterable.tasty	413	328	45	205
ArraySeq.tasty	15879	14674	851	1917

compiler/src/dotty/tools/dotc/core/tasty/LineSizesPickler.scala

nicolasstucki · 2020-11-19T15:37:50Z

test performance please

nicolasstucki · 2020-11-20T12:54:59Z

test performance please

smarter · 2020-11-21T01:49:47Z

I don't know if it would be better or worse in term of size, but it might be worth trying to store the line number as part of the PositionPickler, using delta-coding like it's already doing for the start/end/span.

nicolasstucki · 2020-11-21T07:56:40Z

The delta coding wouldn't save us anything as we are already close to one byte per line. We could save on the section name and size of we join them.

nicolasstucki · 2020-11-21T10:58:59Z

The second commit puts the line sizes directly in the Positions section. In this version, we pickle the number of lines followed by the line sizes at the start of the Positions section. It requires around one byte for each line (2 bytes if line longer than 127 characters) and usually needs 2 bytes for the number of lines (1 if the source file has 127 lines or less).

This simplifies considerably the code required to implement this addition.

The jar of the standard library increased from 6,647,453 bytes 6,851,555 bytes (instead of the previous 6,893,064 bytes). This is a 3.1% (instead of 3.7%) increase in jar file size.

nicolasstucki · 2020-11-23T13:03:18Z

test performance please

dottybot · 2020-11-23T13:04:59Z

performance test scheduled: 3 job(s) in queue, 1 running.

smarter · 2020-11-23T14:23:25Z

The delta coding wouldn't save us anything as we are already close to one byte per line.

Since the lien number delta between two trees should be 0 or 1 in most cases, it's possible we could find a better encoding that uses less than one byte per line, but I'm not sure.

sjrd · 2020-11-23T16:28:09Z

One thing to pay attention to, if not already done: the different kinds of newlines and how they interact with offsets. In particular, the Windows \r\n takes 2 bytes, so it's a single line jump but with a shift of 2 bytes/Chars in offsets.

liufengyun · 2020-11-23T16:28:09Z

It would be good to add neg test for inlined trees from separately compiled files to make sure it points to the correct source file and line.

dottybot · 2020-11-23T17:50:39Z

Performance test finished successfully:

Visit http://dotty-bench.epfl.ch/10363/ to see the changes.

Benchmarks is based on merging with master (bbbcfde)

nicolasstucki · 2020-11-24T16:54:34Z

Added a regression test in project/scripts/cmdTests.

liufengyun · 2020-11-24T17:09:21Z

compiler/src/dotty/tools/dotc/core/tasty/TreeUnpickler.scala

-            sourceFile.setLineIndices(lineSizesUnpickler.lineIndices)
+        posUnpicklerOpt match
+          case Some(posUnpickler) =>
+            sourceFile.setLineIndicesFromLineSizes(posUnpickler.lineSizes)


There is a potential issue here: the relative path of a file is not a unique identifier of the file, there might be path conflicts in the Scala ecosystem.

If we also store the hash of source files, that would be better.

How would you use that hash?

Maybe ctx.getSource(path) can take hash as an argument? I haven't thought through the details.

The hash would be on the contents of the source? Then we would need to read all the sources eagerly when we add them to the context. That may be quite expensive.

When we compile files, we already have the file contents in memory.

Actually, it is only loaded when we first access something the depends on the source https://github.com/lampepfl/dotty/blob/master/compiler/src/dotty/tools/dotc/util/SourceFile.scala#L43.

When we compile Scala files, the contents for the Scala files will be forced.

nicolasstucki · 2020-11-24T17:16:29Z

Could someone with a Windows machine help me check if the following behaves the same?

Reproduction steps

# cp tests/neg/i6371/A_1.scala A.scala     
# cp tests/neg/i6371/B_2.scala B.scala    
# mkdir outA
# sbt   
sbt:scala3> scalac -d outA A.scala
sbt:scala3> scalac -print-tasty outA/A.tasty
Names:
  ...
  ...

Trees:
start = Addr(0), base = 300, current = Addr(0), end = Addr(146)
146 bytes of AST, base = Addr(0)
  ...
  ...

 97 position bytes:
   lines: 7
   line sizes: 10, 42, 18, 13, 3, 1, 0
   positions:
         0: 0 .. 92
         5: 0 .. 0
         8: 0 .. 0
        24: 0 .. 92
        27: 13 .. 90
        34: 7 .. 7
        41: 13 .. 13
        45: 13 .. 13
        49: 13 .. 13
        54: 7 .. 7
        58: 7 .. 7
        67: 7 .. 7
        71: 7 .. 7
        76: 13 .. 90
        79: 28 .. 34
        82: 31 .. 34
        88: 37 .. 41
        96: 44 .. 90
        98: 44 .. 45
       100: 58 .. 72
       104: 63 .. 64
       110: 66 .. 69
       116: 70 .. 72
       117: 77 .. 86
       119: 82 .. 83
       125: 84 .. 86


 0 comment bytes:

sbt:scala3> scalac -classpath outA -d outA B.scala
-- Error: B.scala:2:7 ----------------------------------------------------------
2 |  A.foo("aa") // error
  |  ^^^^^^^^^^^
  |this case is unreachable since type String is not a subclass of class Integer
  | This location contains code that was inlined from A.scala:3
1 error found

sbt:scala3> exit
# rm A.scala 
# sbt
sbt:scala3> scalac -classpath outA -d outA B.scala
-- Error: B.scala:2:7 ----------------------------------------------------------
2 |  A.foo("aa") // error
  |  ^^^^^^^^^^^
  |this case is unreachable since type String is not a subclass of class Integer
  | This location contains code that was inlined from A.scala:3
1 error found

liufengyun · 2020-11-24T17:17:40Z

Could someone with a Windows machine help me check if the following behaves the same?

Reproduction steps

I just sent you by email.

nicolasstucki · 2020-11-25T07:43:01Z

Executing the test on Windows yielded the same result

Reproduction steps (Windows)

dotty-2>copy tests\neg\i6371\A_1.scala A.scala
        1 file(s) copied.
dotty-2>copy tests\neg\i6371\B_2.scala B.scala
        1 file(s) copied.
dotty-2>mkdir outA
dotty-2>sbt
sbt:scala3> scalac -d outA A.scala

Names:
   
   ...

Trees:
start = Addr(0), base = 300, current = Addr(0), end = Addr(146)
146 bytes of AST, base = Addr(0)

   ...

 97 position bytes:
   lines: 7
   line sizes: 10, 42, 18, 13, 3, 1, 0
   positions:
         0: 0 .. 92
         5: 0 .. 0
         8: 0 .. 0
        24: 0 .. 92
        27: 13 .. 90
        34: 7 .. 7
        41: 13 .. 13
        45: 13 .. 13
        49: 13 .. 13
        54: 7 .. 7
        58: 7 .. 7
        67: 7 .. 7
        71: 7 .. 7
        76: 13 .. 90
        79: 28 .. 34
        82: 31 .. 34
        88: 37 .. 41
        96: 44 .. 90
        98: 44 .. 45
       100: 58 .. 72
       104: 63 .. 64
       110: 66 .. 69
       116: 70 .. 72
       117: 77 .. 86
       119: 82 .. 83
       125: 84 .. 86


 0 comment bytes:

sbt:scala3> scalac -classpath outA -d outA B.scala
-- Error: B.scala:2:7 ----------------------------------------------------------
2 |  A.foo("aa") // error
  |  ^^^^^^^^^^^
  |this case is unreachable since type String is not a subclass of class Integer
  | This location contains code that was inlined from A.scala:3
1 error found

sbt:scala3> exit

dotty-2>del A.scala

dotty-2>sbt
-- Error: B.scala:2:7 ----------------------------------------------------------
2 |  A.foo("aa") // error
  |  ^^^^^^^^^^^
  |this case is unreachable since type String is not a subclass of class Integer
  | This location contains code that was inlined from A.scala:3
1 error found

odersky

Otherwise LGTM

tasty/src/dotty/tools/tasty/TastyFormat.scala

compiler/src/dotty/tools/dotc/core/tasty/PositionPickler.scala

nicolasstucki · 2020-11-27T09:51:03Z

@odersky I also changed writeInt/readInt to writeNat/readNat, not only in the docs.

nicolasstucki self-assigned this Nov 17, 2020

nicolasstucki force-pushed the add-line-sizes-in-TASTy branch 2 times, most recently from b256b21 to 6e47481 Compare November 18, 2020 10:06

nicolasstucki changed the title ~~Pickle line sizes in TASTy~~ Fix #6542: Pickle line sizes in TASTy Nov 18, 2020

nicolasstucki linked an issue Nov 18, 2020 that may be closed by this pull request

Line numbers from TASTy #6542

Closed

nicolasstucki requested a review from odersky November 18, 2020 12:16

nicolasstucki assigned odersky and unassigned nicolasstucki Nov 18, 2020

nicolasstucki marked this pull request as ready for review November 18, 2020 12:16

anatoliykmetyuk added this to the 3.0.0-RC1 milestone Nov 18, 2020

nicolasstucki commented Nov 18, 2020

View reviewed changes

compiler/src/dotty/tools/dotc/core/tasty/LineSizesPickler.scala Outdated Show resolved Hide resolved

nicolasstucki force-pushed the add-line-sizes-in-TASTy branch from 71a6c12 to 65afab0 Compare November 20, 2020 08:26

nicolasstucki force-pushed the add-line-sizes-in-TASTy branch 3 times, most recently from 45a2479 to 5db4c1f Compare November 21, 2020 16:02

nicolasstucki force-pushed the add-line-sizes-in-TASTy branch from 5db4c1f to 9c326f2 Compare November 24, 2020 16:51

nicolasstucki added 2 commits November 24, 2020 17:52

Fix scala#6542: Pickle line sizes in TASTy

c36a3e0

Put line sizes in the Positions section

c7106ef

nicolasstucki force-pushed the add-line-sizes-in-TASTy branch from 9c326f2 to c7106ef Compare November 24, 2020 16:53

liufengyun reviewed Nov 24, 2020

View reviewed changes

odersky approved these changes Nov 26, 2020

View reviewed changes

tasty/src/dotty/tools/tasty/TastyFormat.scala Outdated Show resolved Hide resolved

compiler/src/dotty/tools/dotc/core/tasty/PositionPickler.scala Outdated Show resolved Hide resolved

Use Nats to pickle line sizes in TASTy

361a680

nicolasstucki force-pushed the add-line-sizes-in-TASTy branch from a9976e9 to 361a680 Compare November 27, 2020 09:30

nicolasstucki merged commit a129373 into scala:master Nov 30, 2020

nicolasstucki deleted the add-line-sizes-in-TASTy branch November 30, 2020 08:27

griggt mentioned this pull request Feb 9, 2021

Fix #10883: Override line and column in NoSourcePosition #11362

Merged

olhotak mentioned this pull request Apr 6, 2022

IndexOutOfBounds in SourceFile.scala contents() #14795

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #6542: Pickle line sizes in TASTy #10363

Fix #6542: Pickle line sizes in TASTy #10363

nicolasstucki commented Nov 17, 2020 •

edited

Loading

nicolasstucki commented Nov 19, 2020

nicolasstucki commented Nov 20, 2020

smarter commented Nov 21, 2020

nicolasstucki commented Nov 21, 2020

nicolasstucki commented Nov 21, 2020 •

edited

Loading

nicolasstucki commented Nov 23, 2020

dottybot commented Nov 23, 2020

smarter commented Nov 23, 2020

sjrd commented Nov 23, 2020

liufengyun commented Nov 23, 2020

dottybot commented Nov 23, 2020

nicolasstucki commented Nov 24, 2020

liufengyun Nov 24, 2020

nicolasstucki Nov 25, 2020

liufengyun Nov 25, 2020

nicolasstucki Nov 25, 2020

liufengyun Nov 25, 2020

nicolasstucki Nov 26, 2020

liufengyun Nov 26, 2020

nicolasstucki commented Nov 24, 2020

liufengyun commented Nov 24, 2020

nicolasstucki commented Nov 25, 2020

odersky left a comment

nicolasstucki commented Nov 27, 2020

Fix #6542: Pickle line sizes in TASTy #10363

Fix #6542: Pickle line sizes in TASTy #10363

Conversation

nicolasstucki commented Nov 17, 2020 • edited Loading

Line Sizes section

TASTy file size increase

nicolasstucki commented Nov 19, 2020

nicolasstucki commented Nov 20, 2020

smarter commented Nov 21, 2020

nicolasstucki commented Nov 21, 2020

nicolasstucki commented Nov 21, 2020 • edited Loading

nicolasstucki commented Nov 23, 2020

dottybot commented Nov 23, 2020

smarter commented Nov 23, 2020

sjrd commented Nov 23, 2020

liufengyun commented Nov 23, 2020

dottybot commented Nov 23, 2020

nicolasstucki commented Nov 24, 2020

liufengyun Nov 24, 2020

Choose a reason for hiding this comment

nicolasstucki Nov 25, 2020

Choose a reason for hiding this comment

liufengyun Nov 25, 2020

Choose a reason for hiding this comment

nicolasstucki Nov 25, 2020

Choose a reason for hiding this comment

liufengyun Nov 25, 2020

Choose a reason for hiding this comment

nicolasstucki Nov 26, 2020

Choose a reason for hiding this comment

liufengyun Nov 26, 2020

Choose a reason for hiding this comment

nicolasstucki commented Nov 24, 2020

liufengyun commented Nov 24, 2020

nicolasstucki commented Nov 25, 2020

odersky left a comment

Choose a reason for hiding this comment

nicolasstucki commented Nov 27, 2020

nicolasstucki commented Nov 17, 2020 •

edited

Loading

nicolasstucki commented Nov 21, 2020 •

edited

Loading