Skip to content

More integer allocations from StringLike#times after scala 2.12.8 upgrade #11547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jyanJing opened this issue May 29, 2019 · 4 comments
Closed
Milestone

Comments

@jyanJing
Copy link

jyanJing commented May 29, 2019

Hi Scala team,

We did a recent upgrade from scala 2.11.12 to 2.12.8, and we noticed that by running on 2.12.8 library, a lot more integers are allocated from calling StringLike#times method on top of a high stack trace(the stack trace we have seen by running async profiler has 46 levels, please see attachment 1).

However, when we try to run bench marks of just the times method in scala 2.11.12 and scala 2.12.8, scala 2.12.8 gave slightly better allocation rate and no integers allocated from running async profiler(please see attachment 2). This issue only happens when the times method is called at the end of a high stack trace.

Do you have any insights on what might cause this issue?

Thanks,
Jing


Attachment 1: Allocation profiling from calling StringLike#times class on top of a high stack trace(by Async Profiler)

--- **5636448352 bytes (31.43%), 3604 samples**
-----  ----------------
  [ 0] java.lang.Integer
  [ 1] java.lang.Integer.valueOf
  [ 2] scala.runtime.BoxesRunTime.boxToInteger
  [ 3] scala.collection.immutable.Range.foreach
  [ 4] scala.collection.immutable.StringLike.$times
  [ 5] scala.collection.immutable.StringLike.$times$
  [ 6] scala.collection.immutable.StringOps.$times
  [ 7] com.twitter.finagle.integ.sim.HttpStreamingClientThroughputSim.$anonfun$request$1
  [ 8] com.twitter.finagle.integ.sim.HttpStreamingClientThroughputSim$$Lambda$1140.1636868723.apply
  [ 9] scala.collection.immutable.Stream$.$anonfun$fill$1
  [10] scala.collection.immutable.Stream$$$Lambda$2178.1984970133.apply
  [11] scala.collection.immutable.Stream$Cons.tail
  [12] scala.collection.immutable.Stream$Cons.tail
  [13] scala.collection.$plus$colon$.unapply
  [14] com.twitter.io.SeqReader.read
  [15] com.twitter.io.Reader$$anon$1.read
  [16] com.twitter.finagle.netty4.http.Netty4StreamTransport$.continue$1
  [17] com.twitter.finagle.netty4.http.Netty4StreamTransport$.$anonfun$streamOut$3
  [18] com.twitter.finagle.netty4.http.Netty4StreamTransport$$$Lambda$2233.1867897542.apply
  [19] com.twitter.util.Future.$anonfun$before$1
  [20] com.twitter.util.Future$$Lambda$897.199150625.apply
  [21] com.twitter.util.Promise$FutureTransformer.liftedTree1$1
  [22] com.twitter.util.Promise$FutureTransformer.k
  [23] com.twitter.util.Promise$Transformer.apply
  [24] com.twitter.util.Promise$WaitQueue.com$twitter$util$Promise$WaitQueue$$run
  [25] com.twitter.util.Promise$WaitQueue$$anon$1.run
  [26] com.twitter.concurrent.LocalScheduler$Activation.run
  [27] com.twitter.concurrent.LocalScheduler$Activation.submit
  [28] com.twitter.concurrent.LocalScheduler.submit
  [29] com.twitter.concurrent.Scheduler$.submit
  [30] com.twitter.util.Promise$WaitQueue.runInScheduler
  [31] com.twitter.util.Promise.continue
  [32] com.twitter.util.Promise$Responder.respond
  [33] com.twitter.util.Promise$Responder.respond$
  [34] com.twitter.util.Promise.respond
  [35] com.twitter.util.Future.proxyTo
  [36] com.twitter.finagle.http2.transport.StreamTransportFactory$StreamTransport$$anon$8.run
  [37] io.netty.util.concurrent.AbstractEventExecutor.safeExecute
  [38] io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks
  [39] io.netty.channel.epoll.EpollEventLoop.run
  [40] io.netty.util.concurrent.SingleThreadEventExecutor$5.run
  [41] io.netty.util.internal.ThreadExecutorMap$2.run
  [42] java.util.concurrent.ThreadPoolExecutor.runWorker
  [43] java.util.concurrent.ThreadPoolExecutor$Worker.run
  [44] com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run
  [45] io.netty.util.concurrent.FastThreadLocalRunnable.run
  [46] java.lang.Thread.run

Attachment 2: Scala 2.11.12 vs. Scala 2.12.8 allocation profiling of calling StringLike#times alone(by Async Profiler)

[Scala 2.11.12]

      **bytes  percent  samples  top**
  ----------  -------  -------  ---
  5989679736   58.85%     5418  char[]
  4056970512   39.86%     3662  byte[]
    50281024    0.49%       49  java.lang.String
    17885768    0.18%       16  com.twitter.io.Buf$ByteArray
    16199176    0.16%       15  scala.collection.immutable.Stream$$anonfun$fill$1
    13047024    0.13%       12  java.lang.StringBuilder
     7424176    0.07%        7  scala.Tuple2
     6471168    0.06%        6  com.twitter.io.SeqReader
     5770056    0.06%        6  com.twitter.util.Promise
     5005960    0.05%        5  scala.Some
     4565120    0.04%        4  com.twitter.util.ConstFuture
     1905920    0.02%        2  scala.collection.immutable.StringOps
     1037272    0.01%      493  java.util.TreeMap$Entry
      756656    0.01%      361  java.util.TreeMap$KeyIterator
      387136    0.00%      184  java.lang.management.MemoryUsage[]
      250560    0.00%      120  java.util.Collections$UnmodifiableCollection$1
      202440    0.00%      119  char[] (out)
      152512    0.00%       60  byte[] (out)
       33280    0.00%       16  java.util.TreeMap$KeySet
        3024    0.00%       63  java.util.TreeMap (out)
        2112    0.00%        1  java.util.TreeMap
          32    0.00%        1  com.twitter.io.Buf$ByteArray (out)
          24    0.00%        1  com.twitter.io.SeqReader (out)
          24    0.00%        1  java.lang.String (out)
          24    0.00%        1  scala.Tuple2 (out)
          16    0.00%        1  scala.collection.immutable.StringOps (out)

[Scala 2.12.8]

 **bytes  percent  samples  top**
  ----------  -------  -------  ---
  4938207856   57.09%     4449  char[]
  3599997040   41.62%     3229  byte[]
    22220392    0.26%       13  scala.collection.immutable.Range
    20024544    0.23%       19  com.twitter.io.Buf$ByteArray
    19762472    0.23%       19  java.lang.StringBuilder
    15934968    0.18%        9  com.twitter.util.Promise
    12808456    0.15%       11  scala.collection.immutable.StringOps
    10042952    0.12%        5  scala.collection.immutable.Stream$Cons
     7823048    0.09%        9  scala.collection.mutable.StringBuilder
     1049896    0.01%      499  java.util.TreeMap$Entry
      645568    0.01%      308  java.util.TreeMap$KeyIterator
      326120    0.00%      155  java.lang.management.MemoryUsage[]
      177600    0.00%       95  char[] (out)
      143280    0.00%       59  byte[] (out)
      125280    0.00%       60  java.util.Collections$UnmodifiableCollection$1
        8256    0.00%        3  java.lang.String
        4560    0.00%       95  java.util.TreeMap (out)
          24    0.00%        1  scala.collection.immutable.StringLike$$Lambda$7 (out)
@hrhino hrhino added this to the Backlog milestone May 29, 2019
@eed3si9n
Copy link
Member

I don't know if the behavior changed in 2.12, but if you're concerned about range allocation, it's been fixed in 2.13 as scala/scala#6535. Unless there's something actionable, I think we should close this issue before it gets buried and forgotten.

@SethTisue
Copy link
Member

(someone motivated could look into whether a 2.12.x backport is feasible)

@SethTisue SethTisue modified the milestones: Backlog, 2.13.0-M4 May 31, 2019
@jyanJing
Copy link
Author

jyanJing commented Jun 1, 2019

Hi Scala team,

I am able to reproduce it by running a micro benchmark, it turns out that this may not be related to the depth of the stack trace. Below is a copy of the StringLikeBenchmark result for scala 2.11.12 and scala 2.12.8.

Do you think this is fixed in Scala 2.13?

Scala 2.12.8

[info] Benchmark                                                                       Mode  Cnt      Score      Error   Units
[info] StringLikeBenchmark.testScalaTimesAllocation                                   thrpt    3  40473.556 ±  359.551   ops/s
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.alloc.rate                    thrpt    3   1561.913 ±   17.292  MB/sec
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.alloc.rate.norm               thrpt    3  42504.001 ±    0.001    B/op
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.churn.PS_Eden_Space           thrpt    3   1562.997 ±  116.615  MB/sec
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.churn.PS_Eden_Space.norm      thrpt    3  42533.441 ± 2706.615    B/op
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.churn.PS_Survivor_Space       thrpt    3      0.168 ±    0.682  MB/sec
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.churn.PS_Survivor_Space.norm  thrpt    3      4.560 ±   18.573    B/op
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.count                         thrpt    3    531.000             counts
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.time                          thrpt    3    272.000                 ms

Scala 2.11.12

[info] Benchmark                                                                       Mode  Cnt      Score        Error   Units
[info] StringLikeBenchmark.testScalaTimesAllocation                                   thrpt    3  36641.087 ± 216107.718   ops/s
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.alloc.rate                    thrpt    3    452.909 ±   2674.209  MB/sec
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.alloc.rate.norm               thrpt    3  13616.001 ±      0.008    B/op
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.churn.PS_Eden_Space           thrpt    3    453.514 ±   2645.985  MB/sec
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.churn.PS_Eden_Space.norm      thrpt    3  13641.822 ±   2214.770    B/op
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.churn.PS_Survivor_Space       thrpt    3      0.063 ±      0.490  MB/sec
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.churn.PS_Survivor_Space.norm  thrpt    3      1.860 ±      4.258    B/op
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.count                         thrpt    3    287.000               counts
[info] StringLikeBenchmark.testScalaTimesAllocation:·gc.time                          thrpt    3    235.000                   ms

To reproduce:

git clone https://github.com/jyanJing/Scala_StringLike_Bug.git
cd Scala_StringLike_Bug
sbt "+jmh:run -wi 3 -i 3 -f 1 -prof gc StringLikeBenchmark"

@som-snytt
Copy link

som-snytt commented Feb 17, 2023

For the record, the supplied benchmark shows that 2.13 improves on 2.11.

The performance problem on 2.12 is that it no longer inlined specialized Range.foreach, so you get boxed indexes passed to the unspecialized function. (Probably that was a known issue by someone.) Also, what function is that?

scala> for (i <- 0 until n) buf append toString // print
   scala.Predef.intWrapper(0).until(n).foreach[StringBuilder](((i: Int) => buf.append(toString()))) // : Unit

scala> (0 until n).foreach(buf append toString) // print
   scala.Predef.intWrapper(0).until(n).foreach[Char](buf.append(toString())) // : Unit

The second one is https://scalapuzzlers.com/#pzzlr-027

I wonder if there is a lint for that yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants