Skip to content

Increase initialCapacity for HashSet in ExtractDependencies.scala #18219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -447,7 +447,7 @@ private class ExtractDependenciesCollector extends tpd.TreeTraverser { thisTreeT
// Avoid cycles by remembering both the types (testcase:
// tests/run/enum-values.scala) and the symbols of named types (testcase:
// tests/pos-java-interop/i13575) we've seen before.
val seen = new mutable.HashSet[Symbol | Type]
val seen = new mutable.HashSet[Symbol | Type](initialCapacity = 128, loadFactor = mutable.HashSet.defaultLoadFactor)
Copy link
Contributor

@nicolasstucki nicolasstucki Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

128 seems a bit too excessive. If we look at the histogram of maximum sizes we see that 64 is enough to cover most cases (99%). Even 32 we would cover 87% of cases and have an extra allocation for the 13% (larger cases).

Screenshot 2023-07-17 at 10 08 39

I measured it when running scala3-bootstrapped/compile after a clean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolasstucki Nice histogram! I wanted to build one too but gave up eventually. How did you do it?
Do your percentages consider the loadFactor of 75%? Should I update the initialCapacity to 64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  def addTypeDependency(tpe: Type)(using Context): Unit = {
    val traverser = new TypeDependencyTraverser {
      def addDependency(symbol: Symbol) = addMemberRefDependency(symbol)
    }
    traverser.traverse(tpe)
    println("seen: " + traverser.seen.size)
  }

  def addPatMatDependency(tpe: Type)(using Context): Unit = {
    val traverser = new TypeDependencyTraverser {
      def addDependency(symbol: Symbol) =
        if (!ignoreDependency(symbol) && symbol.is(Sealed)) {
          val usedName = symbol.zincMangledName
          addUsedName(usedName, UseScope.PatMatTarget)
        }
    }
    traverser.traverse(tpe)
    println("seen: " + traverser.seen.size)

  }

sbt "clean; scala3-bootstrapped/compile" > sbtOutput.txt then filter out seen: from that file and import the numbers in google sheets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not consider the load factor. My percentages only take into account the size of the set. I would go for 64 if we want to minimize those allocations.

def traverse(tp: Type): Unit = if (!seen.contains(tp)) {
seen += tp
tp match {
Expand Down