Skip to content

add Spark #763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SethTisue opened this issue Aug 6, 2018 · 10 comments
Closed

add Spark #763

SethTisue opened this issue Aug 6, 2018 · 10 comments
Assignees

Comments

@SethTisue
Copy link
Member

suggested by @soronpo at
https://contributors.scala-lang.org/t/spark-as-a-scala-gateway-drug-and-the-2-12-failure/1747/44

I had listed Spark at #161 as "probably out of scope", but perhaps that could change now that Spark is on Scala 2.12

some concerns I can think of are:

  • build tool — I think Spark is built with Maven?
    • currently literally everything else in the community build is sbt based
    • dbuild is supposed to support multiple build tools, but I'm not sure what the status of Maven support is
    • does an sbt build exist?
  • overall size, complexity, length of build? I've heard it's a big build, am I wrong? two concerns here:
    • possible difficulty adding & maintaining it
    • possibly bloating the community build runtimes if the tests take a long time to run
    • would we even meaningfully be testing much unless we do cluster-based tests, which would be out of scope for the community build?

note that an alternative approach would be for Spark to add "latest Scala nightly" to their own CI matrix. this might be more practical than taking this on at our end

@SethTisue
Copy link
Member Author

in a reply to the above-linked post @adriaanm wrote "I don’t think we will have time to tackle this ourselves in the next 6 months." I agree. but perhaps there's someone in the community who would like to work on this. (either adding it here, or going the other route of adding latest-Scala-nightly to Spark's own testing setup)

@soronpo
Copy link
Contributor

soronpo commented Aug 6, 2018

note that an alternative approach would be for Spark to add "latest Scala nightly" to their own CI matrix. this might be more practical than taking this on at our end

I agree, but I'm wondering if there is some sort of middle-ground. The great thing about a large community build is that you can change the compiler and know that if the community build went through OK, then likely no bugs were added. If there are projects outside of the build and are important to the community, do we refine the scalac release process to say after community build is green, check "this", "that", and "the other thing".... ? It's kind of strange to me. Furthermore, currently the community build can point to a specific stable branch of a library. If now we rely on CI of a different project, who says the master is stable?

@SethTisue
Copy link
Member Author

If there are projects outside of the build and are important to the community, do we refine the scalac release process

ideally probably yes, but in practice that would be an expansion of scope our team can't afford. (as it stands, the community build already uses a sizable chunk of my time, and I am a sizable chunk of our 5-person team)

I'm wondering if there is some sort of middle-ground

not sure what you mean...?

or maybe you don't know either :-) but do you have any ideas?

@SethTisue
Copy link
Member Author

Furthermore, currently the community build can point to a specific stable branch of a library. If now we rely on CI of a different project, who says the master is stable?

at our current ambition level for the community build, our goal is to have some reasonably achievable level of validation of the Scala compiler and standard library.

as a side effect, we do happen to get a great deal of validation of a great deal of the library ecosystem, somewhat duplicating everyone's own CI, but that's gravy, not the goal.

at our current ambition level, all we need is a really really really big pile of Scala code and a really really really big pile of tests to run. and we have that regardless of exactly what versions of things we're building.

it's definitely natural to want something more complete, more rigorous, etc...!

@jvican
Copy link
Member

jvican commented Aug 7, 2018

We use spark in our custom community build in Bloop, so I feel entitled to answer some of your questions Seth 😄

build tool — I think Spark is built with Maven?
does an sbt build exist?

They use primarily Maven but they maintain an sbt build tool thanks to the awesome https://github.com/sbt/sbt-pom-reader.

overall size, complexity, length of build? I've heard it's a big build, am I wrong? two concerns here

I don't think it's that big. In general, you can expect it to be about the size of akka/akka.

I have a suspicion adding it to the community build would be easier than it looks like, especially because Scala dependencies in Spark are limited and cherry-picked (heck, I don't know if they even depend on a library that is not already in the community build).

@cunei
Copy link

cunei commented Aug 7, 2018

dbuild is supposed to support multiple build tools, but I'm not sure what the status of Maven support is

There used to be an embryonic build module for Maven, but it was never fully functional; the last update was in July 2012, and it was removed altogether due to bitrot in Oct 2014: lightbend-labs/dbuild@05ef639
It would be absolutely possible to revive it, but it would require a non-negligible effort, maybe a couple of months of dedicated work. It was never completed since nearly everything in the Scala ecosystem moved to sbt at some point, and Aether support allowed us to grab pre-build jars for the rest.

@SethTisue
Copy link
Member Author

I have a suspicion adding it to the community build would be easier than it looks like

okay, sounds like I ought to take an initial stab at it and see if I encounter dragons or not

@SethTisue SethTisue self-assigned this Jun 4, 2019
@SethTisue
Copy link
Member Author

I have resumed work on this. I'll report back here once I know more

initially I'm just going to grab binary JARs of dependencies from Maven Central as needed, and then later go back and see what we could be building from source instead

(oh, we can do that? we sure can! we just try hard not to, since if we do it that way, it doesn't help us test Scala prereleases, where everything really must be built from source)

@SethTisue SethTisue changed the title add Spark? add Spark Jun 4, 2019
@SethTisue
Copy link
Member Author

SethTisue commented Jun 5, 2019

the good news: though Spark's primary build is Maven-based, Spark also has an sbt build, and it seems to actually be maintained

the bad news: the sbt build isn't a vanilla sbt build; it sbt-pom-reader to define its subprojects

this causes dbuild to choke during dependency extraction with the following mysterious error message (@cunei does this ring any bells, does working around this somehow seem plausible?):

[spark:error] java.lang.ClassCastException: Cannot cast org.apache.maven.repository.internal.DefaultVersionResolver to org.sonatype.aether.impl.VersionResolver
[spark:error] 	at java.lang.Class.cast(Class.java:3369)
[spark:error] 	at org.sonatype.aether.impl.internal.DefaultServiceLocator.getServices(DefaultServiceLocator.java:161)
[spark:error] 	at org.sonatype.aether.impl.internal.DefaultServiceLocator.getService(DefaultServiceLocator.java:133)
[spark:error] 	at org.sonatype.aether.impl.internal.DefaultRepositorySystem.initService(DefaultRepositorySystem.java:142)
[spark:error] 	at org.sonatype.aether.impl.internal.DefaultServiceLocator.getServices(DefaultServiceLocator.java:181)
[spark:error] 	at org.sonatype.aether.impl.internal.DefaultServiceLocator.getService(DefaultServiceLocator.java:133)
[spark:error] 	at com.typesafe.sbt.pom.package$.newRepositorySystemImpl(package.scala:26)
[spark:error] 	at com.typesafe.sbt.pom.MvnPomResolver$.<init>(MavenPomResolver.scala:20)
[spark:error] 	at com.typesafe.sbt.pom.MvnPomResolver$.<clinit>(MavenPomResolver.scala)
[spark:error] 	at com.typesafe.sbt.pom.package$.loadEffectivePom(package.scala:41)
[spark:error] 	at com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:128)
[spark:error] 	at com.typesafe.sbt.pom.MavenProjectHelper$.makeReactorProject(MavenProjectHelper.scala:49)
[spark:error] 	at com.typesafe.sbt.pom.PomBuild$class.projectDefinitions(PomBuild.scala:28)
[spark:error] 	at SparkBuild$.projectDefinitions(SparkBuild.scala:76)

as per sbt/sbt-pom-reader#51, sbt-pom-reader doesn't offer the ability to export a vanilla sbt build

@SethTisue
Copy link
Member Author

abandoning this effort. given the nature of the Spark build, it's easier to just test it with a Scala nightly outside of the dbuild context. that will use Maven Central retrieved dependencies rather than dbuild-built dependencies, but it's fine, the likelihood of some obscure bug being missed that way are very low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants