Skip to content

Conversation

@Bidek56
Copy link
Contributor

@Bidek56 Bidek56 commented Jun 16, 2022

Describe your changes

Upgrading:

  • Spark -> 3.3
  • Hadoop -> 3.3
  • Scala -> 2.13
  • Java -> 17

Breaking change: spylon-kernel not longer works since the package is no longer maintained.
For anyone looking for a Scala kernel, I would suggest looking at Almond

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes

@Bidek56 Bidek56 changed the title Spark->3.3,Hadoop->3,Scala->2.13,Java->17 Upgrading Spark->3.3,Hadoop->3,Scala->2.13,Java->17 Jun 16, 2022
@Bidek56
Copy link
Contributor Author

Bidek56 commented Jun 17, 2022

@mathbunnyru Let me know what you think we should do with spylon-kernel which is causing tests to fail.
Because spylon-kernel is a dead package, I think we should remove it and update our documentation to use Almond instead.
Thanks

@bjornjorgensen
Copy link
Contributor

bjornjorgensen commented Jun 17, 2022

Almond dont support spark 3.x almond-sh/almond#929

And you can remove from line 51 to 55

@bjornjorgensen
Copy link
Contributor

@Bidek56 her I made some changes to this PR

Copy link
Member

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest doing it this way:

  1. Remove scala in a separate PR first. We can't keep using software which is not updated and starts breaking things. If possible, it would be nice to have a replacement, like almond or sth else (when almond or something similar is working in our environment).

  2. Then we can merge this PR.

@bjornjorgensen
Copy link
Contributor

@Bidek56 Will you do this? If not than I can do it. If you will try I can help you, just ping me..
@mathbunnyru Nice plan. I havent found any scala kernel that works with spark 3.X.

@bjornjorgensen
Copy link
Contributor

I have created a branch for this upgrade just to test it.

docker imgae for pyspark
and
docker imgae for all-spark-notebook

I test the scala kernel and only get Intitializing Scala interpreter ...

@Bidek56
Copy link
Contributor Author

Bidek56 commented Jun 21, 2022

Just a note that this PR will break Build an Image with a Different Version of Spark for Spark < 3.2 since Apache was not providing tgz files with Scala version appended.

@rhazegh rhazegh mentioned this pull request Jul 2, 2022
Copy link
Member

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bidek56 as you mentioned, could you please update the docs, so they will work after this change is merged?

@mathbunnyru
Copy link
Member

I merged removing spylon-kernel. I will have to do a few full rebuilds, to make sure it's been removed in all images.
Please, merge master to this branch as well.

@Bidek56
Copy link
Contributor Author

Bidek56 commented Jul 4, 2022

The build has failed because it is running Spark 3.2.1 not 3.3.
Anyone knows where it's getting the wrong version of Spark from? Thx

The error message references Spark 3.2.1
file:/usr/local/spark-3.2.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.2.1.jar

but in the container using the PR code, I see Spark 3.3.0 and my tests pass

(base) jovyan@30cf4fa1f45a:~$ ls -l /usr/local/  

total 40 
drwxr-xr-x 1 root root 4096 Jul  4 14:54 bin
drwxr-xr-x 2 root root 4096 May 31 15:43 etc
drwxr-xr-x 2 root root 4096 May 31 15:43 games
drwxr-xr-x 2 root root 4096 May 31 15:43 include
drwxr-xr-x 1 root root 4096 Jul  4 11:45 lib
lrwxrwxrwx 1 root root    9 May 31 15:43 man -> share/man
drwxr-xr-x 2 root root 4096 May 31 15:46 sbin
drwxr-xr-x 1 root root 4096 Jul  4 11:46 share
lrwxrwxrwx 1 root root   33 Jul  4 14:54 spark -> spark-3.3.0-bin-hadoop3-scala2.13
drwxr-xr-x 1 root root 4096 Jun  9 18:47 spark-3.3.0-bin-hadoop3-scala2.13
drwxr-xr-x 2 root root 4096 May 31 15:43 src
(base) jovyan@30cf4fa1f45a:~$ ls -l /usr/local/spark  
lrwxrwxrwx 1 root root 33 Jul  4 14:54 /usr/local/spark -> spark-3.3.0-bin-hadoop3-scala2.13  
(base) jovyan@30cf4fa1f45a:~$  

@mathbunnyru
Copy link
Member

@Bidek56 this happens because when building all-spark-notebook instead of using freshly build pyspark-notebook we use old image which we pull.
This is true for other images as well and it's my nightmare 😂
I've just finished my work to completely rewrite CI system from scratch.
Please, take a look here:
#1703

After I merge this PR, I think we will be able to easily merge your PR.

@bjornjorgensen
Copy link
Contributor

Yes, you don't have to do anything with the tests.
And you can remove

# Fix Spark installation for Java 11 and Apache Arrow library
# see: https://github.com/apache/spark/pull/27356, https://spark.apache.org/docs/latest/#downloading
RUN cp -p "${SPARK_HOME}/conf/spark-defaults.conf.template" "${SPARK_HOME}/conf/spark-defaults.conf" && \
echo 'spark.driver.extraJavaOptions -Dio.netty.tryReflectionSetAccessible=true' >> "${SPARK_HOME}/conf/spark-defaults.conf" && \
echo 'spark.executor.extraJavaOptions -Dio.netty.tryReflectionSetAccessible=true' >> "${SPARK_HOME}/conf/spark-defaults.conf"

@mathbunnyru
Copy link
Member

@Bidek56 could you merge master once again?
I merged my PR, so theoretically, everything should work fine in yours.

Copy link
Member

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor documentation comments only.

Copy link
Member

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge this one when tests pass.
Thank you for your contribution @Bidek56 👍

@alexander-manley
Copy link
Contributor

All checks have passed. Excellent!
33 successful checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants