-
Notifications
You must be signed in to change notification settings - Fork 455
Adding a working Docker setup for developing sparkmagic #361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
b3aa0ee
Adding a working Docker setup for developing sparkmagic
apetresc 89947f9
Pre-configure the ~/.sparkmagic/config.json
apetresc e8042ad
Add R to Livy container
apetresc f1e5490
Add more detail to the README container section
apetresc 663b5d3
Add dev_mode build-arg.
apetresc 649538a
Adding missing kernels
apetresc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| FROM jupyter/base-notebook:d0b2d159cc6c | ||
|
|
||
| ARG dev_mode=false | ||
|
|
||
| USER $NB_USER | ||
|
|
||
| # Install sparkmagic - if DEV_MODE is set, use the one in the host directory. | ||
| # Otherwise, just install from pip. | ||
| COPY hdijupyterutils hdijupyterutils/ | ||
| COPY autovizwidget autovizwidget/ | ||
| COPY sparkmagic sparkmagic/ | ||
| RUN if [ "$dev_mode" = "true" ]; then \ | ||
| cd hdijupyterutils && pip install . && cd ../ && \ | ||
| cd autovizwidget && pip install . && cd ../ && \ | ||
| cd sparkmagic && pip install . && cd ../ ; \ | ||
| else pip install sparkmagic ; fi | ||
|
|
||
| RUN mkdir /home/$NB_USER/.sparkmagic | ||
| COPY sparkmagic/example_config.json /home/$NB_USER/.sparkmagic/config.json | ||
| RUN sed -i 's/localhost/spark/g' /home/$NB_USER/.sparkmagic/config.json | ||
| RUN jupyter nbextension enable --py --sys-prefix widgetsnbextension | ||
| RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/sparkkernel | ||
| RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/pysparkkernel | ||
| RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/pyspark3kernel | ||
| RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/sparkrkernel | ||
| RUN jupyter serverextension enable --py sparkmagic | ||
|
|
||
| USER root | ||
| RUN chown $NB_USER /home/$NB_USER/.sparkmagic/config.json | ||
| RUN rm -rf hdijupyterutils/ autovizwidget/ sparkmagic/ | ||
| USER $NB_USER |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| FROM gettyimages/spark:2.1.0-hadoop-2.7 | ||
|
|
||
| RUN apt-get update && apt-get install -yq --no-install-recommends --force-yes \ | ||
| git \ | ||
| openjdk-7-jdk \ | ||
| maven \ | ||
| python2.7 \ | ||
| python3.4 \ | ||
| r-base \ | ||
| r-base-core && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| ENV LIVY_BUILD_VERSION livy-server-0.3.0 | ||
| ENV LIVY_APP_PATH /apps/$LIVY_BUILD_VERSION | ||
| ENV LIVY_BUILD_PATH /apps/build/livy | ||
| ENV PYSPARK_PYTHON python2.7 | ||
| ENV PYSPARK3_PYTHON python3.4 | ||
|
|
||
| RUN mkdir -p /apps/build && \ | ||
| cd /apps/build && \ | ||
| git clone https://github.com/cloudera/livy.git && \ | ||
| cd $LIVY_BUILD_PATH && \ | ||
| git checkout v0.3.0 && \ | ||
| mvn -DskipTests -Dspark.version=$SPARK_VERSION clean package && \ | ||
| ls -al $LIVY_BUILD_PATH && ls -al $LIVY_BUILD_PATH/assembly && ls -al $LIVY_BUILD_PATH/assembly/target && \ | ||
| unzip $LIVY_BUILD_PATH/assembly/target/$LIVY_BUILD_VERSION.zip -d /apps && \ | ||
| rm -rf $LIVY_BUILD_PATH && \ | ||
| mkdir -p $LIVY_APP_PATH/upload && \ | ||
| mkdir -p $LIVY_APP_PATH/logs | ||
|
|
||
|
|
||
| EXPOSE 8998 | ||
|
|
||
| CMD ["/apps/livy-server-0.3.0/bin/livy-server"] | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -54,6 +54,39 @@ See [Pyspark](examples/Pyspark Kernel.ipynb) and [Spark](examples/Spark Kernel.i | |
|
|
||
| jupyter serverextension enable --py sparkmagic | ||
|
|
||
|
|
||
| ## Docker | ||
|
|
||
| The included `docker-compose.yml` file will let you spin up a full | ||
| sparkmagic stack that includes a Jupyter notebook with the appropriate | ||
| extensions installed, and a Livy server backed by a local-mode Spark instance. | ||
| (This is just for testing and developing sparkmagic itself; in reality, | ||
| sparkmagic is not very useful if your Spark instance is on the same machine!) | ||
|
|
||
| In order to use it, make sure you have [Docker](https://docker.com) and | ||
| [Docker Compose](https://docs.docker.com/compose/) both installed, and | ||
| then simply run: | ||
|
|
||
| docker-compose build | ||
| docker-compose up | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would also be a good idea to add instructions for exiting...
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
|
|
||
| You will then be able to access the Jupyter notebook in your browser at | ||
| http://localhost:8888. Inside this notebook, you can configure a | ||
| sparkmagic endpoint at http://spark:8998. This endpoint is able to | ||
| launch both Scala and Python sessions. You can also choose to start a | ||
| wrapper kernel for Scala, Python, or R from the list of kernels. | ||
|
|
||
| To shut down the containers, you can interrupt `docker-compose` with | ||
| `Ctrl-C`, and optionally remove the containers with `docker-compose | ||
| down`. | ||
|
|
||
| If you are developing sparkmagic and want to test out your changes in | ||
| the Docker container without needing to push a version to PyPI, you can | ||
| set the `dev_mode` build arg in `docker-compose.yml` to `true`, and then | ||
| re-build the container. This will cause the container to install your | ||
| local version of autovizwidget, hdijupyterutils, and sparkmagic. Make | ||
| sure to re-run `docker-compose build` before each test run. | ||
|
|
||
| ### Server extension API | ||
|
|
||
| #### `/reconnectsparkmagic`: | ||
|
|
@@ -125,4 +158,4 @@ To run unit tests, run: | |
|
|
||
| nosetests hdijupyterutils autovizwidget sparkmagic | ||
|
|
||
| If you want to see an enhancement made but don't have time to work on it yourself, feel free to submit an [issue](https://github.com/jupyter-incubator/sparkmagic/issues) for us to deal with. | ||
| If you want to see an enhancement made but don't have time to work on it yourself, feel free to submit an [issue](https://github.com/jupyter-incubator/sparkmagic/issues) for us to deal with. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| version: "3" | ||
| services: | ||
| spark: | ||
| image: jupyter/sparkmagic-livy | ||
| build: | ||
| context: . | ||
| dockerfile: Dockerfile.spark | ||
| hostname: spark | ||
| ports: | ||
| - "8998:8998" | ||
| jupyter: | ||
| image: jupyter/sparkmagic | ||
| build: | ||
| context: . | ||
| dockerfile: Dockerfile.jupyter | ||
| args: | ||
| dev_mode: "false" | ||
| links: | ||
| - spark | ||
| ports: | ||
| - "8888:8888" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to create a python3 environment and set the
PYSPARK3_PYTHONvariable as explained in https://github.com/cloudera/livy/blob/511a05f2282cd85a457017cc5a739672aaed5238/README.rst#pyspark3I'd recommend installing Anaconda to create the environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would be the same for
PYSPARK_PYTHON, but if it's not set, it will just take the system'spython, which is fine as long as it's a2.7.Xversion.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something similar to this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the other changes, but I'm not sure I understand the need for this one. The base image I'm using for the Livy container (
gettyimages/spark:2.1.0-hadoop-2.7) already has a Python3 installation which seems to be working just fine with Livy and sparkmagic as-is.Do you just mean creating a virtualenv for it? I can do that, though I'm not sure it's necessary if we're just running inside a single-purpose container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't clear. I meant we need to install the following two kernels, besides the kernels already being installed:
You can just add
sparkkernelto the list of kernels to install, and it should just work.pysparkkernelis different in that it should run in a python 2.7 environment whereaspyspark3kernelwould run in a python 3 environment. Because the container's python installation is python 3, we need to tell livy where to find the python 2 installation in the image (created via virtualenv or anaconda or something else). The way livy finds the two installations is via the env variables i mentioned above.The point would be to try to show users that one can have two python versions running fine side by side and how to set it up. This is some extra work, and I would be OK with you just adding
sparkkernelfor now and me creating an issue to addpysparkkernelto the docker image to tackle later.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean then :) Okay, let me add that real quick.