Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions Dockerfile.jupyter
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
FROM jupyter/base-notebook:d0b2d159cc6c

ARG dev_mode=false

USER $NB_USER

# Install sparkmagic - if DEV_MODE is set, use the one in the host directory.
# Otherwise, just install from pip.
COPY hdijupyterutils hdijupyterutils/
COPY autovizwidget autovizwidget/
COPY sparkmagic sparkmagic/
RUN if [ "$dev_mode" = "true" ]; then \
cd hdijupyterutils && pip install . && cd ../ && \
cd autovizwidget && pip install . && cd ../ && \
cd sparkmagic && pip install . && cd ../ ; \
else pip install sparkmagic ; fi

RUN mkdir /home/$NB_USER/.sparkmagic
COPY sparkmagic/example_config.json /home/$NB_USER/.sparkmagic/config.json
RUN sed -i 's/localhost/spark/g' /home/$NB_USER/.sparkmagic/config.json
RUN jupyter nbextension enable --py --sys-prefix widgetsnbextension
RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/sparkkernel
RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/pysparkkernel
RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/pyspark3kernel
RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/sparkrkernel
RUN jupyter serverextension enable --py sparkmagic

USER root
RUN chown $NB_USER /home/$NB_USER/.sparkmagic/config.json
RUN rm -rf hdijupyterutils/ autovizwidget/ sparkmagic/
USER $NB_USER
35 changes: 35 additions & 0 deletions Dockerfile.spark
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
FROM gettyimages/spark:2.1.0-hadoop-2.7

RUN apt-get update && apt-get install -yq --no-install-recommends --force-yes \
git \
openjdk-7-jdk \
maven \
python2.7 \
python3.4 \
r-base \
r-base-core && \
rm -rf /var/lib/apt/lists/*

ENV LIVY_BUILD_VERSION livy-server-0.3.0
ENV LIVY_APP_PATH /apps/$LIVY_BUILD_VERSION
ENV LIVY_BUILD_PATH /apps/build/livy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to create a python3 environment and set the PYSPARK3_PYTHON variable as explained in https://github.com/cloudera/livy/blob/511a05f2282cd85a457017cc5a739672aaed5238/README.rst#pyspark3

I'd recommend installing Anaconda to create the environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be the same for PYSPARK_PYTHON, but if it's not set, it will just take the system's python, which is fine as long as it's a 2.7.X version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something similar to this:

ANACONDA_DEST_PATH=/usr/bin/anaconda
CONDA_PATH=$ANACONDA_DEST_PATH/bin/conda

# TODO wget anaconda to ANACONDA_DEST

bash $INSTALLER_PATH -p $ANACONDA_DEST_PATH -b -f

# Create an env for Python 3.
$CONDA_PATH create -n py35 python=3.5 anaconda

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the other changes, but I'm not sure I understand the need for this one. The base image I'm using for the Livy container (gettyimages/spark:2.1.0-hadoop-2.7) already has a Python3 installation which seems to be working just fine with Livy and sparkmagic as-is.

Do you just mean creating a virtualenv for it? I can do that, though I'm not sure it's necessary if we're just running inside a single-purpose container.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear. I meant we need to install the following two kernels, besides the kernels already being installed:

  • pysparkkernel
  • sparkkernel

You can just add sparkkernel to the list of kernels to install, and it should just work.

pysparkkernel is different in that it should run in a python 2.7 environment whereas pyspark3kernel would run in a python 3 environment. Because the container's python installation is python 3, we need to tell livy where to find the python 2 installation in the image (created via virtualenv or anaconda or something else). The way livy finds the two installations is via the env variables i mentioned above.

The point would be to try to show users that one can have two python versions running fine side by side and how to set it up. This is some extra work, and I would be OK with you just adding sparkkernel for now and me creating an issue to add pysparkkernel to the docker image to tackle later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean then :) Okay, let me add that real quick.

ENV PYSPARK_PYTHON python2.7
ENV PYSPARK3_PYTHON python3.4

RUN mkdir -p /apps/build && \
cd /apps/build && \
git clone https://github.com/cloudera/livy.git && \
cd $LIVY_BUILD_PATH && \
git checkout v0.3.0 && \
mvn -DskipTests -Dspark.version=$SPARK_VERSION clean package && \
ls -al $LIVY_BUILD_PATH && ls -al $LIVY_BUILD_PATH/assembly && ls -al $LIVY_BUILD_PATH/assembly/target && \
unzip $LIVY_BUILD_PATH/assembly/target/$LIVY_BUILD_VERSION.zip -d /apps && \
rm -rf $LIVY_BUILD_PATH && \
mkdir -p $LIVY_APP_PATH/upload && \
mkdir -p $LIVY_APP_PATH/logs


EXPOSE 8998

CMD ["/apps/livy-server-0.3.0/bin/livy-server"]

35 changes: 34 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,39 @@ See [Pyspark](examples/Pyspark Kernel.ipynb) and [Spark](examples/Spark Kernel.i

jupyter serverextension enable --py sparkmagic


## Docker

The included `docker-compose.yml` file will let you spin up a full
sparkmagic stack that includes a Jupyter notebook with the appropriate
extensions installed, and a Livy server backed by a local-mode Spark instance.
(This is just for testing and developing sparkmagic itself; in reality,
sparkmagic is not very useful if your Spark instance is on the same machine!)

In order to use it, make sure you have [Docker](https://docker.com) and
[Docker Compose](https://docs.docker.com/compose/) both installed, and
then simply run:

docker-compose build
docker-compose up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would also be a good idea to add instructions for exiting...

Ctrl-C + docker-compose down?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


You will then be able to access the Jupyter notebook in your browser at
http://localhost:8888. Inside this notebook, you can configure a
sparkmagic endpoint at http://spark:8998. This endpoint is able to
launch both Scala and Python sessions. You can also choose to start a
wrapper kernel for Scala, Python, or R from the list of kernels.

To shut down the containers, you can interrupt `docker-compose` with
`Ctrl-C`, and optionally remove the containers with `docker-compose
down`.

If you are developing sparkmagic and want to test out your changes in
the Docker container without needing to push a version to PyPI, you can
set the `dev_mode` build arg in `docker-compose.yml` to `true`, and then
re-build the container. This will cause the container to install your
local version of autovizwidget, hdijupyterutils, and sparkmagic. Make
sure to re-run `docker-compose build` before each test run.

### Server extension API

#### `/reconnectsparkmagic`:
Expand Down Expand Up @@ -125,4 +158,4 @@ To run unit tests, run:

nosetests hdijupyterutils autovizwidget sparkmagic

If you want to see an enhancement made but don't have time to work on it yourself, feel free to submit an [issue](https://github.com/jupyter-incubator/sparkmagic/issues) for us to deal with.
If you want to see an enhancement made but don't have time to work on it yourself, feel free to submit an [issue](https://github.com/jupyter-incubator/sparkmagic/issues) for us to deal with.
21 changes: 21 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
version: "3"
services:
spark:
image: jupyter/sparkmagic-livy
build:
context: .
dockerfile: Dockerfile.spark
hostname: spark
ports:
- "8998:8998"
jupyter:
image: jupyter/sparkmagic
build:
context: .
dockerfile: Dockerfile.jupyter
args:
dev_mode: "false"
links:
- spark
ports:
- "8888:8888"
5 changes: 5 additions & 0 deletions sparkmagic/example_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@
"password": "",
"url": "http://localhost:8998"
},
"kernel_r_credentials": {
"username": "",
"password": "",
"url": "http://localhost:8998"
},

"logging_config": {
"version": 1,
Expand Down