-
Notifications
You must be signed in to change notification settings - Fork 139
Build and Deploy
docker run -it uscdatascience/sparkler
docker tag uscdatascience/sparkler sparkler-local
# Tagging lets bin/dockler.sh use this downloaded image instead of rebuilding from scratch
If you prefer to build the latest image from source code, use the instructions below.
cd to the root directory of the project and issue the following commands:
$ bin/dockler.sh
When the script asks 'Y/N', press 'Y'. This script will do the following:
- Builds this project (
mvnandgitare required) - Builds a docker image named
sparkler-local(dockercommand is required), - Starts a docker container
- Starts the Solr
- gives you a bash shell inside docker container
-
/data/solr/bin/solr- start / stop solr using this tool -
/data/sparkler/bin/sparkler.sh- cli interface to sparkler
# inject a seed url, assign a job id to it
/data/sparkler/bin/sparkler.sh inject -id sjob-1 -su https://isi.edu
# Crawl it
/data/sparkler/bin/sparkler.sh crawl -id sjob-1
NOTE: if you would like to build docker image directly
docker build -f sparkler-deployment/docker/Dockerfile . -t sparkler-local
docker run -it -p 8984:8983 sparkler-local
# inside
sparkler@inside # /data/solr/bin/solr start
sparkler@inside # /data/sparkler/bin/sparkler.sh [crawl|inject] -h
- Apache Maven (Tested on v3.3.x)
- JDK (Tested on Oracle JDK 1.8)
- Working internet connection to retrieve maven dependencies
The following dependencies will be downloaded from Maven central. Feel free to look inside the pom.xml for the current versions being used.
- Apache Spark
- Apache Nutch
- Apache Kafka Client
- Apache Solr Client
- Scala
Note that the libraries like Solr-client, spark, kafka etc should match with your own deployment version. For instance, if you have Spark Cluster deployment of v1.6 with Scala 2.11, make sure to set them the same versions for the client libraries in pom.xml.
mvn clean compile packageThis should produce build directory that has everything (except solr) required to run sparkler.
For solr setup see this page.