Cloud Integration for Apache Spark

The cloud-integration repository provides modules to improve Apache Spark's integration with cloud infrastructures.

Module `spark-cloud-integration`

Classes and Tools to make Spark work better in-cloud

Committer integration with the s3a committers.
Proof of concept cloud-first distcp replacement.
Serialization for Hadoop Configuration: class ConfigSerDeser. Use this to get a configuration into an RDD method
Trait HConf to manipulate the hadoop options in a spark config.
Anything else which turns out to be useful.
Variant of FileInputStream for cloud storage, org.apache.spark.streaming.hortonworks.CloudInputDStream

Module `cloud-examples`

This does the packaging/integration tests for Spark and cloud against AWS, Azure and openstack.

These are basic tests of the core functionality of I/O, streaming, and verify that the commmitters work in the presence of inconsistent object storage As well as running as unit tests, they have CLI entry points which can be used for scalable functional testing.

Module `minimal-integration-test`

This is a minimal JAR for integration tests, intended to work against all versions of Spark 2.2. As Spark 2.1 has Spark's Logging class private, it reinstates its own log API, CloudLogging which is used; then copies in the relevant ops from spark-cloud-integration with their logging fixed up.

Usage

spark-submit --class com.hortonworks.spark.cloud.integration.Generator \
--master yarn \
--num-executors 2 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
minimal-integration-test-1.0-SNAPSHOT.jar \
adl://example.azuredatalakestore.net/output/dest/1 \
2 2 15

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
cloud-examples		cloud-examples
spark-cloud-integration		spark-cloud-integration
.gitallowed		.gitallowed
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cloud Integration for Apache Spark

Module `spark-cloud-integration`

Module `cloud-examples`

Module `minimal-integration-test`

About

Uh oh!

Releases

Packages

Languages

License

deepakcv/cloud-integration

Folders and files

Latest commit

History

Repository files navigation

Cloud Integration for Apache Spark

Module spark-cloud-integration

Module cloud-examples

Module minimal-integration-test

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Module `spark-cloud-integration`

Module `cloud-examples`

Module `minimal-integration-test`

Packages