File tree Expand file tree Collapse file tree 1 file changed +22
-0
lines changed
Expand file tree Collapse file tree 1 file changed +22
-0
lines changed Original file line number Diff line number Diff line change @@ -156,7 +156,29 @@ A few suggestions have been made regarding using Docker Stacks with spark.
156156
157157### Using PySpark with AWS S3
158158
159+ Using Spark session for hadoop 2.7.3
160+
161+ ``` py
162+ import os
163+ # !ls /usr/local/spark/jars/hadoop* # to figure out what version of hadoop
164+ os.environ[' PYSPARK_SUBMIT_ARGS' ] = ' --packages "org.apache.hadoop:hadoop-aws:2.7.3" pyspark-shell'
165+
166+ import pyspark
167+ myAccessKey = input ()
168+ mySecretKey = input ()
169+
170+ spark = pyspark.sql.SparkSession.builder \
171+ .master(" local[*]" ) \
172+ .config(" spark.hadoop.fs.s3a.access.key" , myAccessKey) \
173+ .config(" spark.hadoop.fs.s3a.secret.key" , mySecretKey) \
174+ .getOrCreate()
175+
176+ df = spark.read.parquet(" s3://myBucket/myKey" )
159177```
178+
179+ Using Spark context for hadoop 2.6.0
180+
181+ ``` py
160182import os
161183os.environ[' PYSPARK_SUBMIT_ARGS' ] = ' --packages com.amazonaws:aws-java-sdk:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 pyspark-shell'
162184
You can’t perform that action at this time.
0 commit comments