Skip to content

Conversation

@ying-w
Copy link
Contributor

@ying-w ying-w commented Mar 29, 2019

Instructions seemed outdated so I've updated it to use spark session.

I couldn't find a way to set hadoopConf from sparksession

Setting aws-java-sdk in --packages shouldn't be necessary since dependency resolution of hadoop-aws should grab the proper version, however, the version required is quite old (march 2014). I tried a newer jar but got an error that looked like this

@parente
Copy link
Member

parente commented Apr 1, 2019

LGTM though I don't have a way to verify the steps. Let's merge it and fix over time if people find problems. 🍰

@parente parente merged commit 6576148 into jupyter:master Apr 1, 2019
@ogierpaul
Copy link

Well, thank you so much @ying-w for adding this comment:

 # !ls /usr/local/spark/jars/hadoop* # to figure out what version of hadoop

That just saved my life.
I was trying to connect to AWS with jupyter/pyspark-notebook, and everywhere the code found was reference Hadoop 2.7.3. Turned out that my version of Hadoop in the latest docker image was 3.2.0.

Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants