-
Notifications
You must be signed in to change notification settings - Fork 135
Description
API call made to submit the Job.
Response states - It is Running
On Cluster UI -
Worker (slave) - worker-20160712083825-172.31.17.189-59433 is Alive
Core 1 out of 2 used
Memory 1Gb out of 6 used
Running Application
app-20160713130056-0020 - Waiting since 5hrs
Cores - unlimited
Job Description of the Application
Active Stage
reduceByKey at /root/wordcount.py:23
Pending Stage
takeOrdered at /root/wordcount.py:26
Running Driver -
stderr log page for driver-20160713130051-0025
WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
According to Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources Slaves haven't been started - Hence it doesn't have resources.
However in my case - Slave 1 is working
According to Unable to Execute More than a spark Job "Initial job has not accepted any resources" I am using deploy-mode = cluster (not client) Since I have 1 master 1 slave and Submit API is being called via Postman / anywhere
Also the Cluster has available Cores, RAM, Memory - Still Job throws the error as conveyed by the UI
According to TaskSchedulerImpl: Initial job has not accepted any resources; I assigned
~/spark-1.5.0/conf/spark-env.sh
Spark Environment Variables
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=2
Replicated those across the Slaves
sudo /root/spark-ec2/copy-dir /root/spark/conf/spark-env.sh
All the cases in the answer to above question - were applicable still no solution found. Hence because I was working with APIs and Apache SPark - maybe some other assistance is required.
Wordcount.py - My PySpark application code -
import time, re
from pyspark import SparkContext, SparkConf
def linesToWordsFunc(line):
wordsList = line.split()
wordsList = [re.sub(r'\W+', '', word) for word in wordsList]
filtered = filter(lambda word: re.match(r'\w+', word), wordsList)
return filtered
def wordsToPairsFunc(word):
return (word, 1)
def reduceToCount(a, b):
return (a + b)
def main():
conf = SparkConf().setAppName("MyApp").setMaster("spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077")
sc = SparkContext(conf=conf)
rdd = sc.textFile("/user/root/In/a.txt")
words = rdd.flatMap(linesToWordsFunc)
pairs = words.map(wordsToPairsFunc)
counts = pairs.reduceByKey(reduceToCount)
# Get the first top 100 words
output = counts.takeOrdered(100, lambda (k, v): -v)
for(word, count) in output:
print word + ': ' + str(count)
sc.stop()
if __name__ == "__main__":
main()