Skip to content

Spark Job submitted - Waiting (TaskSchedulerImpl : Initial job not accepted) #12

@ChaiBapchya

Description

@ChaiBapchya

API call made to submit the Job.
Response states - It is Running

On Cluster UI -

Worker (slave) - worker-20160712083825-172.31.17.189-59433 is Alive

Core 1 out of 2 used

Memory 1Gb out of 6 used

Running Application

app-20160713130056-0020 - Waiting since 5hrs

Cores - unlimited

Job Description of the Application

Active Stage

reduceByKey at /root/wordcount.py:23

Pending Stage

takeOrdered at /root/wordcount.py:26

Running Driver -

stderr log page for driver-20160713130051-0025

WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

According to Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources Slaves haven't been started - Hence it doesn't have resources.

However in my case - Slave 1 is working

According to Unable to Execute More than a spark Job "Initial job has not accepted any resources" I am using deploy-mode = cluster (not client) Since I have 1 master 1 slave and Submit API is being called via Postman / anywhere

Also the Cluster has available Cores, RAM, Memory - Still Job throws the error as conveyed by the UI

According to TaskSchedulerImpl: Initial job has not accepted any resources; I assigned

~/spark-1.5.0/conf/spark-env.sh

Spark Environment Variables

SPARK_WORKER_INSTANCES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=2

Replicated those across the Slaves

sudo /root/spark-ec2/copy-dir /root/spark/conf/spark-env.sh

All the cases in the answer to above question - were applicable still no solution found. Hence because I was working with APIs and Apache SPark - maybe some other assistance is required.

Wordcount.py - My PySpark application code -
import time, re
from pyspark import SparkContext, SparkConf

def linesToWordsFunc(line):
    wordsList = line.split()
    wordsList = [re.sub(r'\W+', '', word) for word in wordsList]
    filtered = filter(lambda word: re.match(r'\w+', word), wordsList)
    return filtered

def wordsToPairsFunc(word):
    return (word, 1)

def reduceToCount(a, b):
    return (a + b)

def main():
    conf = SparkConf().setAppName("MyApp").setMaster("spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077")
    sc = SparkContext(conf=conf)
    rdd = sc.textFile("/user/root/In/a.txt")

    words = rdd.flatMap(linesToWordsFunc)
    pairs = words.map(wordsToPairsFunc)
    counts = pairs.reduceByKey(reduceToCount)

    # Get the first top 100 words
    output = counts.takeOrdered(100, lambda (k, v): -v)

    for(word, count) in output:
        print word + ': ' + str(count)

    sc.stop()

if __name__ == "__main__":
    main()


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions