Spark Job submitted - Waiting (TaskSchedulerImpl : Initial job not accepted)

API call made to submit the Job. 
Response states - It is Running

On Cluster UI -

>    Worker (slave) - worker-20160712083825-172.31.17.189-59433 is Alive
> 
>    Core 1 out of 2 used
> 
>    Memory 1Gb out of 6 used

Running Application

>    app-20160713130056-0020 - Waiting since 5hrs
> 
>    Cores - unlimited

Job Description of the Application

Active Stage

>    reduceByKey at /root/wordcount.py:23

Pending Stage

>    takeOrdered at /root/wordcount.py:26

Running Driver -

>  stderr log page for driver-20160713130051-0025 

`WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources`

According to [Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources](http://stackoverflow.com/questions/38118572/initial-job-has-not-accepted-any-resources-check-your-cluster-ui-to-ensure-that)  Slaves haven't been started - Hence it doesn't have resources.

However in my case - Slave 1 is working

According to [Unable to Execute More than a spark Job "Initial job has not accepted any resources"](http://stackoverflow.com/questions/32651568/unable-to-execute-more-than-a-spark-job-initial-job-has-not-accepted-any-resour) I am using deploy-mode = cluster (not client) Since I have 1 master 1 slave and Submit API is being called via Postman / anywhere

Also the Cluster has available Cores, RAM, Memory - Still Job throws the error as conveyed by the UI

According to [TaskSchedulerImpl: Initial job has not accepted any resources;](http://stackoverflow.com/questions/29469462/taskschedulerimpl-initial-job-has-not-accepted-any-resources) I assigned

~/spark-1.5.0/conf/spark-env.sh

Spark Environment Variables

```
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=2
```

Replicated those across the Slaves

sudo /root/spark-ec2/copy-dir /root/spark/conf/spark-env.sh

All the cases in the answer to above question - were applicable still no solution found. Hence because I was working with APIs and Apache SPark - maybe some other assistance is required.

> ```
> Wordcount.py - My PySpark application code -
> ```

```
import time, re
from pyspark import SparkContext, SparkConf

def linesToWordsFunc(line):
    wordsList = line.split()
    wordsList = [re.sub(r'\W+', '', word) for word in wordsList]
    filtered = filter(lambda word: re.match(r'\w+', word), wordsList)
    return filtered

def wordsToPairsFunc(word):
    return (word, 1)

def reduceToCount(a, b):
    return (a + b)

def main():
    conf = SparkConf().setAppName("MyApp").setMaster("spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077")
    sc = SparkContext(conf=conf)
    rdd = sc.textFile("/user/root/In/a.txt")

    words = rdd.flatMap(linesToWordsFunc)
    pairs = words.map(wordsToPairsFunc)
    counts = pairs.reduceByKey(reduceToCount)

    # Get the first top 100 words
    output = counts.takeOrdered(100, lambda (k, v): -v)

    for(word, count) in output:
        print word + ': ' + str(count)

    sc.stop()

if __name__ == "__main__":
    main()


```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark Job submitted - Waiting (TaskSchedulerImpl : Initial job not accepted) #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spark Job submitted - Waiting (TaskSchedulerImpl : Initial job not accepted) #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions