EMR compatibility issue with OnnxWrapper

### Is there an existing issue for this?

- [X] I have searched the existing issues and did not find a match.

### Who can help?

I can help to contribute to fix the issue in the short term and collaborate in a long term.

### What are you working on?

Testing Spark-nlp functionality with Onnx and Tensorflow components in the ML pipeline **on EMR with multiple task nodes**. We hope to onboard a few models trained by ourselves to spark nlp pipeline by using Onnx and tensorflow integration. 

### Current Behavior

I ran into the "the SparkSession should only be created on driver" issue caused by OnnxWrapper when testing spark-nlp on EMR. I suspected there was an issue in the code. Basically, the spark worker node was trying to get configuration from hadoop to initialize OnnxWrapper session. but It they cannot get sparkSession since it is not the driver.

```
Caused by: java.lang.IllegalStateException: SparkSession should only be created and accessed on the driver.
	at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$assertOnDriver(SparkSession.scala:1158)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:927)
	at com.johnsnowlabs.nlp.util.io.ResourceHelper$.$anonfun$getActiveSparkSession$1(ResourceHelper.scala:56)
	at scala.Option.getOrElse(Option.scala:189)
	at com.johnsnowlabs.nlp.util.io.ResourceHelper$.getActiveSparkSession(ResourceHelper.scala:57)
	at com.johnsnowlabs.nlp.util.io.ResourceHelper$.spark$lzycompute(ResourceHelper.scala:104)
	at com.johnsnowlabs.nlp.util.io.ResourceHelper$.spark(ResourceHelper.scala:104)
	at com.johnsnowlabs.util.ConfigHelper$.sparkSession$lzycompute(ConfigHelper.scala:23)
	at com.johnsnowlabs.util.ConfigHelper$.sparkSession(ConfigHelper.scala:23)
	at com.johnsnowlabs.util.ConfigHelper$.getConfigValueOrElse(ConfigHelper.scala:79)
	at com.johnsnowlabs.util.ConfigLoader$.getConfigInfo(ConfigLoader.scala:70)
	at com.johnsnowlabs.util.ConfigLoader$.configData$lzycompute(ConfigLoader.scala:36)
	at com.johnsnowlabs.util.ConfigLoader$.configData(ConfigLoader.scala:60)
	at com.johnsnowlabs.util.ConfigLoader$.getConfigIntValue(ConfigLoader.scala:79)
	at com.johnsnowlabs.ml.onnx.OnnxWrapper$.getCPUSessionConfig(OnnxWrapper.scala:195)
```



### Expected Behavior

should run without exception either in worker node or driver node.

### Steps To Reproduce

## Step 1 create EMR cluster

we recommend create the cluster with EC2 RSA key, if you don't have it, you may want to create it first.

```
aws emr create-cluster \
    --release-label emr-6.8.0 \
    --use-default-roles \
    --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=1,InstanceType=m4.large InstanceGroupType=TASK,InstanceCount=5,InstanceType=m4.large \
    --applications Name=Spark Name=Zeppelin \
    --ec2-attributes KeyName=<yourKeyName>
```

## Step 2 Find DNS in AWS Console

find the  **Primary node public DNS**  of your cluster. It looks like `hadoop@*.compute-1.amazonaws.com`.

## Step 3 SSH into the cluster

ssh into the emr cluster
```
ssh -i <key_path> <nodepublicDNS>
```

## Step 4
install and run pyspark shell
```
pip install spark-nlp==5.1.4

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.4

```

## Step 5

run code
```import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline

documentAssembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

sentence = SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence")

tokenizer = Tokenizer() \
    .setInputCols(["sentence"]) \
    .setOutputCol("token")

# Use the transformer embeddings
embeddings = BertEmbeddings.pretrained(name='bert_base_cased', lang='en') \
    .setInputCols(['document', 'token']) \
    .setOutputCol('embeddings')

# This pretrained model requires those specific transformer embeddings
ner_model = NerDLModel.pretrained("ner_dl_bert", "en") \
    .setInputCols(["document", "token", "embeddings"]) \
    .setOutputCol("ner")

pipeline = Pipeline().setStages([
    documentAssembler,
    sentence,
    tokenizer,
    embeddings,
    ner_model
])

data = spark.createDataFrame([["U.N. official Ekeus heads for Baghdad."]]).toDF("text")
result = pipeline.fit(data).transform(data)

result.select("ner.result").show(truncate=False)
```


exception will be thrown when executing the lass line of code.

### Spark NLP version and Apache Spark

'5.1.4'
'3.3.0-amzn-0'

### Type of Spark Application

_No response_

### Java Version

_No response_

### Java Home Directory

_No response_

### Setup and installation

_No response_

### Operating System and Version

_No response_

### Link to your project (if available)

_No response_

### Additional Information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EMR compatibility issue with OnnxWrapper #14057

Is there an existing issue for this?

Who can help?

What are you working on?

Current Behavior

Expected Behavior

Steps To Reproduce

Step 1 create EMR cluster

Step 2 Find DNS in AWS Console

Step 3 SSH into the cluster

Step 4

Step 5

Spark NLP version and Apache Spark

Type of Spark Application

Java Version

Java Home Directory

Setup and installation

Operating System and Version

Link to your project (if available)

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EMR compatibility issue with OnnxWrapper #14057

Description

Is there an existing issue for this?

Who can help?

What are you working on?

Current Behavior

Expected Behavior

Steps To Reproduce

Step 1 create EMR cluster

Step 2 Find DNS in AWS Console

Step 3 SSH into the cluster

Step 4

Step 5

Spark NLP version and Apache Spark

Type of Spark Application

Java Version

Java Home Directory

Setup and installation

Operating System and Version

Link to your project (if available)

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions