Skip to content

Commit c85c730

Browse files
Merge pull request #14084 from JohnSnowLabs/release/520-release-candidate
520-release-candidate
2 parents 2851925 + 8ebcfbe commit c85c730

File tree

1,595 files changed

+64196
-10243
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,595 files changed

+64196
-10243
lines changed

CHANGELOG

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,33 @@
1+
========
2+
5.2.0
3+
========
4+
----------------
5+
New Features & Enhancements
6+
----------------
7+
* **NEW:** Introduceding the `CLIPForZeroShotClassification` for Zero-Shot Image Classification using OpenAI's CLIP models
8+
* **NEW:** Introduceding the `DocumentTokenSplitter` which allows users to split large documents into smaller chunks to be used in RAG with LLM models
9+
* **NEW:** Introducing support for ONNX Runtime in T5Transformer annotator
10+
* **NEW:** Introducing support for ONNX Runtime in MarianTransformer annotator
11+
* **NEW:** Introducing support for ONNX Runtime in BertSentenceEmbeddings annotator
12+
* **NEW:** Introducing support for ONNX Runtime in XlmRoBertaSentenceEmbeddings annotator
13+
* **NEW:** Introducing support for ONNX Runtime in CamemBertForQuestionAnswering, CamemBertForTokenClassification, and CamemBertForSequenceClassification annotators
14+
* Adding a caching support for newly imported T5 models in TF format to improve the performance to be competitive to ONNX version
15+
* Improve ZIP util and add tests for both ZipArchiveUtil and OnnxWrapper
16+
* Refactor ONNX and add OnnxSession to broadcast
17+
* Update ONNX Runtime to 1.16.3
18+
* Add a new notebook fro structure streaming
19+
20+
----------------
21+
Bug Fixes
22+
----------------
23+
* Fix random dimension mismatch in E5Embeddings and MPNetEmbeddings due to a missing average_pool after last_hidden_state in the output
24+
* Fix batching exception in E5 and MPNet embeddings annotators failing when sentence is used instead of document
25+
* Fix chunk construction when an entity is found
26+
* Fix a bug in library's version in Scala
27+
* Fix Whisper models not downloading due to wrong library's version
28+
* Fix and refactor saving best model based on given metrics during NerDL training
29+
30+
131
========
232
5.1.4
333
========

README.md

Lines changed: 53 additions & 50 deletions
Large diffs are not rendered by default.

build.sbt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64)
66

77
organization := "com.johnsnowlabs.nlp"
88

9-
version := "5.1.4"
9+
version := "5.2.0"
1010

1111
(ThisBuild / scalaVersion) := scalaVer
1212

docs/_includes/head.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
{%- assign _article_pagetitle = __return -%}
1717

1818
{%- if page.layout == "landing" -%}
19-
<title>Spark NLP - State of the Art NLP</title>
19+
<title>Spark NLP - State of the Art NLP Library for Large Language Models (LLMs)</title>
2020
{%- elsif page.layout == "model" -%}
2121
<title>{%- include snippets/get-article-modeltitle.html article=page -%}</title>
2222
{%- elsif _pagetitle -%}

docs/_layouts/landing.html

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ <h3 class="grey h3_title">{{ _section.title }}</h3>
201201
<div class="highlight-box">
202202
{% highlight bash %}
203203
# Using PyPI
204-
$ pip install spark-nlp==5.1.4
204+
$ pip install spark-nlp==5.2.0
205205

206206
# Using Anaconda/Conda
207207
$ conda install -c johnsnowlabs spark-nlp
@@ -336,12 +336,14 @@ <h4 class="blue h4_title">NLP Features</h4>
336336
<li>Vision Transformer (Google ViT) <strong>Image Classification</strong></li>
337337
<li>Microsoft Swin Transformer <strong>Image Classification</strong></li>
338338
<li>Facebook ConvNext <strong>Image Classification</strong></li>
339+
<li>Image to Text <strong>Image Captioning</strong></li>
340+
<li>Zero-Shot <strong>Image Classification (OpenAI CLIP)</strong></li>
339341
<li>Automatic Speech Recognition <strong>(OpenAI Whisper, Wav2Vec2 & HuBERT)</strong></li>
340342
<li>Easy <strong>ONNX</strong> and <strong>TensorFlow</strong> integrations</li>
341343
<li><strong>GPU</strong> Support</li>
342344
<li>Full integration with <strong>Spark ML</strong> functions</li>
343-
<li><strong>16800+</strong> pre-trained <strong>models </strong> in <strong>200+ languages! </strong>
344-
<li><strong>5900+</strong> pre-trained <strong>pipelines </strong> in <strong>200+ languages! </strong>
345+
<li><strong>24000+</strong> pre-trained <strong>models </strong> in <strong>200+ languages! </strong>
346+
<li><strong>6000+</strong> pre-trained <strong>pipelines </strong> in <strong>200+ languages! </strong>
345347
</ul>
346348
</div>
347349
{% highlight python %}

docs/api/com/index.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
<head>
44
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
55
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
6-
<title>Spark NLP 5.1.4 ScalaDoc - com</title>
7-
<meta name="description" content="Spark NLP 5.1.4 ScalaDoc - com" />
8-
<meta name="keywords" content="Spark NLP 5.1.4 ScalaDoc com" />
6+
<title>Spark NLP 5.2.0 ScalaDoc - com</title>
7+
<meta name="description" content="Spark NLP 5.2.0 ScalaDoc - com" />
8+
<meta name="keywords" content="Spark NLP 5.2.0 ScalaDoc com" />
99
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
1010

1111

@@ -28,7 +28,7 @@
2828
</head>
2929
<body>
3030
<div id="search">
31-
<span id="doc-title">Spark NLP 5.1.4 ScalaDoc<span id="doc-version"></span></span>
31+
<span id="doc-title">Spark NLP 5.2.0 ScalaDoc<span id="doc-version"></span></span>
3232
<span class="close-results"><span class="left">&lt;</span> Back</span>
3333
<div id="textfilter">
3434
<span class="input">

docs/api/com/johnsnowlabs/client/CloudClient.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
<head>
44
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
55
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
6-
<title>Spark NLP 5.1.4 ScalaDoc - com.johnsnowlabs.client.CloudClient</title>
7-
<meta name="description" content="Spark NLP 5.1.4 ScalaDoc - com.johnsnowlabs.client.CloudClient" />
8-
<meta name="keywords" content="Spark NLP 5.1.4 ScalaDoc com.johnsnowlabs.client.CloudClient" />
6+
<title>Spark NLP 5.2.0 ScalaDoc - com.johnsnowlabs.client.CloudClient</title>
7+
<meta name="description" content="Spark NLP 5.2.0 ScalaDoc - com.johnsnowlabs.client.CloudClient" />
8+
<meta name="keywords" content="Spark NLP 5.2.0 ScalaDoc com.johnsnowlabs.client.CloudClient" />
99
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
1010

1111

@@ -28,7 +28,7 @@
2828
</head>
2929
<body>
3030
<div id="search">
31-
<span id="doc-title">Spark NLP 5.1.4 ScalaDoc<span id="doc-version"></span></span>
31+
<span id="doc-title">Spark NLP 5.2.0 ScalaDoc<span id="doc-version"></span></span>
3232
<span class="close-results"><span class="left">&lt;</span> Back</span>
3333
<div id="textfilter">
3434
<span class="input">

docs/api/com/johnsnowlabs/client/CloudManager.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
<head>
44
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
55
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
6-
<title>Spark NLP 5.1.4 ScalaDoc - com.johnsnowlabs.client.CloudManager</title>
7-
<meta name="description" content="Spark NLP 5.1.4 ScalaDoc - com.johnsnowlabs.client.CloudManager" />
8-
<meta name="keywords" content="Spark NLP 5.1.4 ScalaDoc com.johnsnowlabs.client.CloudManager" />
6+
<title>Spark NLP 5.2.0 ScalaDoc - com.johnsnowlabs.client.CloudManager</title>
7+
<meta name="description" content="Spark NLP 5.2.0 ScalaDoc - com.johnsnowlabs.client.CloudManager" />
8+
<meta name="keywords" content="Spark NLP 5.2.0 ScalaDoc com.johnsnowlabs.client.CloudManager" />
99
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
1010

1111

@@ -28,7 +28,7 @@
2828
</head>
2929
<body>
3030
<div id="search">
31-
<span id="doc-title">Spark NLP 5.1.4 ScalaDoc<span id="doc-version"></span></span>
31+
<span id="doc-title">Spark NLP 5.2.0 ScalaDoc<span id="doc-version"></span></span>
3232
<span class="close-results"><span class="left">&lt;</span> Back</span>
3333
<div id="textfilter">
3434
<span class="input">

docs/api/com/johnsnowlabs/client/CloudResources$.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
<head>
44
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
55
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
6-
<title>Spark NLP 5.1.4 ScalaDoc - com.johnsnowlabs.client.CloudResources</title>
7-
<meta name="description" content="Spark NLP 5.1.4 ScalaDoc - com.johnsnowlabs.client.CloudResources" />
8-
<meta name="keywords" content="Spark NLP 5.1.4 ScalaDoc com.johnsnowlabs.client.CloudResources" />
6+
<title>Spark NLP 5.2.0 ScalaDoc - com.johnsnowlabs.client.CloudResources</title>
7+
<meta name="description" content="Spark NLP 5.2.0 ScalaDoc - com.johnsnowlabs.client.CloudResources" />
8+
<meta name="keywords" content="Spark NLP 5.2.0 ScalaDoc com.johnsnowlabs.client.CloudResources" />
99
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
1010

1111

@@ -28,7 +28,7 @@
2828
</head>
2929
<body>
3030
<div id="search">
31-
<span id="doc-title">Spark NLP 5.1.4 ScalaDoc<span id="doc-version"></span></span>
31+
<span id="doc-title">Spark NLP 5.2.0 ScalaDoc<span id="doc-version"></span></span>
3232
<span class="close-results"><span class="left">&lt;</span> Back</span>
3333
<div id="textfilter">
3434
<span class="input">

docs/api/com/johnsnowlabs/client/CloudStorage.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
<head>
44
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
55
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
6-
<title>Spark NLP 5.1.4 ScalaDoc - com.johnsnowlabs.client.CloudStorage</title>
7-
<meta name="description" content="Spark NLP 5.1.4 ScalaDoc - com.johnsnowlabs.client.CloudStorage" />
8-
<meta name="keywords" content="Spark NLP 5.1.4 ScalaDoc com.johnsnowlabs.client.CloudStorage" />
6+
<title>Spark NLP 5.2.0 ScalaDoc - com.johnsnowlabs.client.CloudStorage</title>
7+
<meta name="description" content="Spark NLP 5.2.0 ScalaDoc - com.johnsnowlabs.client.CloudStorage" />
8+
<meta name="keywords" content="Spark NLP 5.2.0 ScalaDoc com.johnsnowlabs.client.CloudStorage" />
99
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
1010

1111

@@ -28,7 +28,7 @@
2828
</head>
2929
<body>
3030
<div id="search">
31-
<span id="doc-title">Spark NLP 5.1.4 ScalaDoc<span id="doc-version"></span></span>
31+
<span id="doc-title">Spark NLP 5.2.0 ScalaDoc<span id="doc-version"></span></span>
3232
<span class="close-results"><span class="left">&lt;</span> Back</span>
3333
<div id="textfilter">
3434
<span class="input">

0 commit comments

Comments
 (0)