diff --git a/docs/speech-to-text/batch/batch_diarization.mdx b/docs/speech-to-text/batch/batch_diarization.mdx
index 52f583a..90bbcbd 100644
--- a/docs/speech-to-text/batch/batch_diarization.mdx
+++ b/docs/speech-to-text/batch/batch_diarization.mdx
@@ -25,11 +25,11 @@ To learn more about diarization as a feature, check out the [diarization](../fea
Batch diarization offers the following ways to separate speakers in audio:
-- [**Speaker diarization**](#speaker-diarization) — Identifies each speaker by their voice.
- Useful when there are multiple speakers in the same audio stream.
+- [**Speaker diarization**](#speaker-diarization) — Identifies each speaker by their voice.
+ Useful when there are multiple speakers in the same audio stream.
-- [**Channel diarization**](#channel-diarization) — Transcribes each audio channel separately.
- Useful when each speaker is recorded on their own channel.
+- [**Channel diarization**](#channel-diarization) — Transcribes each audio channel separately.
+ Useful when each speaker is recorded on their own channel.
## Speaker diarization
@@ -170,23 +170,34 @@ You can reduce the likelihood of incorrectly switching between similar sounding
}
}
```
-By default this flag is `false`. When this flag is set to `true`, the system will stay with the speaker of the previous word, if they closely match the speaker of the new word.
+By default this is `false`. When this is set to `true`, the system will stay with the speaker of the previous word, if they closely match the speaker of the new word.
+
+This can reduce instances where the system inadvertently alternates between different speaker labels within a single speaker audio segment
+
+However, it may also result in some shorter speaker turn changes between similar speakers being missed.
-This may result in some shorter speaker turn changes between similar speakers being missed.
### Speaker diarization and punctuation
-Speaker diarization uses punctuation to improve accuracy. Small corrections are applied to speaker labels based on sentence boundaries.
+Speaker diarization uses punctuation to improve the accuracy of speaker change points. Small adjustments to speaker labels may be applied based on sentence boundries.
+
+For example, consider a case where the diarization marks a speaker change one word after a full stop:
+
+> Hello my name is John. And my name is Alice.
+
+In this case, the above would be corrected to move the speaker change point to match with the end of sentence:
+
+> Hello my name is John. And my name is Alice.
-For example, if the system initially assigns 9 words in a sentence to S1 and 1 word to S2, the lone S2 word may be corrected to S1.
+Speaker diarization may also insert punctuation when a speaker change occurs without a corresponding sentence-ending punctuation mark in the transcription result.
-This adjustment only works when punctuation is enabled. Disabling punctuation via the `permitted_marks` setting in `punctuation_overrides` can reduce diarization accuracy.
+These adjustments are only applied when punctuation is enabled. Disabling punctuation via the `permitted_marks` setting in `punctuation_overrides` can reduce diarization accuracy.
Adjusting punctuation sensitivity can also affect how accurately speakers are identified.
### Speaker change (legacy)
-The speaker change detection feature was removed in July 2024. The `speaker_change` and `channel_and_speaker_change` parameters are no longer supported. Use the [speaker diarization](#speaker-diarization) feature for speaker labeling.
+The speaker change detection feature was removed in July 2024. The `speaker_change` and `channel_and_speaker_change` parameters are no longer supported. Use the [speaker diarization](#speaker-diarization) feature for speaker labeling.
For API-related questions, contact [Support](https://support.speechmatics.com).
diff --git a/docs/speech-to-text/realtime/realtime_diarization.mdx b/docs/speech-to-text/realtime/realtime_diarization.mdx
index 0b9e0d9..7218e83 100644
--- a/docs/speech-to-text/realtime/realtime_diarization.mdx
+++ b/docs/speech-to-text/realtime/realtime_diarization.mdx
@@ -28,18 +28,18 @@ To learn more about diarization as a feature, check out the [diarization](../fea
Real-time diarization offers the following ways to separate speakers in audio:
-- [**Speaker diarization**](#speaker-diarization) — Identifies each speaker by their voice.
- Useful when there are multiple speakers in the same audio stream.
+- [**Speaker diarization**](#speaker-diarization) — Identifies each speaker by their voice.
+ Useful when there are multiple speakers in the same audio stream.
-- [**Channel diarization**](#channel-diarization) — Transcribes each audio channel separately.
- Useful when each speaker is recorded on their own channel.
+- [**Channel diarization**](#channel-diarization) — Transcribes each audio channel separately.
+ Useful when each speaker is recorded on their own channel.
-- [**Channel & speaker diarization**](#channel-and-speaker-diarization) — Combines both methods.
- Each channel is transcribed separately, with unique speakers identified within each channel.
- Useful when multiple speakers are present across multiple channels.
+- [**Channel & speaker diarization**](#channel-and-speaker-diarization) — Combines both methods.
+ Each channel is transcribed separately, with unique speakers identified within each channel.
+ Useful when multiple speakers are present across multiple channels.
## Speaker diarization
-
+
Speaker diarization picks out different speakers from the audio stream based on acoustic matching.
@@ -169,7 +169,7 @@ Transcripts are returned independently for each channel, with the `channel` prop
```
:::warning
-The `channel` property will be returned for `AddTranscript` and `AddPartialTranscript` messages only.
+The `channel` property will be returned for `AddTranscript` and `AddPartialTranscript` messages only.
Features such as [audio events](/speech-to-text/features/audio-events), [translation](/speech-to-text/features/translation) and [end of turn detection](/speech-to-text/realtime/end-of-turn) do not currently include this property. To request this feature, please contact [support](https://support.speechmatics.com).
:::
@@ -179,7 +179,7 @@ Channel and speaker diarization combines speaker diarization and channel diariza
To enable this mode, follow the steps in [speaker diarization](#speaker-diarization) and set the `diarization` mode to `channel_and_speaker`.
-To send audio to a channel, follow the instructions in [send audio to a channel](#send-audio-to-a-channel).
+To send audio to a channel, follow the instructions in [send audio to a channel](#send-audio-to-a-channel).
Transcripts are returned in the same way as channel diarization, but with individual speakers identified:
@@ -221,7 +221,7 @@ For SaaS customers, the maximum number of channels is 2.
For On-prem Container customers, the maximum number of channels depends on your [Multi-session container's](../../deployments/container/cpu-speech-to-text.mdx#multi-session-containers) maximum number of connections.
-The Speechmatics Python client CLI is currently limited to transcribing multi-channel audio in via files and not streaming/raw audio.
+The Speechmatics Python client CLI is currently limited to transcribing multi-channel audio in via files and not streaming/raw audio.
## Configuration
@@ -229,7 +229,6 @@ You can customize diarization to match your use case by adjusting settings for s
### Speaker sensitivity
-
You can configure the sensitivity of speaker detection by using the `speaker_sensitivity` setting in the `speaker_diarization_config` section of the job config object as shown below:
```json
@@ -250,7 +249,7 @@ You can configure the sensitivity of speaker detection by using the `speaker_sen
This takes a value between 0 and 1 (the default is 0.5). A higher sensitivity will
increase the likelihood of more unique speakers returning.
-### Prefer Current Speaker
+### Prefer current speaker
You can reduce the likelihood of incorrectly switching between similar sounding speakers by setting the `prefer_current_speaker` flag in the `speaker_diarization_config`:
@@ -270,9 +269,11 @@ You can reduce the likelihood of incorrectly switching between similar sounding
```
By default this is `false`. When this is set to `true`, the system will stay with the speaker of the previous word, if they closely match the speaker of the new word.
-This may result in some shorter speaker turn changes between similar speakers being missed.
+This can reduce instances where the system inadvertently alternates between different speaker labels within a single speaker audio segment
+
+However, it may also result in some shorter speaker turn changes between similar speakers being missed.
-### Max. Speakers
+### Max. speakers
You can prevent too many speakers from being detected by using the `max_speakers` setting in the `StartRecognition` message as shown below:
@@ -299,19 +300,31 @@ You can prevent too many speakers from being detected by using the `max_speakers
The default value is 50, but it can take any integer value between 2 and 100 inclusive.
-### Punctuation
+This restricts the number of unique speaker labels that may be output by the system.
+
+Note that accuracy may decline once this limit is reached. It is advisable to set the value to at least the expected number of speakers, and preferably slightly higher.
+
+### Speaker diarization and punctuation
+
+Speaker diarization uses punctuation to improve the accuracy of speaker change points. Small adjustments to speaker labels may be applied based on sentence boundries.
+
+For example, consider a case where the diarization marks a speaker change one word after a full stop:
+
+> Hello my name is John. And my name is Alice.
+
+In this case, the above would be corrected to move the speaker change point to match with the end of sentence:
-Speaker diarization uses punctuation to improve accuracy. Small corrections are applied to speaker labels based on sentence boundaries.
+> Hello my name is John. And my name is Alice.
-For example, if the system initially assigns 9 words in a sentence to S1 and 1 word to S2, the lone S2 word may be corrected to S1.
+Speaker diarization may also insert punctuation when a speaker change occurs without a corresponding sentence-ending punctuation mark in the transcription result.
-This adjustment only works when punctuation is enabled. Disabling punctuation via the `permitted_marks` setting in `punctuation_overrides` can reduce diarization accuracy.
+These adjustments are only applied when punctuation is enabled. Disabling punctuation via the `permitted_marks` setting in `punctuation_overrides` can reduce diarization accuracy.
Adjusting punctuation sensitivity can also affect how accurately speakers are identified.
### Speaker change (legacy)
-The Speaker Change Detection feature was removed in July 2024. The `speaker_change` and `channel_and_speaker_change` parameters are no longer supported. Use the [Speaker diarization](#speaker-diarization) feature for speaker labeling.
+The Speaker Change Detection feature was removed in July 2024. The `speaker_change` and `channel_and_speaker_change` parameters are no longer supported. Use the [Speaker diarization](#speaker-diarization) feature for speaker labeling.
For API-related questions, contact [support](https://support.speechmatics.com).
@@ -319,7 +332,7 @@ For API-related questions, contact [support](https://support.speechmatics.com).
To run `channel` or `channel_and_speaker` diarization with an on-prem deployment, configure your environment as follows:
-- Use a [GPU Speech-to-Text container](../../deployments/container/gpu-speech-to-text.mdx). Handling multiple audio streams is computationally intensive and benefits from GPU acceleration.
-- Set the `SM_MAX_CONCURRENT_CONNECTIONS` environment variable to match the number of channels you want to process.
+- Use a [GPU Speech-to-Text container](../../deployments/container/gpu-speech-to-text.mdx). Handling multiple audio streams is computationally intensive and benefits from GPU acceleration.
+- Set the `SM_MAX_CONCURRENT_CONNECTIONS` environment variable to match the number of channels you want to process.
For more details on container setup, see the [on-prem deployment docs](../../deployments/index.md).