Skip to content

Commit 552c779

Browse files
Updated Nvidia models (#204)
1 parent 00ce4cc commit 552c779

3 files changed

+61
-95
lines changed

nvidia_deeplearningexamples_ssd.md

+13-25
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,6 @@ order: 10
1717
---
1818

1919

20-
```python
21-
import torch
22-
precision = 'fp32'
23-
ssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd', model_math=precision)
24-
```
25-
26-
will load an SSD model pretrained on COCO dataset from Torch Hub.
27-
28-
Setting precision='fp16' will load a checkpoint trained with [mixed precision](https://arxiv.org/abs/1710.03740) into architecture enabling execution on [Tensor Cores](https://developer.nvidia.com/tensor-cores).
29-
Handling mixed precision data requires [Apex](https://github.com/NVIDIA/apex) library.
30-
31-
32-
3320
### Model Description
3421

3522
This SSD300 model is based on the
@@ -56,17 +43,17 @@ they are enhanced by additional BatchNorm layers after each convolution.
5643

5744
### Example
5845

59-
In the example below we will use the pretrained SSD model loaded from Torch Hub to detect objects in sample images and visualize the result.
60-
61-
To run the example you need some extra python packages installed.
62-
These are needed for preprocessing images and visualization.
46+
In the example below we will use the pretrained SSD model to detect objects in sample images and visualize the result.
6347

48+
To run the example you need some extra python packages installed. These are needed for preprocessing images and visualization.
6449
```bash
6550
pip install numpy scipy scikit-image matplotlib
6651
```
6752

68-
For convenient and comprehensive formatting of input and output of the model, load a set of utility methods.
53+
Load an SSD model pretrained on COCO dataset, as well as a set of utility methods for convenient and comprehensive formatting of input and output of the model.
6954
```python
55+
import torch
56+
ssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd')
7057
utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd_processing_utils')
7158
```
7259

@@ -76,7 +63,6 @@ ssd_model.to('cuda')
7663
ssd_model.eval()
7764
```
7865

79-
8066
Prepare input images for object detection.
8167
(Example links below correspond to first few test images from the COCO dataset, but you can also specify paths to your local images here)
8268
```python
@@ -86,30 +72,33 @@ uris = [
8672
'http://images.cocodataset.org/val2017/000000252219.jpg'
8773
]
8874
```
75+
8976
Format the images to comply with the network input and convert them to tensor.
9077
```python
9178
inputs = [utils.prepare_input(uri) for uri in uris]
92-
tensor = utils.prepare_tensor(inputs, precision == 'fp16')
79+
tensor = utils.prepare_tensor(inputs)
9380
```
9481

95-
9682
Run the SSD network to perform object detection.
9783
```python
9884
with torch.no_grad():
9985
detections_batch = ssd_model(tensor)
10086
```
87+
10188
By default, raw output from SSD network per input image contains
10289
8732 boxes with localization and class probability distribution.
10390
Let's filter this output to only get reasonable detections (confidence>40%) in a more comprehensive format.
10491
```python
10592
results_per_input = utils.decode_results(detections_batch)
10693
best_results_per_input = [utils.pick_best(results, 0.40) for results in results_per_input]
10794
```
95+
10896
The model was trained on COCO dataset, which we need to access in order to translate class IDs into object names.
10997
For the first time, downloading annotations may take a while.
11098
```python
11199
classes_to_labels = utils.get_coco_object_dictionary()
112100
```
101+
113102
Finally, let's visualize our detections
114103
```python
115104
from matplotlib import pyplot as plt
@@ -131,16 +120,15 @@ for image_idx in range(len(best_results_per_input)):
131120
plt.show()
132121
```
133122

134-
135123
### Details
136124
For detailed information on model input and output,
137125
training recipies, inference and performance visit:
138126
[github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)
139-
and/or [NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:ssd_for_pytorch)
127+
and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch)
140128

141129
### References
142130

143131
- [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) paper
144132
- [Speed/accuracy trade-offs for modern convolutional object detectors](https://arxiv.org/abs/1611.10012) paper
145-
- [SSD on NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:ssd_for_pytorch)
146-
- [SSD on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)
133+
- [SSD on NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch)
134+
- [SSD on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)

nvidia_deeplearningexamples_tacotron2.md

+25-38
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,6 @@ accelerator: cuda
1616
order: 10
1717
---
1818

19-
To run the example you need some extra python packages installed.
20-
These are needed for preprocessing the text and audio, as well as for display and input / output.
21-
22-
```bash
23-
pip install numpy scipy librosa unidecode inflect librosa
24-
```
25-
26-
```python
27-
import torch
28-
tacotron2 = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_tacotron2')
29-
```
30-
31-
will load the Tacotron2 model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)
3219

3320
### Model Description
3421

@@ -40,72 +27,72 @@ This implementation of Tacotron 2 model differs from the model described in the
4027

4128
In the example below:
4229
- pretrained Tacotron2 and Waveglow models are loaded from torch.hub
43-
- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you")
30+
- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you so much")
4431
- Waveglow generates sound given the mel spectrogram
4532
- the output sound is saved in an 'audio.wav' file
4633

47-
```python
48-
import numpy as np
49-
from scipy.io.wavfile import write
34+
To run the example you need some extra python packages installed.
35+
These are needed for preprocessing the text and audio, as well as for display and input / output.
36+
```bash
37+
pip install numpy scipy librosa unidecode inflect librosa
38+
apt-get update
39+
apt-get install -y libsndfile1
5040
```
5141

52-
Prepare tacotron2 for inference
53-
42+
Load the Tacotron2 model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/) and prepare it for inference:
5443
```python
44+
import torch
45+
tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')
5546
tacotron2 = tacotron2.to('cuda')
5647
tacotron2.eval()
5748
```
5849

59-
Load waveglow from PyTorch Hub
60-
50+
Load pretrained WaveGlow model
6151
```python
62-
waveglow = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_waveglow')
52+
waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp16')
6353
waveglow = waveglow.remove_weightnorm(waveglow)
6454
waveglow = waveglow.to('cuda')
6555
waveglow.eval()
6656
```
6757

68-
Now, let's make the model say *"hello world, I missed you"*
69-
58+
Now, let's make the model say:
7059
```python
71-
text = "hello world, I missed you"
60+
text = "Hello world, I missed you so much."
7261
```
7362

74-
Now chain pre-processing -> tacotron2 -> waveglow
75-
63+
Format the input using utility methods
7664
```python
77-
# preprocessing
78-
sequence = np.array(tacotron2.text_to_sequence(text, ['english_cleaners']))[None, :]
79-
sequence = torch.from_numpy(sequence).to(device='cuda', dtype=torch.int64)
65+
utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')
66+
sequences, lengths = utils.prepare_input_sequence([text])
67+
```
8068

81-
# run the models
69+
Run the chained models:
70+
```python
8271
with torch.no_grad():
83-
_, mel, _, _ = tacotron2.infer(sequence)
72+
mel, _, _ = tacotron2.infer(sequences, lengths)
8473
audio = waveglow.infer(mel)
8574
audio_numpy = audio[0].data.cpu().numpy()
8675
rate = 22050
8776
```
8877

8978
You can write it to a file and listen to it
90-
9179
```python
80+
from scipy.io.wavfile import write
9281
write("audio.wav", rate, audio_numpy)
9382
```
9483

95-
9684
Alternatively, play it right away in a notebook with IPython widgets
97-
9885
```python
9986
from IPython.display import Audio
10087
Audio(audio_numpy, rate=rate)
10188
```
10289

10390
### Details
104-
For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch)
91+
For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)
10592

10693
### References
10794

10895
- [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884)
10996
- [WaveGlow: A Flow-based Generative Network for Speech Synthesis](https://arxiv.org/abs/1811.00002)
110-
- [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch)
111-
- [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2)
97+
- [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)
98+
- [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2)

nvidia_deeplearningexamples_waveglow.md

+23-32
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,9 @@ github-id: NVIDIA/DeepLearningExamples
1313
featured_image_1: waveglow_diagram.png
1414
featured_image_2: no-image
1515
accelerator: cuda
16-
order: 3
17-
demo-model-link: https://colab.research.google.com/drive/1omouh8c4XIoZR1vw91X5AQUY2_nIKeNz?usp=sharing
16+
order: 10
1817
---
1918

20-
```python
21-
import torch
22-
waveglow = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_waveglow')
23-
```
24-
will load the WaveGlow model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)
2519

2620
### Model Description
2721

@@ -31,79 +25,76 @@ The Tacotron 2 and WaveGlow model form a text-to-speech system that enables user
3125

3226
In the example below:
3327
- pretrained Tacotron2 and Waveglow models are loaded from torch.hub
34-
- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you")
28+
- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you so much")
3529
- Waveglow generates sound given the mel spectrogram
3630
- the output sound is saved in an 'audio.wav' file
3731

3832
To run the example you need some extra python packages installed.
3933
These are needed for preprocessing the text and audio, as well as for display and input / output.
40-
4134
```bash
4235
pip install numpy scipy librosa unidecode inflect librosa
36+
apt-get update
37+
apt-get install -y libsndfile1
4338
```
4439

40+
Load the WaveGlow model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)
4541
```python
46-
import numpy as np
47-
from scipy.io.wavfile import write
42+
import torch
43+
waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp32')
4844
```
4945

50-
Prepare the waveglow model for inference
51-
46+
Prepare the WaveGlow model for inference
5247
```python
5348
waveglow = waveglow.remove_weightnorm(waveglow)
5449
waveglow = waveglow.to('cuda')
5550
waveglow.eval()
5651
```
5752

58-
Load tacotron2 from PyTorch Hub
59-
53+
Load a pretrained Tacotron2 model
6054
```python
61-
tacotron2 = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_tacotron2')
55+
tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp32')
6256
tacotron2 = tacotron2.to('cuda')
6357
tacotron2.eval()
6458
```
6559

66-
Now, let's make the model say *"hello world, I missed you"*
67-
60+
Now, let's make the model say:
6861
```python
69-
text = "hello world, I missed you"
62+
text = "hello world, I missed you so much"
7063
```
7164

72-
Now chain pre-processing -> tacotron2 -> waveglow
73-
65+
Format the input using utility methods
7466
```python
75-
# preprocessing
76-
sequence = np.array(tacotron2.text_to_sequence(text, ['english_cleaners']))[None, :]
77-
sequence = torch.from_numpy(sequence).to(device='cuda', dtype=torch.int64)
67+
utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')
68+
sequences, lengths = utils.prepare_input_sequence([text])
69+
```
7870

79-
# run the models
71+
Run the chained models
72+
```python
8073
with torch.no_grad():
81-
_, mel, _, _ = tacotron2.infer(sequence)
74+
mel, _, _ = tacotron2.infer(sequences, lengths)
8275
audio = waveglow.infer(mel)
8376
audio_numpy = audio[0].data.cpu().numpy()
8477
rate = 22050
8578
```
8679

8780
You can write it to a file and listen to it
88-
8981
```python
82+
from scipy.io.wavfile import write
9083
write("audio.wav", rate, audio_numpy)
9184
```
9285

93-
9486
Alternatively, play it right away in a notebook with IPython widgets
95-
9687
```python
9788
from IPython.display import Audio
9889
Audio(audio_numpy, rate=rate)
9990
```
10091

10192
### Details
102-
For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch)
93+
For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)
10394

10495
### References
10596

10697
- [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884)
10798
- [WaveGlow: A Flow-based Generative Network for Speech Synthesis](https://arxiv.org/abs/1811.00002)
108-
- [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch)
109-
- [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2)
99+
- [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)
100+
- [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2)

0 commit comments

Comments
 (0)