Updated Nvidia models (#204)

nv-kkudrynski · web-flow · commit 552c77959ae3 · 2021-06-12T10:29:28.000-04:00
diff --git a/nvidia_deeplearningexamples_ssd.md b/nvidia_deeplearningexamples_ssd.md
@@ -17,19 +17,6 @@ order: 10
 ---
 
 
-```python
-import torch
-precision = 'fp32'
-ssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd', model_math=precision)
-```
-
-will load an SSD model pretrained on COCO dataset from Torch Hub.
-
-Setting precision='fp16' will load a checkpoint trained with [mixed precision](https://arxiv.org/abs/1710.03740) into architecture enabling execution on [Tensor Cores](https://developer.nvidia.com/tensor-cores).
-Handling mixed precision data requires [Apex](https://github.com/NVIDIA/apex) library.
-
-
-
 ### Model Description
 
 This SSD300 model is based on the
@@ -56,17 +43,17 @@ they are enhanced by additional BatchNorm layers after each convolution.
 
 ### Example
 
-In the example below we will use the pretrained SSD model loaded from Torch Hub to detect objects in sample images and visualize the result.
-
-To run the example you need some extra python packages installed.
-These are needed for preprocessing images and visualization.
+In the example below we will use the pretrained SSD model to detect objects in sample images and visualize the result.
 
+To run the example you need some extra python packages installed. These are needed for preprocessing images and visualization.
 ```bash
 pip install numpy scipy scikit-image matplotlib
 ```
 
-For convenient and comprehensive formatting of input and output of the model, load a set of utility methods.
+Load an SSD model pretrained on COCO dataset, as well as a set of utility methods for convenient and comprehensive formatting of input and output of the model.
 ```python
+import torch
+ssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd')
 utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd_processing_utils')
 ```
 
@@ -76,7 +63,6 @@ ssd_model.to('cuda')
 ssd_model.eval()
 ```
 
-
 Prepare input images for object detection.
 (Example links below correspond to first few test images from the COCO dataset, but you can also specify paths to your local images here)
 ```python
@@ -86,30 +72,33 @@ uris = [
     'http://images.cocodataset.org/val2017/000000252219.jpg'
 ]
 ```
+
 Format the images to comply with the network input and convert them to tensor.
 ```python
 inputs = [utils.prepare_input(uri) for uri in uris]
-tensor = utils.prepare_tensor(inputs, precision == 'fp16')
+tensor = utils.prepare_tensor(inputs)
 ```
 
-
 Run the SSD network to perform object detection.
 ```python
 with torch.no_grad():
     detections_batch = ssd_model(tensor)
 ```
+
 By default, raw output from SSD network per input image contains
 8732 boxes with localization and class probability distribution.
 Let's filter this output to only get reasonable detections (confidence>40%) in a more comprehensive format.
 ```python
 results_per_input = utils.decode_results(detections_batch)
 best_results_per_input = [utils.pick_best(results, 0.40) for results in results_per_input]
 ```
+
 The model was trained on COCO dataset, which we need to access in order to translate class IDs into object names.
 For the first time, downloading annotations may take a while.
 ```python
 classes_to_labels = utils.get_coco_object_dictionary()
 ```
+
 Finally, let's visualize our detections
 ```python
 from matplotlib import pyplot as plt
@@ -131,16 +120,15 @@ for image_idx in range(len(best_results_per_input)):
 plt.show()
 ```
 
-
 ### Details
 For detailed information on model input and output,
 training recipies, inference and performance visit:
 [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)
-and/or [NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:ssd_for_pytorch)
+and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch)
 
 ### References
 
  - [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) paper
  - [Speed/accuracy trade-offs for modern convolutional object detectors](https://arxiv.org/abs/1611.10012) paper
- - [SSD on NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:ssd_for_pytorch)
- - [SSD on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)
+ - [SSD on NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch)
+ - [SSD on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)
diff --git a/nvidia_deeplearningexamples_tacotron2.md b/nvidia_deeplearningexamples_tacotron2.md
@@ -16,19 +16,6 @@ accelerator: cuda
 order: 10
 ---
 
-To run the example you need some extra python packages installed.
-These are needed for preprocessing the text and audio, as well as for display and input / output.
-
-```bash
-pip install numpy scipy librosa unidecode inflect librosa
-```
-
-```python
-import torch
-tacotron2 = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_tacotron2')
-```
-
-will load the Tacotron2 model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)
 
 ### Model Description
 
@@ -40,72 +27,72 @@ This implementation of Tacotron 2 model differs from the model described in the
 
 In the example below:
 - pretrained Tacotron2 and Waveglow models are loaded from torch.hub
-- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you")
+- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you so much")
 - Waveglow generates sound given the mel spectrogram
 - the output sound is saved in an 'audio.wav' file
 
-```python
-import numpy as np
-from scipy.io.wavfile import write
+To run the example you need some extra python packages installed.
+These are needed for preprocessing the text and audio, as well as for display and input / output.
+```bash
+pip install numpy scipy librosa unidecode inflect librosa
+apt-get update
+apt-get install -y libsndfile1
 ```
 
-Prepare tacotron2 for inference
-
+Load the Tacotron2 model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/) and prepare it for inference:
 ```python
+import torch
+tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')
 tacotron2 = tacotron2.to('cuda')
 tacotron2.eval()
 ```
 
-Load waveglow from PyTorch Hub
-
+Load pretrained WaveGlow model
 ```python
-waveglow = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_waveglow')
+waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp16')
 waveglow = waveglow.remove_weightnorm(waveglow)
 waveglow = waveglow.to('cuda')
 waveglow.eval()
 ```
 
-Now, let's make the model say *"hello world, I missed you"*
-
+Now, let's make the model say:
 ```python
-text = "hello world, I missed you"
+text = "Hello world, I missed you so much."
 ```
 
-Now chain pre-processing -> tacotron2 -> waveglow
-
+Format the input using utility methods
 ```python
-# preprocessing
-sequence = np.array(tacotron2.text_to_sequence(text, ['english_cleaners']))[None, :]
-sequence = torch.from_numpy(sequence).to(device='cuda', dtype=torch.int64)
+utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')
+sequences, lengths = utils.prepare_input_sequence([text])
+```
 
-# run the models
+Run the chained models:
+```python
 with torch.no_grad():
-    _, mel, _, _ = tacotron2.infer(sequence)
+    mel, _, _ = tacotron2.infer(sequences, lengths)
     audio = waveglow.infer(mel)
 audio_numpy = audio[0].data.cpu().numpy()
 rate = 22050
 ```
 
 You can write it to a file and listen to it
-
 ```python
+from scipy.io.wavfile import write
 write("audio.wav", rate, audio_numpy)
 ```
 
-
 Alternatively, play it right away in a notebook with IPython widgets
-
 ```python
 from IPython.display import Audio
 Audio(audio_numpy, rate=rate)
 ```
 
 ### Details
-For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch)
+For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)
 
 ### References
 
  - [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884)
  - [WaveGlow: A Flow-based Generative Network for Speech Synthesis](https://arxiv.org/abs/1811.00002)
- - [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch)
- - [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2)
+ - [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)
+ - [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2)
diff --git a/nvidia_deeplearningexamples_waveglow.md b/nvidia_deeplearningexamples_waveglow.md
@@ -13,15 +13,9 @@ github-id: NVIDIA/DeepLearningExamples
 featured_image_1: waveglow_diagram.png
 featured_image_2: no-image
 accelerator: cuda
-order: 3
-demo-model-link: https://colab.research.google.com/drive/1omouh8c4XIoZR1vw91X5AQUY2_nIKeNz?usp=sharing
+order: 10
 ---
 
-```python
-import torch
-waveglow = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_waveglow')
-```
-will load the WaveGlow model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)
 
 ### Model Description
 
@@ -31,79 +25,76 @@ The Tacotron 2 and WaveGlow model form a text-to-speech system that enables user
 
 In the example below:
 - pretrained Tacotron2 and Waveglow models are loaded from torch.hub
-- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you")
+- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you so much")
 - Waveglow generates sound given the mel spectrogram
 - the output sound is saved in an 'audio.wav' file
 
 To run the example you need some extra python packages installed.
 These are needed for preprocessing the text and audio, as well as for display and input / output.
-
 ```bash
 pip install numpy scipy librosa unidecode inflect librosa
+apt-get update
+apt-get install -y libsndfile1
 ```
 
+Load the WaveGlow model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)
 ```python
-import numpy as np
-from scipy.io.wavfile import write
+import torch
+waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp32')
 ```
 
-Prepare the waveglow model for inference
-
+Prepare the WaveGlow model for inference
 ```python
 waveglow = waveglow.remove_weightnorm(waveglow)
 waveglow = waveglow.to('cuda')
 waveglow.eval()
 ```
 
-Load tacotron2 from PyTorch Hub
-
+Load a pretrained Tacotron2 model
 ```python
-tacotron2 = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_tacotron2')
+tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp32')
 tacotron2 = tacotron2.to('cuda')
 tacotron2.eval()
 ```
 
-Now, let's make the model say *"hello world, I missed you"*
-
+Now, let's make the model say:
 ```python
-text = "hello world, I missed you"
+text = "hello world, I missed you so much"
 ```
 
-Now chain pre-processing -> tacotron2 -> waveglow
-
+Format the input using utility methods
 ```python
-# preprocessing
-sequence = np.array(tacotron2.text_to_sequence(text, ['english_cleaners']))[None, :]
-sequence = torch.from_numpy(sequence).to(device='cuda', dtype=torch.int64)
+utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')
+sequences, lengths = utils.prepare_input_sequence([text])
+```
 
-# run the models
+Run the chained models
+```python
 with torch.no_grad():
-    _, mel, _, _ = tacotron2.infer(sequence)
+    mel, _, _ = tacotron2.infer(sequences, lengths)
     audio = waveglow.infer(mel)
 audio_numpy = audio[0].data.cpu().numpy()
 rate = 22050
 ```
 
 You can write it to a file and listen to it
-
 ```python
+from scipy.io.wavfile import write
 write("audio.wav", rate, audio_numpy)
 ```
 
-
 Alternatively, play it right away in a notebook with IPython widgets
-
 ```python
 from IPython.display import Audio
 Audio(audio_numpy, rate=rate)
 ```
 
 ### Details
-For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch)
+For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)
 
 ### References
 
  - [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884)
  - [WaveGlow: A Flow-based Generative Network for Speech Synthesis](https://arxiv.org/abs/1811.00002)
- - [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch)
- - [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2)
+ - [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)
+ - [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2)