diff --git a/nvidia_deeplearningexamples_ssd.md b/nvidia_deeplearningexamples_ssd.md index d6fa159d..8c66d771 100644 --- a/nvidia_deeplearningexamples_ssd.md +++ b/nvidia_deeplearningexamples_ssd.md @@ -17,19 +17,6 @@ order: 10 --- -```python -import torch -precision = 'fp32' -ssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd', model_math=precision) -``` - -will load an SSD model pretrained on COCO dataset from Torch Hub. - -Setting precision='fp16' will load a checkpoint trained with [mixed precision](https://arxiv.org/abs/1710.03740) into architecture enabling execution on [Tensor Cores](https://developer.nvidia.com/tensor-cores). -Handling mixed precision data requires [Apex](https://github.com/NVIDIA/apex) library. - - - ### Model Description This SSD300 model is based on the @@ -56,17 +43,17 @@ they are enhanced by additional BatchNorm layers after each convolution. ### Example -In the example below we will use the pretrained SSD model loaded from Torch Hub to detect objects in sample images and visualize the result. - -To run the example you need some extra python packages installed. -These are needed for preprocessing images and visualization. +In the example below we will use the pretrained SSD model to detect objects in sample images and visualize the result. +To run the example you need some extra python packages installed. These are needed for preprocessing images and visualization. ```bash pip install numpy scipy scikit-image matplotlib ``` -For convenient and comprehensive formatting of input and output of the model, load a set of utility methods. +Load an SSD model pretrained on COCO dataset, as well as a set of utility methods for convenient and comprehensive formatting of input and output of the model. ```python +import torch +ssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd') utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd_processing_utils') ``` @@ -76,7 +63,6 @@ ssd_model.to('cuda') ssd_model.eval() ``` - Prepare input images for object detection. (Example links below correspond to first few test images from the COCO dataset, but you can also specify paths to your local images here) ```python @@ -86,18 +72,19 @@ uris = [ 'http://images.cocodataset.org/val2017/000000252219.jpg' ] ``` + Format the images to comply with the network input and convert them to tensor. ```python inputs = [utils.prepare_input(uri) for uri in uris] -tensor = utils.prepare_tensor(inputs, precision == 'fp16') +tensor = utils.prepare_tensor(inputs) ``` - Run the SSD network to perform object detection. ```python with torch.no_grad(): detections_batch = ssd_model(tensor) ``` + By default, raw output from SSD network per input image contains 8732 boxes with localization and class probability distribution. Let's filter this output to only get reasonable detections (confidence>40%) in a more comprehensive format. @@ -105,11 +92,13 @@ Let's filter this output to only get reasonable detections (confidence>40%) in a results_per_input = utils.decode_results(detections_batch) best_results_per_input = [utils.pick_best(results, 0.40) for results in results_per_input] ``` + The model was trained on COCO dataset, which we need to access in order to translate class IDs into object names. For the first time, downloading annotations may take a while. ```python classes_to_labels = utils.get_coco_object_dictionary() ``` + Finally, let's visualize our detections ```python from matplotlib import pyplot as plt @@ -131,16 +120,15 @@ for image_idx in range(len(best_results_per_input)): plt.show() ``` - ### Details For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD) -and/or [NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:ssd_for_pytorch) +and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch) ### References - [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) paper - [Speed/accuracy trade-offs for modern convolutional object detectors](https://arxiv.org/abs/1611.10012) paper - - [SSD on NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:ssd_for_pytorch) - - [SSD on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD) + - [SSD on NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch) + - [SSD on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD) \ No newline at end of file diff --git a/nvidia_deeplearningexamples_tacotron2.md b/nvidia_deeplearningexamples_tacotron2.md index 9b1bfd6f..fffc3ead 100644 --- a/nvidia_deeplearningexamples_tacotron2.md +++ b/nvidia_deeplearningexamples_tacotron2.md @@ -16,19 +16,6 @@ accelerator: cuda order: 10 --- -To run the example you need some extra python packages installed. -These are needed for preprocessing the text and audio, as well as for display and input / output. - -```bash -pip install numpy scipy librosa unidecode inflect librosa -``` - -```python -import torch -tacotron2 = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_tacotron2') -``` - -will load the Tacotron2 model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/) ### Model Description @@ -40,72 +27,72 @@ This implementation of Tacotron 2 model differs from the model described in the In the example below: - pretrained Tacotron2 and Waveglow models are loaded from torch.hub -- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you") +- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you so much") - Waveglow generates sound given the mel spectrogram - the output sound is saved in an 'audio.wav' file -```python -import numpy as np -from scipy.io.wavfile import write +To run the example you need some extra python packages installed. +These are needed for preprocessing the text and audio, as well as for display and input / output. +```bash +pip install numpy scipy librosa unidecode inflect librosa +apt-get update +apt-get install -y libsndfile1 ``` -Prepare tacotron2 for inference - +Load the Tacotron2 model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/) and prepare it for inference: ```python +import torch +tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16') tacotron2 = tacotron2.to('cuda') tacotron2.eval() ``` -Load waveglow from PyTorch Hub - +Load pretrained WaveGlow model ```python -waveglow = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_waveglow') +waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp16') waveglow = waveglow.remove_weightnorm(waveglow) waveglow = waveglow.to('cuda') waveglow.eval() ``` -Now, let's make the model say *"hello world, I missed you"* - +Now, let's make the model say: ```python -text = "hello world, I missed you" +text = "Hello world, I missed you so much." ``` -Now chain pre-processing -> tacotron2 -> waveglow - +Format the input using utility methods ```python -# preprocessing -sequence = np.array(tacotron2.text_to_sequence(text, ['english_cleaners']))[None, :] -sequence = torch.from_numpy(sequence).to(device='cuda', dtype=torch.int64) +utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils') +sequences, lengths = utils.prepare_input_sequence([text]) +``` -# run the models +Run the chained models: +```python with torch.no_grad(): - _, mel, _, _ = tacotron2.infer(sequence) + mel, _, _ = tacotron2.infer(sequences, lengths) audio = waveglow.infer(mel) audio_numpy = audio[0].data.cpu().numpy() rate = 22050 ``` You can write it to a file and listen to it - ```python +from scipy.io.wavfile import write write("audio.wav", rate, audio_numpy) ``` - Alternatively, play it right away in a notebook with IPython widgets - ```python from IPython.display import Audio Audio(audio_numpy, rate=rate) ``` ### Details -For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch) +For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch) ### References - [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) - [WaveGlow: A Flow-based Generative Network for Speech Synthesis](https://arxiv.org/abs/1811.00002) - - [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch) - - [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) + - [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch) + - [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) \ No newline at end of file diff --git a/nvidia_deeplearningexamples_waveglow.md b/nvidia_deeplearningexamples_waveglow.md index bb5a42a8..f71da055 100644 --- a/nvidia_deeplearningexamples_waveglow.md +++ b/nvidia_deeplearningexamples_waveglow.md @@ -13,15 +13,9 @@ github-id: NVIDIA/DeepLearningExamples featured_image_1: waveglow_diagram.png featured_image_2: no-image accelerator: cuda -order: 3 -demo-model-link: https://colab.research.google.com/drive/1omouh8c4XIoZR1vw91X5AQUY2_nIKeNz?usp=sharing +order: 10 --- -```python -import torch -waveglow = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_waveglow') -``` -will load the WaveGlow model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/) ### Model Description @@ -31,79 +25,76 @@ The Tacotron 2 and WaveGlow model form a text-to-speech system that enables user In the example below: - pretrained Tacotron2 and Waveglow models are loaded from torch.hub -- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you") +- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you so much") - Waveglow generates sound given the mel spectrogram - the output sound is saved in an 'audio.wav' file To run the example you need some extra python packages installed. These are needed for preprocessing the text and audio, as well as for display and input / output. - ```bash pip install numpy scipy librosa unidecode inflect librosa +apt-get update +apt-get install -y libsndfile1 ``` +Load the WaveGlow model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/) ```python -import numpy as np -from scipy.io.wavfile import write +import torch +waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp32') ``` -Prepare the waveglow model for inference - +Prepare the WaveGlow model for inference ```python waveglow = waveglow.remove_weightnorm(waveglow) waveglow = waveglow.to('cuda') waveglow.eval() ``` -Load tacotron2 from PyTorch Hub - +Load a pretrained Tacotron2 model ```python -tacotron2 = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_tacotron2') +tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp32') tacotron2 = tacotron2.to('cuda') tacotron2.eval() ``` -Now, let's make the model say *"hello world, I missed you"* - +Now, let's make the model say: ```python -text = "hello world, I missed you" +text = "hello world, I missed you so much" ``` -Now chain pre-processing -> tacotron2 -> waveglow - +Format the input using utility methods ```python -# preprocessing -sequence = np.array(tacotron2.text_to_sequence(text, ['english_cleaners']))[None, :] -sequence = torch.from_numpy(sequence).to(device='cuda', dtype=torch.int64) +utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils') +sequences, lengths = utils.prepare_input_sequence([text]) +``` -# run the models +Run the chained models +```python with torch.no_grad(): - _, mel, _, _ = tacotron2.infer(sequence) + mel, _, _ = tacotron2.infer(sequences, lengths) audio = waveglow.infer(mel) audio_numpy = audio[0].data.cpu().numpy() rate = 22050 ``` You can write it to a file and listen to it - ```python +from scipy.io.wavfile import write write("audio.wav", rate, audio_numpy) ``` - Alternatively, play it right away in a notebook with IPython widgets - ```python from IPython.display import Audio Audio(audio_numpy, rate=rate) ``` ### Details -For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch) +For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch) ### References - [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) - [WaveGlow: A Flow-based Generative Network for Speech Synthesis](https://arxiv.org/abs/1811.00002) - - [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/model-scripts/nvidia:tacotron_2_and_waveglow_for_pytorch) - - [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) + - [Tacotron2 and WaveGlow on NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch) + - [Tacotron2 and Waveglow on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) \ No newline at end of file