Skip to content

Commit 08ac8f5

Browse files
Update deepsparse-readme.md (#848)
* Update deepsparse-readme.md * Update deepsparse-readme.md * Add files via upload * Delete performance-chart.png * Delete cluster-bar.png * Delete sparse-network.svg * Add files via upload * Update deepsparse-readme.md * Update deepsparse-readme.md * Add files via upload * Update deepsparse-readme.md * Update deepsparse-readme.md * Add files via upload * Add files via upload * Add files via upload * Add files via upload * Delete performance-chart.png * Add files via upload * Add files via upload * Update deepsparse-readme.md * Delete performance-chart.png Co-authored-by: Mark Kurtz <[email protected]>
1 parent 38fb244 commit 08ac8f5

File tree

4 files changed

+108
-95
lines changed

4 files changed

+108
-95
lines changed
Binary file not shown.

examples/ultralytics-yolo/ultralytics-readmes/deepsparse-readme.md

Lines changed: 108 additions & 94 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,10 @@ Welcome to software-delivered AI.
2424

2525
This guide explains how to deploy YOLOv5 with Neural Magic's DeepSparse.
2626

27-
DeepSparse is an inference runtime with exceptional performance on CPUs. For instance, compared to ONNX Runtime's baseline, DeepSparse offers a 3.7x speed-up at batch size 1 and a 5.8x speed-up at batch size 64 for YOLOv5s!
27+
DeepSparse is an inference runtime with exceptional performance on CPUs. For instance, compared to the ONNX Runtime baseline, DeepSparse offers a 5.8x speed-up for YOLOv5s, running on the same machine!
2828

2929
<p align="center">
30-
<img width="60%" src="cluster-bar.png">
30+
<img width="60%" src="performance-chart-5.8x.png">
3131
</p>
3232

3333
For the first time, your deep learning workloads can meet the performance demands of production without the complexity and costs of hardware accelerators.
@@ -77,111 +77,21 @@ DeepSparse accepts a model in the ONNX format, passed either as:
7777
- A SparseZoo stub which identifies an ONNX file in the SparseZoo
7878
- A local path to an ONNX model in a filesystem
7979

80-
The examples below will use the standard dense YOLOv5s and pruned-quantized YOLOv5s checkpoints, identified by the following SparseZoo stubs:
80+
The examples below use the standard dense and pruned-quantized YOLOv5s checkpoints, identified by the following SparseZoo stubs:
8181
```bash
8282
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
8383
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none
84-
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned35_quant-none-vnni # < pruned for VNNI machines
85-
```
86-
87-
### Benchmark Performance
88-
89-
We will compare DeepSparse's throughput to ONNX Runtime's throughput on YOLOv5s, using DeepSparse's benchmarking script.
90-
91-
The benchmarks were run on an AWS `c6i.8xlarge` instance (16 cores).
92-
93-
#### Batch 1 Performance Comparison
94-
95-
ONNX Runtime achieves 49 images/sec with dense YOLOv5s.
96-
97-
```bash
98-
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 1 -nstreams 1 -e onnxruntime
99-
100-
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
101-
> Batch Size: 1
102-
> Scenario: sync
103-
> Throughput (items/sec): 48.8549
104-
> Latency Mean (ms/batch): 20.4613
105-
> Latency Median (ms/batch): 20.4192
106-
```
107-
108-
DeepSparse achieves 135 items/sec with a pruned-quantized YOLOv5s, **a 2.8x performance gain over ONNX Runtime!**
109-
110-
```bash
111-
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none -s sync -b 1 -nstreams 1
112-
113-
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none
114-
> Batch Size: 1
115-
> Scenario: sync
116-
> Throughput (items/sec): 135.0647
117-
> Latency Mean (ms/batch): 7.3895
118-
> Latency Median (ms/batch): 7.2398```
119-
```
120-
121-
Since `c6i.8xlarge` instances have VNNI instructions, DeepSparse's throughput can be pushed further if weights are pruned in blocks of 4. DeepSparse achieves 180 items/sec with a 4-block pruned-quantized YOLOv5s, a **3.7x performance gain over ONNX Runtime!**
122-
123-
```bash
124-
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned35_quant-none-vnni -s sync -b 1 -nstreams 1
125-
126-
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned35_quant-none-vnni
127-
> Batch Size: 1
128-
> Scenario: sync
129-
> Throughput (items/sec): 179.6016
130-
> Latency Mean (ms/batch): 5.5615
131-
> Latency Median (ms/batch): 5.5458
132-
```
133-
134-
#### Batch 64 Performance Comparison
135-
136-
In latency-insensitive scenarios with large batch sizes, DeepSparse's performance relative to ONNX Runtime is even stronger.
137-
138-
ONNX Runtime achieves 42 images/sec with dense YOLOv5s:
139-
140-
```bash
141-
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 64 -nstreams 1 -e onnxruntime
142-
143-
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
144-
> Batch Size: 64
145-
> Scenario: sync
146-
> Throughput (items/sec): 41.5560
147-
> Latency Mean (ms/batch): 1538.6640
148-
> Latency Median (ms/batch): 1538.0362
149-
```
150-
151-
DeepSparse achieves 239 images/sec with pruned-quantized YOLOv5s, a **5.8x performance improvement over ORT**!
152-
153-
```bash
154-
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none -s sync -b 64 -nstreams 1
155-
156-
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none
157-
> Batch Size: 64
158-
> Scenario: sync
159-
> Throughput (items/sec): 239.0854
160-
> Latency Mean (ms/batch): 267.6703
161-
> Latency Median (ms/batch): 267.3194
16284
```
16385

16486
### Deploy a Model
16587

16688
DeepSparse offers convenient APIs for integrating your model into an application.
16789

168-
To try the deployment examples below, pull down a sample image for the example and save as `basilica.jpg` with the following command:
90+
To try the deployment examples below, pull down a sample image and save it as `basilica.jpg` with the following:
16991
```bash
17092
wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg
17193
```
17294

173-
#### Annotate CLI
174-
You can also use the annotate command to have the engine save an annotated photo on disk. Try --source 0 to annotate your live webcam feed!
175-
```bash
176-
deepsparse.object_detection.annotate --model_filepath zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 --source basilica.jpg
177-
```
178-
179-
Running the above command will create an `annotation-results` folder and save the annotated image inside.
180-
181-
<p align = "center">
182-
<img src="https://github.com/neuralmagic/deepsparse/blob/d31f02596ebff2ec62761d0bc9ca14c4663e8858/src/deepsparse/yolo/sample_images/basilica-annotated.jpg" alt="annotated" width="60%"/>
183-
</p>
184-
18595
#### Python API
18696

18797
`Pipelines` wrap pre-processing and output post-processing around the runtime, providing a clean inferface for adding DeepSparse to an application.
@@ -239,6 +149,110 @@ bounding_boxes = annotations["boxes"]
239149
labels = annotations["labels"]
240150
```
241151

152+
#### Annotate CLI
153+
You can also use the annotate command to have the engine save an annotated photo on disk. Try --source 0 to annotate your live webcam feed!
154+
```bash
155+
deepsparse.object_detection.annotate --model_filepath zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none --source basilica.jpg
156+
```
157+
158+
Running the above command will create an `annotation-results` folder and save the annotated image inside.
159+
160+
<p align = "center">
161+
<img src="https://github.com/neuralmagic/deepsparse/blob/d31f02596ebff2ec62761d0bc9ca14c4663e8858/src/deepsparse/yolo/sample_images/basilica-annotated.jpg" alt="annotated" width="60%"/>
162+
</p>
163+
164+
## Benchmarking Performance
165+
166+
We will compare DeepSparse's throughput to ONNX Runtime's throughput on YOLOv5s, using DeepSparse's benchmarking script.
167+
168+
The benchmarks were run on an AWS `c6i.8xlarge` instance (16 cores).
169+
170+
### Batch 32 Performance Comparsion
171+
172+
#### ONNX Runtime Baseline
173+
174+
At batch 32, ONNX Runtime achieves 42 images/sec with the standard dense YOLOv5s:
175+
176+
```bash
177+
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 32 -nstreams 1 -e onnxruntime
178+
179+
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
180+
> Batch Size: 32
181+
> Scenario: sync
182+
> Throughput (items/sec): 41.9025
183+
```
184+
185+
#### DeepSparse Dense Performance
186+
187+
While DeepSparse offers its best performance with optimized sparse models, it also performs well with the standard dense YOLOv5s.
188+
189+
At batch 32, DeepSparse achieves 70 images/sec with the standard dense YOLOv5s, a **1.7x performance improvement over ORT**!
190+
191+
```bash
192+
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 32 -nstreams 1
193+
194+
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
195+
> Batch Size: 32
196+
> Scenario: sync
197+
> Throughput (items/sec): 69.5546
198+
```
199+
#### DeepSparse Sparse Performance
200+
201+
When sparsity is applied to the model, DeepSparse's performance gains over ONNX Runtime is even stronger.
202+
203+
At batch 32, DeepSparse achieves 241 images/sec with the pruned-quantized YOLOv5s, a **5.8x performance improvement over ORT**!
204+
205+
```bash
206+
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none -s sync -b 32 -nstreams 1
207+
208+
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none
209+
> Batch Size: 32
210+
> Scenario: sync
211+
> Throughput (items/sec): 241.2452
212+
```
213+
214+
### Batch 1 Performance Comparison
215+
216+
DeepSparse is also able to gain a speed-up over ONNX Runtime for the latency-sensitive, batch 1 scenario.
217+
218+
#### ONNX Runtime Baseline
219+
At batch 1, ONNX Runtime achieves 48 images/sec with the standard, dense YOLOv5s.
220+
221+
```bash
222+
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 1 -nstreams 1 -e onnxruntime
223+
224+
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
225+
> Batch Size: 1
226+
> Scenario: sync
227+
> Throughput (items/sec): 48.0921
228+
```
229+
230+
#### DeepSparse Sparse Performance
231+
232+
At batch 1, DeepSparse achieves 135 items/sec with a pruned-quantized YOLOv5s, **a 2.8x performance gain over ONNX Runtime!**
233+
234+
```bash
235+
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none -s sync -b 1 -nstreams 1
236+
237+
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none
238+
> Batch Size: 1
239+
> Scenario: sync
240+
> Throughput (items/sec): 134.9468
241+
```
242+
243+
Since `c6i.8xlarge` instances have VNNI instructions, DeepSparse's throughput can be pushed further if weights are pruned in blocks of 4.
244+
245+
At batch 1, DeepSparse achieves 180 items/sec with a 4-block pruned-quantized YOLOv5s, a **3.7x performance gain over ONNX Runtime!**
246+
247+
```bash
248+
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned35_quant-none-vnni -s sync -b 1 -nstreams 1
249+
250+
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned35_quant-none-vnni
251+
> Batch Size: 1
252+
> Scenario: sync
253+
> Throughput (items/sec): 179.7375
254+
```
255+
242256
## Get Started With DeepSparse
243257

244258
**Research or Testing?** DeepSparse Community is free for research and testing. Get started with our [Documentation](https://docs.neuralmagic.com/).
Loading

examples/ultralytics-yolo/ultralytics-readmes/sparse-network.svg

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)