Commit 31cb58c
committed
[ET-VK][ez] Allow logit linear layer to be lowered to Vulkan
## Context
Due to poor performance of Vulkan's int4 linear operator, the final logit layer of the transformer model was not being delegated to vulkan, and was instead quantized and executed with the XNNPACK delegate.
However, with D72412950 / #9883 decent performance can now be achieved with Vulkan/s int4 linear op. Therefore, the final logit layer can be lowered to Vulkan.
## Changes
* Remove limit from `VkInt4WeightOnlyQuantizer` that was causing it to ignore the logit layer of the transformer
* Do not apply XNNPACK partitioner and quantizer when lowering with Vulkan
Differential Revision: [D72480177](https://our.internmc.facebook.com/intern/diff/D72480177/)
ghstack-source-id: 276219519
Pull Request resolved: #99181 parent 9809360 commit 31cb58c
File tree
4 files changed
+3
-28
lines changed- backends/vulkan
- _passes
- examples/models/llama
- source_transformation
4 files changed
+3
-28
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
118 | 118 | | |
119 | 119 | | |
120 | 120 | | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | 121 | | |
125 | 122 | | |
126 | 123 | | |
| |||
131 | 128 | | |
132 | 129 | | |
133 | 130 | | |
134 | | - | |
135 | | - | |
136 | 131 | | |
137 | 132 | | |
138 | 133 | | |
| |||
175 | 170 | | |
176 | 171 | | |
177 | 172 | | |
178 | | - | |
179 | 173 | | |
180 | 174 | | |
181 | 175 | | |
| |||
186 | 180 | | |
187 | 181 | | |
188 | 182 | | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | 183 | | |
193 | 184 | | |
194 | 185 | | |
| |||
197 | 188 | | |
198 | 189 | | |
199 | 190 | | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
| 191 | + | |
204 | 192 | | |
205 | 193 | | |
206 | 194 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
392 | 392 | | |
393 | 393 | | |
394 | 394 | | |
| 395 | + | |
395 | 396 | | |
396 | 397 | | |
397 | 398 | | |
| |||
400 | 401 | | |
401 | 402 | | |
402 | 403 | | |
| 404 | + | |
403 | 405 | | |
404 | 406 | | |
405 | 407 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
793 | 793 | | |
794 | 794 | | |
795 | 795 | | |
796 | | - | |
797 | | - | |
798 | | - | |
799 | | - | |
800 | 796 | | |
801 | 797 | | |
802 | 798 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
206 | 206 | | |
207 | 207 | | |
208 | 208 | | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | 209 | | |
221 | 210 | | |
222 | 211 | | |
| |||
0 commit comments