Commit 5172c0c
committed
Optimize sparse 2:4 compression performance (3.69x speedup)
- Implement GPU-accelerated bit packing in pack_bitmasks()
- Remove unnecessary CPU transfers in sparse compression pipeline
- Optimize topk operation with sorted=False parameter
Achieves 3.69x speedup (22.57s → 6.12s) for 8B parameter models by keeping operations on GPU and eliminating device transfers.1 parent 3fb2844 commit 5172c0c
File tree
2 files changed
+49
-9
lines changed- src/compressed_tensors
- compressors/sparse_compressors
- utils
2 files changed
+49
-9
lines changedLines changed: 9 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
| 93 | + | |
93 | 94 | | |
94 | | - | |
| 95 | + | |
95 | 96 | | |
| 97 | + | |
96 | 98 | | |
97 | 99 | | |
98 | | - | |
99 | | - | |
| 100 | + | |
| 101 | + | |
100 | 102 | | |
101 | 103 | | |
102 | 104 | | |
| |||
233 | 235 | | |
234 | 236 | | |
235 | 237 | | |
236 | | - | |
| 238 | + | |
| 239 | + | |
237 | 240 | | |
238 | 241 | | |
239 | 242 | | |
240 | | - | |
| 243 | + | |
| 244 | + | |
241 | 245 | | |
242 | 246 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
301 | 301 | | |
302 | 302 | | |
303 | 303 | | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
308 | 344 | | |
309 | 345 | | |
310 | 346 | | |
| |||
0 commit comments