Commit 0ca8628
Fix garbled output with REPACK at high thread counts (ggml-org#16956)
* Fix garbled output with REPACK at high thread counts
Fixed a race condition in the REPACK matrix multiplication code that caused garbled output when using 26+ threads (model-dependent threshold). The issue occurred because with high thread counts, the code forced chunk count to equal thread count, creating many small chunks. After aligning these chunks to NB_COLS boundaries, adjacent chunks could overlap, causing data corruption and race conditions. The fix enforces minimum chunk sizes based on NB_COLS and caps maximum chunk count to prevent creating too many tiny chunks, ensuring proper alignment without overlaps.
* Update ggml/src/ggml-cpu/repack.cpp
Co-authored-by: Georgi Gerganov <[email protected]>
* Update ggml/src/ggml-cpu/repack.cpp
Co-authored-by: Georgi Gerganov <[email protected]>
---------
Co-authored-by: Georgi Gerganov <[email protected]>1 parent c29136a commit 0ca8628
1 file changed
+25
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1678 | 1678 | | |
1679 | 1679 | | |
1680 | 1680 | | |
| 1681 | + | |
| 1682 | + | |
| 1683 | + | |
| 1684 | + | |
| 1685 | + | |
| 1686 | + | |
| 1687 | + | |
1681 | 1688 | | |
1682 | 1689 | | |
1683 | 1690 | | |
1684 | 1691 | | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
| 1695 | + | |
| 1696 | + | |
| 1697 | + | |
| 1698 | + | |
1685 | 1699 | | |
1686 | 1700 | | |
1687 | 1701 | | |
| |||
1695 | 1709 | | |
1696 | 1710 | | |
1697 | 1711 | | |
| 1712 | + | |
| 1713 | + | |
| 1714 | + | |
1698 | 1715 | | |
1699 | 1716 | | |
| 1717 | + | |
| 1718 | + | |
| 1719 | + | |
| 1720 | + | |
1700 | 1721 | | |
1701 | 1722 | | |
1702 | 1723 | | |
| |||
1808 | 1829 | | |
1809 | 1830 | | |
1810 | 1831 | | |
| 1832 | + | |
1811 | 1833 | | |
1812 | 1834 | | |
| 1835 | + | |
| 1836 | + | |
| 1837 | + | |
1813 | 1838 | | |
1814 | 1839 | | |
1815 | 1840 | | |
| |||
0 commit comments