You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Existing dlight optimization only works for NT matmul, but not NN. As a
result, the new `nn.Module`-based implementation, which uses NN matmul,
fails compilation at HEAD for now. This PR fixes this issue by tweaking
`k` to the preferred layout.
The following commands now work with the new compilation pipeline:
```bash
python -m mlc_chat.cli.compile --config llama2_7b --quantization q4f16_1 -o /tmp/1.so
python -m mlc_chat.cli.compile --config llama2_13b --quantization q4f16_1 -o /tmp/1.so
python -m mlc_chat.cli.compile --config llama2_70b --quantization q4f16_1 -o /tmp/1.so
```
Note that the quantization algorithm per se, `q4f16_1`, has not been
implemented yet, meaning this code path is not yet ready for use so far.
0 commit comments