- [ ] Fp8 kv-cache - [ ] Kv-cache prefix reuse - [ ] Grammar constrained speedup - [ ] `torch.compile` like speedups - [ ] Simple one-liner `pip install` - [ ] Multi lora support (lorax kind of) - [x] Marlin quantization - [x] Exl2 quants (non gptq 2, 3, 4.5 bpw) - [ ] Adding more documentation/help regarding getting production grafana dashboards easily setup.