Planned/Potential of significant work

- [ ] Fp8 kv-cache
- [ ] Kv-cache prefix reuse
- [ ] Grammar constrained speedup
- [ ] `torch.compile` like speedups
- [ ] Simple one-liner `pip install`
- [ ] Multi lora support (lorax kind of)
- [x] Marlin quantization
- [x] Exl2 quants (non gptq 2, 3, 4.5 bpw)
- [ ] Adding more documentation/help regarding getting production grafana dashboards easily setup.