1. Need to design and implement standard interface like [TextGeneration](https://github.com/TabbyML/tabby/blob/main/crates/tabby-inference/src/lib.rs#L24). 2. Consider add bert based embedding to upstream llama.cpp for integrate encoder-decoder model Related: https://github.com/ggerganov/llama.cpp/blob/master/examples/embedding/embedding.cpp https://github.com/ggerganov/llama.cpp/issues/2872