Support llama.cpp  "Multi GPU support, CUDA refactor, CUDA scratch buffer"

Multi-GPU inference is essential for small VRAM GPU. 13B llama model cannot fit in a single 3090 unless using quantization.

llama.cpp yesterday merge multi gpu branch, which help us using small VRAM GPUS to deploy LLM.
ggerganov/llama.cpp#1703

Hope llama-cpp-python can support multi GPU inference in the future.
Many thanks!!!