Closed as not planned
Description
I used google colab cpu to try it. this is my code. i am trying to use larger context but this take so long to generate the text
llm = Llama(model_path="/content/ggml-vicuna-13b-4bit-rev1.bin", n_ctx=2048, n_threads=20, n_batch=2048)
output = llm(
"Question: Any suggestion for delicious food? Answer:",
max_tokens=1024,
temperature=0.9,
top_p=0.95,
repeat_penalty=1.2,
top_k=50,
echo=True
)
print(output)