llama-cli: command not found #1

tjthejuggler · 2023-03-21T08:45:19Z

Thanks so much for making and sharing this!

The first command works perfectly, but when I do the one that starts llama-cli I get 'command not found'

bossbaby@Will-of-Steve:~/projects/llama-cli$ sudo docker run -ti --rm quay.io/go-skynet/llama-cli:latest --instruction "What's an alpaca?" --topk 10000

Alpacas are domesticated animals that are closely related to llamas and camels. They are native to the Andes Mountains in South America, where they were first domesticated by the Incas.

bossbaby@Will-of-Steve:~/projects/llama-cli$ llama-cli --model ~/ggml-alpaca-7b-q4.bin --instruction "What's an alpaca?"
llama-cli: command not found

Also, I saw from the issue post in the alpaca.cpp github that with this project alpaca should be running in memory all the time, but it seems like it has to start up a new instance every time I run that first command, also when i do 'ps aux | grep alpaca' after that first command has completed there seems to be no process with 'alpaca' running. Is it possible with this to get responses as fast as in the original alpaca.cpp, but with this awesome single command API-style system?

mudler · 2023-03-21T18:05:18Z

Hi @tjthejuggler !

Indeed the first command starts a new instance each time, and it is used or for troubleshooting and/or automating things by piping commands to it. To have a long-running instance, start it in API mode:

docker run -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.1 api

And in another terminal run inferences with curl:

curl --location --request POST 'http://localhost:8080/predict' --header 'Content-Type: application/json' --data-raw '{
    "text": "What is an alpaca?",
    "topP": 0.8,
    "topK": 50,
    "temperature": 0.7,
    "tokens": 100
}'

The API will keep the model loaded into memory, and it's a long running process

tjthejuggler · 2023-03-22T15:10:20Z

hey @mudler, thanks so much for the help! I've got another question, everytime I run it, it makes me download that 3.839gig file again. I don't know where it is downloading it to, I can't seem to find any file that size on my HD. I assumed with it's size that it was the 7B model, so I tried pointing it that model which I already have downloaded, but it still wants to download the 3.839gig file again.

$ sudo docker run -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.1 api --model /models/ggml-alpaca-7b-q4.bin
Unable to find image 'quay.io/go-skynet/llama-cli:v0.1' locally
v0.1: Pulling from go-skynet/llama-cli
32fb02163b6b: Already exists
167c7feebee8: Already exists
d6dfff1f6f3d: Already exists
e9cdcd4942eb: Already exists
543368fb39ee: Already exists
5898d990df6b: Already exists
9602be2ba0fe: Already exists
dda7abc9e477: Pull complete
13679b03456b: Pull complete
c5704ac31306: Pull complete
8f2899c04205: Downloading 11.34MB/68.38MB
c829f586020d: Download complete
0837277f1cf1: Downloading 10.27MB/3.839GB
05ea17c3de8f: Download complete

Thanks again, I really appreciate your time and effort!

mudler · 2023-03-23T17:39:17Z

looks like there is something wrong in your docker installation, images shouldn't be cleaned up between calls, how did you installed docker?

tjthejuggler · 2023-03-23T20:53:20Z

@mudler

I have no experience with it, I hadn't even heard of it until setting up your project. All I did to set it up was follow these instructions exactly:

https://docs.docker.com/engine/install/ubuntu/

I will look into debugging it knowing that the issue is that the image is being cleaned up between calls. Thank you!

tjthejuggler · 2023-03-24T06:02:38Z

The issue has been solved! when I ran 'sudo docker images' I saw that the image was listed in there and tagged 'latest', but when i was running the command it had ':v0.1' at the end, i switched to it ':latest' and it worked beautifully. Thanks so much, I really appreciate it!

tjthejuggler closed this as completed Mar 24, 2023

iocron mentioned this issue May 25, 2023

Manual make/build with stablediffusion fails #376

Closed

mike-niemand mentioned this issue Jun 12, 2023

Endpoint disabled for this model by API configuration 500 #575

Closed

marpanda00 mentioned this issue Jun 15, 2023

chatbot-ui fetching models with POST gets 405 - Method not allowed #607

Closed

christiancadieux mentioned this issue Jun 20, 2023

langchain-chroma/store.py makes LocalAI crash #636

Open

Iheuzio mentioned this issue Jun 23, 2023

Error: Docker build failed, gzip: stdin: not in gzip format #662

Closed

Pablo1107 mentioned this issue Jul 7, 2023

Issue regarding falcon-7b quantized #728

Closed

ddosakura mentioned this issue Jul 8, 2023

Prebuilt image /tts reports an error #730

Open

Rybens92 mentioned this issue Jul 20, 2023

Error while 'make build' #781

Closed

nibiru5 mentioned this issue Jul 26, 2023

WSL Ubuntu - " Build Error cp: cannot stat 'CMakeFiles/ggml.dir/ggml-cuda.cu.o' " #810

Open

emakkus mentioned this issue Jul 26, 2023

Cuda inference doesn't work anymore! #812

Closed

iamashwin99 mentioned this issue Jul 31, 2023

espeak-ng-data make build fails #848

Closed

emakkus mentioned this issue Aug 7, 2023

Inference crashes when context is bigger than ~500 Tokens #868

Closed

This was referenced Aug 24, 2023

could not load model: rpc error: code = Unavailable desc = error reading from server: EOF #950

Closed

Ridiculously slow on RTX 3090!?! #957

Closed

Noooste mentioned this issue Aug 27, 2023

Loop in answer #969

Closed

This was referenced Sep 14, 2023

frequency_penalty and presence_penalty from curl request discarded #1051

Closed

Alternating two models inference with the first one only one #1061

Closed

RussellPacheco mentioned this issue Sep 26, 2023

LLama2 failing to load in Docker with cuBLAS #1109

Closed

xsolinsx mentioned this issue Sep 27, 2023

curl empty response using Docker container with StableDiffusion #1111

Open

mutschler mentioned this issue Oct 13, 2023

Audio File not created when using /tts with bark backend and custom model #1173

Open

TwinFinz mentioned this issue Mar 1, 2024

Out of date huggingface-embeddings backend #1783

Closed

longdafeng mentioned this issue Mar 7, 2024

Failed to build "sources/go-llama/llama.go:372:13: undefined: min" #1807

Closed

ga-it mentioned this issue Mar 24, 2024

LocalAI sends empty chunk to chatbot_ui and closes stream #1886

Closed

enricoros mentioned this issue Mar 25, 2024

[AIO] WSL detection for NVIDIA VRAM size is not working #1893

Closed

1 task

splitbrain mentioned this issue Apr 2, 2024

rpc error: code = Unknown desc = unimplemented instead of chat completion #1946

Open

renovate bot mentioned this issue Apr 10, 2024

feat(container)!: Update image quay.io/go-skynet/local-ai to v2 - autoclosed lenaxia/home-ops-dev#253

Closed

1 task

chris-sanders mentioned this issue Apr 10, 2024

Exl2 models don't seem to be working with exllama2 #1987

Open

lenaxia mentioned this issue May 12, 2024

Function calling results in bad state for all LLM models #2293

Closed

shizidushu mentioned this issue Jun 16, 2024

Unable to use BAAI/bge-reranker-base model for reranking #2577

Open

JamesClarke7283 mentioned this issue Jun 20, 2024

worker llama-cpp-rpc crash #2609

Closed

techResearcher2021 mentioned this issue Jun 27, 2024

There is something wrong with VLM #2668

Open

This was referenced Jul 1, 2024

LocalAI returns Server error error="could not load model: rpc error: code = Unavailable desc = error reading from server: EOF" #2692

Closed

Home Assistant integration with Extended OpenAI Conversation and LocalAI #2702

Closed

greygoo mentioned this issue Jul 14, 2024

diffuser backend processes stack up and hog GPU memory #2866

Open

zhang7249 mentioned this issue Aug 6, 2024

docker run fail #3184

Closed

noblerboy2004 mentioned this issue Aug 7, 2024

could not load model (no success): Unexpected err=ModuleNotFoundError(\"No module named 'optimum'\"), type(err)=<class 'ModuleNotFoundError' #3190

Closed

maxvaneck mentioned this issue Aug 26, 2024

intel igpu not working #3382

Open

titogrima mentioned this issue Aug 26, 2024

Only use 4 CPU threads in P2P worker cluster #3410

Open

msameer mentioned this issue Sep 3, 2024

gpu_layers is not effective #3479

Open

daJuels mentioned this issue Sep 10, 2024

Confusing finish_reason when using max_tokens property in 'v1/chat/completions' endpoint #3533

Open

rn3-sfos mentioned this issue Oct 23, 2024

Cannot use parler-tts models #3941

Closed

j4ys0n mentioned this issue Oct 26, 2024

Inferencing not working with P2P in latest version. #3968

Open

maxvaneck mentioned this issue Oct 28, 2024

linux kernel crash after a lengthy chat #3982

Closed

jobidon mentioned this issue Jan 10, 2025

Failure to detect GPU driver #4571

Open

zenyanbo mentioned this issue Feb 2, 2025

When and only when running a GGUF format Qwen model (e.g., Qwen-Math, Qwen-1M), LocalAI gets confusing output. #4734

Open

maxvaneck mentioned this issue Feb 25, 2025

latest-aio-gpu-intel-f32 docker not working on the arc a750 #4905

Closed

This was referenced Mar 25, 2025

feat(webui): include all static assets as part of the image #5072

Open

Image creation via WebUI does not download the resuling image #5094

Open

michieal mentioned this issue Apr 20, 2025

Watchdog Time Out detection error. #5221

Open

tescophil mentioned this issue Apr 20, 2025

Clean install fails to run any model #5225

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-cli: command not found #1

llama-cli: command not found #1

tjthejuggler commented Mar 21, 2023

mudler commented Mar 21, 2023

tjthejuggler commented Mar 22, 2023

mudler commented Mar 23, 2023

tjthejuggler commented Mar 23, 2023 •

edited

Loading

tjthejuggler commented Mar 24, 2023

llama-cli: command not found #1

llama-cli: command not found #1

Comments

tjthejuggler commented Mar 21, 2023

mudler commented Mar 21, 2023

tjthejuggler commented Mar 22, 2023

mudler commented Mar 23, 2023

tjthejuggler commented Mar 23, 2023 • edited Loading

tjthejuggler commented Mar 24, 2023

tjthejuggler commented Mar 23, 2023 •

edited

Loading