Skip to content

ERR Failed starting/connecting to the gRPC service #1721

@doug-wade

Description

@doug-wade

LocalAI version:
2.8.2

Environment, CPU architecture, OS, and Version:

» uname -a
Darwin Dougs-MacBook-Air.local 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:33:31 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T8112 arm64

Describe the bug

ERR Failed starting/connecting to the gRPC service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35053: connect: connection refused"

To Reproduce

» docker run -ti --platform linux/amd64 -p 8080:8080 localai/localai:v2.8.2-ffmpeg-core codellama-7b-gguf
» curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "codellama-7b-gguf",
     "messages": [{"role": "user", "content": "Please write a function that calculates the first n prime numbers."}],
     "temperature": 0.9
   }'

Expected behavior
To return a code snippet

Logs

» docker run -ti --platform linux/amd64 -p 8080:8080 localai/localai:v2.8.2-ffmpeg-core codellama-7b-gguf --debug            125 ↵
@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
CPU: no AVX    found
CPU: no AVX2   found
CPU: no AVX512 found
@@@@@
6:55PM DBG no galleries to load
6:55PM INF Starting LocalAI using 4 threads, with models path: /build/models
6:55PM INF LocalAI version: v2.8.2 (e690bf387a27de277368e2f742a616e1b2600d5b)
6:55PM WRN [startup] failed resolving model '--debug'
6:55PM INF Preloading models from /build/models
6:55PM INF Downloading "https://huggingface.co/TheBloke/CodeLlama-7B-GGUF/resolve/main/codellama-7b.Q4_K_M.gguf"
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 321.4 MiB/3.8 GiB (8.26%) ETA: 55.553743882s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 711.8 MiB/3.8 GiB (18.29%) ETA: 44.680439228s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 1.0 GiB/3.8 GiB (27.58%) ETA: 39.397551322s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 1.4 GiB/3.8 GiB (37.36%) ETA: 33.538813589s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 1.8 GiB/3.8 GiB (47.02%) ETA: 28.171281943s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 2.2 GiB/3.8 GiB (56.74%) ETA: 22.877781731s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 2.5 GiB/3.8 GiB (66.86%) ETA: 17.34760848s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 2.9 GiB/3.8 GiB (76.16%) ETA: 12.522402138s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 3.3 GiB/3.8 GiB (86.14%) ETA: 7.240267439s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 3.6 GiB/3.8 GiB (95.45%) ETA: 2.38336908s
6:55PM INF File "/build/models/232692e1614183192beee756c58afefc" downloaded and verified
6:55PM INF Model name: codellama-7b-gguf
6:55PM INF Model usage:
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
    "model": "codellama-7b-gguf",
    "prompt": "import socket\n\ndef ping_exponential_backoff(host: str):"
}'

 ┌───────────────────────────────────────────────────┐
 │                   Fiber v2.50.0                   │
 │               http://127.0.0.1:8080               │
 │       (bound on host 0.0.0.0 and port 8080)       │
 │                                                   │
 │ Handlers ............ 73  Processes ........... 1 │
 │ Prefork ....... Disabled  PID ................ 53 │
 └───────────────────────────────────────────────────┘

6:55PM INF Loading model '232692e1614183192beee756c58afefc' with backend transformers
6:56PM ERR Failed starting/connecting to the gRPC service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32949: connect: connection refused"

Additional context
This is my first time trying to start the project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions