Skip to content

Clean install fails to run any model #5225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tescophil opened this issue Apr 20, 2025 · 9 comments
Open

Clean install fails to run any model #5225

tescophil opened this issue Apr 20, 2025 · 9 comments
Labels
bug Something isn't working unconfirmed

Comments

@tescophil
Copy link

tescophil commented Apr 20, 2025

LocalAI version:

Latest version I stalled yesterday no idea how to get the tag/commit

Environment, CPU architecture, OS, and Version:
Linux desktop-garage 4.19.0-12-amd64 #1 SMP Debian 4.19.152-1 (2020-10-18) x86_64 GNU/Linux

Describe the bug
Clean install and l download of several models, all of them fail to load

To Reproduce
Install and attempt to run a model

Expected behavior
I expect a model to load

Logs
10:14AM INF Trying to load the model 'minicpm-v-2_6' with the backend '[llama-cpp llama-cpp-fallback piper silero-vad stablediffusion-ggml whisper bark-cpp huggingface]'
10:14AM INF [llama-cpp] Attempting to load
10:14AM INF BackendLoader starting backend=llama-cpp modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf
10:14AM INF [llama-cpp] attempting to load with AVX variant
10:14AM INF Success ip=10.8.1.10 latency=767.980683ms method=POST status=200 url=/v1/chat/completions
10:14AM INF Success ip=10.8.1.10 latency="29.688µs" method=GET status=200 url=/static/favicon.svg
10:15AM INF Trying to load the model 'minicpm-v-2_6' with the backend '[llama-cpp llama-cpp-fallback whisper bark-cpp piper silero-vad stablediffusion-ggml huggingface]'
10:15AM INF [llama-cpp] Attempting to load
10:15AM INF BackendLoader starting backend=llama-cpp modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf
10:15AM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
10:15AM INF [llama-cpp] attempting to load with AVX variant
10:15AM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
10:15AM INF [llama-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
10:15AM INF [llama-cpp-fallback] Attempting to load
10:15AM INF BackendLoader starting backend=llama-cpp-fallback modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf
10:16AM INF [llama-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
10:16AM INF [llama-cpp-fallback] Attempting to load
10:16AM INF BackendLoader starting backend=llama-cpp-fallback modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf
10:16AM INF [llama-cpp-fallback] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
10:16AM INF [piper] Attempting to load
10:16AM INF BackendLoader starting backend=piper modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf
10:16AM INF [llama-cpp-fallback] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
10:16AM INF [whisper] Attempting to load
10:16AM INF BackendLoader starting backend=whisper modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf
10:17AM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36117: connect: connection refused""
10:17AM INF [piper] Fails: failed to load model with internal loader: grpc service not ready
10:17AM INF [silero-vad] Attempting to load
10:17AM INF BackendLoader starting backend=silero-vad modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf
10:17AM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42421: connect: connection refused""
10:17AM INF [whisper] Fails: failed to load model with internal loader: grpc service not ready
10:17AM INF [bark-cpp] Attempting to load
10:17AM INF BackendLoader starting backend=bark-cpp modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf
10:18AM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39127: connect: connection refused""
10:18AM INF [silero-vad] Fails: failed to load model with internal loader: grpc service not ready
10:18AM INF [stablediffusion-ggml] Attempting to load
10:18AM INF BackendLoader starting backend=stablediffusion-ggml modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf
10:19AM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45537: connect: connection refused""
10:19AM INF [bark-cpp] Fails: failed to load model with internal loader: grpc service not ready
10:19AM INF [piper] Attempting to load
10:19AM INF BackendLoader starting backend=piper modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf
10:19AM INF [stablediffusion-ggml] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
10:19AM INF [whisper] Attempting to load
10:19AM INF BackendLoader starting backend=whisper modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf
10:19AM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39659: connect: connection refused""
10:19AM INF [piper] Fails: failed to load model with internal loader: grpc service not ready
10:19AM INF [silero-vad] Attempting to load
10:19AM INF BackendLoader starting backend=silero-vad modelID=minicpm-v-2_6 o.model=minicpm-v-2_6-Q4_K_M.gguf

Additional context
The basic documentation is very sparse and the default localai.env installed does not match up with the default master on GitHub. all the variables are prefixes with LOCALAI_ in the documentation, but the files installed by default does not e.g. LOCALAI_THREADS VS. THREADS. Also the installer runs the system as a service, but the docs don't mention this anywhere, rather it says run from the command line 'local-ai run' which fails as the service is already running.

I think this may be the same issue as #5216

@tescophil tescophil added bug Something isn't working unconfirmed labels Apr 20, 2025
@bstone108
Copy link

I seem to be having the same problem. no models, not even the included ones, are working. using v2.28.0 nvidia gpu 12 gigs vram, unraid docker all in one nvidia cuda 12 image. here's quite a lot of log, all that showed in log window. seems it failed with internal loader so it tried to use a back end instead and still failed.

6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr runtime.netpollblock(0x4dd3d8?, 0x41dce6?, 0x0?)
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/[email protected]/src/runtime/netpoll.go:575 +0xf7 fp=0xc00006da50 sp=0xc00006da18 pc=0x44a3b7
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr internal/poll.runtime_pollWait(0x1504a3138540, 0x72)
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/[email protected]/src/runtime/netpoll.go:351 +0x85 fp=0xc00006da70 sp=0xc00006da50 pc=0x484aa5
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr internal/poll.(*pollDesc).wait(0xc000090000?, 0xc000092000?, 0x0)
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00006da98 sp=0xc00006da70 pc=0x4fa5a7
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr internal/poll.(*pollDesc).waitRead(...)
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:89
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr internal/poll.(*FD).Read(0xc000090000, {0xc000092000, 0x8000, 0x8000})
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_unix.go:165 +0x27a fp=0xc00006db30 sp=0xc00006da98 pc=0x4fb89a
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr net.(*netFD).Read(0xc000090000, {0xc000092000?, 0x1060100000000?, 0x8?})
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/[email protected]/src/net/fd_posix.go:55 +0x25 fp=0xc00006db78 sp=0xc00006db30 pc=0x63b945
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr net.(*conn).Read(0xc00005a008, {0xc000092000?, 0x800010601?, 0x0?})
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/[email protected]/src/net/net.go:189 +0x45 fp=0xc00006dbc0 sp=0xc00006db78 pc=0x64b605
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr net.(*TCPConn).Read(0x0?, {0xc000092000?, 0xc00006dc18?, 0x48932d?})
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr :1 +0x25 fp=0xc00006dbf0 sp=0xc00006dbc0 pc=0x65f6c5
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr bufio.(Reader).Read(0xc00008a0c0, {0xc0000a4040, 0x9, 0xc00005d508?})
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/[email protected]/src/bufio/bufio.go:241 +0x197 fp=0xc00006dc28 sp=0xc00006dbf0 pc=0x5a2497
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr io.ReadAtLeast({0xc36620, 0xc00008a0c0}, {0xc0000a4040, 0x9, 0x9}, 0x9)
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/[email protected]/src/io/io.go:335 +0x90 fp=0xc00006dc70 sp=0xc00006dc28 pc=0x4d3970
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr io.ReadFull(...)
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/[email protected]/src/io/io.go:354
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr golang.org/x/net/http2.readFrameHeader({0xc0000a4040, 0x9, 0xc00002a090?}, {0xc36620?, 0xc00008a0c0?})
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:237 +0x65 fp=0xc00006dcc0 sp=0xc00006dc70 pc=0x77f465
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr golang.org/x/net/http2.(Framer).ReadFrame(0xc0000a4000)
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:501 +0x85 fp=0xc00006dd68 sp=0xc00006dcc0 pc=0x77fba5
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr google.golang.org/grpc/internal/transport.(http2Server).HandleStreams(0xc0000261a0, {0xc39dd8, 0xc000024240}, 0xc000024270)
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:644 +0x107 fp=0xc00006de88 sp=0xc00006dd68 pc=0x7abae7
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr google.golang.org/grpc.(Server).serveStreams(0xc00015a800, {0xc39d30?, 0x139b440?}, {0xc3d6e0, 0xc0000261a0}, {0xc3cfb8?, 0xc00005a008?})
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/google.golang.org/[email protected]/server.go:1023 +0x396 fp=0xc00006df70 sp=0xc00006de88 pc=0x819316
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr google.golang.org/grpc.(Server).handleRawConn.func1()
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/google.golang.org/[email protected]/server.go:958 +0x56 fp=0xc00006dfe0 sp=0xc00006df70 pc=0x818ab6
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr runtime.goexit({})
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00006dfe8 sp=0xc00006dfe0 pc=0x48d1c1
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr created by google.golang.org/grpc.(Server).handleRawConn in goroutine 3
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr /root/go/pkg/mod/google.golang.org/[email protected]/server.go:957 +0x1c6
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr rax 0xc5a15e
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr rbx 0x13ba
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr rcx 0x13ba3a8
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr rdx 0x1504ea468528
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr rdi 0x13ba
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr rsi 0xc57eff
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr rbp 0xc57eff
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr rsp 0x1504a312cc70
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr r8 0xf
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr r9 0x0
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr r10 0x1504ea252b80
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr r11 0x1504ea2f0340
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr r12 0x13ba3a8
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr r13 0x1504ea468528
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr r14 0xc5a15e
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr r15 0xc5a15e
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr rip 0x1504ea2f036c
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr rflags 0x10202
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr cs 0x33
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr fs 0x0
6:52PM DBG GRPC(gpt-4-127.0.0.1:40951): stderr gs 0x0
6:52PM INF [stablediffusion-ggml] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
6:52PM INF [whisper] Attempting to load
6:52PM INF BackendLoader starting backend=whisper modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: whisper): {backendString:whisper model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper
6:52PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:37735'
6:52PM DBG GRPC Service state dir: /tmp/go-processmanager477766507
6:52PM DBG GRPC Service Started
6:52PM DBG Wait for the service to start up
6:52PM DBG Options: ContextSize:4096 Seed:288213495 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
6:52PM DBG GRPC(gpt-4-127.0.0.1:37735): stderr 2025/04/21 18:52:26 gRPC Server listening at 127.0.0.1:37735
6:52PM DBG GRPC Service Ready
6:52PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:288213495 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:52PM INF [whisper] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = stat /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf: no such file or directory
6:52PM INF [bark-cpp] Attempting to load
6:52PM INF BackendLoader starting backend=bark-cpp modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: bark-cpp): {backendString:bark-cpp model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bark-cpp
6:52PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:35571'
6:52PM DBG GRPC Service state dir: /tmp/go-processmanager1670450128
6:52PM DBG GRPC Service Started
6:52PM DBG Wait for the service to start up
6:52PM DBG Options: ContextSize:4096 Seed:288213495 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
6:52PM DBG GRPC(gpt-4-127.0.0.1:35571): stderr 2025/04/21 18:52:28 gRPC Server listening at 127.0.0.1:35571
6:52PM DBG GRPC Service Ready
6:52PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:288213495 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:52PM DBG GRPC(gpt-4-127.0.0.1:35571): stderr bark_load_model_from_file: failed to open '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
6:52PM DBG GRPC(gpt-4-127.0.0.1:35571): stderr bark_load_model: failed to load model weights from '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
6:52PM DBG GRPC(gpt-4-127.0.0.1:35571): stderr load_model: Could not load model
6:52PM INF [bark-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = inference failed
6:52PM INF [huggingface] Attempting to load
6:52PM INF BackendLoader starting backend=huggingface modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: huggingface): {backendString:huggingface model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/huggingface
6:52PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:35957'
6:52PM DBG GRPC Service state dir: /tmp/go-processmanager3434427151
6:52PM DBG GRPC Service Started
6:52PM DBG Wait for the service to start up
6:52PM DBG Options: ContextSize:4096 Seed:288213495 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
6:52PM DBG GRPC(gpt-4-127.0.0.1:35957): stderr 2025/04/21 18:52:30 gRPC Server listening at 127.0.0.1:35957
6:52PM DBG GRPC Service Ready
6:52PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:288213495 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:52PM INF [huggingface] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = no huggingface token provided
6:52PM INF [/build/backend/python/bark/run.sh] Attempting to load
6:52PM INF BackendLoader starting backend=/build/backend/python/bark/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/bark/run.sh): {backendString:/build/backend/python/bark/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM INF [/build/backend/python/bark/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/bark/run.sh
6:52PM INF [/build/backend/python/vllm/run.sh] Attempting to load
6:52PM INF BackendLoader starting backend=/build/backend/python/vllm/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/vllm/run.sh): {backendString:/build/backend/python/vllm/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM INF [/build/backend/python/vllm/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vllm/run.sh
6:52PM INF [/build/backend/python/exllama2/run.sh] Attempting to load
6:52PM INF BackendLoader starting backend=/build/backend/python/exllama2/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/exllama2/run.sh): {backendString:/build/backend/python/exllama2/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM INF [/build/backend/python/exllama2/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama2/run.sh
6:52PM INF [/build/backend/python/kokoro/run.sh] Attempting to load
6:52PM INF BackendLoader starting backend=/build/backend/python/kokoro/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/kokoro/run.sh): {backendString:/build/backend/python/kokoro/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM INF [/build/backend/python/kokoro/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/kokoro/run.sh
6:52PM INF [/build/backend/python/transformers/run.sh] Attempting to load
6:52PM INF BackendLoader starting backend=/build/backend/python/transformers/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/transformers/run.sh): {backendString:/build/backend/python/transformers/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM INF [/build/backend/python/transformers/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers/run.sh
6:52PM INF [/build/backend/python/rerankers/run.sh] Attempting to load
6:52PM INF BackendLoader starting backend=/build/backend/python/rerankers/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/rerankers/run.sh): {backendString:/build/backend/python/rerankers/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM INF [/build/backend/python/rerankers/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/rerankers/run.sh
6:52PM INF [/build/backend/python/faster-whisper/run.sh] Attempting to load
6:52PM INF BackendLoader starting backend=/build/backend/python/faster-whisper/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/faster-whisper/run.sh): {backendString:/build/backend/python/faster-whisper/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM INF [/build/backend/python/faster-whisper/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/faster-whisper/run.sh
6:52PM INF [/build/backend/python/coqui/run.sh] Attempting to load
6:52PM INF BackendLoader starting backend=/build/backend/python/coqui/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/coqui/run.sh): {backendString:/build/backend/python/coqui/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM INF [/build/backend/python/coqui/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh
6:52PM INF [/build/backend/python/autogptq/run.sh] Attempting to load
6:52PM INF BackendLoader starting backend=/build/backend/python/autogptq/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/autogptq/run.sh): {backendString:/build/backend/python/autogptq/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM INF [/build/backend/python/autogptq/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/autogptq/run.sh
6:52PM INF [/build/backend/python/diffusers/run.sh] Attempting to load
6:52PM INF BackendLoader starting backend=/build/backend/python/diffusers/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:52PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/diffusers/run.sh): {backendString:/build/backend/python/diffusers/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004d0008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:52PM INF [/build/backend/python/diffusers/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/diffusers/run.sh
6:53PM INF Success ip=127.0.0.1 latency="11.216µs" method=GET status=200 url=/readyz
6:53PM INF Success ip=192.168.1.104 latency=2.054562ms method=GET status=200 url=/
6:53PM INF Success ip=192.168.1.104 latency="17.842µs" method=GET status=200 url=/static/assets/highlightjs.css
6:53PM INF Success ip=192.168.1.104 latency="19.579µs" method=GET status=200 url=/static/assets/purify.js
6:53PM INF Success ip=192.168.1.104 latency="30.406µs" method=GET status=200 url=/static/assets/highlightjs.js
6:53PM INF Success ip=192.168.1.104 latency="13.346µs" method=GET status=200 url=/static/assets/alpine.js
6:53PM INF Success ip=192.168.1.104 latency="19.996µs" method=GET status=200 url=/static/general.css
6:53PM INF Success ip=192.168.1.104 latency="14.751µs" method=GET status=200 url=/static/assets/marked.js
6:53PM INF Success ip=192.168.1.104 latency="18.784µs" method=GET status=200 url=/static/assets/font1.css
6:53PM INF Success ip=192.168.1.104 latency="19.604µs" method=GET status=200 url=/static/assets/font2.css
6:53PM INF Success ip=192.168.1.104 latency="17.79µs" method=GET status=200 url=/static/assets/tw-elements.css
6:53PM INF Success ip=192.168.1.104 latency="17.486µs" method=GET status=200 url=/static/assets/tailwindcss.js
6:53PM INF Success ip=192.168.1.104 latency="15.917µs" method=GET status=200 url=/static/assets/fontawesome/css/fontawesome.css
6:53PM INF Success ip=192.168.1.104 latency="17.134µs" method=GET status=200 url=/static/assets/fontawesome/css/brands.css
6:53PM INF Success ip=192.168.1.104 latency="18.627µs" method=GET status=200 url=/static/assets/fontawesome/css/solid.css
6:53PM INF Success ip=192.168.1.104 latency="17.126µs" method=GET status=200 url=/static/assets/flowbite.min.js
6:53PM INF Success ip=192.168.1.104 latency="13.712µs" method=GET status=200 url=/static/assets/htmx.js
6:53PM INF Success ip=192.168.1.104 latency="16.916µs" method=GET status=200 url=/static/assets/tw-elements.js
6:53PM INF Success ip=192.168.1.104 latency="18.17µs" method=GET status=200 url=/static/logo_horizontal.png
6:53PM INF Success ip=192.168.1.104 latency="19.457µs" method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuLyfMZg.ttf
6:53PM INF Success ip=192.168.1.104 latency="8.465µs" method=GET status=200 url=/static/assets/fontawesome/webfonts/fa-solid-900.woff2
6:53PM INF Success ip=192.168.1.104 latency="14.613µs" method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuFuYMZg.ttf
6:53PM INF Success ip=192.168.1.104 latency="15.084µs" method=GET status=200 url=/static/assets/fontawesome/webfonts/fa-brands-400.woff2
6:53PM INF Success ip=192.168.1.104 latency=1.43733ms method=GET status=200 url=/chat/gpt-4
6:53PM INF Success ip=192.168.1.104 latency="21.385µs" method=GET status=200 url=/static/assets/highlightjs.css
6:53PM INF Success ip=192.168.1.104 latency="9.33µs" method=GET status=200 url=/static/assets/highlightjs.js
6:53PM INF Success ip=192.168.1.104 latency="11.464µs" method=GET status=200 url=/static/assets/alpine.js
6:53PM INF Success ip=192.168.1.104 latency="18.972µs" method=GET status=200 url=/static/assets/marked.js
6:53PM INF Success ip=192.168.1.104 latency="9.888µs" method=GET status=200 url=/static/assets/purify.js
6:53PM INF Success ip=192.168.1.104 latency="22.264µs" method=GET status=200 url=/static/general.css
6:53PM INF Success ip=192.168.1.104 latency="18.306µs" method=GET status=200 url=/static/assets/font1.css
6:53PM INF Success ip=192.168.1.104 latency="15.525µs" method=GET status=200 url=/static/assets/font2.css
6:53PM INF Success ip=192.168.1.104 latency="9.921µs" method=GET status=200 url=/static/assets/tailwindcss.js
6:53PM INF Success ip=192.168.1.104 latency="8.594µs" method=GET status=200 url=/static/assets/tw-elements.css
6:53PM INF Success ip=192.168.1.104 latency="10.085µs" method=GET status=200 url=/static/assets/fontawesome/css/fontawesome.css
6:53PM INF Success ip=192.168.1.104 latency="7.927µs" method=GET status=200 url=/static/assets/fontawesome/css/brands.css
6:53PM INF Success ip=192.168.1.104 latency="20.709µs" method=GET status=200 url=/static/assets/fontawesome/css/solid.css
6:53PM INF Success ip=192.168.1.104 latency="12.032µs" method=GET status=200 url=/static/assets/flowbite.min.js
6:53PM INF Success ip=192.168.1.104 latency="12.346µs" method=GET status=200 url=/static/assets/htmx.js
6:53PM INF Success ip=192.168.1.104 latency="14.568µs" method=GET status=200 url=/static/chat.js
6:53PM INF Success ip=192.168.1.104 latency="17.516µs" method=GET status=200 url=/static/logo_horizontal.png
6:53PM INF Success ip=192.168.1.104 latency="18.502µs" method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuLyfMZg.ttf
6:53PM INF Success ip=192.168.1.104 latency="19.507µs" method=GET status=200 url=/static/assets/fontawesome/webfonts/fa-solid-900.woff2
6:53PM INF Success ip=192.168.1.104 latency="8.254µs" method=GET status=200 url=/static/assets/KFOmCnqEu92Fr1Mu4mxP.ttf
6:53PM INF Success ip=192.168.1.104 latency="18.775µs" method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuGKYMZg.ttf
6:54PM DBG context local model name not found, setting to the first model first model name=gpt-4
6:54PM DBG Chat endpoint configuration read: &{PredictionOptions:{BasicModelRequest:{Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf} Language: Translate:false N:0 TopP:0xc001d7d510 TopK:0xc001d7d518 Temperature:0xc001d7d520 Maxtokens:0xc001d7d550 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc001d7d548 TypicalP:0xc001d7d540 Seed:0xc001d7d568 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-4 F16:0xc001d7d398 Threads:0xc001d7d500 Debug:0xc0008c7320 Roles:map[] Embeddings:0xc001d7d561 Backend: TemplateConfig:{Chat:{{.Input -}}
<|im_start|>assistant
ChatMessage:<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>
Completion:{{.Input}}
Edit: Functions:<|im_start|>system
You are an AI assistant that executes function calls, and these are the tools at your disposal:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
<|im_end|>
{{.Input -}}
<|im_start|>assistant
UseTokenizerTemplate:false JoinChatMessagesByCharacter: Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_COMPLETION FLAG_CHAT FLAG_ANY] KnownUsecases: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder:name,arguments SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[(?s)(.
?)] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[{Key:(?s)(.
?)
Value:}] CaptureLLMResult:[(?s)(.?)] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc001d7d538 MirostatTAU:0xc001d7d530 Mirostat:0xc001d7d528 NGPULayers:0xc001d7d558 MMap:0xc001d7d492 MMlock:0xc001d7d561 LowVRAM:0xc001d7d561 Grammar: StopWords:[<|im_end|> ] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc001d7d388 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[{Filename:localai-functioncall-phi-4-v0.3-q4_k_m.gguf SHA256:23fee048ded2a6e2e1a7b6bbefa6cbf83068f194caa9552aecbaa00fec8a16d5 URI:huggingface://mudler/LocalAI-functioncall-phi-4-v0.3-Q4_K_M-GGUF/localai-functioncall-phi-4-v0.3-q4_k_m.gguf}] Description: Usage: Options:[]}
6:54PM DBG Parameters: &{PredictionOptions:{BasicModelRequest:{Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf} Language: Translate:false N:0 TopP:0xc001d7d510 TopK:0xc001d7d518 Temperature:0xc001d7d520 Maxtokens:0xc001d7d550 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc001d7d548 TypicalP:0xc001d7d540 Seed:0xc001d7d568 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-4 F16:0xc001d7d398 Threads:0xc001d7d500 Debug:0xc0008c7320 Roles:map[] Embeddings:0xc001d7d561 Backend: TemplateConfig:{Chat:{{.Input -}}
<|im_start|>assistant
ChatMessage:<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>
Completion:{{.Input}}
Edit: Functions:<|im_start|>system
You are an AI assistant that executes function calls, and these are the tools at your disposal:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
<|im_end|>
{{.Input -}}
<|im_start|>assistant
UseTokenizerTemplate:false JoinChatMessagesByCharacter: Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_COMPLETION FLAG_CHAT FLAG_ANY] KnownUsecases: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder:name,arguments SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[(?s)(.?)] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[{Key:(?s)(.?) Value:}] CaptureLLMResult:[(?s)(.?)] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc001d7d538 MirostatTAU:0xc001d7d530 Mirostat:0xc001d7d528 NGPULayers:0xc001d7d558 MMap:0xc001d7d492 MMlock:0xc001d7d561 LowVRAM:0xc001d7d561 Grammar: StopWords:[<|im_end|> ] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc001d7d388 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[{Filename:localai-functioncall-phi-4-v0.3-q4_k_m.gguf SHA256:23fee048ded2a6e2e1a7b6bbefa6cbf83068f194caa9552aecbaa00fec8a16d5 URI:huggingface://mudler/LocalAI-functioncall-phi-4-v0.3-Q4_K_M-GGUF/localai-functioncall-phi-4-v0.3-q4_k_m.gguf}] Description: Usage: Options:[]}
6:54PM DBG templated message for chat: <|im_start|>user
testing
<|im_end|>

6:54PM DBG Prompt (before templating): <|im_start|>user
testing
<|im_end|>

6:54PM DBG Template found, input modified to: <|im_start|>user
testing
<|im_end|>
<|im_start|>assistant

6:54PM DBG Prompt (after templating): <|im_start|>user
testing
<|im_end|>
<|im_start|>assistant

6:54PM DBG Stream request received
6:54PM DBG Sending chunk: {"created":1745283241,"object":"chat.completion.chunk","id":"24a747e2-e6f4-4fd4-afb7-e9adf1b21907","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

6:54PM DBG Loading from the following backends (in order): [llama-cpp llama-cpp-fallback piper silero-vad stablediffusion-ggml whisper bark-cpp huggingface /build/backend/python/exllama2/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/kokoro/run.sh /build/backend/python/rerankers/run.sh /build/backend/python/coqui/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/bark/run.sh /build/backend/python/faster-whisper/run.sh /build/backend/python/vllm/run.sh /build/backend/python/transformers/run.sh]
6:54PM INF Trying to load the model 'gpt-4' with the backend '[llama-cpp llama-cpp-fallback piper silero-vad stablediffusion-ggml whisper bark-cpp huggingface /build/backend/python/exllama2/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/kokoro/run.sh /build/backend/python/rerankers/run.sh /build/backend/python/coqui/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/bark/run.sh /build/backend/python/faster-whisper/run.sh /build/backend/python/vllm/run.sh /build/backend/python/transformers/run.sh]'
6:54PM INF [llama-cpp] Attempting to load
6:54PM INF BackendLoader starting backend=llama-cpp modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: llama-cpp): {backendString:llama-cpp model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM INF Success ip=192.168.1.104 latency=2.058584ms method=POST status=200 url=/v1/chat/completions
6:54PM DBG Nvidia GPU device found, no embedded CUDA variant found. You can ignore this message if you are using container with CUDA support
6:54PM DBG [llama-cpp-fallback] llama-cpp variant available
6:54PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
6:54PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:43453'
6:54PM DBG GRPC Service state dir: /tmp/go-processmanager429258888
6:54PM DBG GRPC Service Started
6:54PM DBG Wait for the service to start up
6:54PM DBG Options: ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr I0000 00:00:1745283241.443659 264 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache, work_serializer_dispatch
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr I0000 00:00:1745283241.444116 264 ev_epoll1_linux.cc:125] grpc epoll fd: 3
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr I0000 00:00:1745283241.444930 264 server_builder.cc:392] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr I0000 00:00:1745283241.446893 264 ev_epoll1_linux.cc:359] grpc epoll fd: 5
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr I0000 00:00:1745283241.447720 264 tcp_socket_utils.cc:634] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stdout Server listening on 127.0.0.1:43453
6:54PM DBG GRPC Service Ready
6:54PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr ggml_cuda_init: found 1 CUDA devices:
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr Device 0: NVIDIA RTX A2000 12GB, compute capability 8.6, VMM: yes
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr llama_model_load_from_file_impl: using device CUDA0 (NVIDIA RTX A2000 12GB) - 11811 MiB free
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr gguf_init_from_file: failed to open GGUF file '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr llama_model_load: error loading model: llama_model_loader: failed to load model from /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr llama_model_load_from_file_impl: failed to load model
6:54PM DBG GRPC(gpt-4-127.0.0.1:43453): stderr common_init_from_params: failed to load model '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
6:54PM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: llama-cpp-fallback): {backendString:llama-cpp model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
6:54PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:35971'
6:54PM DBG GRPC Service state dir: /tmp/go-processmanager1195233851
6:54PM DBG GRPC Service Started
6:54PM DBG Wait for the service to start up
6:54PM DBG Options: ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr I0000 00:00:1745283245.132552 280 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache, work_serializer_dispatch
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr I0000 00:00:1745283245.132883 280 ev_epoll1_linux.cc:125] grpc epoll fd: 3
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr I0000 00:00:1745283245.133699 280 server_builder.cc:392] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr I0000 00:00:1745283245.135564 280 ev_epoll1_linux.cc:359] grpc epoll fd: 5
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr I0000 00:00:1745283245.136400 280 tcp_socket_utils.cc:634] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stdout Server listening on 127.0.0.1:35971
6:54PM DBG GRPC Service Ready
6:54PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr ggml_cuda_init: found 1 CUDA devices:
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr Device 0: NVIDIA RTX A2000 12GB, compute capability 8.6, VMM: yes
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr llama_model_load_from_file_impl: using device CUDA0 (NVIDIA RTX A2000 12GB) - 11811 MiB free
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr gguf_init_from_file: failed to open GGUF file '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr llama_model_load: error loading model: llama_model_loader: failed to load model from /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr llama_model_load_from_file_impl: failed to load model
6:54PM DBG GRPC(gpt-4-127.0.0.1:35971): stderr common_init_from_params: failed to load model '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
6:54PM INF [llama-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
6:54PM INF [llama-cpp-fallback] Attempting to load
6:54PM INF BackendLoader starting backend=llama-cpp-fallback modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: llama-cpp-fallback): {backendString:llama-cpp-fallback model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
6:54PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:33861'
6:54PM DBG GRPC Service state dir: /tmp/go-processmanager764900467
6:54PM DBG GRPC Service Started
6:54PM DBG Wait for the service to start up
6:54PM DBG Options: ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr I0000 00:00:1745283248.805860 296 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache, work_serializer_dispatch
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr I0000 00:00:1745283248.807242 296 ev_epoll1_linux.cc:125] grpc epoll fd: 3
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr I0000 00:00:1745283248.807950 296 server_builder.cc:392] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr I0000 00:00:1745283248.810207 296 ev_epoll1_linux.cc:359] grpc epoll fd: 6
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stdout Server listening on 127.0.0.1:33861
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr I0000 00:00:1745283248.811109 296 tcp_socket_utils.cc:634] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
6:54PM DBG GRPC Service Ready
6:54PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:54PM INF Success ip=127.0.0.1 latency="25.934µs" method=GET status=200 url=/readyz
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr ggml_cuda_init: found 1 CUDA devices:
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr Device 0: NVIDIA RTX A2000 12GB, compute capability 8.6, VMM: yes
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr llama_model_load_from_file_impl: using device CUDA0 (NVIDIA RTX A2000 12GB) - 11811 MiB free
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr gguf_init_from_file: failed to open GGUF file '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr llama_model_load: error loading model: llama_model_loader: failed to load model from /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr llama_model_load_from_file_impl: failed to load model
6:54PM DBG GRPC(gpt-4-127.0.0.1:33861): stderr common_init_from_params: failed to load model '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
6:54PM INF [llama-cpp-fallback] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
6:54PM INF [piper] Attempting to load
6:54PM INF BackendLoader starting backend=piper modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: piper): {backendString:piper model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/piper
6:54PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:43807'
6:54PM DBG GRPC Service state dir: /tmp/go-processmanager1539370462
6:54PM DBG GRPC Service Started
6:54PM DBG Wait for the service to start up
6:54PM DBG Options: ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
6:54PM DBG GRPC(gpt-4-127.0.0.1:43807): stderr 2025/04/21 18:54:12 gRPC Server listening at 127.0.0.1:43807
6:54PM DBG GRPC Service Ready
6:54PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:54PM INF [piper] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = unsupported model type /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf (should end with .onnx)
6:54PM INF [silero-vad] Attempting to load
6:54PM INF BackendLoader starting backend=silero-vad modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: silero-vad): {backendString:silero-vad model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/silero-vad
6:54PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:33525'
6:54PM DBG GRPC Service state dir: /tmp/go-processmanager1286463254
6:54PM DBG GRPC Service Started
6:54PM DBG Wait for the service to start up
6:54PM DBG Options: ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
6:54PM DBG GRPC(gpt-4-127.0.0.1:33525): stderr 2025/04/21 18:54:14 gRPC Server listening at 127.0.0.1:33525
6:54PM DBG GRPC Service Ready
6:54PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:54PM INF [silero-vad] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = create silero detector: failed to create session: Load model from /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf failed:Load model /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf failed. File doesn't exist
6:54PM INF [stablediffusion-ggml] Attempting to load
6:54PM INF BackendLoader starting backend=stablediffusion-ggml modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: stablediffusion-ggml): {backendString:stablediffusion-ggml model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/stablediffusion-ggml
6:54PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:35901'
6:54PM DBG GRPC Service state dir: /tmp/go-processmanager3865121830
6:54PM DBG GRPC Service Started
6:54PM DBG Wait for the service to start up
6:54PM DBG Options: ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
6:54PM DBG GRPC(gpt-4-127.0.0.1:35901): stderr 2025/04/21 18:54:16 gRPC Server listening at 127.0.0.1:35901
6:54PM DBG GRPC Service Ready
6:54PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:54PM DBG GRPC(gpt-4-127.0.0.1:35901): stderr Options: []
6:54PM DBG GRPC(gpt-4-127.0.0.1:35901): stderr Loading model!
6:54PM DBG GRPC(gpt-4-127.0.0.1:35901): stderr Invalid sample method, default to EULER_A!
6:54PM DBG GRPC(gpt-4-127.0.0.1:35901): stderr Invalid scheduler! using DEFAULT
6:54PM DBG GRPC(gpt-4-127.0.0.1:35901): stderr Creating context
6:54PM DBG GRPC(gpt-4-127.0.0.1:35901): stderr failed loading model (generic error)
6:54PM INF [stablediffusion-ggml] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = could not load model
6:54PM INF [whisper] Attempting to load
6:54PM INF BackendLoader starting backend=whisper modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: whisper): {backendString:whisper model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper
6:54PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:33907'
6:54PM DBG GRPC Service state dir: /tmp/go-processmanager712217269
6:54PM DBG GRPC Service Started
6:54PM DBG Wait for the service to start up
6:54PM DBG Options: ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
6:54PM DBG GRPC(gpt-4-127.0.0.1:33907): stderr 2025/04/21 18:54:18 gRPC Server listening at 127.0.0.1:33907
6:54PM DBG GRPC Service Ready
6:54PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:54PM INF [whisper] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = stat /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf: no such file or directory
6:54PM INF [bark-cpp] Attempting to load
6:54PM INF BackendLoader starting backend=bark-cpp modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: bark-cpp): {backendString:bark-cpp model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bark-cpp
6:54PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:46849'
6:54PM DBG GRPC Service state dir: /tmp/go-processmanager25318322
6:54PM DBG GRPC Service Started
6:54PM DBG Wait for the service to start up
6:54PM DBG Options: ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
6:54PM DBG GRPC(gpt-4-127.0.0.1:46849): stderr 2025/04/21 18:54:20 gRPC Server listening at 127.0.0.1:46849
6:54PM DBG GRPC Service Ready
6:54PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:54PM DBG GRPC(gpt-4-127.0.0.1:46849): stderr bark_load_model_from_file: failed to open '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
6:54PM DBG GRPC(gpt-4-127.0.0.1:46849): stderr bark_load_model: failed to load model weights from '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
6:54PM DBG GRPC(gpt-4-127.0.0.1:46849): stderr load_model: Could not load model
6:54PM INF [bark-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = inference failed
6:54PM INF [huggingface] Attempting to load
6:54PM INF BackendLoader starting backend=huggingface modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: huggingface): {backendString:huggingface model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/huggingface
6:54PM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:33539'
6:54PM DBG GRPC Service state dir: /tmp/go-processmanager3401355797
6:54PM DBG GRPC Service Started
6:54PM DBG Wait for the service to start up
6:54PM DBG Options: ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:4 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
6:54PM DBG GRPC(gpt-4-127.0.0.1:33539): stderr 2025/04/21 18:54:22 gRPC Server listening at 127.0.0.1:33539
6:54PM DBG GRPC Service Ready
6:54PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007b9958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1764292306 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
6:54PM INF [huggingface] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = no huggingface token provided
6:54PM INF [/build/backend/python/exllama2/run.sh] Attempting to load
6:54PM INF BackendLoader starting backend=/build/backend/python/exllama2/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/exllama2/run.sh): {backendString:/build/backend/python/exllama2/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM INF [/build/backend/python/exllama2/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama2/run.sh
6:54PM INF [/build/backend/python/diffusers/run.sh] Attempting to load
6:54PM INF BackendLoader starting backend=/build/backend/python/diffusers/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/diffusers/run.sh): {backendString:/build/backend/python/diffusers/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM INF [/build/backend/python/diffusers/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/diffusers/run.sh
6:54PM INF [/build/backend/python/kokoro/run.sh] Attempting to load
6:54PM INF BackendLoader starting backend=/build/backend/python/kokoro/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/kokoro/run.sh): {backendString:/build/backend/python/kokoro/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM INF [/build/backend/python/kokoro/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/kokoro/run.sh
6:54PM INF [/build/backend/python/rerankers/run.sh] Attempting to load
6:54PM INF BackendLoader starting backend=/build/backend/python/rerankers/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/rerankers/run.sh): {backendString:/build/backend/python/rerankers/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM INF [/build/backend/python/rerankers/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/rerankers/run.sh
6:54PM INF [/build/backend/python/coqui/run.sh] Attempting to load
6:54PM INF BackendLoader starting backend=/build/backend/python/coqui/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/coqui/run.sh): {backendString:/build/backend/python/coqui/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM INF [/build/backend/python/coqui/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh
6:54PM INF [/build/backend/python/autogptq/run.sh] Attempting to load
6:54PM INF BackendLoader starting backend=/build/backend/python/autogptq/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/autogptq/run.sh): {backendString:/build/backend/python/autogptq/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM INF [/build/backend/python/autogptq/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/autogptq/run.sh
6:54PM INF [/build/backend/python/bark/run.sh] Attempting to load
6:54PM INF BackendLoader starting backend=/build/backend/python/bark/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/bark/run.sh): {backendString:/build/backend/python/bark/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM INF [/build/backend/python/bark/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/bark/run.sh
6:54PM INF [/build/backend/python/faster-whisper/run.sh] Attempting to load
6:54PM INF BackendLoader starting backend=/build/backend/python/faster-whisper/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/faster-whisper/run.sh): {backendString:/build/backend/python/faster-whisper/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM INF [/build/backend/python/faster-whisper/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/faster-whisper/run.sh
6:54PM INF [/build/backend/python/vllm/run.sh] Attempting to load
6:54PM INF BackendLoader starting backend=/build/backend/python/vllm/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/vllm/run.sh): {backendString:/build/backend/python/vllm/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM INF [/build/backend/python/vllm/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vllm/run.sh
6:54PM INF [/build/backend/python/transformers/run.sh] Attempting to load
6:54PM INF BackendLoader starting backend=/build/backend/python/transformers/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
6:54PM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/transformers/run.sh): {backendString:/build/backend/python/transformers/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c66008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
6:54PM INF [/build/backend/python/transformers/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers/run.sh

@richiejp
Copy link
Collaborator

Does the previous version work for you? Did you install from Docker or somewhere else? If you are using NVIDIA you could try v2.27.0-cublas-cuda12-ffmpeg to see if this is new bug.

I'm not sure that in either case the log contains the root cause. Could you upload the full log as an attachment?

@bstone108
Copy link

Does the previous version work for you? Did you install from Docker or somewhere else? If you are using NVIDIA you could try v2.27.0-cublas-cuda12-ffmpeg to see if this is new bug.

I'm not sure that in either case the log contains the root cause. Could you upload the full log as an attachment?

I'm not sure to who you were replying. but in my original message I stated how I was running, what image, and that it was under nvidia. so I'll assume it was to the other guy. gave as much log as I could recover from the virtual machine. it ran beyond the log buffer so I'll need to obtain more another way if you were asking me for more.

@bstone108
Copy link

found part of the problem on my end. it's attempting to load /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf which doesn't exist. Even though it's one of the default bundled models. auto download failing? perhaps the url it's looking for is broken?

@richiejp
Copy link
Collaborator

@bstone108 Yes more logs are needed and I think most likely you have a different issue. Although you can also try using the previous version to see if it is a regression.

@tescophil
Copy link
Author

tescophil commented Apr 22, 2025

As the OP the second poster doesn't have the same problem as me, I have a simple install on a x86 PC with no acceleration hardware etc. 1 model downloaded which fails to load, so, any suggestions?

Attached is the debug log from startup, then opening the local web interface, selecting chat and then asking a question.

local-ai.log

@XueSheng-GIT
Copy link

I'm facing the same issue after updating from v2.27.0-aio-gpu-nvidia-cuda-12 to v2.28.0-aio-gpu-nvidia-cuda-12. I tried clean install of both versions without success. But switching to a clean install of v2.26.0-aio-gpu-nvidia-cuda-12 makes it work again.

@simonmaeldev
Copy link

I'm on the localai/localai:latest-aio-gpu-nvidia-cuda-12 (so v2.28.0 at the time of writing) and can't make it work either. At first I tried without the aio but it didn't work, so I switched to aio.

Switching to a clean install of v2.26.0-aio-gpu-nvidia-cuda-12 makes it work again as @XueSheng-GIT said (thanks man). Did not tried the v2.27.0.

For the log trace of v2.28.0, it is quite long. I have :

  • a bunch of stderr
  • [silero-vad] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = create silero detector: failed to create session: Load model from /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf failed:Load model /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf failed. File doesn't exist
  • localai-1 | 10:41AM INF [huggingface] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = no huggingface token provided
  • localai-1 | 10:41AM INF [/build/backend/python/bark/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/bark/run.sh

Here is the full stack trace, starting from when I made the call via the webui to "gpt4" model

localai-1  |  UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_ANY FLAG_COMPLETION FLAG_CHAT] KnownUsecases:<nil> PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder:name,arguments SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[(?s)<Output>(.*?)</Output>] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[{Key:(?s)<Thought>(.*?)</Thought> Value:}] CaptureLLMResult:[(?s)<Thought>(.*?)</Thought>] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000951e88 MirostatTAU:0xc000951e80 Mirostat:0xc000951e78 NGPULayers:0xc000951ea8 MMap:0xc000951de2 MMlock:0xc000951eb1 LowVRAM:0xc000951eb1 Grammar: StopWords:[<|im_end|> <dummy32000> </s>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000951cd8 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[{Filename:localai-functioncall-phi-4-v0.3-q4_k_m.gguf SHA256:23fee048ded2a6e2e1a7b6bbefa6cbf83068f194caa9552aecbaa00fec8a16d5 URI:huggingface://mudler/LocalAI-functioncall-phi-4-v0.3-Q4_K_M-GGUF/localai-functioncall-phi-4-v0.3-q4_k_m.gguf}] Description: Usage: Options:[]}
localai-1  | 10:41AM DBG templated message for chat: <|im_start|>user
localai-1  | hello, what is your name?
localai-1  | <|im_end|>
localai-1  |
localai-1  | 10:41AM DBG Prompt (before templating): <|im_start|>user
localai-1  | hello, what is your name?
localai-1  | <|im_end|>
localai-1  |
localai-1  | 10:41AM DBG Template found, input modified to: <|im_start|>user
localai-1  | hello, what is your name?
localai-1  | <|im_end|>
localai-1  | <|im_start|>assistant
localai-1  |
localai-1  | 10:41AM DBG Prompt (after templating): <|im_start|>user
localai-1  | hello, what is your name?
localai-1  | <|im_end|>
localai-1  | <|im_start|>assistant
localai-1  |
localai-1  | 10:41AM DBG Stream request received
localai-1  | 10:41AM INF Success ip=172.18.0.1 latency=2.966943ms method=POST status=200 url=/v1/chat/completions
localai-1  | 10:41AM DBG Sending chunk: {"created":1745923262,"object":"chat.completion.chunk","id":"c0eadbdd-3627-416a-8f74-05d8ac7dd49f","model":"gpt-4","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
localai-1  |
localai-1  | 10:41AM DBG Loading from the following backends (in order): [llama-cpp llama-cpp-fallback piper silero-vad stablediffusion-ggml whisper bark-cpp huggingface /build/backend/python/bark/run.sh /build/backend/python/faster-whisper/run.sh /build/backend/python/kokoro/run.sh /build/backend/python/vllm/run.sh /build/backend/python/coqui/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/exllama2/run.sh /build/backend/python/rerankers/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/transformers/run.sh]
localai-1  | 10:41AM INF Trying to load the model 'gpt-4' with the backend '[llama-cpp llama-cpp-fallback piper silero-vad stablediffusion-ggml whisper bark-cpp huggingface /build/backend/python/bark/run.sh /build/backend/python/faster-whisper/run.sh /build/backend/python/kokoro/run.sh /build/backend/python/vllm/run.sh /build/backend/python/coqui/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/exllama2/run.sh /build/backend/python/rerankers/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/transformers/run.sh]'
localai-1  | 10:41AM INF [llama-cpp] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=llama-cpp modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: llama-cpp): {backendString:llama-cpp model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}   
localai-1  | WARNING: failed to determine nodes: open /sys/devices/system/node: no such file or directory
localai-1  | WARNING: failed to read int from file: open /sys/class/drm/card0/device/numa_node: no such file or directory
localai-1  | WARNING: failed to determine nodes: open /sys/devices/system/node: no such file or directory
localai-1  | WARNING: error parsing the pci address "vgem"
localai-1  | 10:41AM DBG [llama-cpp-fallback] llama-cpp variant available
localai-1  | 10:41AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
localai-1  | 10:41AM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:45989'
localai-1  | 10:41AM DBG GRPC Service state dir: /tmp/go-processmanager1817219009
localai-1  | 10:41AM DBG GRPC Service Started
localai-1  | 10:41AM DBG Wait for the service to start up
localai-1  | 10:41AM DBG Options: ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:16
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr I0000 00:00:1745923262.433796      99 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache, work_serializer_dispatch
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr I0000 00:00:1745923262.434583      99 ev_epoll1_linux.cc:125] grpc epoll fd: 3       
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr I0000 00:00:1745923262.434721      99 server_builder.cc:392] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr I0000 00:00:1745923262.437096      99 ev_epoll1_linux.cc:359] grpc epoll fd: 5       
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr I0000 00:00:1745923262.437340      99 tcp_socket_utils.cc:634] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stdout Server listening on 127.0.0.1:45989
localai-1  | 10:41AM DBG GRPC Service Ready
localai-1  | 10:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00083d958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:16 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr ggml_cuda_init: found 1 CUDA devices:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr   Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4090) - 22994 MiB free
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr gguf_init_from_file: failed to open GGUF file '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr llama_model_load: error loading model: llama_model_loader: failed to load model from /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr llama_model_load_from_file_impl: failed to load model
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:45989): stderr common_init_from_params: failed to load model '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
localai-1  | 10:41AM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc = 
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: llama-cpp-fallback): {backendString:llama-cpp model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
localai-1  | 10:41AM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:36865'
localai-1  | 10:41AM DBG GRPC Service state dir: /tmp/go-processmanager2385696971
localai-1  | 10:41AM DBG GRPC Service Started
localai-1  | 10:41AM DBG Wait for the service to start up
localai-1  | 10:41AM DBG Options: ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:16
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr I0000 00:00:1745923264.697749     127 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache, work_serializer_dispatch
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr I0000 00:00:1745923264.697987     127 ev_epoll1_linux.cc:125] grpc epoll fd: 3       
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr I0000 00:00:1745923264.698466     127 server_builder.cc:392] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr I0000 00:00:1745923264.700325     127 ev_epoll1_linux.cc:359] grpc epoll fd: 5       
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr I0000 00:00:1745923264.700586     127 tcp_socket_utils.cc:634] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stdout Server listening on 127.0.0.1:36865
localai-1  | 10:41AM DBG GRPC Service Ready
localai-1  | 10:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00083d958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:16 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr ggml_cuda_init: found 1 CUDA devices:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr   Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4090) - 22994 MiB free
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr gguf_init_from_file: failed to open GGUF file '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr llama_model_load: error loading model: llama_model_loader: failed to load model from /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr llama_model_load_from_file_impl: failed to load model
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:36865): stderr common_init_from_params: failed to load model '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
localai-1  | 10:41AM INF [llama-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =    
localai-1  | 10:41AM INF [llama-cpp-fallback] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=llama-cpp-fallback modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf 
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: llama-cpp-fallback): {backendString:llama-cpp-fallback model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
localai-1  | 10:41AM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:35511'
localai-1  | 10:41AM DBG GRPC Service state dir: /tmp/go-processmanager1618541458
localai-1  | 10:41AM DBG GRPC Service Started
localai-1  | 10:41AM DBG Wait for the service to start up
localai-1  | 10:41AM DBG Options: ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:16
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr I0000 00:00:1745923266.817606     155 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache, work_serializer_dispatch
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr I0000 00:00:1745923266.817814     155 ev_epoll1_linux.cc:125] grpc epoll fd: 3       
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr I0000 00:00:1745923266.817935     155 server_builder.cc:392] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr I0000 00:00:1745923266.819789     155 ev_epoll1_linux.cc:359] grpc epoll fd: 5       
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr I0000 00:00:1745923266.820053     155 tcp_socket_utils.cc:634] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stdout Server listening on 127.0.0.1:35511
localai-1  | 10:41AM DBG GRPC Service Ready
localai-1  | 10:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00083d958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:16 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr ggml_cuda_init: found 1 CUDA devices:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr   Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4090) - 22994 MiB free
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr gguf_init_from_file: failed to open GGUF file '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr llama_model_load: error loading model: llama_model_loader: failed to load model from /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr llama_model_load_from_file_impl: failed to load model
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35511): stderr common_init_from_params: failed to load model '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
localai-1  | 10:41AM INF [llama-cpp-fallback] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc = 
localai-1  | 10:41AM INF [piper] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=piper modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: piper): {backendString:piper model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/piper
localai-1  | 10:41AM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:37067'
localai-1  | 10:41AM DBG GRPC Service state dir: /tmp/go-processmanager4046227675
localai-1  | 10:41AM DBG GRPC Service Started
localai-1  | 10:41AM DBG Wait for the service to start up
localai-1  | 10:41AM DBG Options: ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:16 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:37067): stderr 2025/04/29 10:41:08 gRPC Server listening at 127.0.0.1:37067
localai-1  | 10:41AM DBG GRPC Service Ready
localai-1  | 10:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00083d958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:16 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
localai-1  | 10:41AM INF [piper] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = unsupported model type /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf (should end with .onnx)
localai-1  | 10:41AM INF [silero-vad] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=silero-vad modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: silero-vad): {backendString:silero-vad model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false} 
localai-1  | 10:41AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/silero-vad
localai-1  | 10:41AM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:37023'
localai-1  | 10:41AM DBG GRPC Service state dir: /tmp/go-processmanager1873321533
localai-1  | 10:41AM DBG GRPC Service Started
localai-1  | 10:41AM DBG Wait for the service to start up
localai-1  | 10:41AM DBG Options: ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:16 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:37023): stderr 2025/04/29 10:41:10 gRPC Server listening at 127.0.0.1:37023
localai-1  | 10:41AM DBG GRPC Service Ready
localai-1  | 10:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00083d958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:16 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
localai-1  | 10:41AM INF [silero-vad] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = create silero detector: failed to create session: Load model from /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf failed:Load model /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf failed. File doesn't exist
localai-1  | 10:41AM INF [stablediffusion-ggml] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=stablediffusion-ggml modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: stablediffusion-ggml): {backendString:stablediffusion-ggml model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/stablediffusion-ggml
localai-1  | 10:41AM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:42251'
localai-1  | 10:41AM DBG GRPC Service state dir: /tmp/go-processmanager2852411407
localai-1  | 10:41AM DBG GRPC Service Started
localai-1  | 10:41AM DBG Wait for the service to start up
localai-1  | 10:41AM DBG Options: ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:16 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr 2025/04/29 10:41:12 gRPC Server listening at 127.0.0.1:42251
localai-1  | 10:41AM DBG GRPC Service Ready
localai-1  | 10:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00083d958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:16 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr Options: []
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr Loading model!
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr SIGSEGV: segmentation violation
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr PC=0x7f8f2b04036c m=0 sigcode=1 addr=0x2eb86458
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr signal arrived during cgo execution
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr goroutine 19 gp=0xc0001fe1c0 m=0 mp=0x12f61a0 [syscall]:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.cgocall(0x878480, 0xc0000ed6a8)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/cgocall.go:167 +0x4b fp=0xc0000ed680 sp=0xc0000ed648 pc=0x47f4ab
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr main._Cfunc_load_model(0x2eba2090, 0x2eba8fd0, 0x10, 0x0)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    _cgo_gotypes.go:127 +0x4b fp=0xc0000ed6a8 sp=0xc0000ed680 pc=0x876ccb
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr main.(*SDGGML).Load.func2(0x2eba2090, 0x2eba8fd0, 0xb780e7?, 0x0)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /build/backend/go/image/stablediffusion-ggml/gosd.go:72 +0x5d fp=0xc0000ed6e0 sp=0xc0000ed6a8 pc=0x8774fd
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr main.(*SDGGML).Load(0xc0001f4c30, 0xc00017c308)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /build/backend/go/image/stablediffusion-ggml/gosd.go:72 +0x585 fp=0xc0000ed818 sp=0xc0000ed6e0 pc=0x8773a5
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr github.com/mudler/LocalAI/pkg/grpc.(*server).LoadModel(0xc000036f40, {0xb53e60?, 0xc0000b3900?}, 0xc00017c308)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /build/pkg/grpc/server.go:50 +0xd5 fp=0xc0000ed8c0 sp=0xc0000ed818 pc=0x83c8b5    
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr github.com/mudler/LocalAI/pkg/grpc/proto._Backend_LoadModel_Handler({0xb53e60, 0xc000036f40}, {0xc39dd8, 0xc000117350}, 0xc000148580, 0x0)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /build/pkg/grpc/proto/backend_grpc.pb.go:415 +0x1a6 fp=0xc0000ed910 sp=0xc0000ed8c0 pc=0x837de6
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001c2600, {0xc39dd8, 0xc0001172c0}, {0xc3d6e0, 0xc000334000}, 0xc000142a20, 0xc0001f4ea0, 0xfda8d0, 0x0)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/server.go:1394 +0xe2b fp=0xc0000edda8 sp=0xc0000ed910 pc=0x81b86b
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc.(*Server).handleStream(0xc0001c2600, {0xc3d6e0, 0xc000334000}, 0xc000142a20)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/server.go:1805 +0xe8b fp=0xc0000edf78 sp=0xc0000edda8 pc=0x82082b
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc.(*Server).serveStreams.func2.1()
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/server.go:1029 +0x7f fp=0xc0000edfe0 sp=0xc0000edf78 pc=0x81957f
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goexit({})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000edfe8 sp=0xc0000edfe0 pc=0x48d1c1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 37       
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/server.go:1040 +0x125
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:424 +0xce fp=0xc0001dfad8 sp=0xc0001dfab8 pc=0x4857ae
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.netpollblock(0xc0001dfb28?, 0x41dce6?, 0x0?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/netpoll.go:575 +0xf7 fp=0xc0001dfb10 sp=0xc0001dfad8 pc=0x44a3b7
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr internal/poll.runtime_pollWait(0x7f8ee3cc7f08, 0x72)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/netpoll.go:351 +0x85 fp=0xc0001dfb30 sp=0xc0001dfb10 pc=0x484aa5
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr internal/poll.(*pollDesc).wait(0xc0001c0500?, 0x900000036?, 0x0)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0001dfb58 sp=0xc0001dfb30 pc=0x4fa5a7
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr internal/poll.(*pollDesc).waitRead(...)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:89
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr internal/poll.(*FD).Accept(0xc0001c0500)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_unix.go:620 +0x295 fp=0xc0001dfc00 sp=0xc0001dfb58 pc=0x4ff975
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr net.(*netFD).accept(0xc0001c0500)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/net/fd_unix.go:172 +0x29 fp=0xc0001dfcb8 sp=0xc0001dfc00 pc=0x63d909
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr net.(*TCPListener).accept(0xc0000ccb00)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/net/tcpsock_posix.go:159 +0x1e fp=0xc0001dfd08 sp=0xc0001dfcb8 pc=0x65461e
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr net.(*TCPListener).Accept(0xc0000ccb00)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/net/tcpsock.go:372 +0x30 fp=0xc0001dfd38 sp=0xc0001dfd08 pc=0x6537f0
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc.(*Server).Serve(0xc0001c2600, {0xc395a0, 0xc0000ccb00})       
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/server.go:884 +0x46c fp=0xc0001dfe98 sp=0xc0001dfd38 pc=0x81804c
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr github.com/mudler/LocalAI/pkg/grpc.StartServer({0x7ffff2bbfa92?, 0xc0000240d0?}, {0xc40250, 0xc0001f4c30})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /build/pkg/grpc/server.go:250 +0x170 fp=0xc0001dff20 sp=0xc0001dfe98 pc=0x83f550  
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr main.main()
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /build/backend/go/image/stablediffusion-ggml/main.go:17 +0x85 fp=0xc0001dff50 sp=0xc0001dff20 pc=0x8768c5
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.main()
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:272 +0x28b fp=0xc0001dffe0 sp=0xc0001dff50 pc=0x45196b
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goexit({})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0001dffe8 sp=0xc0001dffe0 pc=0x48d1c1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:424 +0xce fp=0xc00009cfa8 sp=0xc00009cf88 pc=0x4857ae
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goparkunlock(...)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:430
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.forcegchelper()
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:337 +0xb3 fp=0xc00009cfe0 sp=0xc00009cfa8 pc=0x451cb3
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goexit({})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009cfe8 sp=0xc00009cfe0 pc=0x48d1c1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr created by runtime.init.7 in goroutine 1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:325 +0x1a
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:424 +0xce fp=0xc00009d780 sp=0xc00009d760 pc=0x4857ae
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goparkunlock(...)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:430
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.bgsweep(0xc0000ca000)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgcsweep.go:277 +0x94 fp=0xc00009d7c8 sp=0xc00009d780 pc=0x43c554
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.gcenable.gowrap1()
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:203 +0x25 fp=0xc00009d7e0 sp=0xc00009d7c8 pc=0x430c85
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goexit({})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009d7e8 sp=0xc00009d7e0 pc=0x48d1c1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr created by runtime.gcenable in goroutine 1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:203 +0x66
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.gopark(0xc0000ca000?, 0xc30bc0?, 0x1?, 0x0?, 0xc000007340?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:424 +0xce fp=0xc00009df78 sp=0xc00009df58 pc=0x4857ae
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goparkunlock(...)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:430
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.(*scavengerState).park(0x12f5240)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00009dfa8 sp=0xc00009df78 pc=0x439f89
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.bgscavenge(0xc0000ca000)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc00009dfc8 sp=0xc00009dfa8 pc=0x43a4fc
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.gcenable.gowrap2()
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:204 +0x25 fp=0xc00009dfe0 sp=0xc00009dfc8 pc=0x430c25
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goexit({})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009dfe8 sp=0xc00009dfe0 pc=0x48d1c1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr created by runtime.gcenable in goroutine 1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:204 +0xa5
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.gopark(0xc00009c648?, 0x4271c5?, 0xb0?, 0x1?, 0xc0000061c0?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:424 +0xce fp=0xc00009c620 sp=0xc00009c600 pc=0x4857ae
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.runfinq()
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mfinal.go:193 +0x107 fp=0xc00009c7e0 sp=0xc00009c620 pc=0x42fd07
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goexit({})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009c7e8 sp=0xc00009c7e0 pc=0x48d1c1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr created by runtime.createfing in goroutine 1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mfinal.go:163 +0x3d
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr goroutine 6 gp=0xc000007dc0 m=nil [chan receive]:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:424 +0xce fp=0xc00009e718 sp=0xc00009e6f8 pc=0x4857ae
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.chanrecv(0xc0000da0e0, 0x0, 0x1)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/chan.go:639 +0x41c fp=0xc00009e790 sp=0xc00009e718 pc=0x4208dc
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.chanrecv1(0x0?, 0x0?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/chan.go:489 +0x12 fp=0xc00009e7b8 sp=0xc00009e790 pc=0x420492
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1732
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1735 +0x2f fp=0xc00009e7e0 sp=0xc00009e7b8 pc=0x433c8f
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goexit({})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009e7e8 sp=0xc00009e7e0 pc=0x48d1c1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr created by unique.runtime_registerUniqueMapCleanup in goroutine 1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1730 +0x96
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr goroutine 35 gp=0xc00033e000 m=nil [select]:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.gopark(0xc000312690?, 0x2?, 0x6?, 0x0?, 0xc000312664?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:424 +0xce fp=0xc000312508 sp=0xc0003124e8 pc=0x4857ae
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.selectgo(0xc000312690, 0xc000312660, 0x7b2836?, 0x0, 0xc000324000?, 0x1)     
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/select.go:335 +0x7a5 fp=0xc000312630 sp=0xc000312508 pc=0x463765
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc000300040, 0x1)    
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:412 +0x108 fp=0xc0003126c0 sp=0xc000312630 pc=0x791788
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc000308100)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:575 +0x7b fp=0xc000312720 sp=0xc0003126c0 pc=0x791f5b
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func2()
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:335 +0xde fp=0xc0003127e0 sp=0xc000312720 pc=0x7a8c7e
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goexit({})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0003127e8 sp=0xc0003127e0 pc=0x48d1c1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 34
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:333 +0x18be
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr goroutine 36 gp=0xc00033e1c0 m=nil [select]:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.gopark(0xc000312f40?, 0x4?, 0x60?, 0x0?, 0xc000312ec0?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:424 +0xce fp=0xc0000aed50 sp=0xc0000aed30 pc=0x4857ae
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.selectgo(0xc0000aef40, 0xc000312eb8, 0x0?, 0x0, 0x0?, 0x1)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/select.go:335 +0x7a5 fp=0xc0000aee78 sp=0xc0000aed50 pc=0x463765
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc/internal/transport.(*http2Server).keepalive(0xc000334000)     
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:1183 +0x1f3 fp=0xc0000aefc8 sp=0xc0000aee78 pc=0x7afed3
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc/internal/transport.NewServerTransport.gowrap1()
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:356 +0x25 fp=0xc0000aefe0 sp=0xc0000aefc8 pc=0x7a8b65
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goexit({})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000aefe8 sp=0xc0000aefe0 pc=0x48d1c1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 34
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:356 +0x1905
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr goroutine 37 gp=0xc00033e380 m=nil [IO wait]:
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.gopark(0x45b41f?, 0xc0000b4a28?, 0x60?, 0xb4?, 0xb?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:424 +0xce fp=0xc0000b4a18 sp=0xc0000b49f8 pc=0x4857ae
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.netpollblock(0x4dd3d8?, 0x41dce6?, 0x0?)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/netpoll.go:575 +0xf7 fp=0xc0000b4a50 sp=0xc0000b4a18 pc=0x44a3b7
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr internal/poll.runtime_pollWait(0x7f8ee3cc7e00, 0x72)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/netpoll.go:351 +0x85 fp=0xc0000b4a70 sp=0xc0000b4a50 pc=0x484aa5
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr internal/poll.(*pollDesc).wait(0xc000308000?, 0xc00031a000?, 0x0)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000b4a98 sp=0xc0000b4a70 pc=0x4fa5a7
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr internal/poll.(*pollDesc).waitRead(...)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:89
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr internal/poll.(*FD).Read(0xc000308000, {0xc00031a000, 0x8000, 0x8000})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_unix.go:165 +0x27a fp=0xc0000b4b30 sp=0xc0000b4a98 pc=0x4fb89a
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr net.(*netFD).Read(0xc000308000, {0xc00031a000?, 0x1060100000000?, 0x8?})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/net/fd_posix.go:55 +0x25 fp=0xc0000b4b78 sp=0xc0000b4b30 pc=0x63b945
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr net.(*conn).Read(0xc00030c000, {0xc00031a000?, 0x800010601?, 0x0?})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/net/net.go:189 +0x45 fp=0xc0000b4bc0 sp=0xc0000b4b78 pc=0x64b605
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr net.(*TCPConn).Read(0x0?, {0xc00031a000?, 0xc0000b4c18?, 0x48932d?})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    <autogenerated>:1 +0x25 fp=0xc0000b4bf0 sp=0xc0000b4bc0 pc=0x65f6c5
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr bufio.(*Reader).Read(0xc000318000, {0xc00032c040, 0x9, 0xc0000a2e08?})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/bufio/bufio.go:241 +0x197 fp=0xc0000b4c28 sp=0xc0000b4bf0 pc=0x5a2497
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr io.ReadAtLeast({0xc36620, 0xc000318000}, {0xc00032c040, 0x9, 0x9}, 0x9)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/io/io.go:335 +0x90 fp=0xc0000b4c70 sp=0xc0000b4c28 pc=0x4d3970
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr io.ReadFull(...)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/io/io.go:354
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr golang.org/x/net/http2.readFrameHeader({0xc00032c040, 0x9, 0xc00033c018?}, {0xc36620?, 0xc000318000?})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:237 +0x65 fp=0xc0000b4cc0 sp=0xc0000b4c70 pc=0x77f465
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr golang.org/x/net/http2.(*Framer).ReadFrame(0xc00032c000)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:501 +0x85 fp=0xc0000b4d68 sp=0xc0000b4cc0 pc=0x77fba5
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams(0xc000334000, {0xc39dd8, 0xc00030a210}, 0xc00030a240)
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:644 +0x107 fp=0xc0000b4e88 sp=0xc0000b4d68 pc=0x7abae7
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc.(*Server).serveStreams(0xc0001c2600, {0xc39d30?, 0x139b440?}, {0xc3d6e0, 0xc000334000}, {0xc3cfb8?, 0xc00030c000?})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/server.go:1023 +0x396 fp=0xc0000b4f70 sp=0xc0000b4e88 pc=0x819316
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr google.golang.org/grpc.(*Server).handleRawConn.func1()
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/server.go:958 +0x56 fp=0xc0000b4fe0 sp=0xc0000b4f70 pc=0x818ab6
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr runtime.goexit({})
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000b4fe8 sp=0xc0000b4fe0 pc=0x48d1c1
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 34
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr    /root/go/pkg/mod/google.golang.org/[email protected]/server.go:957 +0x1c6
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr rax    0xc5a15e
localai-1  | 10:41AM INF [stablediffusion-ggml] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
localai-1  | 10:41AM INF [whisper] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=whisper modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: whisper): {backendString:whisper model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}       
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr rbx    0x2eb86458
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr rcx    0x2eba8fd8
localai-1  | 10:41AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper
localai-1  | 10:41AM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:38257'
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr rdx    0x7f8f2b1b8528
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr rdi    0x2eb86458
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr rsi    0xc57eff
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr rbp    0xc57eff
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr rsp    0x7ffff2bbe070
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr r8     0xf
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr r9     0x0
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr r10    0x7f8f2afa2b80
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr r11    0x7f8f2b040340
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr r12    0x2eba8fd8
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr r13    0x7f8f2b1b8528
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr r14    0xc5a15e
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr r15    0xc5a15e
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr rip    0x7f8f2b04036c
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr rflags 0x10202
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr cs     0x33
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr fs     0x0
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:42251): stderr gs     0x0
localai-1  | 10:41AM DBG GRPC Service state dir: /tmp/go-processmanager3731501794
localai-1  | 10:41AM DBG GRPC Service Started
localai-1  | 10:41AM DBG Wait for the service to start up
localai-1  | 10:41AM DBG Options: ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:16 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:38257): stderr 2025/04/29 10:41:15 gRPC Server listening at 127.0.0.1:38257
localai-1  | 10:41AM DBG GRPC Service Ready
localai-1  | 10:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00083d958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:16 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
localai-1  | 10:41AM INF [whisper] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = stat /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf: no such file or directory
localai-1  | 10:41AM INF [bark-cpp] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=bark-cpp modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: bark-cpp): {backendString:bark-cpp model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}     
localai-1  | 10:41AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bark-cpp
localai-1  | 10:41AM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:35133'
localai-1  | 10:41AM DBG GRPC Service state dir: /tmp/go-processmanager4022029646
localai-1  | 10:41AM DBG GRPC Service Started
localai-1  | 10:41AM DBG Wait for the service to start up
localai-1  | 10:41AM DBG Options: ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:16 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35133): stderr 2025/04/29 10:41:16 gRPC Server listening at 127.0.0.1:35133
localai-1  | 10:41AM DBG GRPC Service Ready
localai-1  | 10:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00083d958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:16 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35133): stderr bark_load_model_from_file: failed to open '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35133): stderr bark_load_model: failed to load model weights from '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf'
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:35133): stderr load_model: Could not load model
localai-1  | 10:41AM INF [bark-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = inference failed
localai-1  | 10:41AM INF [huggingface] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=huggingface modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf        
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: huggingface): {backendString:huggingface model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/huggingface
localai-1  | 10:41AM DBG GRPC Service for gpt-4 will be running at: '127.0.0.1:41651'
localai-1  | 10:41AM DBG GRPC Service state dir: /tmp/go-processmanager858785259
localai-1  | 10:41AM DBG GRPC Service Started
localai-1  | 10:41AM DBG Wait for the service to start up
localai-1  | 10:41AM DBG Options: ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:16 LibrarySearchPath:"/tmp/localai/backend_data/backend-assets/espeak-ng-data"
localai-1  | 10:41AM DBG GRPC(gpt-4-127.0.0.1:41651): stderr 2025/04/29 10:41:18 gRPC Server listening at 127.0.0.1:41651
localai-1  | 10:41AM DBG GRPC Service Ready
localai-1  | 10:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00083d958} sizeCache:0 unknownFields:[] Model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf ContextSize:4096 Seed:1146640031 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:16 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
localai-1  | 10:41AM INF [huggingface] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = no huggingface token provided
localai-1  | 10:41AM INF [/build/backend/python/bark/run.sh] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=/build/backend/python/bark/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/bark/run.sh): {backendString:/build/backend/python/bark/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM INF [/build/backend/python/bark/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/bark/run.sh
localai-1  | 10:41AM INF [/build/backend/python/faster-whisper/run.sh] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=/build/backend/python/faster-whisper/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/faster-whisper/run.sh): {backendString:/build/backend/python/faster-whisper/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM INF [/build/backend/python/faster-whisper/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/faster-whisper/run.sh
localai-1  | 10:41AM INF [/build/backend/python/kokoro/run.sh] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=/build/backend/python/kokoro/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/kokoro/run.sh): {backendString:/build/backend/python/kokoro/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM INF [/build/backend/python/kokoro/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/kokoro/run.sh
localai-1  | 10:41AM INF [/build/backend/python/vllm/run.sh] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=/build/backend/python/vllm/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/vllm/run.sh): {backendString:/build/backend/python/vllm/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM INF [/build/backend/python/vllm/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vllm/run.sh
localai-1  | 10:41AM INF [/build/backend/python/coqui/run.sh] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=/build/backend/python/coqui/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/coqui/run.sh): {backendString:/build/backend/python/coqui/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM INF [/build/backend/python/coqui/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh
localai-1  | 10:41AM INF [/build/backend/python/diffusers/run.sh] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=/build/backend/python/diffusers/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/diffusers/run.sh): {backendString:/build/backend/python/diffusers/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM INF [/build/backend/python/diffusers/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/diffusers/run.sh
localai-1  | 10:41AM INF [/build/backend/python/exllama2/run.sh] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=/build/backend/python/exllama2/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/exllama2/run.sh): {backendString:/build/backend/python/exllama2/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM INF [/build/backend/python/exllama2/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama2/run.sh
localai-1  | 10:41AM INF [/build/backend/python/rerankers/run.sh] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=/build/backend/python/rerankers/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/rerankers/run.sh): {backendString:/build/backend/python/rerankers/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM INF [/build/backend/python/rerankers/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/rerankers/run.sh
localai-1  | 10:41AM INF [/build/backend/python/autogptq/run.sh] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=/build/backend/python/autogptq/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/autogptq/run.sh): {backendString:/build/backend/python/autogptq/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM INF [/build/backend/python/autogptq/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/autogptq/run.sh
localai-1  | 10:41AM INF [/build/backend/python/transformers/run.sh] Attempting to load
localai-1  | 10:41AM INF BackendLoader starting backend=/build/backend/python/transformers/run.sh modelID=gpt-4 o.model=localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading model in memory from file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
localai-1  | 10:41AM DBG Loading Model gpt-4 with gRPC (file: /build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf) (backend: /build/backend/python/transformers/run.sh): {backendString:/build/backend/python/transformers/run.sh model:localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf modelID:gpt-4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000d34008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
localai-1  | 10:41AM INF [/build/backend/python/transformers/run.sh] Fails: failed to load model with internal loader: backend not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers/run.sh
localai-1  | 10:41AM INF Success ip=127.0.0.1 latency="12.615µs" method=GET status=200 url=/readyz

@richiejp
Copy link
Collaborator

It appears the exact point it goes wrong is gguf_init_from_file: failed to open GGUF file '/build/models/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf. At least for when the first backend it tries fails (llama.cpp).

gguf_init_from_file calls ggml_fopen which calls fopen to open the file as read only.

Unfortunately Llama doesn't log what error it failed with, but probably it is the same as what @bstone108 determined and the file path doesn't exist.

I have created a PR #5276 that may fix this because there is a typo in the "gpt-4" model definition for aio cuda images. However if this is the cause it should be possible to avoid it by selecting a different model from the gallery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unconfirmed
Projects
None yet
Development

No branches or pull requests

5 participants