Skip to content

LocalAI returns Server error error="could not load model: rpc error: code = Unavailable desc = error reading from server: EOF" #2692

@CyberGWJ

Description

@CyberGWJ

LocalAI version:
Fresh intall latest version

Environment, CPU architecture, OS, and Version:
Proxmox VM: Linux localAI 6.1.0-22-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21) x86_64 GNU/Linux

Describe the bug
When trying to chat using the statement below. Localai returns the error.

curl http://localhost:8989/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Luna-AI-Llama2-Uncensored-GGUF",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'

To Reproduce
Do a fresh install on a VM (Proxmox) and do the following steps.

1.  apt install curl git (git may not be needed but I normaly install it)
2.  curl https://localai.io/install.sh | PORT=8989 USE_AIO=false sh (I have tried USE_AIO=true and get the same results)
3. nano /etc/localai.env and add GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]
4. nano /usr/share/local-ai/models/Luna-AI-Llama2-Uncensored-GGUF.yaml

name: Luna-AI-Llama2-Uncensored-GGUF
context_size: 2048
trimsuffix:

  • "\n"
    mmap: false
    parameters:
    model: huggingface://TheBloke/Luna-AI-Llama2-Uncensored-GGUF/luna-ai-llama2-uncensored.Q5_K_M.gguf
    top_k: 80
    temperature: 0.2
    top_p: 0.7
    backend: llama
    roles:
    assistant: 'ASSISTANT:'
    system: 'SYSTEM:'
    user: 'USER:'
    template:
    chat: lunademo-chat
    completion: lunademo-completion

5. nano /usr/share/local-ai/models/lunademo-chat.tmpl

USER: {{.Input}}

ASSISTANT:

6. nano /usr/share/local-ai/models/lunademo-completion.tmpl

Complete the following sentence: {{.Input}}

7. systemctl stop local-ai
8. local-ai --debug

Expected behavior
Should return a response to the chat

Logs
root@localAI:~# local-ai --debug
8:30AM INF env file found, loading environment variables from file envFile=/etc/localai.env
8:30AM DBG Setting logging to debug
8:30AM INF Starting LocalAI using 8 threads, with models path: /usr/share/local-ai/models
8:30AM INF LocalAI version: ()
8:30AM DBG CPU capabilities: [aes apic clflush cmov constant_tsc cpuid cpuid_fault cx16 cx8 de fpu fxsr ht hypervisor lahf_lm lm mca mce mmx msr mtrr nopl nx pae pat pge pni popcnt pse pse36 pti sep sse sse2 sse4_1 sse4_2 ssse3 syscall tsc tsc_known_freq x2apic xtopology]
8:30AM DBG GPU count: 1
8:30AM DBG GPU: card #0 @0000:00:02.0 -> driver: 'bochs-drm' class: 'Display controller' vendor: 'unknown' product: 'unknown'
8:30AM DBG guessDefaultsFromFile: template already set name=Luna-AI-Llama2-Uncensored-GGUF
8:30AM INF Preloading models from /usr/share/local-ai/models

Model name: Luna-AI-Llama2-Uncensored-GGUF

8:30AM DBG Model: Luna-AI-Llama2-Uncensored-GGUF (config: {PredictionOptions:{Model:f83553a34a79b75aca661acbf73b8d62 Language: Translate:false N:0 TopP:0xc000cf8de0 TopK:0xc000cf8db8 Temperature:0xc000cf8dc0 Maxtokens:0xc000cf8e98 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000cf8e90 TypicalP:0xc000cf8e88 Seed:0xc000cf8eb0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:Luna-AI-Llama2-Uncensored-GGUF F16:0xc000cf8e50 Threads:0xc000cf8e48 Debug:0xc000cf8ea8 Roles:map[assistant:ASSISTANT: system:SYSTEM: user:USER:] Embeddings:false Backend:llama TemplateConfig:{Chat:lunademo-chat ChatMessage: Completion:lunademo-completion Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionName:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000cf8e80 MirostatTAU:0xc000cf8e78 Mirostat:0xc000cf8e70 NGPULayers:0xc000cf8ea0 MMap:0xc000cf8cb8 MMlock:0xc000cf8ea9 LowVRAM:0xc000cf8ea9 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[
] ContextSize:0xc000cf8ca8 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: VallE:{AudioPath:}} CUDA:false DownloadFiles:[] Description: Usage:})
8:30AM DBG Extracting backend assets files to /tmp/localai/backend_data
8:30AM DBG processing api keys runtime update
8:30AM DBG processing external_backends.json
8:30AM DBG external backends loaded from external_backends.json
8:30AM INF core/startup process completed!
8:30AM DBG No configuration file found at /tmp/localai/upload/uploadedFiles.json
8:30AM DBG No configuration file found at /tmp/localai/config/assistants.json
8:30AM DBG No configuration file found at /tmp/localai/config/assistantsFile.json
8:30AM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8989
8:30AM DBG Request received: {"model":"Luna-AI-Llama2-Uncensored-GGUF","language":"","translate":false,"n":0,"top_p":null,"top_k":null,"temperature":0.9,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"repeat_last_n":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"How are you?"}],"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"grammar_json_name":null,"backend":"","model_base_name":""}
8:30AM DBG guessDefaultsFromFile: template already set name=Luna-AI-Llama2-Uncensored-GGUF
8:30AM DBG Configuration read: &{PredictionOptions:{Model:f83553a34a79b75aca661acbf73b8d62 Language: Translate:false N:0 TopP:0xc000cf8de0 TopK:0xc000cf8db8 Temperature:0xc000515930 Maxtokens:0xc000cf8e98 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000cf8e90 TypicalP:0xc000cf8e88 Seed:0xc000cf8eb0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:Luna-AI-Llama2-Uncensored-GGUF F16:0xc000cf8e50 Threads:0xc000cf8e48 Debug:0xc000515a20 Roles:map[assistant:ASSISTANT: system:SYSTEM: user:USER:] Embeddings:false Backend:llama TemplateConfig:{Chat:lunademo-chat ChatMessage: Completion:lunademo-completion Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionName:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000cf8e80 MirostatTAU:0xc000cf8e78 Mirostat:0xc000cf8e70 NGPULayers:0xc000cf8ea0 MMap:0xc000cf8cb8 MMlock:0xc000cf8ea9 LowVRAM:0xc000cf8ea9 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[
] ContextSize:0xc000cf8ca8 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: VallE:{AudioPath:}} CUDA:false DownloadFiles:[] Description: Usage:}
8:30AM DBG Parameters: &{PredictionOptions:{Model:f83553a34a79b75aca661acbf73b8d62 Language: Translate:false N:0 TopP:0xc000cf8de0 TopK:0xc000cf8db8 Temperature:0xc000515930 Maxtokens:0xc000cf8e98 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000cf8e90 TypicalP:0xc000cf8e88 Seed:0xc000cf8eb0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:Luna-AI-Llama2-Uncensored-GGUF F16:0xc000cf8e50 Threads:0xc000cf8e48 Debug:0xc000515a20 Roles:map[assistant:ASSISTANT: system:SYSTEM: user:USER:] Embeddings:false Backend:llama TemplateConfig:{Chat:lunademo-chat ChatMessage: Completion:lunademo-completion Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionName:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000cf8e80 MirostatTAU:0xc000cf8e78 Mirostat:0xc000cf8e70 NGPULayers:0xc000cf8ea0 MMap:0xc000cf8cb8 MMlock:0xc000cf8ea9 LowVRAM:0xc000cf8ea9 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[
] ContextSize:0xc000cf8ca8 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: VallE:{AudioPath:}} CUDA:false DownloadFiles:[] Description: Usage:}
8:30AM DBG Prompt (before templating): USER:How are you?
8:30AM DBG Template found, input modified to: USER: USER:How are you?

ASSISTANT:

8:30AM DBG Prompt (after templating): USER: USER:How are you?

ASSISTANT:

8:30AM INF Loading model 'f83553a34a79b75aca661acbf73b8d62' with backend llama
8:30AM DBG llama-cpp is an alias of llama-cpp
8:30AM DBG Loading model in memory from file: /usr/share/local-ai/models/f83553a34a79b75aca661acbf73b8d62
8:30AM DBG Loading Model f83553a34a79b75aca661acbf73b8d62 with gRPC (file: /usr/share/local-ai/models/f83553a34a79b75aca661acbf73b8d62) (backend: llama-cpp): {backendString:llama model:f83553a34a79b75aca661acbf73b8d62 threads:8 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0009b3688 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
8:30AM INF [llama-cpp] attempting to load with fallback variant
8:30AM DBG ld.so found
8:30AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/lib/ld.so
8:30AM DBG GRPC Service for f83553a34a79b75aca661acbf73b8d62 will be running at: '127.0.0.1:44937'
8:30AM DBG GRPC Service state dir: /tmp/go-processmanager644594595
8:30AM DBG GRPC Service Started
8:30AM DBG GRPC(f83553a34a79b75aca661acbf73b8d62-127.0.0.1:44937): stdout Server listening on 127.0.0.1:44937
8:30AM DBG GRPC Service Ready
8:30AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:f83553a34a79b75aca661acbf73b8d62 ContextSize:2048 Seed:1541718028 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/usr/share/local-ai/models/f83553a34a79b75aca661acbf73b8d62 Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false}
8:30AM ERR Server error error="could not load model: rpc error: code = Unavailable desc = error reading from server: EOF" ip=127.0.0.1 latency=2.045800508s method=POST status=500 url=/v1/chat/completions

FYI This is my first time using localai so if I missed something please let me know. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions