-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Labels
Description
Is your feature request related to a problem? Please describe.
LocalAI should support a single binary instead of multiple options for avx, avx2, cuda, etc
Describe the solution you'd like
Support for single binary that can check capabilities and fallback when needed. It should start with GPU by checking libraries, then adjust layers if not enough VRAM, and finally fallback to CPU and adjust instruction set depending on the host capabilities.
This will make AIO simpler as logic will be handled automatically inside the binary.
Subtasks:
- embed avx, avx2 and fallback into localai
- embed cuda into localai
- auto select cpu runtimes (feat: auto select llama-cpp cpu variant #2305)
- auto select cuda runtime (feat: auto select llama-cpp cuda runtime #2306)
- better gpu detection by checking cuda libraries in addition to devices #3637
- feat: automatically adjust default gpu_layers by available GPU memory #3541
- compress before embed and decompress when extracting to save space #3638
Describe alternatives you've considered
Additional context