single binary

**Is your feature request related to a problem? Please describe.**

LocalAI should support a single binary instead of multiple options for avx, avx2, cuda, etc

**Describe the solution you'd like**

Support for single binary that can check capabilities and fallback when needed. It should start with GPU by checking libraries, then adjust layers if not enough VRAM, and finally fallback to CPU and adjust instruction set depending on the host capabilities.

This will make AIO simpler as logic will be handled automatically inside the binary.

Subtasks:

- [x] embed avx, avx2 and fallback into localai
- [x] embed cuda into localai 
- [x] auto select cpu runtimes (#2305)
- [x] auto select cuda runtime (#2306)
- [x] #3637
- [ ]  https://github.com/mudler/LocalAI/issues/3541
- [x] #3638

**Describe alternatives you've considered**


**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

single binary #1888

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

single binary #1888

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions