On this page
article
Model compatibility table
Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See the advanced section for more details.
Text Generation & Language Models
Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
---|---|---|---|---|---|---|
llama.cpp | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many others | yes | GPT and Functions | yes | yes | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU |
vLLM | Various GPTs and quantization formats | yes | GPT | no | no | CUDA 12, ROCm, Intel |
transformers | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CUDA 11/12, ROCm, Intel, CPU |
exllama2 | GPTQ | yes | GPT only | no | no | CUDA 12 |
MLX | Various LLMs | yes | GPT | no | no | Metal (Apple Silicon) |
MLX-VLM | Vision-Language Models | yes | Multimodal GPT | no | no | Metal (Apple Silicon) |
langchain-huggingface | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
Audio & Speech Processing
Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
---|---|---|---|---|---|---|
whisper.cpp | whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU |
faster-whisper | whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel, CPU |
piper (binding) | Any piper onnx model | no | Text to voice | no | no | CPU |
bark | bark | no | Audio generation | no | no | CUDA 12, ROCm, Intel |
bark-cpp | bark | no | Audio-Only | no | no | CUDA, Metal, CPU |
coqui | Coqui TTS | no | Audio generation and Voice cloning | no | no | CUDA 12, ROCm, Intel, CPU |
kokoro | Kokoro TTS | no | Text-to-speech | no | no | CUDA 12, ROCm, Intel, CPU |
chatterbox | Chatterbox TTS | no | Text-to-speech | no | no | CUDA 11/12, CPU |
kitten-tts | Kitten TTS | no | Text-to-speech | no | no | CPU |
silero-vad with Golang bindings | Silero VAD | no | Voice Activity Detection | no | no | CPU |
Image & Video Generation
Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
---|---|---|---|---|---|---|
stablediffusion.cpp | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | CUDA 12, Intel SYCL, Vulkan, CPU |
diffusers | SD, various diffusion models,… | no | Image/Video generation | no | no | CUDA 11/12, ROCm, Intel, Metal, CPU |
transformers-musicgen | MusicGen | no | Audio generation | no | no | CUDA, CPU |
Specialized AI Tasks
Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
---|---|---|---|---|---|---|
rfdetr | RF-DETR | no | Object Detection | no | no | CUDA 12, Intel, CPU |
rerankers | Reranking API | no | Reranking | no | no | CUDA 11/12, ROCm, Intel, CPU |
local-store | Vector database | no | Vector storage | yes | no | CPU |
huggingface | HuggingFace API models | yes | Various AI tasks | yes | yes | API-based |
Acceleration Support Summary
GPU Acceleration
- NVIDIA CUDA: CUDA 11.7, CUDA 12.0 support across most backends
- AMD ROCm: HIP-based acceleration for AMD GPUs
- Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
- Vulkan: Cross-platform GPU acceleration
- Metal: Apple Silicon GPU acceleration (M1/M2/M3+)
Specialized Hardware
- NVIDIA Jetson (L4T): ARM64 support for embedded AI
- Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+
- Darwin x86: Intel Mac support
CPU Optimization
- AVX/AVX2/AVX512: Advanced vector extensions for x86
- Quantization: 4-bit, 5-bit, 8-bit integer quantization support
- Mixed Precision: F16/F32 mixed precision support
Note: any backend name listed above can be used in the backend
field of the model configuration file (See the advanced section).
- * Only for CUDA and OpenVINO CPU/XPU acceleration.
Last updated 24 Aug 2025, 20:09 +0200 .