Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.

Text Generation & Language Models

Backend and BindingsCompatible modelsCompletion/Chat endpointCapabilityEmbeddings supportToken stream supportAcceleration
llama.cppLLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many othersyesGPT and FunctionsyesyesCUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU
vLLMVarious GPTs and quantization formatsyesGPTnonoCUDA 12, ROCm, Intel
transformersVarious GPTs and quantization formatsyesGPT, embeddings, Audio generationyesyes*CUDA 11/12, ROCm, Intel, CPU
exllama2GPTQyesGPT onlynonoCUDA 12
MLXVarious LLMsyesGPTnonoMetal (Apple Silicon)
MLX-VLMVision-Language ModelsyesMultimodal GPTnonoMetal (Apple Silicon)
langchain-huggingfaceAny text generators available on HuggingFace through APIyesGPTnonoN/A

Audio & Speech Processing

Backend and BindingsCompatible modelsCompletion/Chat endpointCapabilityEmbeddings supportToken stream supportAcceleration
whisper.cppwhispernoAudio transcriptionnonoCUDA 12, ROCm, Intel SYCL, Vulkan, CPU
faster-whisperwhispernoAudio transcriptionnonoCUDA 12, ROCm, Intel, CPU
piper (binding)Any piper onnx modelnoText to voicenonoCPU
barkbarknoAudio generationnonoCUDA 12, ROCm, Intel
bark-cppbarknoAudio-OnlynonoCUDA, Metal, CPU
coquiCoqui TTSnoAudio generation and Voice cloningnonoCUDA 12, ROCm, Intel, CPU
kokoroKokoro TTSnoText-to-speechnonoCUDA 12, ROCm, Intel, CPU
chatterboxChatterbox TTSnoText-to-speechnonoCUDA 11/12, CPU
kitten-ttsKitten TTSnoText-to-speechnonoCPU
silero-vad with Golang bindingsSilero VADnoVoice Activity DetectionnonoCPU

Image & Video Generation

Backend and BindingsCompatible modelsCompletion/Chat endpointCapabilityEmbeddings supportToken stream supportAcceleration
stablediffusion.cppstablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMakernoImagenonoCUDA 12, Intel SYCL, Vulkan, CPU
diffusersSD, various diffusion models,…noImage/Video generationnonoCUDA 11/12, ROCm, Intel, Metal, CPU
transformers-musicgenMusicGennoAudio generationnonoCUDA, CPU

Specialized AI Tasks

Backend and BindingsCompatible modelsCompletion/Chat endpointCapabilityEmbeddings supportToken stream supportAcceleration
rfdetrRF-DETRnoObject DetectionnonoCUDA 12, Intel, CPU
rerankersReranking APInoRerankingnonoCUDA 11/12, ROCm, Intel, CPU
local-storeVector databasenoVector storageyesnoCPU
huggingfaceHuggingFace API modelsyesVarious AI tasksyesyesAPI-based

Acceleration Support Summary

GPU Acceleration

  • NVIDIA CUDA: CUDA 11.7, CUDA 12.0 support across most backends
  • AMD ROCm: HIP-based acceleration for AMD GPUs
  • Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
  • Vulkan: Cross-platform GPU acceleration
  • Metal: Apple Silicon GPU acceleration (M1/M2/M3+)

Specialized Hardware

  • NVIDIA Jetson (L4T): ARM64 support for embedded AI
  • Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+
  • Darwin x86: Intel Mac support

CPU Optimization

  • AVX/AVX2/AVX512: Advanced vector extensions for x86
  • Quantization: 4-bit, 5-bit, 8-bit integer quantization support
  • Mixed Precision: F16/F32 mixed precision support

Note: any backend name listed above can be used in the backend field of the model configuration file (See the advanced section).

  • * Only for CUDA and OpenVINO CPU/XPU acceleration.

Last updated 24 Aug 2025, 20:09 +0200 . history