providersllama-cpplocal-aiprivacy

llama.cpp: Raw Local Performance

Ishi Labs•January 17, 2026•1 min read

llama.cpp: Raw Local Performance

llama.cpp: Raw Local Performance

llama.cpp provides the fastest local inference. Ishi connects directly for zero overhead.

Why llama.cpp?

Maximum Speed — No wrapper overhead
Full Control — All quantization options
Minimal Resources — Efficient memory usage
GGUF Support — Industry-standard format

Setup

Run llama.cpp server:

./server -m model.gguf --port 8080

Configure Ishi:

{
  "provider": "llamacpp",
  "baseUrl": "http://localhost:8080"
}

Get started: Download Ishi | llama.cpp Docs

Try Ishi Today

Download Ishi and start automating your workflow with the Glass Box philosophy.