Back to Blog
providersgroqfast-inferencellama
Instant Inference with Groq
Ishi Labs•January 17, 2026•1 min read

Instant Inference with Groq
Groq's LPU (Language Processing Unit) delivers inference speeds that feel like magic. Here's how to get 500+ tok/s with Ishi.
Why Groq?
- Blazing fast — 500+ tokens/second
- Free tier — Generous rate limits
- Open models — Llama, Mixtral, Gemma
- Real-time feel — Responses appear instantly
Quick Setup
{
"provider": "groq",
"model": "llama-3.3-70b-versatile"
}
Get your API key at console.groq.com.
Speed Comparison
| Provider | Speed | Model | |----------|-------|-------| | Groq | 500+ tok/s | Llama 3.3 70B | | OpenAI | 50-80 tok/s | GPT-4o | | Anthropic | 40-60 tok/s | Claude Sonnet | | Ollama (local) | 15-40 tok/s | Llama 3.2 |
Perfect For
- Iterative workflows — Rapid back-and-forth
- Real-time processing — Live document streaming
- High-volume tasks — Batch file processing
- Development — Quick testing and iteration
Get started: Download Ishi | Groq Docs