Instant Inference with Groq

Groq's LPU (Language Processing Unit) delivers inference speeds that feel like magic. Here's how to get 500+ tok/s with Ishi.

Why Groq?

Blazing fast — 500+ tokens/second
Free tier — Generous rate limits
Open models — Llama, Mixtral, Gemma
Real-time feel — Responses appear instantly

Quick Setup

{
  "provider": "groq",
  "model": "llama-3.3-70b-versatile"
}

Get your API key at console.groq.com.

Speed Comparison

| Provider | Speed | Model | |----------|-------|-------| | Groq | 500+ tok/s | Llama 3.3 70B | | OpenAI | 50-80 tok/s | GPT-4o | | Anthropic | 40-60 tok/s | Claude Sonnet | | Ollama (local) | 15-40 tok/s | Llama 3.2 |

Perfect For

Iterative workflows — Rapid back-and-forth
Real-time processing — Live document streaming
High-volume tasks — Batch file processing
Development — Quick testing and iteration

Get started: Download Ishi | Groq Docs

Instant Inference with Groq

Why Groq?

Quick Setup

Speed Comparison

Perfect For

Try Ishi Today