Back to Blog
providersgroqfast-inferencellama

Instant Inference with Groq

Ishi LabsJanuary 17, 20261 min read
Instant Inference with Groq

Instant Inference with Groq

Groq's LPU (Language Processing Unit) delivers inference speeds that feel like magic. Here's how to get 500+ tok/s with Ishi.

Why Groq?

  • Blazing fast — 500+ tokens/second
  • Free tier — Generous rate limits
  • Open models — Llama, Mixtral, Gemma
  • Real-time feel — Responses appear instantly

Quick Setup

{
  "provider": "groq",
  "model": "llama-3.3-70b-versatile"
}

Get your API key at console.groq.com.

Speed Comparison

| Provider | Speed | Model | |----------|-------|-------| | Groq | 500+ tok/s | Llama 3.3 70B | | OpenAI | 50-80 tok/s | GPT-4o | | Anthropic | 40-60 tok/s | Claude Sonnet | | Ollama (local) | 15-40 tok/s | Llama 3.2 |

Perfect For

  • Iterative workflows — Rapid back-and-forth
  • Real-time processing — Live document streaming
  • High-volume tasks — Batch file processing
  • Development — Quick testing and iteration

Get started: Download Ishi | Groq Docs

Try Ishi Today

Download Ishi and start automating your workflow with the Glass Box philosophy.

Download Free