Text to Speech
Kokoro TTS
Agent speaks back in 40+ natural voices via Kokoro TTS.
Give your OpenClaw agent (formerly ClawdBot) every sense it needs — text-to-speech, transcription, image generation, and embeddings. One API key, zero GPU setup. Pay-per-use at up to 20× lower cost than centralized providers.
One API key unlocks voice, hearing, vision, and memory for your OpenClaw agent. Check the full model list.
Kokoro TTS
Agent speaks back in 40+ natural voices via Kokoro TTS.
Whisper Large V3
Understands voice messages and video content via Whisper Large V3.
FLUX, Qwen Image
Creates visuals from prompts with FLUX and Qwen Image models.
BGE M3
Builds long-term memory and knowledge bases via BGE M3.
From zero to a voice-enabled OpenClaw agent in minutes.
Follow the setup guide at openclaw.ai. No GPU required on your machine.
Grab your free key and paste it into the OpenClaw config file. One key covers all AI capabilities.
Ask your agent to transcribe a voice message, speak a response, or generate an image. It just works.
A freshly installed OpenClaw agent gains voice, ears, and vision in three config edits — and immediately starts replying with audio, transcribing voice notes, and generating images from prompts.
$5 credits included. No subscription, no GPU headaches.
Everything you need to know