AI Video Avatar
$5 Free Credits
Build talking AI avatars by chaining text-to-image (FLUX-2 Klein), text-to-speech (Kokoro, Chatterbox), and audio-to-video (LTX-2.3) into one pipeline. Full avatar from ~$0.04, powered by decentralized GPUs at low cost.
Why deAPI for
AI video avatars?
Build a complete talking-head pipeline without stitching together expensive SaaS tools. deAPI gives you direct API access to open-source AI models — image generation, video animation, and voice synthesis — all on decentralized GPU infrastructure at a fraction of the cost. See the full list of models.
3-Step Pipeline
Image generation, video animation, and voice synthesis — chain three API calls to produce a complete talking avatar.
LTX-2.3 Animation
State-of-the-art image-to-video. Natural head movements, blinking, and expressions from a single portrait.
Low Cost
Full avatar from ~$0.04. Decentralized GPUs make talking-head video affordable at any scale.
Open-Source Models
No vendor lock-in. FLUX, LTX, Kokoro, Chatterbox — swap models anytime as better ones emerge.
Three Steps to a Talking Avatar
Each step is a separate API call — compose them into any workflow
Step 1: Generate a Portrait
What it does
Create a photorealistic or stylized portrait from a text description. Define gender, age, ethnicity, clothing, background — everything through a prompt. FLUX-2 Klein delivers high-quality faces in seconds.
API workflow
Single POST to /txt2img
with your prompt. Receive a download URL with the generated portrait.
Use prompt enhancement for optimized results automatically.
Available Models
Fast, high-quality photorealistic portraits
from $0.00141/img
Alternative model for stylized portraits
from $0.00248/img
Optimize prompts for better face generation
Step 2: Generate a Voice
What it does
Generate natural-sounding speech from any text. Choose from multiple voices or clone a custom voice with Chatterbox. The generated audio file will be used in the next step to drive the avatar's animation.
API workflow
POST to /txt2audio
with text content and voice parameters. Receive an audio file URL.
This audio will feed directly into LTX-2.3's audio-to-video endpoint.
Available Models
Natural multilingual speech with preset voices
from $0.77/1M chars
Clone any voice from a short audio sample
from $0.77/1M chars
Advanced multilingual TTS with emotion control
Step 3: Animate with LTX-2.3
What it does
Combine the portrait and the generated audio in one step. LTX-2.3's audio-to-video mode takes an image and an audio file, then produces a video with lip-synced animation, natural head movements, and facial expressions driven by the speech.
API workflow
POST to /aud2video
with the portrait URL, the generated audio URL, and a motion prompt.
Receive a complete talking avatar video — audio and animation combined.
Available Models
Lip-synced animation driven by audio input
from $0.0396/video
Animate portraits without audio (motion only)
from $0.0396/video
Enhance prompts for better animation results
Who Uses AI Video Avatars?
Marketing & Sales
Generate personalized video messages at scale. Create product demos, explainer videos, and social media content with AI presenters — without hiring actors or booking studios.
E-Learning & Training
Build course videos with AI instructors. Translate training materials into any language with localized avatars. Update content instantly without re-recording.
Industries
Onboarding videos, feature announcements, in-app guides
Automated video responses, FAQ avatars, multilingual agents
News anchors, podcast visuals, social media at scale
See the Avatar Pipeline in Action
Watch how deAPI chains text-to-image, text-to-speech, and audio-to-video into a single pipeline. From text prompt to talking avatar in under a minute.
Frequently Asked Questions
/aud2video endpoint — the model produces a video with lip-synced animation driven by the speech. No manual merging or FFmpeg needed.
Create your first AI avatar
in under a minute
Three API calls. One talking avatar. Start with $5 free credits — no subscription, no credit card.
Claim $5 Credits