AI Video Avatar
$5 Free Credits

Build talking AI avatars by chaining text-to-image (FLUX-2 Klein), text-to-speech (Kokoro, Chatterbox), and audio-to-video (LTX-2.3) into one pipeline. Full avatar from ~$0.04, powered by decentralized GPUs at low cost.

No subscription
No credit card required

Why deAPI for
AI video avatars?

Build a complete talking-head pipeline without stitching together expensive SaaS tools. deAPI gives you direct API access to open-source AI models — image generation, video animation, and voice synthesis — all on decentralized GPU infrastructure at a fraction of the cost. See the full list of models.

3-Step Pipeline

Image generation, video animation, and voice synthesis — chain three API calls to produce a complete talking avatar.

LTX-2.3 Animation

State-of-the-art image-to-video. Natural head movements, blinking, and expressions from a single portrait.

Low Cost

Full avatar from ~$0.04. Decentralized GPUs make talking-head video affordable at any scale.

Open-Source Models

No vendor lock-in. FLUX, LTX, Kokoro, Chatterbox — swap models anytime as better ones emerge.

Three Steps to a Talking Avatar

Each step is a separate API call — compose them into any workflow

Step 1: Generate a Portrait

What it does

Create a photorealistic or stylized portrait from a text description. Define gender, age, ethnicity, clothing, background — everything through a prompt. FLUX-2 Klein delivers high-quality faces in seconds.

API workflow

Single POST to /txt2img with your prompt. Receive a download URL with the generated portrait. Use prompt enhancement for optimized results automatically.

Available Models

FLUX-2 Klein Text → Image

Fast, high-quality photorealistic portraits

from $0.00141/img

Z-Image Text → Image

Alternative model for stylized portraits

from $0.00248/img

Prompt Enhancement AI Boost

Optimize prompts for better face generation

Step 2: Generate a Voice

What it does

Generate natural-sounding speech from any text. Choose from multiple voices or clone a custom voice with Chatterbox. The generated audio file will be used in the next step to drive the avatar's animation.

API workflow

POST to /txt2audio with text content and voice parameters. Receive an audio file URL. This audio will feed directly into LTX-2.3's audio-to-video endpoint.

Available Models

Kokoro TTS Text → Speech

Natural multilingual speech with preset voices

from $0.77/1M chars

Chatterbox Voice Clone

Clone any voice from a short audio sample

from $0.77/1M chars

Qwen TTS Multilingual

Advanced multilingual TTS with emotion control

Step 3: Animate with LTX-2.3

What it does

Combine the portrait and the generated audio in one step. LTX-2.3's audio-to-video mode takes an image and an audio file, then produces a video with lip-synced animation, natural head movements, and facial expressions driven by the speech.

API workflow

POST to /aud2video with the portrait URL, the generated audio URL, and a motion prompt. Receive a complete talking avatar video — audio and animation combined.

Available Models

LTX-2.3 Audio → Video

Lip-synced animation driven by audio input

from $0.0396/video

LTX-2.3 Image → Video

Animate portraits without audio (motion only)

from $0.0396/video

Prompt Enhancement AI Boost

Enhance prompts for better animation results

Who Uses AI Video Avatars?

Marketing & Sales

Generate personalized video messages at scale. Create product demos, explainer videos, and social media content with AI presenters — without hiring actors or booking studios.

E-Learning & Training

Build course videos with AI instructors. Translate training materials into any language with localized avatars. Update content instantly without re-recording.

Industries

SaaS & Product B2B

Onboarding videos, feature announcements, in-app guides

Customer Support Automation

Automated video responses, FAQ avatars, multilingual agents

Media & Content Creator

News anchors, podcast visuals, social media at scale

See the Avatar Pipeline in Action

Watch how deAPI chains text-to-image, text-to-speech, and audio-to-video into a single pipeline. From text prompt to talking avatar in under a minute.

Full pipeline from ~$0.04 per avatar
Three API calls, fully automatable
Webhook delivery — no polling needed

Frequently Asked Questions

The pipeline uses three models: FLUX-2 Klein for portrait generation (text-to-image), Kokoro or Chatterbox for voice synthesis (text-to-speech), and LTX-2.3 for audio-driven animation (audio-to-video). You can also use Z-Image for image generation or Qwen TTS for multilingual voice.
The full pipeline costs approximately $0.04 per avatar: ~$0.0014 for image generation (FLUX-2 Klein), TTS voice generation, and ~$0.0396 for audio-to-video animation (LTX-2.3). With $5 free credits you can generate around 120 avatars to get started.
Yes. Skip Step 1 and pass any portrait image URL directly to the image-to-video endpoint (LTX-2.3). This works with photos, AI-generated images, or illustrations.
LTX-2.3 is a state-of-the-art open-source video generation model by Lightricks. It supports image-to-video, text-to-video, and audio-to-video modes. For avatars, the audio-to-video mode is key — it takes a portrait and an audio file, then generates a video with lip-synced animation, head turns, and natural expressions. Available on deAPI from $0.0396 per video.
LTX-2.3's audio-to-video mode handles this automatically. You pass the portrait image and the generated audio file to the /aud2video endpoint — the model produces a video with lip-synced animation driven by the speech. No manual merging or FFmpeg needed.
Yes. Chatterbox supports voice cloning from a short audio sample. Upload a reference clip, and the model generates speech matching that voice — perfect for maintaining a consistent brand voice across all avatar content.
Yes. deAPI's async job pattern with webhook delivery is built for production workloads. Submit hundreds of avatar jobs simultaneously, receive results via webhook when ready. The decentralized GPU infrastructure scales automatically with demand — no capacity planning needed.
Unlike SaaS platforms like HeyGen or Synthesia that require monthly subscriptions, deAPI is pay-per-use with no subscription. You get full API access to open-source models, complete customization of the pipeline, and pay only for what you use — starting from ~$0.04 per avatar.

Create your first AI avatar
in under a minute

Three API calls. One talking avatar. Start with $5 free credits — no subscription, no credit card.

Claim $5 Credits
No subscription
No credit card required