AI Video Avatar
$5 Free Credits

Q: What models are used in the AI Video Avatar pipeline?

The pipeline uses three models: FLUX-2 Klein for portrait generation (text-to-image), Kokoro or Chatterbox for voice synthesis (text-to-speech), and LTX-2.3 for audio-driven animation (audio-to-video). You can also use Z-Image for image generation or Qwen TTS for multilingual voice.

Q: How much does it cost to create one AI avatar?

The full pipeline costs approximately $0.04 per avatar: ~$0.0014 for image generation (FLUX-2 Klein), TTS voice generation, and ~$0.0396 for audio-to-video animation (LTX-2.3). With $5 free credits you can generate around 120 avatars to get started.

Q: Can I clone a specific voice for my avatar?

Yes. Chatterbox supports voice cloning from a short audio sample. Upload a reference clip, and the model generates speech matching that voice — perfect for maintaining a consistent brand voice across all avatar content.

Build talking AI avatars by chaining text-to-image (FLUX-2 Klein), text-to-speech (Kokoro, Chatterbox), and audio-to-video (LTX-2.3) into one pipeline. Full avatar from ~$0.04, powered by decentralized GPUs at low cost.

Get $5 credits View API Docs

No subscription

No credit card required

Why deAPI for
AI video avatars?

Build a complete talking-head pipeline without stitching together expensive SaaS tools. deAPI gives you direct API access to open-source AI models — image generation, video animation, and voice synthesis — all on decentralized GPU infrastructure at a fraction of the cost. See the full list of models.

3-Step Pipeline

Image generation, video animation, and voice synthesis — chain three API calls to produce a complete talking avatar.

LTX-2.3 Animation

State-of-the-art image-to-video. Natural head movements, blinking, and expressions from a single portrait.

Low Cost

Full avatar from ~$0.04. Decentralized GPUs make talking-head video affordable at any scale.

Open-Source Models

No vendor lock-in. FLUX, LTX, Kokoro, Chatterbox — swap models anytime as better ones emerge.

Three Steps to a Talking Avatar

Each step is a separate API call — compose them into any workflow

Step 1: Generate a Portrait

What it does

Create a photorealistic or stylized portrait from a text description. Define gender, age, ethnicity, clothing, background — everything through a prompt. FLUX-2 Klein delivers high-quality faces in seconds.

API workflow

Single POST to /txt2img with your prompt. Receive a download URL with the generated portrait. Use prompt enhancement for optimized results automatically.

Available Models

FLUX-2 Klein Text → Image

Fast, high-quality photorealistic portraits

from $0.00141/img

Z-Image Text → Image

Alternative model for stylized portraits

from $0.00248/img

Prompt Enhancement AI Boost

Optimize prompts for better face generation

Step 2: Generate a Voice

What it does

Generate natural-sounding speech from any text. Choose from multiple voices or clone a custom voice with Chatterbox. The generated audio file will be used in the next step to drive the avatar's animation.

API workflow

POST to /txt2audio with text content and voice parameters. Receive an audio file URL. This audio will feed directly into LTX-2.3's audio-to-video endpoint.

Available Models

Kokoro TTS Text → Speech

Natural multilingual speech with preset voices

from $0.77/1M chars

Chatterbox Voice Clone

Clone any voice from a short audio sample

from $0.77/1M chars

Qwen TTS Multilingual

Advanced multilingual TTS with emotion control

Step 3: Animate with LTX-2.3

What it does

Combine the portrait and the generated audio in one step. LTX-2.3's audio-to-video mode takes an image and an audio file, then produces a video with lip-synced animation, natural head movements, and facial expressions driven by the speech.

API workflow

POST to /aud2video with the portrait URL, the generated audio URL, and a motion prompt. Receive a complete talking avatar video — audio and animation combined.

Available Models

LTX-2.3 Audio → Video

Lip-synced animation driven by audio input

from $0.0396/video

LTX-2.3 Image → Video

Animate portraits without audio (motion only)

from $0.0396/video

Prompt Enhancement AI Boost

Enhance prompts for better animation results

Who Uses AI Video Avatars?

Marketing & Sales

Generate personalized video messages at scale. Create product demos, explainer videos, and social media content with AI presenters — without hiring actors or booking studios.

E-Learning & Training

Build course videos with AI instructors. Translate training materials into any language with localized avatars. Update content instantly without re-recording.

Industries

SaaS & Product B2B

Onboarding videos, feature announcements, in-app guides

Customer Support Automation

Automated video responses, FAQ avatars, multilingual agents

Media & Content Creator

News anchors, podcast visuals, social media at scale

See the Avatar Pipeline in Action

Watch how deAPI chains text-to-image, text-to-speech, and audio-to-video into a single pipeline. From text prompt to talking avatar in under a minute.

Full pipeline from ~$0.04 per avatar

Three API calls, fully automatable

Webhook delivery — no polling needed

Frequently Asked Questions

The pipeline uses three models: FLUX-2 Klein for portrait generation (text-to-image), Kokoro or Chatterbox for voice synthesis (text-to-speech), and LTX-2.3 for audio-driven animation (audio-to-video). You can also use Z-Image for image generation or Qwen TTS for multilingual voice.

The full pipeline costs approximately $0.04 per avatar: ~$0.0014 for image generation (FLUX-2 Klein), TTS voice generation, and ~$0.0396 for audio-to-video animation (LTX-2.3). With $5 free credits you can generate around 120 avatars to get started.

Yes. Skip Step 1 and pass any portrait image URL directly to the image-to-video endpoint (LTX-2.3). This works with photos, AI-generated images, or illustrations.

LTX-2.3 is a state-of-the-art open-source video generation model by Lightricks. It supports image-to-video, text-to-video, and audio-to-video modes. For avatars, the audio-to-video mode is key — it takes a portrait and an audio file, then generates a video with lip-synced animation, head turns, and natural expressions. Available on deAPI from $0.0396 per video.

LTX-2.3's audio-to-video mode handles this automatically. You pass the portrait image and the generated audio file to the /aud2video endpoint — the model produces a video with lip-synced animation driven by the speech. No manual merging or FFmpeg needed.

Yes. Chatterbox supports voice cloning from a short audio sample. Upload a reference clip, and the model generates speech matching that voice — perfect for maintaining a consistent brand voice across all avatar content.

Yes. deAPI's async job pattern with webhook delivery is built for production workloads. Submit hundreds of avatar jobs simultaneously, receive results via webhook when ready. The decentralized GPU infrastructure scales automatically with demand — no capacity planning needed.

Unlike SaaS platforms like HeyGen or Synthesia that require monthly subscriptions, deAPI is pay-per-use with no subscription. You get full API access to open-source models, complete customization of the pipeline, and pay only for what you use — starting from ~$0.04 per avatar.

Create your first AI avatar
in under a minute

Three API calls. One talking avatar. Start with $5 free credits — no subscription, no credit card.

Claim $5 Credits

No subscription

No credit card required

AI Video Avatar
$5 Free Credits

Why deAPI for
AI video avatars?

3-Step Pipeline

LTX-2.3 Animation

Low Cost

Open-Source Models

Three Steps to a Talking Avatar

Step 1: Generate a Portrait

What it does

API workflow

Step 2: Generate a Voice

What it does

API workflow

Step 3: Animate with LTX-2.3

What it does

API workflow

Who Uses AI Video Avatars?

Marketing & Sales

E-Learning & Training

See the Avatar Pipeline in Action

Frequently Asked Questions

What models are used in the AI Video Avatar pipeline?

How much does it cost to create one AI avatar?

Can I use my own photo instead of generating one?

What is LTX-2.3 and why is it used for avatars?

How does the voice sync with lip movements?

Can I clone a specific voice for my avatar?

Is it suitable for production at scale?

How is deAPI different from HeyGen or Synthesia?

Create your first AI avatar
in under a minute

AI Video Avatar $5 Free Credits

Why deAPI for AI video avatars?

3-Step Pipeline

LTX-2.3 Animation

Low Cost

Open-Source Models

Three Steps to a Talking Avatar

Step 1: Generate a Portrait

What it does

API workflow

Step 2: Generate a Voice

What it does

API workflow

Step 3: Animate with LTX-2.3

What it does

API workflow

Who Uses AI Video Avatars?

Marketing & Sales

E-Learning & Training

See the Avatar Pipeline in Action

Frequently Asked Questions

What models are used in the AI Video Avatar pipeline?

How much does it cost to create one AI avatar?

Can I use my own photo instead of generating one?

What is LTX-2.3 and why is it used for avatars?

How does the voice sync with lip movements?

Can I clone a specific voice for my avatar?

Is it suitable for production at scale?

How is deAPI different from HeyGen or Synthesia?

Create your first AI avatarin under a minute

AI Video Avatar
$5 Free Credits

Why deAPI for
AI video avatars?

Create your first AI avatar
in under a minute