The focused media API
next to Together's full inference platform.

Q: What is the clean split of work between Together and deAPI?

A pattern that works well for multi-provider teams: Together keeps the LLM stack (chat, vision, embeddings, moderation, dedicated-endpoint rerank) and the infrastructure layer (fine-tuning, dedicated endpoints, GPU cluster rental, BYOM, code sandbox) — everything that demands platform-grade controls. deAPI handles the media loop end-to-end: txt2img, img2video, txt2video, txt2speech, txt2music and transcribe, on one Bearer token, one response contract and a distributed GPU pool. Two deAPI-side wins on top of the split: source-agnostic video transcription (YouTube / Kick / Twitch / X URLs or direct upload) that Together does not surface natively, and a per-output billing shape across every modality. Same Bearer on both sides keeps the client config identical.

Together AI is a full platform — 85+ serverless models, fine-tuning, dedicated endpoints, GPU clusters and custom deployment. deAPI is focused: open-weight image, video, speech, music and transcription on a distributed GPU pool under one Bearer token. Same open-source checkpoints, different trade-off — platform breadth vs focused media economics.

Try deAPI free — $5 credits Where they actually differ ↓

Comparison · Updated April 2026
See live pricing

Why teams add deAPI next to Together

Four structural differences between a focused media API and a broad inference platform — each useful in a different shape of team and workload.

A focused product surface

Together ships inference plus fine-tuning plus dedicated endpoints plus GPU clusters plus BYOM plus a code sandbox plus batch — a platform for teams that want to run the full AI stack under one vendor. deAPI intentionally stops at media inference. Fewer concepts to learn, fewer billing meters to reconcile, and a single response contract that covers txt2img, txt2video, txt2speech, txt2music and transcribe.

Distributed supply, not a centralized cloud

Together's tiers run on its own centralized cloud — shared capacity for serverless, per-minute H100-class hardware for dedicated endpoints, and on-demand or reserved NVIDIA HGX hardware (H100 / B200) for cluster rental. deAPI sources GPU-seconds from a global pool of independent providers competing for inference work. Different supply layer, different cost floor for the same open-source image / video / audio checkpoints.

Per-output billing, no infra meter layered on top

Running media on Together can involve several billing meters at once: per-token for vision, per-image for generation, per-character for TTS, per-minute for transcription, plus per-minute dedicated endpoints, per-hour GPU clusters and per-vCPU-hour sandbox if you use them. deAPI collapses media to one per-output meter per modality, with no separate infrastructure layer and no per-seat tier gating the rate.

Source-agnostic video transcription

Both platforms run Whisper Large V3 for speech-to-text. deAPI's transcribe endpoint additionally accepts a URL on the social stack — YouTube, Kick, Twitch, X (Twitter) posts and Spaces — as well as direct file upload. One request, no out-of-band ffmpeg / yt-dlp step, same request_id polling as every other modality.

When to use deAPI for media

You do not need an LLM hosted by the same vendor — chat, vision, rerank and moderation happen elsewhere, or they are not part of your product at all.
You are running a high-volume media loop — image batches, long-form video generation, audiobook-scale TTS, transcription pipelines — and the economics of a distributed GPU pool beat rented H100-class capacity.
Video transcription is part of the pipeline and the source is a URL on YouTube / Kick / Twitch / X, not just a file you have already pulled down.
You want a single response contract across txt2img, img2video, txt2speech, txt2music and transcribe — one retry handler, one webhook consumer, one SDK surface.
You do not need fine-tuning, BYOM deployment, dedicated endpoints, GPU cluster rental or a code sandbox for this workload — you need predictable media output against mainstream open-source checkpoints.

When Together AI is the right tool

LLM inference is core to your product — DeepSeek V3.1, Llama 3.3, Qwen3.5, MiniMax, Kimi, GPT OSS — and you want one Bearer token for chat, vision, moderation, rerank and embeddings.
You fine-tune models (LoRA or full) as part of your workflow and want the training pipeline on the same account as the inference.
You need dedicated single-tenant endpoints on H100-class hardware with predictable capacity — or on-demand and reserved GPU cluster rental for training and experimentation.
You ship a custom model (BYOM) and want to run it on managed infrastructure, or you need the code sandbox / code interpreter for agent-side execution.
Your stack is already on OpenAI-compatible /chat/completions conventions and the drop-in compatibility shortens the integration path.

deAPI vs Together AI at a glance

Both are Bearer; both host a lot of the same open-source checkpoints. The real differences are scope, GPU supply and billing shape. Every claim verified against public product docs as of April 2026.

Dimension

deAPI

Together AI

Core positioning

Focused open-source media inference

Broad inference platform (LLM + media + infra)

Catalog size

Curated media-model shortlist

~85+ serverless models across 8 categories

LLM hosting

Not currently — media only

DeepSeek V3.1, Llama 3.3, Qwen3.5, MiniMax, Kimi, GPT OSS…

Overlap on open-source media

FLUX family, Whisper Large V3, Kokoro and others — same underlying checkpoints

Same checkpoints, on centralized GPU supply

Fine-tuning

Not currently supported

LoRA + full fine-tuning, per-token training metering

Dedicated single-tenant endpoints

Not offered

H100-class hardware, per-minute billing

GPU cluster rental

Not offered

On-demand + reserved on NVIDIA HGX hardware (H100 / B200)

Custom model deployment (BYOM)

Curated catalog only

Supported

Source-agnostic video transcription

URL or upload — YouTube / Kick / Twitch / X Spaces natively

File upload or direct audio URL, no native social-source ingest — per-audio-minute metering

GPU supply

Distributed global pool of independent operators

Centralized cloud — H100-class / B200 hardware

Auth header

Authorization: Bearer

Authorization: Bearer (parity — no header change on migration)

API schema

Unified per-modality contract (/v1/client/txt2img, txt2video, txt2speech…)

OpenAI-compatible (/chat/completions, /images/generations…)

Billing shape

Per output, one invoice

Per-token / per-image / per-character / per-video / per-audio-minute serverless + per-minute dedicated + per-hour clusters + per-token fine-tuning + per-vCPU-hour sandbox

Free credits on signup

$5, no credit card

Trial credits available

Both platforms iterate frequently — pricing numbers intentionally omitted. Always verify current capabilities on each vendor's live docs.

Same Bearer, different request contract

Together's serverless inference mostly follows OpenAI-compatible routes (/chat/completions, /images/generations). deAPI uses one unified per-modality contract across every endpoint. That is the shape of the diff.

Keep the auth header — both sides use Authorization: Bearer <key>. Nothing to rewrite in HTTP client config.
Flip the base URL from api.together.xyz/v1 to api.deapi.ai/api/v1/client and remap the endpoint. /images/generations → /txt2img; Whisper transcribe → /transcribe.
Map Together model slugs to deAPI slugs via GET /api/v1/client/models (for example black-forest-labs/FLUX.1-schnell → Flux1schnell). Response shape is async by default — poll /request-status/{request_id} or pass a webhook_url on submit.

Before · Together images/generations

curl -s -X POST \
  https://api.together.xyz/v1/images/generations \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model":  "black-forest-labs/FLUX.1-schnell",
    "prompt": "Futuristic city at sunset",
    "width":  1536,
    "height": 896,
    "steps":  4
  }'

After · deAPI txt2img

curl -s -X POST \
  https://api.deapi.ai/api/v1/client/txt2img \
  -H "Authorization: Bearer $DEAPI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model":  "Flux1schnell",
    "prompt": "Futuristic city at sunset",
    "width":  1536,
    "height": 896,
    "steps":  4,
    "seed":   42
  }'

Frequently asked questions

As a focused media-inference layer that pairs cleanly with Together's platform surface. deAPI exposes image, video, speech, music and transcription on a distributed GPU pool under one response contract, with social-source video ingest (YouTube / Kick / Twitch / X) built in. Together covers the same open-source media checkpoints inside a wider platform that also includes LLM serverless, fine-tuning, dedicated endpoints, GPU cluster rental and BYOM. Because both sides use Authorization: Bearer, most teams end up running them side-by-side: Together where the platform features are load-bearing, deAPI where the distributed-GPU supply and per-output billing change the unit economics on high-volume media workloads.

Scope is a feature here. deAPI intentionally stops at media inference — image, video, speech, music, transcription — so the product can optimize for one thing: running open-source media models on a distributed GPU pool with one Bearer token, one response shape and no infrastructure meter layered on top. Teams that need LLMs, fine-tuning, moderation or rerank keep Together's /chat/completions and dedicated endpoints for that side; deAPI takes the media loop. One /chat/completions call on Together, one /v1/client/txt2img or /transcribe on deAPI — same auth strategy in your HTTP client, two narrowly focused vendors.

Yes — meaningfully, and that is what makes pairing them straightforward. Both host Flux-family image models (including FLUX.1 schnell and FLUX.2), Whisper Large V3 for speech-to-text, and Kokoro for text-to-speech, among others. Same underlying open-source checkpoints, two different supply layers — Together on centralized H100-class cloud, deAPI on a distributed pool of independent operators competing for inference work. Because the weights are shared, your prompts and outputs travel well between the two, and you can route the same workload to whichever side makes better economic sense at a given batch size.

Simpler than most migrations on this shape of stack. Both sides use Authorization: Bearer, so the auth header does not change. Two pieces do: (1) base URL flips from api.together.xyz/v1 to api.deapi.ai/api/v1/client, and (2) the schema switches from Together's OpenAI-compatible shape (/chat/completions, /images/generations) to deAPI's unified per-modality contract — POST /v1/client/txt2img, /v1/client/txt2video, /v1/client/txt2speech, /v1/client/transcribe, /v1/client/txt2music — with a single request_id polling / webhook flow that covers every modality the same way. Map Together model slugs (for example black-forest-labs/FLUX.1-schnell) to deAPI slugs returned by GET /api/v1/client/models. You can also keep both live in production and route per workload; the two do not conflict.

Yes — that is exactly the pairing many teams land on. deAPI's scope is pure shared-capacity inference on a distributed GPU pool, which is what keeps the media per-output billing flat. The infrastructure-grade features stay on Together: LoRA and full fine-tuning, dedicated single-tenant endpoints on H100-class GPUs (per-minute), on-demand and reserved NVIDIA HGX cluster rental (H100 / B200), BYOM deployment and the code sandbox. One Bearer token on each side, clean separation of concerns, no duplication of effort — you use each product for what it is purpose-built for.

deAPI collapses media billing onto one surface. The per-modality metric varies (pixels × steps for image, duration for video / music, characters for TTS, minutes for transcription) but there is one account and one invoice, with no separate meters for infrastructure products. Together meters serverless inference in several units (per million tokens for chat / vision / embeddings / moderation, per image for image generation, per million characters for TTS, per video for video generation, per audio minute for transcription) and then layers on per-minute dedicated-endpoint billing, per-hour GPU clusters, per-token fine-tuning training and per-vCPU-hour + per-GiB-RAM-hour sandbox on top. For teams whose workload is primarily media, the one-surface shape makes forecasting simpler — for teams deep on Together's platform features, the existing meters stay where they are. See the live pricing page for current values.

A pattern that works well for multi-provider teams: Together keeps the LLM stack (chat, vision, embeddings, moderation, dedicated-endpoint rerank) and the infrastructure layer (fine-tuning, dedicated endpoints, GPU cluster rental, BYOM, code sandbox) — everything that demands platform-grade controls. deAPI handles the media loop end-to-end: txt2img, img2video, txt2video, txt2speech, txt2music and transcribe, on one Bearer token, one response contract and a distributed GPU pool. Two deAPI-side wins on top of the split: source-agnostic video transcription (YouTube / Kick / Twitch / X URLs or direct upload) that Together does not surface natively, and a per-output billing shape across every modality. Same Bearer on both sides keeps the client config identical.

Free tier available
No credit card required

Pair Together's platform with deAPI's focused media path

Get $5 credits Docs

Get $5 credits Read the Docs

Migration assistance available talk to an engineer

The focused media API
next to Together's full inference platform.

Why teams add deAPI next to Together

A focused product surface

Distributed supply, not a centralized cloud

Per-output billing, no infra meter layered on top

Source-agnostic video transcription

When to use deAPI for media

When Together AI is the right tool

deAPI vs Together AI at a glance

Same Bearer, different request contract

Frequently asked questions

How does deAPI fit next to Together AI?

Why is deAPI media-only, and how does that help?

Do deAPI and Together overlap on image, video and audio models?

What does a migration from Together to deAPI look like?

Can I still use Together for fine-tuning and dedicated capacity while running media on deAPI?

How does deAPI's billing compare to Together's multi-product pricing?

What is the clean split of work between Together and deAPI?

Pair Together's platform with deAPI's focused media path

The focused media API next to Together's full inference platform.

A focused product surface

Distributed supply, not a centralized cloud

Per-output billing, no infra meter layered on top

Source-agnostic video transcription

When to use deAPI for media

When Together AI is the right tool

Same Bearer, different request contract

Frequently asked questions

How does deAPI fit next to Together AI?

Why is deAPI media-only, and how does that help?

Do deAPI and Together overlap on image, video and audio models?

What does a migration from Together to deAPI look like?

Can I still use Together for fine-tuning and dedicated capacity while running media on deAPI?

How does deAPI's billing compare to Together's multi-product pricing?

What is the clean split of work between Together and deAPI?

Pair Together's platform with deAPI's focused media path

The focused media API
next to Together's full inference platform.