Comparison · Updated April 2026 See live pricing

The focused media API next to Together's full inference platform.

Together AI is a platform: ~85+ serverless models across chat LLMs, vision, image, video, audio and moderation, plus fine-tuning, dedicated GPU endpoints, on-demand and reserved GPU clusters, BYOM custom deployment and a code sandbox. deAPI is a narrower product: open-source image, video, speech, music and transcription on a distributed GPU pool, under one response contract. Both are Bearer, both host a lot of the same open-source checkpoints — the trade-off is platform breadth on one side, focused media economics and distributed supply on the other.

Why teams add deAPI next to Together

Four structural differences between a focused media API and a broad inference platform — each useful in a different shape of team and workload.

A focused product surface

Together ships inference plus fine-tuning plus dedicated endpoints plus GPU clusters plus BYOM plus a code sandbox plus batch — a platform for teams that want to run the full AI stack under one vendor. deAPI intentionally stops at media inference. Fewer concepts to learn, fewer billing meters to reconcile, and a single response contract that covers txt2img, txt2video, txt2speech, txt2music and transcribe.

Distributed supply, not a centralized cloud

Together's tiers run on its own centralized cloud — shared capacity for serverless, per-minute H100-class hardware for dedicated endpoints, and on-demand or reserved NVIDIA HGX hardware (H100 / B200) for cluster rental. deAPI sources GPU-seconds from a global pool of independent providers competing for inference work. Different supply layer, different cost floor for the same open-source image / video / audio checkpoints.

Per-output billing, no infra meter layered on top

Running media on Together can involve several billing meters at once: per-token for vision, per-image for generation, per-character for TTS, per-minute for transcription, plus per-minute dedicated endpoints, per-hour GPU clusters and per-vCPU-hour sandbox if you use them. deAPI collapses media to one per-output meter per modality, with no separate infrastructure layer and no per-seat tier gating the rate.

Source-agnostic video transcription

Both platforms run Whisper Large V3 for speech-to-text. deAPI's transcribe endpoint additionally accepts a URL on the social stack — YouTube, Kick, Twitch, X (Twitter) posts and Spaces — as well as direct file upload. One request, no out-of-band ffmpeg / yt-dlp step, same request_id polling as every other modality.

When to use deAPI for media

  • You don't need an LLM hosted by the same vendor — chat, vision, rerank and moderation happen elsewhere, or they're not part of your product at all.

  • You're running a high-volume media loop — image batches, long-form video generation, audiobook-scale TTS, transcription pipelines — and the economics of a distributed GPU pool beat rented H100-class capacity.

  • Video transcription is part of the pipeline and the source is a URL on YouTube / Kick / Twitch / X, not just a file you've already pulled down.

  • You want a single response contract across txt2img, img2video, txt2speech, txt2music and transcribe — one retry handler, one webhook consumer, one SDK surface.

  • You don't need fine-tuning, BYOM deployment, dedicated endpoints, GPU cluster rental or a code sandbox for this workload — you need predictable media output against mainstream open-source checkpoints.

When Together AI is the right tool

  • LLM inference is core to your product — DeepSeek V3.1, Llama 3.3, Qwen3.5, MiniMax, Kimi, GPT OSS — and you want one Bearer token for chat, vision, moderation, rerank and embeddings.

  • You fine-tune models (LoRA or full) as part of your workflow and want the training pipeline on the same account as the inference.

  • You need dedicated single-tenant endpoints on H100-class hardware with predictable capacity — or on-demand and reserved GPU cluster rental for training and experimentation.

  • You ship a custom model (BYOM) and want to run it on managed infrastructure, or you need the code sandbox / code interpreter for agent-side execution.

  • Your stack is already on OpenAI-compatible /chat/completions conventions and the drop-in compatibility shortens the integration path.

deAPI vs Together AI at a glance

Both are Bearer; both host a lot of the same open-source checkpoints. The real differences are scope, GPU supply and billing shape. Every claim verified against public product docs as of April 2026.

Dimension deAPI Together AI
Core positioning Focused open-source media inference Broad inference platform (LLM + media + infra)
Catalog size Curated media-model shortlist ~85+ serverless models across 8 categories
LLM hosting Not currently — media only DeepSeek V3.1, Llama 3.3, Qwen3.5, MiniMax, Kimi, GPT OSS…
Overlap on open-source media = FLUX family, Whisper Large V3, Kokoro and others — same underlying checkpoints = Same checkpoints, on centralized GPU supply
Fine-tuning Not currently supported LoRA + full fine-tuning, per-token training metering
Dedicated single-tenant endpoints Not offered H100-class hardware, per-minute billing
GPU cluster rental Not offered On-demand + reserved on NVIDIA HGX hardware (H100 / B200)
Custom model deployment (BYOM) Curated catalog only Supported
Source-agnostic video transcription URL or upload — YouTube / Kick / Twitch / X Spaces natively File upload or direct audio URL, no native social-source ingest — per-audio-minute metering
GPU supply Distributed global pool of independent operators Centralized cloud — H100-class / B200 hardware
Auth header = Authorization: Bearer Authorization: Bearer (parity — no header change on migration)
API schema Unified per-modality contract (/v1/client/txt2img, txt2video, txt2speech…) OpenAI-compatible (/chat/completions, /images/generations…)
Billing shape Per output, one invoice Per-token / per-image / per-character / per-video / per-audio-minute serverless + per-minute dedicated + per-hour clusters + per-token fine-tuning + per-vCPU-hour sandbox
Free credits on signup $5, no credit card Trial credits available

Both platforms iterate frequently — pricing numbers intentionally omitted. Always verify current capabilities on each vendor's live docs.

Same Bearer, different request contract

Together's serverless inference mostly follows OpenAI-compatible routes (/chat/completions, /images/generations). deAPI uses one unified per-modality contract across every endpoint. That's the shape of the diff.

  • 1. Keep the auth header — both sides use Authorization: Bearer <key>. Nothing to rewrite in HTTP client config.
  • 2. Flip the base URL from api.together.xyz/v1 to api.deapi.ai/api/v1/client and remap the endpoint. /images/generations/txt2img; Whisper transcribe → /transcribe.
  • 3. Map Together model slugs to deAPI slugs via GET /api/v1/client/models (for example black-forest-labs/FLUX.1-schnellFlux1schnell). Response shape is async by default — poll /request-status/{request_id} or pass a webhook_url on submit.
Before · Together images/generations
curl -s -X POST \
  https://api.together.xyz/v1/images/generations \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model":  "black-forest-labs/FLUX.1-schnell",
    "prompt": "Futuristic city at sunset",
    "width":  1536,
    "height": 896,
    "steps":  4
  }'
After · deAPI txt2img
curl -s -X POST \
  https://api.deapi.ai/api/v1/client/txt2img \
  -H "Authorization: Bearer $DEAPI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model":  "Flux1schnell",
    "prompt": "Futuristic city at sunset",
    "width":  1536,
    "height": 896,
    "steps":  4,
    "seed":   42
  }'

Frequently asked questions

As a focused media-inference layer that pairs cleanly with Together's platform surface. deAPI exposes image, video, speech, music and transcription on a distributed GPU pool under one response contract, with social-source video ingest (YouTube / Kick / Twitch / X) built in. Together covers the same open-source media checkpoints inside a wider platform that also includes LLM serverless, fine-tuning, dedicated endpoints, GPU cluster rental and BYOM. Because both sides use Authorization: Bearer, most teams end up running them side-by-side: Together where the platform features are load-bearing, deAPI where the distributed-GPU supply and per-output billing change the unit economics on high-volume media workloads.
Scope is a feature here. deAPI intentionally stops at media inference — image, video, speech, music, transcription — so the product can optimize for one thing: running open-source media models on a distributed GPU pool with one Bearer token, one response shape and no infrastructure meter layered on top. Teams that need LLMs, fine-tuning, moderation or rerank keep Together's /chat/completions and dedicated endpoints for that side; deAPI takes the media loop. One /chat/completions call on Together, one /v1/client/txt2img or /transcribe on deAPI — same auth strategy in your HTTP client, two narrowly focused vendors.
Yes — meaningfully, and that's what makes pairing them straightforward. Both host Flux-family image models (including FLUX.1 schnell and FLUX.2), Whisper Large V3 for speech-to-text, and Kokoro for text-to-speech, among others. Same underlying open-source checkpoints, two different supply layers — Together on centralized H100-class cloud, deAPI on a distributed pool of independent operators competing for inference work. Because the weights are shared, your prompts and outputs travel well between the two, and you can route the same workload to whichever side makes better economic sense at a given batch size.
Simpler than most migrations on this shape of stack. Both sides use Authorization: Bearer, so the auth header doesn't change. Two pieces do: (1) base URL flips from api.together.xyz/v1 to api.deapi.ai/api/v1/client, and (2) the schema switches from Together's OpenAI-compatible shape (/chat/completions, /images/generations) to deAPI's unified per-modality contract — POST /v1/client/txt2img, /v1/client/txt2video, /v1/client/txt2speech, /v1/client/transcribe, /v1/client/txt2music — with a single request_id polling / webhook flow that covers every modality the same way. Map Together model slugs (for example black-forest-labs/FLUX.1-schnell) to deAPI slugs returned by GET /api/v1/client/models. You can also keep both live in production and route per workload; the two don't conflict.
Yes — that's exactly the pairing many teams land on. deAPI's scope is pure shared-capacity inference on a distributed GPU pool, which is what keeps the media per-output billing flat. The infrastructure-grade features stay on Together: LoRA and full fine-tuning, dedicated single-tenant endpoints on H100-class GPUs (per-minute), on-demand and reserved NVIDIA HGX cluster rental (H100 / B200), BYOM deployment and the code sandbox. One Bearer token on each side, clean separation of concerns, no duplication of effort — you use each product for what it's purpose-built for.
deAPI collapses media billing onto one surface. The per-modality metric varies (pixels × steps for image, duration for video / music, characters for TTS, minutes for transcription) but there is one account and one invoice, with no separate meters for infrastructure products. Together meters serverless inference in several units (per million tokens for chat / vision / embeddings / moderation, per image for image generation, per million characters for TTS, per video for video generation, per audio minute for transcription) and then layers on per-minute dedicated-endpoint billing, per-hour GPU clusters, per-token fine-tuning training and per-vCPU-hour + per-GiB-RAM-hour sandbox on top. For teams whose workload is primarily media, the one-surface shape makes forecasting simpler — for teams deep on Together's platform features, the existing meters stay where they are. See the live pricing page for current values.
A pattern that works well for multi-provider teams: Together keeps the LLM stack (chat, vision, embeddings, moderation, dedicated-endpoint rerank) and the infrastructure layer (fine-tuning, dedicated endpoints, GPU cluster rental, BYOM, code sandbox) — everything that demands platform-grade controls. deAPI handles the media loop end-to-end: txt2img, img2video, txt2video, txt2speech, txt2music and transcribe, on one Bearer token, one response contract and a distributed GPU pool. Two deAPI-side wins on top of the split: source-agnostic video transcription (YouTube / Kick / Twitch / X URLs or direct upload) that Together doesn't surface natively, and a per-output billing shape across every modality. Same Bearer on both sides keeps the client config identical.

Pair Together's platform with deAPI's focused media path

$5 of free credits. No credit card. Keep LLMs, fine-tuning and dedicated endpoints on Together — move the media loop onto a distributed GPU pool with one response contract across every modality.

Migration assistance available — talk to an engineer.