admin Jun 18, 2026 7 min read

AI Video Upscaling Guide: RealESRGAN & FlashVSR

Old video looks old for one reason: not enough pixels. A 480p clip from 2007 carries the same content as a 4K master – it just doesn’t have the resolution to show it. AI super-resolution synthesizes the missing detail, frame by frame, without re-shooting a single second.

deAPI offers three video upscaling models through a single API endpoint. Upload a clip, pick a model, download a higher-resolution version. This guide covers when to use each model, how to prepare your source, and what the output actually looks like on real footage.

We tested on clips ranging from a 320×240 internet meme to a 1947 U.S. War Department film. Here’s what we learned.

Three Models, Three Trade-offs

Picking the right model depends on your source material and how far you need to push the resolution.

	RealESRGAN Video x4	RealESRGAN Video x2	FlashVSR Tiny
Scale factor	4× (fixed)	2× (fixed)	2x-4x (configurable)
Max input	1024 × 1024	1024 × 1024	1024 x 1024
Max duration	15 seconds	30 seconds	10 seconds
Architecture	GAN (frame-by-frame)	GAN (frame-by-frame)	Diffusion-based
Best for	Low-res, big jumps	Mid-res, subtle cleanup	Anime, line art, AI-gen video
Model slug	RealESRGAN_Vid_x4	RealESRGAN_Vid_x2	FlashVSR_Tiny

RealESRGAN Video x4 produces the most dramatic before/after difference. Feed it 480p, get back near-4K. The trade-off: it invents more detail per pixel, which can look artificial on skin textures and fine fabric.

RealESRGAN Video x2 doubles resolution while keeping hallucination to a minimum. Faces retain natural pore detail instead of flattening into a waxy sheen. The 30-second duration limit also makes it practical for longer clips.

FlashVSR Tiny takes a fundamentally different approach – diffusion instead of GAN. Hard edges, flat colors, and stylized content survive the upscale better here than through either RealESRGAN variant.

How Video Upscaling Works (The Short Version)

There’s no prompt. The model processes each frame using a trained understanding of how real-world textures look at higher resolutions, synthesizing plausible detail where the original has none.

Your only control is the source clip itself. Clean input produces clean output. Noisy, compressed, motion-blurred input produces upscaled versions of that same noise, those same compression artifacts, that same blur – just bigger. The model enhances everything it finds in the frame, including the parts you wish it wouldn’t.

Pre-processing matters more here than in any other deAPI workflow.

Preparing Your Source Material

Upscaling works like a magnifying glass. Whatever’s in the frame gets larger – detail and defects alike.

Pre-processing checklist

Source problem	Fix before upscaling	Why
Noise / grain	Denoise (temporal NR)	The model can’t distinguish grain from texture. 4× grain becomes 4× larger grain.
Interlacing (1080i, DV)	Deinterlace (QTGMC, yadif)	Combing artifacts get baked into every upscaled frame as permanent horizontal lines.
Heavy compression	Deblock filter	Block boundaries upscale into visible grid patterns.
Camera shake	Stabilize on the original	Stabilizing at 4× resolution costs more compute and produces worse results.
Color grading	Grade after upscale	More pixels mean more headroom for precise grading adjustments.

Container and codec

MP4 or MOV with H.264/H.265. Always start from the highest bitrate master you can find – a 240p YouTube re-upload has its compression artifacts permanently embedded in every frame.

Duration limits

RealESRGAN x4 caps at 15 seconds per clip. RealESRGAN x2 allows 30. For longer footage, split along shot boundaries and process each segment independently. Temporal consistency holds up better within a continuous take than across cuts.

What We Tested

We ran three clips through RealESRGAN Video x4 to see how it handles genuinely different source material.

Keyboard Cat (320×240 → 1280×960)

The internet’s most famous feline pianist, uploaded in 2007 at a resolution where individual pixels were practically visible. After x4 upscaling, the cat’s whiskers separate into individual strands. Keyboard labels become readable. YouTube compression artifacts that dominated the original give way to plausible texture.

Big Buck Bunny (854×480 → 3416×1920)

Blender Foundation’s open-source animated short, deliberately downscaled to 480p for this test. Animation challenges upscaling models differently than live action: flat color regions, hard character outlines, stylized fur. RealESRGAN handled organic elements – grass, sky gradients, fur detail – convincingly. Character edges stayed crisp instead of softening.

“I Like Turtles” (1024×576 → 2048×1152)

The zombie kid who broke the internet in 2007. A local news reporter asks a boy in full zombie face paint about the Halloween event – and he stares into the camera and says “I like turtles.” Fifty million views later, the clip still circulates in 576p potato quality. We ran it through RealESRGAN Video x2. Face paint detail that was a green-brown smear in the original separates into visible brush strokes and latex edges. The reporter’s microphone flag, previously a solid-colored blob, shows readable call letters. Even the crowd behind them gains individual faces instead of a flesh-toned blur.

Choosing the Right Model

Four questions get you to the right answer.

Is your source below 540p? Use x4. The resolution gap is too large for x2 to bridge meaningfully.

Is your source 720p or above? Use x2. At mid-resolution, x4 tends to over-sharpen skin and flatten pore detail into a processed look.

Is your content animated, cartoon, or AI-generated? Start with FlashVSR Tiny. Diffusion preserves line art and flat color regions better than GAN-based upscaling.

Is your clip longer than 15 seconds? x2 gives you double the duration budget at 30 seconds. Beyond that, split at shot boundaries.

Chaining with Other deAPI Models

Video upscaling fits at the end of a generation pipeline.

LTX-2.3 → RealESRGAN x4: Generate video with LTX at lower resolution for faster iteration, then upscale the approved take. A 512×512 LTX output becomes 2048×2048 after the x4 pass.

Wan 2.2 Animate → RealESRGAN x2: Character animation followed by a gentle x2 sharpens the result without overprocessing generated textures.

FlashVSR Tiny → RealESRGAN x2: Two-pass workflow for maximum quality. Diffusion naturalizes textures first, then the GAN adds a final sharpening pass on top.

Common Mistakes

Skipping denoise on a noisy source. Temporal noise amplifies into visible flicker across every frame. A denoising pass on the source takes seconds and prevents unusable output.

Expecting deblur. RealESRGAN upscales pixels – it doesn’t reconstruct motion blur. A blurry 480p frame becomes a blurry 1920p frame, four times larger. Fix blur in pre-processing.

Running x2 twice to simulate x4. Two x2 passes stack hallucination artifacts without matching x4 quality. If you need 4×, use the x4 model once.

Forgetting audio. Video upscaling processes the visual track only. Demux your audio before processing, then re-mux after.

Exceeding duration limits. x4 rejects clips over 15 seconds. x2 rejects over 30. Split longer footage at shot boundaries and rejoin in your editor.

Code Example (Python, API v2)

Video upscale follows the same async pattern as every other deAPI endpoint: submit a job, get a request_id, poll until done, download the result.

import requests
import time

API_KEY = "your_api_key_here"
BASE = "<https://api.deapi.ai>"
HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Accept": "application/json"
}

# Submit a video upscale job
with open("keyboard_cat_320p.mp4", "rb") as video_file:
    response = requests.post(
        f"{BASE}/api/v2/videos/upscales",
        headers=HEADERS,
        files={
            "video": ("keyboard_cat.mp4", video_file, "video/mp4")
        },
        data={
            "model": "RealESRGAN_Vid_x4"
        }
    )

request_id = response.json()["data"]["request_id"]
print(f"Job submitted: {request_id}")

# Poll for result
while True:
    status = requests.get(
        f"{BASE}/api/v2/jobs/{request_id}",
        headers=HEADERS
    ).json()

    if status["data"]["status"] == "done":
        print(f"Video ready: {status['data']['result_url']}")
        break
    elif status["data"]["status"] == "error":
        print(f"Error: {status['data']}")
        break

    print(f"Status: {status['data']['status']}")
    time.sleep(3)

Switching to x2 is one line:

data={"model": "RealESRGAN_Vid_x2"}

The endpoint accepts MP4, MPEG, QuickTime, AVI, WMV, and OGG files up to 50 MB.

When Video Upscaling Makes Sense

Archive footage headed for a modern production. Phone clips from 2015 that need to match current platform specs. AI-generated video from LTX or Wan that needs a resolution bump before final delivery.

One API call per clip. Output in minutes.

Video upscaling is available now. Sign up at deapi.ai for $5 in free credits – enough to upscale dozens of clips. Full API docs: docs.deapi.ai/api/v2/videos/upscales

Back to blog

ACE-Step 1.5 Prompting Guide: How to Write Tags, Structure Lyrics, and Generate Better Music

Wan 2.2 Animate: AI Character Replacement in Video via API