arxiv:2602.04939

SynthForensics: A Multi-Generator Benchmark for Detecting Synthetic Video Deepfakes

Published on Feb 4

Authors:

Abstract

Text-to-video models have reached a level of realism that renders existing forensic benchmarks ineffective, prompting the creation of SynthForensics, a new human-centric benchmark for detecting synthetic videos generated by open-source models.

AI-generated summary

The landscape of synthetic media has been irrevocably altered by text-to-video (T2V) models, whose outputs are rapidly approaching indistinguishability from reality. Critically, this technology is no longer confined to large-scale labs; the proliferation of efficient, open-source generators is democratizing the ability to create high-fidelity synthetic content on consumer-grade hardware. This makes existing face-centric and manipulation-based benchmarks obsolete. To address this urgent threat, we introduce SynthForensics, to the best of our knowledge the first human-centric benchmark for detecting purely synthetic video deepfakes. The benchmark comprises 6,815 unique videos from five architecturally distinct, state-of-the-art open-source T2V models. Its construction was underpinned by a meticulous two-stage, human-in-the-loop validation to ensure high semantic and visual quality. Each video is provided in four versions (raw, lossless, light, and heavy compression) to enable real-world robustness testing. Experiments demonstrate that state-of-the-art detectors are both fragile and exhibit limited generalization when evaluated on this new domain: we observe a mean performance drop of 29.19% AUC, with some methods performing worse than random chance, and top models losing over 30 points under heavy compression. The paper further investigates the efficacy of training on SynthForensics as a means to mitigate these observed performance gaps, achieving robust generalization to unseen generators (93.81% AUC), though at the cost of reduced backward compatibility with traditional manipulation-based deepfakes. The complete dataset and all generation metadata, including the specific prompts and inference parameters for every video, will be made publicly available at [link anonymized for review].

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2602.04939

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.04939 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.04939 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.04939 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.