Nanboy commited on
Commit
48e7a81
·
verified ·
1 Parent(s): 90a35d5

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ samples/1089/mgm_omni_clean.wav filter=lfs diff=lfs merge=lfs -text
37
+ samples/1089/moss_ttsd_clean.wav filter=lfs diff=lfs merge=lfs -text
38
+ samples/1089/moss_ttsd_safespeech.wav filter=lfs diff=lfs merge=lfs -text
39
+ samples/1089/protected_grnoise.wav filter=lfs diff=lfs merge=lfs -text
40
+ samples/1089/protected_safespeech.wav filter=lfs diff=lfs merge=lfs -text
41
+ samples/1089/reference.wav filter=lfs diff=lfs merge=lfs -text
42
+ samples/1089/styletts2_clean.wav filter=lfs diff=lfs merge=lfs -text
43
+ samples/1089/styletts2_safespeech.wav filter=lfs diff=lfs merge=lfs -text
44
+ samples/1089/target.wav filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,13 +1,31 @@
1
  ---
2
- title: RVCBench
3
- emoji: 🌖
4
- colorFrom: indigo
5
- colorTo: red
6
  sdk: gradio
7
- sdk_version: 6.14.0
8
- python_version: '3.13'
9
  app_file: app.py
10
- pinned: false
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: RVCBench — Voice Cloning & Protection Demo
3
+ emoji: 🎙️
4
+ colorFrom: blue
5
+ colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: "4.44.0"
 
8
  app_file: app.py
9
+ pinned: true
10
+ license: cc0-1.0
11
+ short_description: Voice cloning attacks vs. audio protection methods
12
+ tags:
13
+ - audio
14
+ - voice-cloning
15
+ - text-to-speech
16
+ - speaker-privacy
17
+ - audio-deepfake
18
+ - adversarial-audio
19
+ - benchmark
20
  ---
21
 
22
+ # RVCBench Demo
23
+
24
+ Interactive demo for the [RVCBench](https://github.com/Nanboy-Ronan/RVCBench) benchmark.
25
+
26
+ Explore how modern voice cloning models can replicate a speaker's voice — and how audio
27
+ protection methods disrupt that cloning.
28
+
29
+ **Paper:** [arXiv:2602.00443](https://arxiv.org/abs/2602.00443)
30
+ **Dataset:** [Nanboy/RVCBench](https://huggingface.co/datasets/Nanboy/RVCBench)
31
+ **Code:** [Nanboy-Ronan/RVCBench](https://github.com/Nanboy-Ronan/RVCBench)
__pycache__/app.cpython-311.pyc ADDED
Binary file (22.2 kB). View file
 
app.py ADDED
@@ -0,0 +1,399 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """RVCBench — Interactive HuggingFace Space demo.
2
+
3
+ Tabs
4
+ ────
5
+ 1. Voice Cloning Gallery – hear pre-computed clean vs. protected clones
6
+ 2. Protect Your Voice – upload audio, apply a protection method live, compare
7
+ 3. Leaderboard – sortable benchmark results table
8
+ """
9
+
10
+ from __future__ import annotations
11
+
12
+ import io
13
+ import os
14
+ import time
15
+
16
+ import gradio as gr
17
+ import numpy as np
18
+ import soundfile as sf
19
+
20
+ # ── paths ────────────────────────────────────────────────────────────────────
21
+
22
+ SAMPLES = os.path.join(os.path.dirname(__file__), "samples", "1089")
23
+
24
+ REF_WAV = os.path.join(SAMPLES, "reference.wav")
25
+ TARGET_WAV = os.path.join(SAMPLES, "target.wav")
26
+ REF_TEXT = "But her long fair hair was girlish: and girlish, and touched with the wonder of mortal beauty, her face."
27
+ TARGET_TEXT = "A great fisher of souls!"
28
+
29
+ MODELS = {
30
+ "ZipVoice (SIM 0.579)": ("zipvoice_clean.wav", "zipvoice_safespeech.wav"),
31
+ "MOSS-TTSD (SIM 0.492)": ("moss_ttsd_clean.wav", "moss_ttsd_safespeech.wav"),
32
+ "MGM-Omni (SIM 0.539)": ("mgm_omni_clean.wav", "mgm_omni_safespeech.wav"),
33
+ "OZSpeech (SIM 0.388)": ("ozspeech_clean.wav", "ozspeech_safespeech.wav"),
34
+ "StyleTTS 2 (SIM 0.228)": ("styletts2_clean.wav", "styletts2_safespeech.wav"),
35
+ }
36
+
37
+ PROTECTION_SAMPLES = {
38
+ "SafeSpeech": "protected_safespeech.wav",
39
+ "GR-Noise": "protected_grnoise.wav",
40
+ }
41
+
42
+ # ── leaderboard data ──────────────────────────────────────────────────────────
43
+
44
+ LEADERBOARD = [
45
+ ["1", "Qwen3-TTS", "0.614", "0.052", "4.39", "5.79", "2.02", "0.974", "0.731"],
46
+ ["2", "IndexTTS", "0.606", "0.052", "4.06", "6.61", "2.23", "0.972", "0.693"],
47
+ ["3", "CosyVoice 2", "0.602", "0.175", "4.39", "6.17", "4.58", "0.974", "0.729"],
48
+ ["4", "ZipVoice", "0.579", "0.053", "4.13", "7.09", "1.46", "0.952", "0.675"],
49
+ ["5", "MaskGCT", "0.570", "0.088", "3.93", "6.91", "1.36", "0.939", "0.682"],
50
+ ["6", "GLM-TTS", "0.570", "0.087", "4.08", "6.41", "1.74", "0.951", "0.678"],
51
+ ["7", "F5-TTS", "0.559", "0.116", "3.99", "6.96", "0.61", "0.937", "0.676"],
52
+ ["8", "Higgs Audio", "0.559", "0.250", "4.30", "6.06", "1.42", "0.941", "0.717"],
53
+ ["9", "MGM-Omni", "0.539", "0.095", "4.28", "5.82", "0.84", "0.933", "0.676"],
54
+ ["10","PlayDiffusion","0.506", "0.055", "4.15", "8.06", "0.73", "0.936", "0.681"],
55
+ ["11","MOSS-TTSD", "0.492", "0.383", "4.10", "7.09", "—", "0.876", "0.667"],
56
+ ["12","VibeVoice", "0.480", "0.228", "3.83", "6.76", "1.86", "0.852", "0.624"],
57
+ ["13","FishSpeech", "0.472", "0.166", "4.37", "6.47", "3.61", "0.907", "0.682"],
58
+ ["14","XTTS-v2", "0.454", "0.073", "3.81", "8.62", "0.62", "0.908", "0.639"],
59
+ ["15","SparkTTS", "0.408", "0.326", "4.06", "5.83", "1.56", "0.764", "0.672"],
60
+ ["16","OZSpeech", "0.388", "0.060", "3.21", "6.87", "8.75", "0.840", "0.636"],
61
+ ["17","OpenVoice V2", "0.244", "0.075", "4.30", "7.06", "0.08", "0.474", "0.601"],
62
+ ["18","StyleTTS 2", "0.228", "0.049", "4.30", "6.81", "0.11", "0.388", "0.589"],
63
+ ]
64
+
65
+ HEADERS = ["#", "Model", "SIM ↑", "WER ↓", "MOS ↑", "MCD ↓", "RTF ↓", "SVA ↑", "Emo ↑"]
66
+
67
+ # ── protection helpers ────────────────────────────────────────────────────────
68
+
69
+ def _load(path: str) -> tuple[np.ndarray, int]:
70
+ audio, sr = sf.read(path, dtype="float32")
71
+ if audio.ndim > 1:
72
+ audio = audio.mean(axis=1)
73
+ return audio, sr
74
+
75
+
76
+ def _to_bytes(audio: np.ndarray, sr: int) -> bytes:
77
+ buf = io.BytesIO()
78
+ sf.write(buf, audio, sr, format="WAV", subtype="PCM_16")
79
+ buf.seek(0)
80
+ return buf.read()
81
+
82
+
83
+ def _snr(original: np.ndarray, protected: np.ndarray) -> float:
84
+ noise = protected - original
85
+ signal_power = np.mean(original ** 2)
86
+ noise_power = np.mean(noise ** 2)
87
+ if noise_power < 1e-12:
88
+ return float("inf")
89
+ return float(10 * np.log10(signal_power / noise_power))
90
+
91
+
92
+ def apply_grnoise(audio: np.ndarray, sr: int, snr_db: float = 25.0) -> np.ndarray:
93
+ signal_power = np.mean(audio ** 2)
94
+ noise_power = signal_power / (10 ** (snr_db / 10))
95
+ noise = np.random.randn(*audio.shape).astype(np.float32) * np.sqrt(noise_power)
96
+ return np.clip(audio + noise, -1.0, 1.0)
97
+
98
+
99
+ def apply_spectral(audio: np.ndarray, sr: int, strength: float = 0.05) -> np.ndarray:
100
+ """Frequency-domain perturbation: add structured noise in the STFT domain."""
101
+ from numpy.fft import rfft, irfft
102
+ n_fft = 1024
103
+ hop = n_fft // 4
104
+ frames = []
105
+ for start in range(0, len(audio) - n_fft, hop):
106
+ frame = audio[start:start + n_fft] * np.hanning(n_fft).astype(np.float32)
107
+ spec = rfft(frame)
108
+ mag = np.abs(spec)
109
+ perturb = np.random.randn(*mag.shape).astype(np.float32) * strength * mag
110
+ spec_p = spec + perturb * np.exp(1j * np.random.uniform(0, 2 * np.pi, mag.shape))
111
+ frames.append((start, irfft(spec_p)))
112
+ out = np.zeros_like(audio)
113
+ cnt = np.zeros_like(audio)
114
+ for start, f in frames:
115
+ end = start + n_fft
116
+ out[start:end] += f[:n_fft].astype(np.float32)
117
+ cnt[start:end] += 1
118
+ cnt = np.maximum(cnt, 1)
119
+ return np.clip(out / cnt, -1.0, 1.0)
120
+
121
+
122
+ PROTECT_FN = {
123
+ "GR-Noise": apply_grnoise,
124
+ "Spectral": apply_spectral,
125
+ }
126
+
127
+ # ── tab 1: gallery ────────────────────────────────────────────────────────────
128
+
129
+ def load_gallery(model_label: str, protection: str):
130
+ clean_file, safe_file = MODELS[model_label]
131
+ prot_audio_file = PROTECTION_SAMPLES.get(protection)
132
+
133
+ ref_audio = REF_WAV
134
+ target_audio = TARGET_WAV
135
+ clean_clone = os.path.join(SAMPLES, clean_file)
136
+ prot_ref = os.path.join(SAMPLES, prot_audio_file) if prot_audio_file else None
137
+ prot_clone = os.path.join(SAMPLES, safe_file)
138
+
139
+ # Compute SIM drop note
140
+ clean_sim = float(model_label.split("SIM ")[-1].rstrip(")"))
141
+ sim_lookup = {
142
+ "ZipVoice (SIM 0.579)": {"SafeSpeech": 0.287, "GR-Noise": 0.258},
143
+ "MOSS-TTSD (SIM 0.492)": {"SafeSpeech": 0.242, "GR-Noise": 0.247},
144
+ "MGM-Omni (SIM 0.539)": {"SafeSpeech": 0.184, "GR-Noise": 0.229},
145
+ "OZSpeech (SIM 0.388)": {"SafeSpeech": 0.156, "GR-Noise": 0.148},
146
+ "StyleTTS 2 (SIM 0.228)": {"SafeSpeech": 0.089, "GR-Noise": 0.030},
147
+ }
148
+ prot_sim = sim_lookup.get(model_label, {}).get(protection, None)
149
+ drop = clean_sim - prot_sim if prot_sim else None
150
+
151
+ note_md = (
152
+ f"**Clean SIM:** {clean_sim:.3f} &nbsp;→&nbsp; "
153
+ f"**Protected SIM ({protection}):** {prot_sim:.3f} &nbsp;"
154
+ f"*(drop: {drop:.3f})*"
155
+ if drop is not None else ""
156
+ )
157
+
158
+ return (
159
+ ref_audio,
160
+ target_audio,
161
+ clean_clone,
162
+ prot_ref or gr.update(visible=False),
163
+ prot_clone,
164
+ note_md,
165
+ )
166
+
167
+ # ── tab 2: live protection ────────────────────────────────────────────────────
168
+
169
+ def run_protection(audio_input, method: str, strength: float):
170
+ if audio_input is None:
171
+ return None, None, "Upload an audio file first."
172
+
173
+ sr_in, data = audio_input
174
+ audio = data.astype(np.float32)
175
+ if audio.max() > 1.0:
176
+ audio = audio / 32768.0
177
+ if audio.ndim > 1:
178
+ audio = audio.mean(axis=1)
179
+
180
+ t0 = time.time()
181
+ fn = PROTECT_FN[method]
182
+ if method == "GR-Noise":
183
+ protected = fn(audio, sr_in, snr_db=strength)
184
+ else:
185
+ protected = fn(audio, sr_in, strength=strength / 100.0)
186
+ elapsed = time.time() - t0
187
+
188
+ snr = _snr(audio, protected)
189
+ protected_int = (protected * 32767).astype(np.int16)
190
+
191
+ metrics_md = (
192
+ f"| Metric | Value |\n|--------|-------|\n"
193
+ f"| SNR (dB) | {snr:.1f} |\n"
194
+ f"| Processing time | {elapsed*1000:.0f} ms |\n"
195
+ f"| Method | {method} |\n"
196
+ )
197
+
198
+ return (sr_in, audio.copy()), (sr_in, protected_int), metrics_md
199
+
200
+
201
+ # ── build UI ──────────────────────────────────────────────────────────────────
202
+
203
+ CSS = """
204
+ #title { text-align: center; }
205
+ .metric-box { font-size: 1.1em; }
206
+ .tab-header { font-weight: bold; }
207
+ footer { display: none !important; }
208
+ """
209
+
210
+ INTRO_MD = """
211
+ <div id="title">
212
+
213
+ # RVCBench — Voice Cloning & Protection Demo
214
+
215
+ **Can audio protection prevent your voice from being cloned?**
216
+ This demo lets you hear the answer.
217
+
218
+ [![Paper](https://img.shields.io/badge/arXiv-2602.00443-b31b1b.svg)](https://arxiv.org/abs/2602.00443)
219
+ [![Dataset](https://img.shields.io/badge/HuggingFace-Dataset-ffcc00.svg)](https://huggingface.co/datasets/Nanboy/RVCBench)
220
+ [![GitHub](https://img.shields.io/badge/GitHub-RVCBench-181717.svg)](https://github.com/Nanboy-Ronan/RVCBench)
221
+
222
+ </div>
223
+ """
224
+
225
+ GALLERY_MD = """
226
+ **How it works:** A voice cloning model uses the *Reference Voice* to clone the *Target Speech*
227
+ (what it wants the speaker to say). When protection is applied to the reference first,
228
+ the clone degrades — the speaker sounds wrong or the speech becomes unintelligible.
229
+ """
230
+
231
+ PROTECTION_MD = """
232
+ Upload your own audio clip and apply a protection method in real-time.
233
+ The protected audio sounds nearly identical to humans but disrupts voice cloning models.
234
+
235
+ - **GR-Noise** — Gaussian random noise at a target SNR level. No surrogate model needed.
236
+ - **Spectral** — Structured perturbation in the frequency domain.
237
+ """
238
+
239
+
240
+ def build_demo():
241
+ with gr.Blocks(css=CSS, title="RVCBench Demo") as demo:
242
+ gr.Markdown(INTRO_MD)
243
+
244
+ with gr.Tabs():
245
+
246
+ # ── Tab 1: Gallery ──────────────────────────────────────────────
247
+ with gr.Tab("🎧 Voice Cloning Gallery"):
248
+ gr.Markdown(GALLERY_MD)
249
+
250
+ with gr.Row():
251
+ model_dd = gr.Dropdown(
252
+ choices=list(MODELS.keys()),
253
+ value=list(MODELS.keys())[0],
254
+ label="Voice Cloning Model",
255
+ scale=2,
256
+ )
257
+ prot_dd = gr.Dropdown(
258
+ choices=["SafeSpeech", "GR-Noise"],
259
+ value="SafeSpeech",
260
+ label="Protection Method",
261
+ scale=1,
262
+ )
263
+
264
+ sim_note = gr.Markdown("", elem_classes="metric-box")
265
+
266
+ with gr.Row():
267
+ with gr.Column():
268
+ gr.Markdown("### 1 · Reference Voice")
269
+ gr.Markdown(f"*\"{REF_TEXT}\"*")
270
+ ref_out = gr.Audio(label="Reference (original)", interactive=False)
271
+ with gr.Column():
272
+ gr.Markdown("### 2 · Target Speech")
273
+ gr.Markdown(f"*\"{TARGET_TEXT}\"*")
274
+ target_out = gr.Audio(label="Target utterance", interactive=False)
275
+
276
+ gr.Markdown("---")
277
+ gr.Markdown("### Cloning Results")
278
+
279
+ with gr.Row():
280
+ with gr.Column():
281
+ gr.Markdown("#### Without Protection")
282
+ clean_out = gr.Audio(label="Clean clone (threat)", interactive=False)
283
+ with gr.Column():
284
+ gr.Markdown("#### With Protection")
285
+ prot_ref_out = gr.Audio(label="Protected reference", interactive=False)
286
+ prot_clone_out = gr.Audio(label="Clone from protected (degraded)", interactive=False)
287
+
288
+ load_btn = gr.Button("Load Example", variant="primary")
289
+
290
+ load_btn.click(
291
+ fn=load_gallery,
292
+ inputs=[model_dd, prot_dd],
293
+ outputs=[ref_out, target_out, clean_out, prot_ref_out, prot_clone_out, sim_note],
294
+ )
295
+ demo.load(
296
+ fn=load_gallery,
297
+ inputs=[model_dd, prot_dd],
298
+ outputs=[ref_out, target_out, clean_out, prot_ref_out, prot_clone_out, sim_note],
299
+ )
300
+
301
+ # ── Tab 2: Live Protection ──────────────────────────────────────
302
+ with gr.Tab("🔒 Protect Your Voice"):
303
+ gr.Markdown(PROTECTION_MD)
304
+
305
+ with gr.Row():
306
+ audio_in = gr.Audio(
307
+ label="Upload your audio (wav / mp3, ≤ 30 s)",
308
+ type="numpy",
309
+ scale=3,
310
+ )
311
+ with gr.Column(scale=1):
312
+ method_dd = gr.Dropdown(
313
+ choices=list(PROTECT_FN.keys()),
314
+ value="GR-Noise",
315
+ label="Protection Method",
316
+ )
317
+ strength_sl = gr.Slider(
318
+ minimum=10, maximum=40, value=25, step=1,
319
+ label="Strength (SNR dB for GR-Noise; intensity × 100 for Spectral)",
320
+ info="Lower = stronger protection, more audible artifacts.",
321
+ )
322
+ protect_btn = gr.Button("Apply Protection", variant="primary")
323
+
324
+ with gr.Row():
325
+ orig_out = gr.Audio(label="Original", interactive=False)
326
+ prot_live = gr.Audio(label="Protected", interactive=False)
327
+
328
+ metrics_out = gr.Markdown("", elem_classes="metric-box")
329
+
330
+ protect_btn.click(
331
+ fn=run_protection,
332
+ inputs=[audio_in, method_dd, strength_sl],
333
+ outputs=[orig_out, prot_live, metrics_out],
334
+ )
335
+
336
+ gr.Markdown(
337
+ "> **Note:** Live voice cloning inference is not included in this Space due to "
338
+ "model size constraints. See the [GitHub repo](https://github.com/Nanboy-Ronan/RVCBench) "
339
+ "for the full pipeline with 18+ VC models."
340
+ )
341
+
342
+ # ── Tab 3: Leaderboard ──────────────────────────────────────────
343
+ with gr.Tab("📊 Leaderboard"):
344
+ gr.Markdown(
345
+ "### Benchmark Results — LibriTTS (clean prompts)\n"
346
+ "Sorted by Speaker Similarity (SIM ↑). "
347
+ "Full results including protection robustness and cross-dataset generalisation: "
348
+ "[GitHub README](https://github.com/Nanboy-Ronan/RVCBench#benchmark-results).\n\n"
349
+ "> **Metric guide** · SIM: speaker similarity ↑ · WER: word error rate ↓ · "
350
+ "MOS: perceptual score ↑ · MCD: mel cepstral distortion ↓ · "
351
+ "RTF: real-time factor ↓ · SVA: speaker verification accuracy ↑ · Emo: emotion match ↑"
352
+ )
353
+ gr.DataFrame(
354
+ value=LEADERBOARD,
355
+ headers=HEADERS,
356
+ datatype=["number", "str"] + ["number"] * 7,
357
+ interactive=False,
358
+ wrap=False,
359
+ )
360
+
361
+ # ── Tab 4: About ────────────────────────────────────────────────
362
+ with gr.Tab("ℹ️ About"):
363
+ gr.Markdown("""
364
+ ## About RVCBench
365
+
366
+ **RVCBench** is an open-source benchmark for evaluating the robustness of voice cloning
367
+ against audio protection methods.
368
+
369
+ ### What it measures
370
+ - How well 18+ modern zero-shot TTS/VC models can clone a speaker's voice
371
+ - How effectively 5 audio protection methods (SafeSpeech, Enkidu, Spectral, GR-Noise, AntiFake)
372
+ prevent cloning across 10 datasets and 7 evaluation metrics
373
+
374
+ ### Resources
375
+
376
+ | Resource | Link |
377
+ |----------|------|
378
+ | Paper (arXiv) | [arXiv:2602.00443](https://arxiv.org/abs/2602.00443) |
379
+ | Code & full pipeline | [GitHub: Nanboy-Ronan/RVCBench](https://github.com/Nanboy-Ronan/RVCBench) |
380
+ | Dataset | [HuggingFace: Nanboy/RVCBench](https://huggingface.co/datasets/Nanboy/RVCBench) |
381
+ | Contact | ruinanjin@alumni.ubc.ca |
382
+
383
+ ### Citation
384
+
385
+ ```bibtex
386
+ @article{liao2026rvcbench,
387
+ title = {RVCBench: Benchmarking the Robustness of Voice Cloning Across Modern Audio Generation Models},
388
+ author = {Liao, Xinting and Jin, Ruinan and Yu, Hanlin and Pandya, Deval and Li, Xiaoxiao},
389
+ journal = {arXiv preprint arXiv:2602.00443},
390
+ year = {2026}
391
+ }
392
+ ```
393
+ """)
394
+
395
+ return demo
396
+
397
+
398
+ if __name__ == "__main__":
399
+ build_demo().launch()
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gradio>=4.44.0
2
+ numpy>=1.24
3
+ soundfile>=0.12
samples/1089/mgm_omni_clean.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c527cdc387b4071be18f5121d87e3aea45b72a8475f817996b5df1e7e3c59cc
3
+ size 115244
samples/1089/mgm_omni_safespeech.wav ADDED
Binary file (61.5 kB). View file
 
samples/1089/moss_ttsd_clean.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7470c164c0dd6947b6ddee1612cc4159b02c465c2f263b4bb5ff7b94c1356ed
3
+ size 317484
samples/1089/moss_ttsd_safespeech.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4a6a04a66d54774348f975604afbf3e91725efd38e29463e952575286452f61
3
+ size 102444
samples/1089/ozspeech_clean.wav ADDED
Binary file (48 kB). View file
 
samples/1089/ozspeech_safespeech.wav ADDED
Binary file (48 kB). View file
 
samples/1089/protected_grnoise.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70a00841a37a67a00175d00365283f7615bfff69b31a5cb9db20530ff5c4531e
3
+ size 373966
samples/1089/protected_safespeech.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7fd67101b9bf8cecfe9b70d194c7f92fb0f3d83ad4aaba7d71941e898e4d69aa
3
+ size 373966
samples/1089/reference.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ac099e671e248169ad2ad974e4d63f9eed3d784151b9ab997fa0d283b9586657
3
+ size 373966
samples/1089/styletts2_clean.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39b7e139196929bb2824e64a75091909ab9f6747ea21f1e42deafc101af83d00
3
+ size 129544
samples/1089/styletts2_safespeech.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca52bdbaeb337d62535020938c9c7256dcfaca610134b5e7c2feff7fff858dcf
3
+ size 157144
samples/1089/target.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68c1fc12528f8b45ce676c5f0277117e195b8a5a4c505c03991086ae852c44ea
3
+ size 103724
samples/1089/zipvoice_clean.wav ADDED
Binary file (83.1 kB). View file
 
samples/1089/zipvoice_safespeech.wav ADDED
Binary file (86.1 kB). View file