Apply for a GPU community grant: Academic project

#1
by coldhyuk - opened

Talking face generation is impressive โ€” but making the face express the desired emotion is still an open problem. Label-based methods are too coarse, audio-based methods tangle emotion with speech content, and image-based methods need hard-to-get reference photos.

C-MET solves this by learning cross-modal emotion semantic vectors that bridge speech and visual feature spaces โ€” so the model can transfer emotions from speech to face without needing any reference image, even for extended emotions like sarcasm that does not appear in training data.

On MEAD and CREMA-D benchmarks, C-MET improves emotion accuracy by 14% over state-of-the-art methods.

Hi @coldhyuk , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.
If you can, we ask that you upgrade to Pro ($9/month) to enjoy higher ZeroGPU quota and other features like Dev Mode, Private Storage, and more: hf.co/pro

Sign up or log in to comment