FaceShot: Bring any Character into Life

Anonymous Submission


Animation results of FaceShot applied to characters from various domains—including 3D anime, emojis, 2D anime, toys, animals, and more. Each character smoothly follows the facial movements of the driving video while preserving its original identity, resulting in remarkable animation effects.

*Please wait for the video to load, and refresh the webpage if needed.

Abstract

Portrait animation generates dynamic, realistic videos by mimicking facial expressions from a driven video. However, existing landmark-based methods are constrained by facial landmark detection and motion transfer limitations, resulting in suboptimal performance. In this paper, we present FaceShot, a novel training-free framework designed to animate any character from any driven video, human or non-human, with unprecedented robustness and stability. We achieve this by offering precise and robust landmark results from an appearance-guided landmark matching module and a relative motion transfer module. Together, these components harness the robust semantic correspondences of latent diffusion models to deliver landmarks across a wide range of character types, all without requiring fine-tuning or retraining. With this powerful generalization capability, FaceShot can significantly extend the application of portrait animation by breaking the limitation of landmark detection for any character and driven video. Furthermore, FaceShot is compatible with any landmark-driven animation model, enhancing the realism and consistency of animations while significantly improving overall performance. Extensive experiments on our newly constructed character benchmark CABench confirm that FaceShot consistently surpasses state-of-the-art approaches across any character domain, setting a new standard for open-domain portrait animation.

Method


MY ALT TEXT


FaceShot first generates precise facial landmarks of the target character with appearance guidance. Next, a relative landmark motion transfer module is applied to generate the landmark sequence. Finally, this landmark sequence is input into an animation model to animate any character from any driving video.

Single Character + Multiple Driven Videos

Multiple Characters + Single Driven Video

Interpolate start reference image.

Videos Longer than 5 Seconds

Interpolate start reference image.

Comparison

Interpolate start reference image. Black video means that the method failed on this character.
From left to right: Driven video, MegActor, LivePortrait, Follow Your Emoji, FADM, FaceVid2Vid, X-Portrait, Mofa-Video, AniPortrait, and Ours (FaceShot).

*Please wait for the video to load, and refresh the webpage if needed.