FaceShot: Bring Any Character into Life

1Tongji University, 2Shanghai AI lab, 3Nanjing University of Science and Technology
ICLR 2025

*Corresponding Authors, Project Leader


Animation results of FaceShot applied to characters from various domains—including 3D anime, emojis, 2D anime, toys, animals, and more. Each character smoothly follows the facial movements of the driving video while preserving its original identity, resulting in remarkable animation effects.

Abstract

In this paper, we present FaceShot, a novel training-free portrait animation framework designed to bring any character into life from any driven video without fine-tuning or retraining. We achieve this by offering precise and robust reposed landmark sequences from an appearance-guided landmark matching module and a coordinate-based landmark retargeting module. Together, these components harness the robust semantic correspondences of latent diffusion models to produce facial motion sequence across a wide range of character types. After that, we input the landmark sequences into a pre-trained landmark-driven animation model to generate animated video. With this powerful generalization capability, FaceShot can significantly extend the application of portrait animation by breaking the limitation of realistic portrait landmark detection for any stylized character and driven video. Also, FaceShot is compatible with any landmark-driven animation model, significantly improving overall performance. Extensive experiments on our newly constructed character benchmark CharacBench confirm that FaceShot consistently surpasses state-of-the-art (SOTA) approaches across any character domain.

Method


MY ALT TEXT


FaceShot first generates precise facial landmarks of the target character with appearance guidance. Next, a coordinated-based landmark retargeting module is applied to generate the landmark sequence. Finally, this landmark sequence is input into an animation model to animate any character from any driving video.

Single Character + Multiple Driven Videos

Multiple Characters + Single Driven Video

Videos Longer than 5 Seconds

Comparison

Black video means that the method failed on this character.