‘MegaPortrait’ Deepfakes Are High-Res Animations of Still Images

Researchers at Samsung’s AI Center in Russia have created high-resolution (megapixel) deepfake animations of still face images using head movements of unrelated individuals, and they even work on faces in paintings.

In a paper recently accepted for this year’s Association for Computing Machinery (ACM)’s annual conference on multimedia in Spain, a team of researchers from Samsung’s AI center in Moscow, Russia outlines how it’s developed a new approach for synthesizing high-resolution neural avatars; i.e. computer representations that explicitly model the surface geometry and appearance of an animatable human face. The researchers’ novel approach results in megapixel resolution (1024 × 1024) videos of faces from still images “animated” by video of unrelated moving heads.

“In this work, we advance the neural head avatar technology to the megapixel resolution while focusing onwhen the appearance of the driving image is substantially different from the animated source image,” the authors write in their paper. In other words, the researchers show how they can take a still image of a face, and, using video of an unrelated head moving around, apply the movements of the latter to the former; specifically at the megapixel resolution.

To animate the still images using video of unrelated moving heads the researchers, including “young research scientist” Nikita Drobyshev, “developed new neural architectures and training methods that can leverage both medium-resolution video data and high-resolution image[s].” That is, the researchers created a new type of neural network—or network of algorithms inspired by the structure of the human brain that takes in “big data” and “trains” itself to recognize relevant patterns in the data—that’s able to take medium-resolution video of a moving head (the “driver”) and apply those movements to a still, high-resolution image (the “source”). (Neural networks are further explained in the unrelated video below.)

Video: Drobyshev, et al. / MegaPortraits: One-shot Megapixel Neural Head Avatars

Drobyshev et al. note their system produces these “high-resolution avatars” (the high-res moving heads) in two stages. First, it captures the appearance of a source image and reenacts it using the driving motions from the driver. (This step is done by the “base model,” which is not high resolution.) The system then visually enhances the animated base model frame by frame so it’s duplicated as a “high-resolution model.” Both the base model and the high-resolution model, the researchers note, are trained using large datasets of both moving heads (for the driver video) and high-resolution images (for the source image).

The results, as the embedded videos show, are still images of faces animated into moving heads capable of looking around, smiling, and demonstrating various facial expressions. In the video immediately above, for example, we see still pictures of both Brad Pitt and the Mona Lisa transformed into moving heads. Note the woman’s face on the left is the “driver” in this instance, as it provides the motions that are applied to the still images.

In the video immediately below, the researchers show how they were able to animate several other faces from famous paintings, including a Frida Kahlo self-portrait and a portrait of Russian novelist Nikolai Gogol, amongst others.

“Our method compared to the competitors[‘] achieved a significantly better cross-reenactment quality; this results in better preservation in the appearance of the source image; as well as the expression of the driver,” the researchers write in their paper. The team also notes their method “can generalize… paintings while still providing a reasonable preservation of the appearance at high resolution” with “high degrees of expression preservation” especially in the “eyes region.”

Video: Drobyshev, et al. / MegaPortraits: One-shot Megapixel Neural Head Avatars

Looking forward, the researchers say their system still “underperforms in the region of shoulders and clothing,” which will be the focus of future work. Drobyshev et al. also note that because only static images are available in high resolution right now, a certain amount of temporal flicker is present in the neural avatars, which also needs to be worked out.

Feature image: Никита Дробышев

(Visited 1 times, 1 visits today)

Accessibility Toolbar