Artificial imagination: CMU lab automates video “fakes”

Carnegie Mellon PhD student Aayush Bansal is teaching computers how to imagine. Working with professors Deva Ramanan and Yaser Sheikh of the Robotics Institute, Bansal recently developed an algorithm which has made a splash in both academic and mainstream media. His method, titled Recycle-GAN, autonomously transfers the contents of one video to another, while making sure that the style of the target is retained. For example: imagine watching a video of Donald Trump speaking, in which his mannerisms, voice, and facial movements are those of Barack Obama. Or a time-lapse rose blooming in the style of an azalea. This is all possible with Recycle-GAN.

The approach combines the functions of two recent AI projects, OpenPose and Cycle-GAN, to track the movement of key points from one video and transfer that movement to the image in a similar video. The project’s website shows the final product on multiple different videos, with original footage next to the retargeted footage of people speaking, flowers blooming, sunrises, origami birds flying, and robotic arms picking up objects.

Prior work in this field has been aimed specifically at facial retargeting, but Bansal’s research team has broadened the applications of the technology. “We aren’t using any specific facial information,” he explained from his desk in Smith Hall, which is littered with philosophy books and Disney-Pixar posters. “We just wanted to see if we could automatically learn the video retargeting or not.”

Video retargeting is not new technology — Hollywood has been doing it for more than 20 years. Animators in Furious 7 created a digital version of Paul Walker, who died before they finished shooting the movie. But until now, tasks like that required a team of artists. With Recycle-GAN and programs like it, virtually anybody can generate retargeted videos — the code is freely available on GitHub.

For some, this is cause for worry. In April 2018, BuzzFeed released a video of Jordan Peele’s now famous “deepfake” video showing a convincingly-rendered Barack Obama saying a number of things the actual Obama probably never would. Peele’s video warned the public that deepfakes are a threat to reliable information, prompting a recently released CNN investigation titled “the Pentagon’s Race Against Deepfakes,” which summarized the history of video augmentation and the current research on both generating and detecting fabricated audiovisual content.

Bansal’s research was mentioned in the report, and though he resents the term “deepfake,” he made his opinions clear on the topic: “Whenever a user is looking at a video, she or he should know if it’s a real or generated content. We should beable to tell people if it’s a real video or fake.” His Recycle-GAN algorithm can help researchers working on deepfake detection methods by generating data for them: instead of paying a team of highly-skilled digital artists to make a retargeted video, researchers could just pass two clips into his algorithm and generate their fakes in a matter of minutes.

The key to this kind of technology is an old idea but a fairly new research topic: artificial imagination. “If you can’t collect data, create it,” joked Bansal. One of his earlier projects involved taking a black-and-white, unshaded outline of a stiletto heel and uploading its visual data to an AI program that generates several different colorized and shaded sketches of what that shoe could look like. This task might be easy for a human; every time we read a description or see a black-and-white photo, our imagination creates a more complete image for us. For a computer, however, this process is complicated: the program must accept an incomplete signal and generate multiple plausible outputs from it.

This technology has implications far beyond deepfake videos. Bansal imagines a future where programs like Recycle-GAN will be able to create data which can’t be captured. Self-driving cars, for instance, mostly have accidents when it is dark, rainy, or misty, because their sensors have trouble interpreting visual data in these settings. “We could use a lot of data from good weather conditions to try and simulate data for bad weather conditions,” he offered. Another potential use includes documentation: if several different people film the same event from different locations, retargeting algorithms could potentially stitch the separate 2-D footage together to create a 3-D virtual reality version of the event.

Bansal’s personal hopes for the technology’s future lie in entertainment. “I have a fascination with animation,” he said, adding that he wants this technology to make the animation process quicker and easier for companies like Pixar and Dreamworks, so that more stories can be created. “Storytellers shouldn’t have to worry about how their stories will be told,” he said, “I’ve always thought that people’s ideas are just sitting on the shelf because they don’t have sufficient resources to make them.”

Whether those ideas are safer autonomous cars, more accurate video evidence, false videos of someone saying things they never said, or the next Disney-Pixar blockbuster, one thing is certain: Recycle-GAN will make the road to creation much easier.