Deepfakes are everywhere these days, from Tom Cruise TikTok impersonations to fake video of former president Donald Trump resisting arrest. Because the AI engine behind deepfakes is relatively complex and technical, there is a fair amount of misinformation about how these videos are created.
It’s time to set the record straight by addressing a common misconception that Deepfakes use Generative Adversarial Networks (GANs). The scoop on this is: They don’t. No matter what Wikipedia says.
So why do people think that GANs are involved? Let’s break it down.
A GAN is a type of generative AI. (Hey Wikipedia, fix your Deepfake definition and we’ll link to you.) GANs use two neural networks: a generator, which is a convolutional neural network, and a discriminator, which is a deconvolutional neural network. They work together in an adversarial fashion: The generator feeds both real and creates false data toand tries to pass it by the discriminator, while the discriminator tries to determine which is which. That’s how the discriminator learns what’s real and what’s fake.catch the false data and let the generator know it didn’t succeed.
In the case of face swapping, the generator creates new data attempting to match an image of a face, and the discriminator, tries to identify if it’s a match or not.which knows what the face looks like, corrects the data, and sends it back. They send data back and forth until the generator gets good enough to pass the discriminator. tries again and sends data to the discriminator, the discriminator corrects it and sends it back, and they repeat the process until the discriminator accepts the image.
GANs learn through interaction to fill data gaps. Their imaginative nature makes them ideal in situations where data may be missing and must be reconstructed.
While this functionality is a plus for creating certain images, this feature can cause problems with generating deepfakes. GANs will try to fill in any gaps in data, which can lead to blending images, instead of swapping them, known as identity bleed. In the case of swapping faces, the final image might look less like the source and more like a mixture of the source and destination. This blending effect is known as identity bleed. GANs can also experience “over imagination” where one frame differs drastically from the next, due to the training data used. An example of this would be a pair of sunglasses that appears on a face for a few frames, then disappears.
These limitations mean that for the most part, GAN can’t effectively be used for deepfakes. While developers can overcome the algorithm’s issues, the procedure takes careful data collection and processing. Simply put, there are better techniques applications for creating deepfakes.
For this reason FaceSwap uses generative autoencoders instead of GANs in its deepfake software.
A generativen autoencoder is a deep learning algorithm that uses an artificial neural network to perform its task: in this case, swapping faces. The autoencoder begins by encoding the input image in a series of progressively smaller layers until it hits the smallest layer, known as the bottleneck. At this point, it matches the data to a target number of variables, then decodes it. Decoding recreates the data through a number of progressively bigger layers until it outputs the final image.
This process isn’t perfect on the first attempt, so the autoencoder trains by repeating the process with different data. Training a neural network consists of:
- Encoding the data into smaller values
- Decoding the data back to the original size
- Calculating the loss, or difference between the original input (the face) and the output
- Modifying the models toward the final desired image
Repeat this sequence until you get your desired results.
Why is this better for creating deepfakes? While a GAN will use it’s “imagination” to go beyond the information given and try to fill in any gaps in the data, an autoencoder will stop at recreating the information it’s been given. The autoencoder will simply swap data without interpreting and inserting any missing data. By only executing on what is there and nothing more, it will deliver the truest results and will deliver a more consistent, accurate face swap. Going back to the example above with the sunglasses, the GAN might add sunglasses simply because the sunglasses can look realistic in that image, while autoencoder wont because the sunglasses weren’t in the original image.
You can augment the results of autoencoded deepfakes by using video to train the AI rather than photos. Video content provides varied poses and more data to work with. Autoencoders read the video data frame by frame, interpreting it as a long sequence of images. This additional input produces a higher-quality deepfake in the end.
GANs are an incredible fine tool in specific scenarios. B, but when it comes to Dtested with deepfakes, their advantages are actually their biggest weaknessy just can’t get the job done — despite the opposing claim from online sources (that’s YOU, Wikipedia). For this reason alone, it shouldn’t surprise anyone that FaceSwap doesn’t use GANs.
Learn more about deepfakes and how they’re made in our recently published book, “Exploring Deepfakes: Deploy powerful AI techniques for face replacement and more with this comprehensive guide,” available online today.