It was April of 2018, and it was just like any other day until I got my first text: “Did you see this yet?!” Includes link to YouTube. Seconds later, former President Barack Obama appeared on the screen delivering a speech that President Donald Trump declared “complete and perfect (expletive).”
However, it wasn’t actually former President Obama. It was a deepfake video created by Jordan Peele and Buzzfeed. The overall purpose of this video was to raise awareness about the then-current AI capabilities to create narrated talking heads. In this case, Peel used Barack Obama to convey his message to be cautious, cautious, and insist on using reliable news sources.
That was six-and-a-half years ago, and in the age of AI, that was years ago.
How can you trust your sources now?
Since then, we have made countless advances in AI technology. It’s literally a leap forward from the technology Peel used to create a deepfake of Obama. GPT-2 was made publicly available in 2019 and can be used to generate text with simple prompts.
2021 saw the release of DALL-E, an AI image generation tool that can fool even the keenest eyes with some of its photorealistic images. In 2022, DALL-E 2 gets even better. MidJourney was also released in the same year. Both let you input text with subject matter, situation, action, and style and output unique artwork with photorealistic images.
In 2024, generative AI has completely collapsed. Meta’s Make-A-Video lets you generate 5-second long videos from just a text description. Meta’s new Movie Gen also takes AI video generation to new heights. OpenAI Sora, Google Veo, Runway ML, HeyGen… You can now generate anything imaginative with text prompts. Perhaps even more so, as AI-generated videos sometimes go wild in response to our input, leading to some very captivating and psychedelic visuals.
Synthesia and DeepBrain are two AIs specifically designed to deliver human-like content using AI-generated avatars, similar to the newscasters delivering the latest news on your favorite local channels. Video platform. Speaking of which, your entire local channel may soon be generated by AI – like the notable Channel One. There are many others.
What is real and what is fake? Who can tell the difference? Certainly not your aunt who keeps sharing ridiculous images on Facebook. The concepts of truth, reality, and authenticity are under attack, and the effects are reverberating beyond the screen. So, to give humanity a chance against the looming tsunami of lies, Google DeepMind has developed technology to watermark and identify AI-generated media, or SynthID.
SynthID can separate legitimate, authentic content from AI-generated content by digitally watermarking AI content in a way that is imperceptible to humans but easily recognized, especially by software looking for watermarks.
This applies not only to videos, but also to images, audio, and even text. DeepMind says it can do so without compromising the integrity of the original content.
Text AI Watermark
Large-scale language models (LLMs) like ChatGPT use “tokens” to read input and produce output. A token is basically a part or whole of a word or phrase. If you’ve used LLM before, you’ve probably noticed that it tends to repeat certain words or phrases within its responses. The pattern is common to LLMs.
How SynthID watermarks AI-generated text is quite complex, but simply put, it subtly manipulates the probabilities of various tokens throughout the text. It can tweak as many as 10 probabilities within a sentence, and hundreds across a page, potentially leaving what Deepmind calls a “statistical signature” on the generated text.
It’s still completely legible to humans, and there’s no way to tell unless you have near-paranormal pattern recognition skills.
However, SynthID’s watermark detector can detect longer texts with higher accuracy. Watermarks should also be fairly robust to some degree of text editing, since no specific character patterns are involved.
AI watermarking for audio and video
Multimedia content should become much easier, as all kinds of information can be encoded into invisible and inaudible artifacts within files. For audio, SynthID creates a spectrogram of the file and inserts a watermark that is imperceptible to the human ear before converting it back to a waveform.
Photos and videos simply have a watermark embedded in the pixels within the image in a non-destructive manner. Even if the image or video has been altered with filters or crops, the watermark is still detectable.
Google has open sourced its SynthID technology and is encouraging companies building generative AI tools to use it. What’s at stake here is not only that people will be fooled by fake AI. Large companies themselves need to ensure that AI-generated content is distinguishable from human-generated content for other reasons. Then tomorrow’s AI models will be trained as follows: Real human-generated content, not AI-generated BS.
If an AI model were forced to eat large amounts of its own excrement, all the “hallucinations” prevalent in today’s earlier models would become part of the new model’s understanding of the ground truth. Google definitely has a vested interest in making sure the next Gemini model is trained on the best possible data.
But at the end of the day, schemes like SynthID are very opt-in, so companies that opt out and have a much harder time discovering GenAI’s text, images, video, and audio have a much harder time making a convincing sales pitch. I will do it. It’s for anyone who really wants to twist the truth or deceive people, from election interference types to kids who can’t be bothered to write their own assignments.
Conceivably, countries could enact laws mandating these watermarking technologies, but in that case there will certainly be countries that choose not to do so, and will develop their own AI to circumvent such restrictions. There will be some shady activity building models.
But this is a start. You and I might still be fooled at first by a video of Taylor Swift handing out pots and pans on TikTok, but now with SynthID technology you can check its authenticity before paying the $9.99 shipping fee. It will be.
Source: Google DeepMind