NSF Grant Supports Effort to Identify Deep Fake Technology

When scrolling through the news in your favorite app, you probably have a method of determining whether the content you’re seeing is legitimate or fake. What’s the source of the story? Who benefits from the story being shared? Are there obvious factual errors? Sometimes, it’s not even necessary to read past the headline because you know, for example, that there’s enough evidence to prove the Earth is, indeed, round.

But how can you tell if the video you’re watching or the audio you’re hearing is illegitimate when it looks and sounds like the real thing?

Engineering professor TJ Tsai has been awarded a National Science Foundation (NSF) grant for his project, “A Cross-Verification Approach for Identifying Tampered Audio,” which seeks ways to identify fake or tampered audio, such as synthetic recordings generated by deep fake technology.

Though it’s relatively new, deep fake technology is prevalent, and “there’s definitely concern that bad actors could use this technology in a very destructive way,” says Tsai.

Indeed, one system can modify videos in a photorealistic manner to lip-sync to unrelated audio recordings. Another can allow a source actor to control the facial expressions and head movements of a person in a target video. Recent advances in speech synthesis have enabled systems to learn and imitate the characteristics of a person’s voice with very limited training data.

Tsai says, “Rather than approaching this problem from a computer science-centric perspective: ‘Is this video a deep fake?’, this proposal approaches the problem from a history-centric perspective: ‘Is this video a historically verifiable event?’ Historians have a very robust methodology for answering this question, and we can apply this methodology to audio and video data. In the same way that a historian tests a historical claim by cross-checking against all other primary sources of information, we can test the authenticity of an audiovisual recording by cross-checking against all other primary sources of audiovisual information.”

The idea for the project came to Tsai as he was listening to a sermon at church. “The speaker was talking about how we can know whether a historical event actually happened,” Tsai says. “Historians have well-established tools for determining historical truth, and one of the primary mechanisms is cross-verifying a historical claim against other primary sources. If something is true, it will be internally consistent. If it is false, it will contradict other eyewitness accounts. At the time, I had been thinking about the problem of detecting fake or tampered videos, and it occurred to me that all of the work I had seen was focused on scrutinizing the video itself to determine if it was genuine or not. I thought it would be interesting to approach the problem from the perspective of verifying a historical claim by cross-checking against other recordings of the same event.”

With the help of six student researchers over two summers, Tsai will develop methods to cross-verify audio data in two different scenarios: cross-verifying with trusted data and cross-verifying with untrusted data.

“The proposal develops tools to counteract the spread of false audiovisual information,” Tsai says. “In particular, it focuses on protecting world leaders from fake videos that might cause instability or unrest at a national level. These same tools establish the reliability of true audiovisual information. Beyond simply detecting fake videos, it provides a quantitative way to measure the reliability of audiovisual data concerning public matters.”

NSF grants are the largest share of external support for faculty research at Harvey Mudd College.