This Eerie AI Extracts Sound from Silent Videos and Pictures

In this groundbreaking development, a scientist has turned the idea of extracting audio from static images into a reality using artificial intelligence (AI). Led by Kevin Fu, a professor with expertise in electrical and computer engineering and computer science at Northeastern University, a team has created a machine learning tool called Side Eye that offers remarkable insights into image analysis.

Side Eye can be applied to still images, enabling it to identify the speaker’s gender in the room where the photo was taken, transcribe spoken words, and even pinpoint the location. Remarkably, this tool can also be used with muted videos.

The technology behind Side Eye leverages image stabilization techniques commonly found in smartphone cameras. These cameras use springs submerged in liquid to stabilize photos when the photographer’s hand is unsteady. When someone speaks near the camera lens while a photo is taken, it causes slight vibrations in the springs, which alter the path of light. By extracting audio frequencies from these vibrations, Side Eye can capture what is being said off-camera.

Although Side Eye is currently in its early stages and requires extensive training data for improvement, it poses potential cybersecurity risks if it falls into the wrong hands. On the flip side, an advanced version of Side Eye could become a valuable digital tool for law enforcement agencies in crime investigations, offering crucial digital evidence.