AI systems already exist that generate sound effects to match silent images of streets (and other places), but an experimental new technology does the opposite. Generates images that match street audio recordings with amazing accuracy.
Developed by Asst. Professor Yuhao Kang and colleagues at the University of Texas at Austin trained a “soundscape-to-image diffusion model” on a dataset of 10-second audiovisual clips.
These clips consisted of still images and ambient sounds taken from YouTube videos of urban and rural streets in North America, Asia, and Europe. Using deep learning algorithms, the system learned not only which sounds correspond to which items in an image, but also which sound qualities correspond to which visual environments.
Once trained, the system generated images based solely on the recorded ambient sounds of 100 other Street View videos, one image per video.
They then showed each of these images to a panel of human judges, alongside two other generated images of the street, while listening to the video soundtrack that the images were based on. When asked to identify which of three images corresponds to a soundtrack, they did so with an average of 80% accuracy.
Additionally, computer analysis of the resulting images found that the relative proportions of sky, greenery, and buildings were “strongly correlated” with the proportions of the original video.
In fact, the generated images often also reflect the lighting conditions of the source video, such as sunny, cloudy, or night sky. This may have been made possible by factors such as reduced nighttime traffic noise and the sounds of nocturnal insects.
The technology could have applications in forensic applications, such as getting a rough idea of where an audio recording was made, but the study also sheds light on how sound contributes to our sense of place. The purpose is to explore.
“The results of this study have the potential to advance our knowledge of the impact of vision and hearing on human mental health, guide urban design practices for place-making, and improve the overall quality of life in communities. “There is,” the scientists said in the paper. It was recently published in Nature magazine.
Source: University of Texas at Austin