So this is incredible. Researchers led by MIT computer scientist Abe Davis have developed an algorithm that can reconstruct intelligible music and speech from video footage of things like plants and potato chip bags by analyzing the patterns of their otherwise imperceptible vibrations.

Via Wired:

The secret to the science is that the researchers were able to analyse the tiny vibrations objects that occur when sound hits objects. "The motion of this vibration creates a very subtle visual signal that's usually invisible to the naked eye. People didn't realise that this information was there," explains Abe Davis from MIT, who is first author on the study detailing the development. It was from these minuscule vibrations that the research team learned to reconstruct the sound.

The team, which consists of researchers from MIT, Microsoft and Adobe, has managed to extract useful audio signals from materials such as water, aluminium foil and the leaves of a potted plant.

In order for the algorithm to work successfully, the video frequency had to be higher than the audio frequency. In order to achieve this, the researchers sometimes captured video at 2,000 to 6,000 frames per second, which is significantly higher than most commercial high-speed cameras can achieve.

With a bit of ingenuity, though, the researchers were able to recover sound using footage captured with video shot at 60 fps. The video above explains how.

A paper describing the algorithm and its abilities can be found here.