CAVA Computational Audio-Visual Analysis

Setup

To achieve the acquisition of such a data collection, the following setup has been developed (let us notice that this setup is designed to be easily plugged with the AV head under development within POP). The audio-visual perceiver is either a person or the dummy head/torso wearing earbud microphones. The perceiver is also fitted with a helmet on which is mounted a pair of stereo cameras. On top of the head, a 4 point tracking device is attached. This has to be viewable from the tracking camera, which is to be placed above; either suspended from the ceiling or similar. The three cameras (stero pair and tracking) are controlled with a software package and the raw image sequences are recorded on to a PC. The audio is recorded on to a laptop or PC. The tree camera data flows are synchronized with the audio signal using NTP network.

Due to some differences in the way NTP works on Windows and on Linux, the safest method to synchronize the audio and video signals has been decided to be the “clap” method".