CAVA Computational Audio-Visual Analysis

Synchronized and calibrated binocular/binaural data sets with head movements

The CAVA database is a unique set of audiovisual recordings using binocular and binaural camera/microphone pairs both mounted onto a person's head. The database was gathered in order to develop computational methods and cognitive models for audiovisual scene analysis, as part of the European project POP (Perception on Purpose, FP6-IST-027268). The CAVA database was recorded in May 2007 by two POP partners: The University of Sheffield and INRIA Grenoble Rhône-Alpes. We recorded a large variety of scenarios representative of typical audiovisual tasks such as tracking a speaker in a complex and dynamic environment: multiple speakers participating to an informal meeting, both static and dynamic speakers, presence of acoustic noise, occluded speakers, speakers' faces turning away from the cameras, etc.

Central to all scenarios is the recording of the state of the audiovisual perceiver (static, panning, tilting, etc.), namely the perceiver's head motions with six degrees of freedom are available as well.

The CAVA database is freely accessible for scientific research purposes and for non-commercial applications.