PERCEPTION

Multiple-camera images and videos

The PERCEPTION’s database of images and videos gathered with several synchronized and calibrated cameras. All the acquisitions were performed with the GrImage platform.

The public database is available at:

http://4drepository.inrialpes.fr/

CAVA: A dataset for computational audio-visual analysis

The CAVA database is freely accessible for scientific research purposes and for non-commercial applications.

CAVA is a unique set of audiovisual recordings using binocular and binaural camera/microphone pairs both mounted onto a person’s head. The database was gathered in order to develop computational methods and cognitive models for audiovisual scene analysis, as part of the European project POP (Perception on Purpose, FP6-IST-027268). The CAVA database was recorded in May 2007 by two POP partners : The University of Sheffield’s Speech and Hearing group and the INRIA Grenoble Rhône-Alpes’ PERCEPTION group. We recorded a large variety of scenarios representative of typical audiovisual tasks such as tracking a speaker in a complex and dynamic environment : multiple speakers participating to an informal meeting, both static and dynamic speakers, presence of acoustic noise, occluded speakers, speakers’ faces turning away from the cameras, etc.

For methodological details, please refer to this paper.

CAMIL : A Dataset for Computational Audio-Motor Integration through Learning

The CAMIL dataset is freely accessible for scientific research purposes and for non-commercial applications.

The CAMIL dataset is a unique set of audio recordings made with a realistic dummy head equipped with a binaural pair of microphones and mounted on a pan tilt robot setup. The dataset was gathered in order to investigate the audio-motor contingency from a computational point of view and experiments new models for sound localization based on machine learning. The recordings were made in November 2010 at INRIA Grenoble Rhône-Alpes and lead by Antoine Deleforge. A fully automatized protocol for the University of Coimbra’s audiovisual robot head POPEYE was designed to gather nearly 100,000 binaural sounds from all the robot’s motor states, with or without head movements. Records were made in the presence of a loud speaker emitting random spectrum sounds. Each record was annotated with the corresponding ground truth motor coordinates of the robot. The overall experiment was entirely unsupervised and laster 70 hours.