CAVA Computational Audio-Visual Analysis

Data

Jump to: Video calibration data - Fixed perceiver - Panning perceiver - Moving perceiver

The following page presents the data available in the database.
Here you can find a description of the sequences, preview videos (reachable by clicking on the sequences image), and you can download the data.

Important: In order to download the data, you must have JavaScript enabled, and accept cookies

For more information about the content of the different datas, you may want to have a look at the documentation page.

List of recorded sequences

sequence name duration min:sec type of head number of speaker(s) speaker(s)/noise behaviour visual occlusion auditory overlap
fixed perceiver TTOS1
CT1OS1
CT2OS3
CT3OS1
NTOS2
TTMS3
CTMS3
DCMS3
NTMS2
CPP1
M1
00:20
00:18
00:21
00:19
00:33
00:23
00:25
00:48
00:26
02:40
03:47
dummy
dummy
dummy
dummy
dummy
dummy
dummy
dummy
dummy
dummy
dummy
1
1
1 (changing appearance)
2 (one at a time)
1
3 to 4
1 to 3
2 to 4
2
several
5
moving
moving
moving
moving
moving
moving
moving
moving
moving
seated
seated
yes
no
no
no
yes - L
yes
yes
yes
yes - L
yes
yes (2 not seen)
no
no
no
no
no – M/N/C
yes
yes
yes
no – M/N/C
yes
yes
panning perceiver VHS1
VHN1
ELMS3
ELSS1
ELSN1
00:34
00:32
00:43
00:41
00:53
dummy
dummy
dummy
dummy
dummy
1
1
1
2
2
fixed
fixed
moving
fixed
fixed
yes
no
yes
yes
no
no
no
no - N
yes
yes
moving perceiver AH2
Ming2
M3
Circ1
AHN1
AHN4
P1
00:38
01:44
03:54
01:00
00:44
00:56
01:17
human
human
human
human
human
human
human
1
several
5
5
1
1
5
moving
moving
seated
seated
moving
moving
seated
yes
yes
yes
yes
yes
no
yes
no
yes
yes
no
no - N
no
no

Visual occlusion means either (i) an occlusion of a speaker by another speaker or by a wall, or (ii) a speaker outside of the field of view while speaking. In the column “auditory overlap” and “visual occlusion”, the tags mean [M]usic, [C]licks, white [N]oise and [L]ight changes.

Fixed perceiver scenarios use the dummy head wearing the helmet, panning perceiver senarios use the dummy head on swivel chair and moving scenarios are recorded with a human wearing the helmet and in-ear microphones.

Scenario schematics

The scenarios are illustrated by a schematic.

Part 0: Video calibration data

Go back to top

Several calibration sequences have been acquired with the goal to get accurate results. We provide the best two.

First calibration downloads:

Second calibration downloads:

Part 1: fixed perceiver

Go back to top

The aim for the fixed perceiver scenarios is to enable evaluation of audio, video and AV tracking and clustering in scenarios with various challenges such as speakers walking in and out of field of view, walking behind a wall, speakers changing appearance and multiple, simultaneous sound sources. These are covered by the following scenarios.

1. Scenarii with one speaker

1.1. Tracking test - One speaker (TTOS)

TTOS Schematic

TTOS1

TTOS1

Downloads:

1.2. Clustering test - One speaker (CT1OS)

CT1OS Schematic

CT1OS1

CT1OS1

Downloads:

1.3. Clustering test 2 - One speaker (CT2OS)

CT2OS Schematic

CT2OS3

CT2OS3

Downloads:

1.4. Clustering test 3 - One speaker (CT30S)

CT3OS Schematic

CT3OS1

CT3OS1

Downloads:

1.5. Noise test - one speaker (NTOS)

NTOS Schematic

NTOS2

NTOS2

Downloads:

2. Scenarii with multiple speakers

2.1. Tracking test - Multipe speakers (TTMS)

TTMS Schematic

TTMS3

TTMS3

Downloads:

2.2. Clustering test - Multiple speakers (CTMS)

CTMS Schematic

CTMS3

CTMS3

Downloads:

2.3. Dynamic changes - Multiple speakers (DCMS)

DCMS Schematic

DCMS3

DCMS3

Downloads:

2.4. Noise test - multiple speakers (NTMS)

NTMS Schematic

NTMS2

NTMS2

Downloads:

Part 2: varying head movement of the perceiver

Go back to top

The panning perceiver scenarios were recorded to obtain recordings of controlled cues from an actively moving head. They are all recorded using the dummy head and torso strapped onto a swivel chair. During recordings, the chair is panned from side to side at the same time as the scenario is “acted” out.

3. Varying head - noise (VHN), ans speech (VHS)

VHN-VHS Schematic

VHN1

VHS1

VHN1 VHS1

Downloads:

Downloads:

4. Ego location - stationary noise (ELSN) and stationary speech (ELSS)

ELSS-ELSN Schematic

ELSN1

ELSS1

ELSN1 ELSS1

Downloads:

Downloads:

5. Ego location - moving speech (ELMS)

ELMS Schematic

ELMS3

ELMS3

Downloads:

6. Active hearing (AH) - only speech, speech + noise, only noise

AH-AHN Schematic

AH2

AHN1

AHN4

AH2 AHN1 AHN4

Downloads:

Downloads:

Downloads:

Part 3: moving perceiver

Go back to top

The aim of the scenarios with a moving perceiver is to provide very challenging audio-visual situations that can appear in a real-life environment data.

7. Cocktail party - passive (CPP)

CPP Schematic

CPP1

CPP1

Downloads:

8. Mingling (Ming)

Ming Schematic

Ming2

Ming2

Downloads:

9. Meeting (M)

M3 Schematic
M3

M1 Schematic
M1

M1

M3

M1 M3

Downloads:

Downloads:

10. Panel (P) and Circle (Circ)

Circ Schematic
Circ1

P Schematic
P1

P1

Circ1

P1 Circ1

Downloads:

Downloads: