Learning to parse pictures of people
Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, Volume 4, page 700-714 - June 2002
Detecting people in images is a key problem for video indexing, browsing and
retrieval. The main difficulties are the large appearance variations caused
by action, clothing, illumination, viewpoint and scale. Our goal is to find
people in static video frames using learned models of both the appearance of
body parts (head, limbs, hands), and of the geometry of their assemblies. We
build on Forsyth & Fleck's general `body plan' methodology and Felzenszwalb
& Huttenlocher's dynamic programming approach for efficiently assembling candidate
parts into `pictorial structures'. However we replace the rather simple part
detectors used in these works with dedicated detectors learned for each body
part using Support Vector Machines (SVMs) or Relevance Vector Machines (RVMs).
We are not aware of any previous work using SVMs to learn articulated body plans,
however they have been used to detect both whole pedestrians and combinations
of rigidly positioned subimages (typically, upper body, arms, and legs) in street
scenes, under a wide range of illumination, pose and clothing variations. RVMs
are SVM-like classifiers that offer a well-founded probabilistic interpretation
and improved sparsity for reduced computation. We demonstrate their benefits
experimentally in a series of results showing great promise for learning detectors
in more general situations.
BibTex references
@InProceedings\{RST02,
author = "Ronfard, Remi and Schmid, Cordelia and Triggs, Bill",
title = "Learning to parse pictures of people",
booktitle = "Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark",
volume = "4",
pages = "700-714",
month = "June",
year = "2002",
publisher = "Springer",
note = "Copenhagen",
url = "http://perception.inrialpes.fr/Publications/2002/RST02"
}
![rst2002.pdf [393Ko]](/Publications/images/pdf.png)