PERCEPTION

PERCEPTIONDescription

The scientific and technological challenges of the PERCEPTION project

From videos to 3D meshed and articulated objectsThe long term goal of our research is to understand, model, implement, and test the process by which images and videos are transformed into geometric and semantic representations. We will refer to this process as vision and we are interested in the development of computational models. Unlike traditional artificial systems, computer vision systems are more difficult to achieve: while the former generally receive input which has already been formatted as a set of abstract data, the latter must interact with the physical world — it turns out that formalizing these interactions is a very complex and tremendously difficult task.

Recent advances in the study of vision from biological, cognitive, and computational points of view have clearly showed that the processing, interpretation, and representation of sensorial inputs occur at various levels of abstraction. Bottom-up and top-down processes interact in a complex way. These recent findings are in contradiction with the assumption that sensorial processes could be studied independently and then considered as black boxes to be plugged into a cognitive system. Therefore, vision systems belong a the more general class of complex systems that should be able to continuously acquire sensory data, transform the data into abstract formats and descriptors, learn from these data, use prior knowledge to purposively select meaningful data, build and update long term knowledge, take decisions, supervise sensory-motor loops, etc.

Based on the visionary paradigm developed by David Marr in the late 1970’s, a large number of theories, models, prototypes, and industrial/commercial systems have been developed in the recent past. One among the most successful research efforts has been the understanding of single- and multiple-camera geometry. Theories based on projective geometry and their associated computational tools allow, among others, the 3-D reconstruction of a rigid scene from several 2-D projections (images). The strength of this approach is that algebraic projective geometry is well funded and well understood from a mathematical point of view. The weakness of the approach is twofold: First it heavily relies on sparse image points. Not only that points must be matched across images — which is an ill-defined mathematical problem — but once their 3-D coordinates are recovered, one is faced with the problem of inferring parameterized descriptions such as surfaces with their local properties (orientation, curvature, discontinuities, etc.), articulated and deformable objects (people, animals, and their body parts), and so forth. Second it is a feedforward (or bottom-up) approach that attempts to map light stimuli (image data) onto 3-D geometric entities with as little prior knowledge as possible.

Research objectives

The objective of PERCEPTION is to put forward the modelling of visual perception as a complex attentional mechanism that embodies a decision taking process. The task of the latter is to find a trade-off between the reliability of the sensorial stimuli and the plausibility of prior knowledge.

For example, such a vision system should be able to indentify, localize, and track over time complex objects, i.e., objects with several articulated parts, deformable objects, objects with complex shapes and aspects, etc. Moreover, we want to model, learn, and recognize their motions, actions, and gestures. Eventually we will build computer descriptions of these physical objects and their behaviors under the form of high-level abstract and symbolic representations.

In the near future we believe that it will be possible to address outstanding applications where the frontier between the physical and the virtual, between the real and the simulated, or between natural and artificial reasoning will become fuzzier: Automated understanding of behaviors (human, natural phenomena, etc.), realistic simulations and animations of these objects, augmented reality (integration of computation within physical objects), interactions between real and virtual objects and virtual worlds, computer-monitored human activities, and so forth.

Another major challenge is to build computational models that are in agreement with biological ones. The fact that the latters are made out of neurons has often been ignored by computer practitioners. Until recently there was no answer to the question of biological plausibility. Equally important is the issue of computational neuroscience: how one can implement with hardware and software the way the brain functions? Although a great deal of knowldege about the brain’s anatomy, physiology, and neural mechanisms has been accumulated, this knowledge is not nearly enough to determine analytic equations that describe large systems of neurons. Even if precise knowledge of neural dynamics is available, it would yield only partial understanding of how brain functions. Therefore, direct approaches based on studying the brain must be augmented with information processing approaches that attempt to model perception (and intelligence in general) at a more abstract level. Indeed, the structure of today’s computers and their programming languages are very far removed from the brain’s architecture.

The PERCEPTION project will contribute to establish a unifying theory of visual perception that is both quantifyable and verifyable with today’s technologies, i.e., mathematical formulations, computer algorithms and software, thorough experimental validation, that can provide a theory for human perception, and that can be used to address new and challenging applications.

Approach and methodology

The rationale of our approach will be to (i) build on our current know-how, (ii) acquire new expertise, and (iii) collaborate with researchers and engineers recognized for their expertise in computer vision as well as expertise in other disciplines:

-  Our team is internationally recognized for its expertise in computer vision: algebraic projective geometry, computational geometry, differential geometry, 2-D and 3-D shape modelling from multiple images and videos, image understanding, optimisation theory, robust statistics for vision, algorithms and software prototypes, etc. We will build on this existing know-how and on our recent research results in order to launch new developments;

-  We will develop and acquire new expertise necessary for achieving our objectives: machine learning, statistical decision theory, variational and EDP methods, computer animation, image-based rendering, distributed computing, cognitive modelling, computational neuroscience, etc.;

-  We will achieve prototypes and experimental platforms through a fine coupling between our scientific findings and technological developments, and

-  We will collaborate with others researchers, engineers, and teams: from our own discipline, from other disciplines, and from companies interested in our developments. As it has always been the case, the collaborations will be formalized through contracts and research grants at French, European, and international levels. In particular we will continue existing collaborations and establish new ones with the following disciplines:

  • Machine learning and statistical decision theory;
  • PDE and variational methods;
  • Computational auditory scene analysis, and
  • Computational cognitive neuroscience and neurophysiology.

Summary of scientific and technological objectives

Snapshot vision lived its time. We believe that the interactions between sensory-level processes (bottom-up) and the cognitive-level processes (top-down) reside at the core of future vision systems that distinguishes them from the classical paradigm.

In particular we will develop methods relying on either one or multiple image sequences linked to a powerful PC platform. Such a configuration should be able to gather calibrated, uncompressed and synchronized video sequences, to follow up the evolving technology progress in terms of higher image resolutions and increasing frame rates, and to run computer vision algorithms at high speeds, possibly in real-time. We will investigate theories and develop methods for recovering the shape of 3-D objects with articulated and/or deformable parts. We will use and combine several visual cues such as 2-D silhouettes, depth data, color, shading, and texture. We will develop methods and software for recovering the motion parameters of complex objects, to fit prior models (bio-mechanical motion models, physical models for shape deformations, etc.) to spatio-temporal visual data, to simulate, animate, and render objects in motion. On a longer term we plan to bridge the gap between computational, cognitive, and neurobiological approaches to visual perception. We will investigate the link between visual representations of the real world and symbolic representations of meaning. We will investigate the interactions between bottom-up and top-down processes. We will address the problems of learning abstract concepts from visual data and of recognizing human actions and gestures.