Graphics, Vision & Video

Lightweight Binocular Facial Performance Capture under Uncontrolled Lighting

SIGGRAPH Asia 2012

Levi Valgaerts 1   Chenglei Wu 1,2   Andrés Bruhn 3   Hans-Peter Seidel 1   Christian Theobalt 1
1 MPI for Informatics 2 Intel Visual Computing Institute 3 University of Stuttgart
Abstract Videos Bibtex Data Sets


Recent progress in passive facial performance capture has shown impressively detailed results on highly articulated motion. However, most methods rely on complex multi-camera set-ups, controlled lighting or fiducial markers. This prevents them from being used in general environments, outdoor scenes, during live action on a film set, or by freelance animators and everyday users who want to capture their digital selves. In this paper, we therefore propose a lightweight passive facial performance capture approach that is able to reconstruct high-quality dynamic facial geometry from only a single pair of stereo cameras. Our method succeeds under uncontrolled and time-varying lighting, and also in outdoor scenes. Our approach builds upon and extends recent image-based scene flow computation, lighting estimation and shading-based refinement algorithms. It integrates them into a pipeline that is specifically tailored towards facial performance reconstruction from challenging binocular footage under uncontrolled lighting. In an experimental evaluation, the strong capabilities of our method become explicit: We achieve detailed and spatio-temporally coherent results for expressive facial motion in both indoor and outdoor scenes -- even from low quality input images recorded with a hand-held consumer stereo camera. We believe that our approach is the first to capture facial performances of such high quality from a single stereo rig and we demonstrate that it brings facial performance capture out of the studio, into the wild, and within the reach of everybody.

pdf (4.7M) / (65.2M)
Supplementary Material
pdf (38.3M)
avi (126.9M)
pptx (123.8M)


Supplementary video to the paper
Additional video showing a result for 560 frames (around 22 seconds)


author = {Levi Valgaerts and Chenglei Wu and Andrés Bruhn and Hans-Peter Seidel and Christian Theobalt},
title = {Lightweight Binocular Facial Performance Capture under Uncontrolled Lighting},
booktitle = {ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2012)},
volume = {31},
number = {6},
pages = {187:1--187:11},
month = {November},
year = {2012},
url = {},
doi = {10.1145/2366145.2366206}

Data Sets

We make the data shown in the following video available on request.

It consists of 200 frames of a stereo sequence captured at 25 fps (around 8 seconds) and the corresponding textured spatio-temporally coherent 3D reconstructions. As mentioned in the paper, synchronisation of the left and right video sequence is performed by event, which makes it sub-frame accurate at best. The face meshes have a resolution of 500K vertices, which is 5 times higher than the results shown in the paper and the supplementary video. The algorithm used to obtain these results is the one described in the paper, with the higher resolution only improving the captured detail.

Terms of use: The provided data is intended for research purposes only and any use of it for non-scientific means is not allowed. This includes the publishing of any scientific results obtained with our data in non-scientific literature, such as tabloid press. We ask the researcher to respect our actors and not to use the data for any distasteful manipulations (such as hideous deformations, exploding heads, manipulations that might be culturally sensitive,...). We also ask the researcher not to disseminate this data outside of his or her institute; distribution within the affiliated institution is allowed.

Requesting the data: Please understand that we can only make the data available to senior project managers or senior researchers. To keep track of researchers and institutions requesting the data and to ascertain that you abide by the above terms of use, we make the data available after sending an email to stating the following:
  1. Your name, title and institution.
  2. Your intended use of the data.
  3. A statement saying that you accept the following terms:

    The rights to use, copy and distribute the 3D reconstructions and image sequences provided on this website are under the supervision of Prof. Christian Theobalt of the Graphics, Vision & Video group at the Max-Planck-Institute for Informatics, Saarbrücken. You are given permission to copy this data in electronic form and to distribute it within your institute for scientific purposes only. Inclusion of rendered results obtained from this data in a scholarly publication (printed or electronic) is permitted. In this case, the following sentence must be added to the acknowledgements section of your paper: The captured performance data were provided courtesy of the research group Graphics, Vision & Video of the Max-Planck-Institute for Informatics and the following paper must be cited: Lightweight Binocular Facial Performance Capture under Uncontrolled Lighting. For any usage other than your intended scientific research, written permission is required from Christian Theobalt. Any commercial use is hereby excluded.
More data sets will be made available in the near future.