We present the first method for real-time full body capture that estimates shape and motion of body and hands together with a dynamic 3D face model from a single color image. Our approach uses a new neural network architecture that exploits correlations between body and hands at high computational efficiency. Unlike previous works, our approach is jointly trained on multiple datasets focusing on hand, body or face separately, without requiring data where all the parts are annotated at the same time, which is much more difficult to create at sufficient variety. The possibility of such multi-dataset training enables superior generalization ability. In contrast to earlier monocular full body methods, our approach captures more expressive 3D face geometry and color by estimating the shape, expression, albedo and illumination parameters of a statistical face model. Our method achieves competitive accuracy on public benchmarks, while being significantly faster and providing more complete face reconstructions.


  • Main Paper

  • Supplementary Document


BibTeX, 1 KB

author = {Zhou, Yuxiao and Habermann, Marc and Habibie, Ikhsanul and Tewari, Ayush and Theobalt, Christian and Xu, Feng},
title = {Monocular Real-time Full Body Capture with Inter-part Correlations},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}


This work was supported by the National Key R&D Program of China 2018YFA0704000, the NSFC (No.61822111, 61727808), Beijing Natural Science Foundation (JQ19015), and the ERC Consolidator Grant 4DRepLy (770784).


For questions, clarifications, please get in touch with:
Yuxiao Zhou

Page last updated Imprint. Data Protection.