skip to main content



  • Joint Estimation of 3D Hand Position and Gestures from Monocular Video for Mobile Interaction   Project Page
    J. Song, F. Pece, G. Sörös, M. Koelle, O. Hilliges
    ACM Human Factors in Computing Systems (CHI 2015)
    Seoul, South Korea, Apr. 2015.
    Abstract We present a machine learning technique to recognize gestures and estimate metric depth of hands for 3D interaction, relying only on monocular RGB video input. We aim to enable spatial interaction with small, body-worn devices where rich 3D input is desired but the usage of conventional depth sensors is prohibitive due to their power consumption and size. We propose a hybrid classification-regression approach to learn and predict a mapping of RGB colors to absolute, metric depth in real time. We also classify distinct hand gestures, allowing for a variety of 3D interactions. We demonstrate our technique with three mobile interaction scenarios and evaluate the method quantitatively and qualitatively.

  • 2014

  • In-air Gestures Around Unmodified Mobile Devices   Project Page
    J. Song, G. Sörös, F. Pece, S. Fanello, S. Izadi, C. Keskin, O. Hilliges
    Symposium on User Interface Software and Technology (UIST 2014)
    Honolulu, Hawaii, Oct. 6-8, 2014.
    Abstract We present a novel machine learning based algorithm ex- tending the interaction space around mobile devices. The technique uses only the RGB camera now commonplace on off-the-shelf mobile devices. Our algorithm robustly recog- nizes a wide range of in-air gestures, supporting user varia- tion, and varying lighting conditions. We demonstrate that our algorithm runs in real-time on unmodified mobile devices, in- cluding resource-constrained smartphones and smartwatches. Our goal is not to replace the touchscreen as primary input device, but rather to augment and enrich the existing interac- tion vocabulary using gestures. While touch input works well for many scenarios, we demonstrate numerous interaction tasks such as mode switches, application and task manage- ment, menu selection and certain types of navigation, where such input can be either complemented or better served by in- air gestures. This removes screen real-estate issues on small touchscreens, and allows input to be expanded to the 3D space around the device. We present results for recognition accuracy (93% test and 98% train), impact of memory footprint and other model parameters. Finally, we report results from pre- liminary user evaluations, discuss advantages and limitations and conclude with directions for future work.

  • Device Effect on Panoramic Video+Context Tasks
    F. Pece, J. Tompkin, H.P. Pfister, J. Kautz, C. Theobalt
    Conference on Visual Media Production (CVMP 2014)
    London, UK, 13-14 November 2014
    Abstract Panoramic imagery is viewed daily by thousands of people, and panoramic video imagery is becoming more common. This imagery is viewed on many different devices with different properties, and the effect of these differences on spatio-temporal task performance is yet untested on these imagery. We adapt a novel panoramic video interface and conduct a user study to discover whether display type affects spatio-temporal reasoning task performance across desktop monitor, tablet, and head-mounted displays. We discover that, in our complex reasoning task, HMDs are as effective as desktop displays even if participants felt less capable, but tablets were less effective than desktop displays even though participants felt just as capable. Our results impact virtual tourism, telepresence, and surveillance applications, and so we state the design implications of our results for panoramic imagery systems.

  • 2013






    Patents and Others