I'm a senior research engineer at Snap Research NYC. Before that, I was a research staff member at IBM T.J. Watson Research Center, Yorktown Heights, NY. My research interests are mobile sensing and computing, human computer interaction, visual recognition, augmented reality and location based services. I obtained my PhD from ECE department, Stony Brook University in 2019. I received my Bachelor’s degree in Applied Physics from University of Science and Technology of China (USTC) in 2011 and master degree from Chinese Academy of Sciences in 2014.
Fine-grained visual recognition for augmented reality enables dynamic presentation of right set of visual instructions in the rightcontext by analyzing the hardware state as the repair procedure evolves. (This work is published in IEEE ISMAR'20, accepted to IEEE TVCG special issue, 18 out of 302.)
While existing visual recognition approaches, which rely on 2D images to train their underlying models, work well for object classification, recognizing the changing state of a 3D object requires addressing several additional challenges. This paper proposes an active visual recognition approach to this problem, leveraging camera pose data available on mobile devices. With this approach, the state of a 3D object, which captures its appearance changes, can be recognized in real time. Our novel approach selects informative video frames filtered by 6-DOF camera poses to train a deep learning model to recognize object state. We validate our approach through a prototype for Augmented Reality-assisted hardware maintenance.
Acknowledgement: This work was done during my internship at IBM Research.
We propose a novel user authentication system EchoPrint, which leverages acoustics and vision for secure and convenient user authentication, without requiring any special hardware. EchoPrint actively emits almost inaudible acoustic signals from the earpiece speaker to “illuminate” the user's face and authenticates the user by the unique features extracted from the echoes bouncing off the 3D facial contour. Because the echo features depend on 3D facial geometries, EchoPrint is not easily spoofed by images or videos like 2D visual face recognition systems. It needs only commodity hardware, thus avoiding the extra costs of special sensors in solutions like FaceID.
EZ-Find provides a comprehensive solution for fast object finding and indoor navigation. The enabling techniques are computer vision, augmented reality and mobile computing. The fast object finding feature enables instant object identification from clutters (e.g., a book/medicine from shelf). Indoor navigation is the essential for indoor LBS, and will provide great convenience to people, especially in large scale public places such as airports and train stations.
We propose BatTracker, which incorporates inertial and acoustic data for robust, high precision and infrastructure-free tracking in indoor environments. BatTracker leverages echoes from nearby objects and uses distance measurements from them to correct error accumulation in inertial based device position prediction. It incorporates Doppler shifts and echo amplitudes to reliably identify the association between echoes and objects despite noisy signals from multi-path reflection and cluttered environment. A probabilistic algorithm creates, prunes and evolves multiple hypotheses based on measurement evidences to accommodate uncertainty in device position. Experiments in real environments show that BatTracker can track a mobile device's movements in 3D space at sub-cm level accuracy, comparable to the state-of-the-art infrastructure based approaches, while eliminating the needs of any additional hardware.
In this project, we propose BatMapper, which explores a previously untapped sensing modality - acoustics - for fast, fine grained and low cost floor plan construction. We design sound signals suitable for heterogeneous microphones on commodity smartphones, and acoustic signal processing techniques to produce accurate distance measurements to nearby objects. We further develop robust probabilistic echo-object association, recursive outlier removal and probabilistic resampling algorithms to identify the correspondence between distances and objects, thus the geometry of corridors and rooms. We compensate minute hand sway movements to identify small surface recessions, thus detecting doors automatically.
Lacking of floor plans is a fundamental obstacle to ubiquitous indoor location-based services. Recent work have made significant progress to accuracy, but they largely rely on slow crowdsensing that may take weeks or even months to collect enough data. In this paper, we propose Knitter that can generate accurate floor maps by a single random user’s one hour data collection efforts, and demonstrate how such maps can be used for indoor navigation. Knitter extracts high quality floor layout information from single images, calibrates user trajectories and filters outliers. It uses a multi-hypothesis map fusion framework that updates landmark positions/orientations and accessible areas incrementally according to evidences from each measurement.
[ Best Paper Award ][PDF]
[ Best Student Paper ][PDF]
[ Accepted to IEEE TVCG special issue, 18 out of 302, Acceptance rate 6%. ][PDF]