Context-aware 3-D Mean-shift with Occlusion Handling for Robust Object Tracking in RGB- D Videos

Depth cameras have recently become popular and many vision problems can be better solved with depth information. But how to integrate depth information into a visual tracker to overcome the challenges such as occlusion and background distraction is still under-investigated in current literatures of visual tracking. In this paper, we investigate a 3D extension of classical mean-shift tracker whose greedy gradient ascend strategy is generally considered as unreliable in conventional 2D tracking. However, through careful study of the physical property of 3D point clouds, we reveal that objects which may appear to be adjacent on 2D image will form distinctive modes in the 3D probability distribution approximated by kernel density estimation, and finding the nearest mode using 3D mean-shift can always work in tracking. Based on the understanding of 3D mean-shift, we propose two important mechanisms to further boost the tracker's robustness: one is to enable the tracker be aware of potential distractions and make corresponding adjustments to the appearance model; and the other is to enable the tracker to be able to detect and recover from tracking failures caused by total occlusion. The proposed method is both effective and computationally efficient. On a conventional PC, it runs at more than 60 FPS without GPU acceleration.

