Current Research

We are in the midst of a data  revolution,  where  visual  content  has  a  protagonist  role. For instance, YouTube reports that over 100 hours of video are uploaded every minute to their servers. Uploaded content ranges from a view of astronauts walking in space to the first steps of a baby at home. Our aim is to develop novel algorithms to automatically understand and recognize human activities from this huge visual space. We hope that our research brings the possibility to develop novel applications such as: video surveillance systems capable of detecting suspicious activities, automated household assistants, monitoring performance and understanding strategy in sports, and indexing content in web services.​
Object tracking is to associate target objects in consecutive video frames. Its real-world applications range from video surveillance, autonomous vehicles, intelligent traffic control, human computer interaction, etc. However, visual tracking is challenging due to significant object appearance variations caused by illumination change, occlusion, sensory noise, fast/abrupt object motion, and also cluttered background. A "good" tracker should be designed with these issues in mind, so it can be applicable in real-world scenarios. ​​​
3D Understanding extends the ability of computer vision systems to understand an image from the image plane to the 3D world. It has become more prominent with off-the-shelf 3D sensors, which have been recently used to build large-scale RGB-D datasets. 3D Understanding addresses problems related to 3D reconstruction, scene understanding (ex: layout prediction), object detection, pose estimation, and others. Applications include but are not limited to automatic inspection, robot navigation, human-machine interaction, and object modeling.​
Many applications in computer vision and machine learning can be directly formulated as a nonlinear optimization problem, which may be inherently non-convex or/and NP-hard discrete (e.g. l0 norm sparse optimization, multiple label MRF learning, representation learning, image segmentation, clustering and classification). This becomes computationally very challenging in algorithm design. In our research, we consider and design efficient and effective convex relaxations (l1  norm relaxation, linear box relaxation, SOCP relaxation, SDP relaxation) or multiple-stage convex optimization algorithms (Dinkelbach algorithm, MPEC optimization, DC programming, greedy basis pursuit) to tackle these problems.
An artificial vision system should be able to understand the content of an image. From a lower level, image content includes different types of textures, shapes/forms, and illumination in the scene, while at a higher level objects, scene type (outdoor, natural, etc.) and their relations need to be considered. Beyond human level understanding of visual data, artificial vision systems should be able to process and synthesize visual information in order to improve the productivity of artists, architects, engineers, and humans in general. This includes many important tasks including deblurring/denoising pictures, assisting in the completion or enhancement of photos, and keeping track of building construction progress to name  a few.​
​The rapid growth of image and video data has recently converged computer graphics research towards computer vision. More realistic graphical models of static and dynamic scenes are reconstructed from real image and video data using computer vision tools and algorithms. ​
​Exploring novel means of involving human judgment in computer vision and image processing methodology. This includes studying perceptual elements of image quality, saliency, and memorability and linking them to low-level image features, which can be used to develop more effective and perceptually-relevant recognition and compression techniques​.​​​
​A dynamic texture (DT) is the temporal extension of 2D texture. Even though the overall global motion of a DT may be perceived by humans as simple and coherent, the underlying local motion is complex and stochastic. DT models are developed and applied to DT synthesis, recognition, and compression.​
​Video Processing​​