Query Localization in Long-form Videos Mengmeng Xu (Frost), Ph.D. Student, Electrical and Computer Engineering May 4, 07:30 - 09:00 KAUST The growth of digital cameras and data communication has led to an exponential increase in video production and dissemination. As a result, automatic video analysis and understanding has become a crucial research topic in the computer vision community. However, the localization problem, which involves identifying a specific event in a large volume of data, particularly in long-form videos, remains a significant challenge.
Towards Designing Robust Deep Learning Models for 3D Understanding Abdullah Hamdi, Ph.D. Student, Electrical and Computer Engineering Apr 10, 17:00 - 19:00 B3 L5 R5220 deep neural networks Deep Neural Networks (DNNs) have shown huge success over the years to solve many 2D computer vision tasks driven by massive labeled 2D datasets and advancements in 2D vision models, but less success is witnessed on 3D vision tasks. This dissertation proposes innovative approaches to enhance the robustness of DNNs for 3D understanding and in 3D settings. The research focuses on two main areas: adversarial robustness on 3D data and setups, and the robustness of DNNs to realistic 3D scenarios. Two paradigms for 3D understanding are discussed: representing 3D as a set of 3D points and performing 2D processing of multiple images of the 3D data.
Towards Richer Video Representation for Action Understanding Humam Alwassel, Ph.D. Student, Computer Science Jan 23, 18:30 - 20:30 B2 L5 R5209 Computer Vision machine learning Human Activity Recognition With video data dominating the internet traffic, it is crucial to develop automated models that can analyze and understand what humans do in videos. Such models must solve tasks such as action classification, temporal activity localization, spatiotemporal action detection, and video captioning. This dissertation aims to identify the challenges hindering the progress in human action understanding and propose novel solutions to overcome these challenges.
Research at the Image and Video Understanding Lab (IVUL) - Graduate Seminar - CS Bernard Ghanem, Professor, Electrical and Computer Engineering Nov 30, 12:00 - 13:00 KAUST In this talk, I will give an overview of research done in the Image and Video Understanding Lab (IVUL) at KAUST. At IVUL, we work on topics that are important to the computer vision (CV) and machine learning (ML) communities, with emphasis on three research themes: Theme 1 (Video Understanding), Theme 2 (Visual Computing for Automated Navigation), Theme 3 (Fundamentals/Foundations).
Indoor 3D Scene Understanding Using Depth Sensors Jean Lahoud, Ph.D., Electrical and Computer Engineering May 28, 16:00 - 18:00 KAUST Computer Vision 3D object detection Deep learning One of the main goals in computer vision is to achieve a human-like understanding of images. This understanding has been recently represented in various forms, including image classification, object detection, semantic segmentation, among many others. Nevertheless, image understanding has been mainly studied in the 2D image frame, so more information is needed to relate them to the 3D world. With the emergence of 3D sensors (e.g. the Microsoft Kinect), which provide depth along with color information, the task of propagating 2D knowledge into 3D becomes more attainable and enables interaction between a machine (e.g. robot) and its environment. This dissertation focuses on three aspects of indoor 3D scene understanding: (1) 2D-driven 3D object detection for single frame scenes with inherent 2D information, (2) 3D object instance segmentation for 3D reconstructed scenes, and (3) using room and floor orientation for automatic labeling of indoor scenes that could be used for self-supervised object segmentation. These methods allow capturing of physical extents of 3D objects, such as their sizes and actual locations within a scene.
Understanding a Block of Layers in Deep Neural Networks: Optimization, Probabilistic and Tropical Geometric Perspectives Adel Bibi, Ph.D., Electrical and Computer Engineering Mar 30, 18:00 - 20:00 KAUST Computer Vision machine learning optimization In this dissertation, we aim at theoretically studying and analyzing deep learning models. Since deep models substantially vary in their shapes and sizes, in this dissertation, we restrict our work to a single fundamental block of layers that is common in almost all architectures. The block of layers of interest is the composition of an affine layer followed by a nonlinear activation function and then lastly followed by another affine layer. We study this block of layers from three different perspectives. (i) An Optimization Perspective. We try addressing the following question: Is it possible that the output of the forward pass through the block of layers highlighted above is an optimal solution to a certain convex optimization problem? As a result, we show an equivalency between the forward pass through this block of layers and a single iteration of certain types of deterministic and stochastic algorithms solving a particular class of tensor formulated convex optimization problems.
Efficient Localization of Human Actions and Moments in Videos Victor Escorcia, Ph.D., Electrical and Computer Engineering Jun 11, 15:00 - 16:00 B3 L5 R5220 Computer Vision machine learning artificial intelligence Abstract We are stumbling across a video tsunami flooding our communication channels. The ubiquity of digital cameras and social networks has increased the amount of visual media content generated and shared by people, in particular videos. Cisco reports that 82% of the internet traffic would be in the form of videos by 2022. The computer vision community has embraced this challenge by offering the first building blocks to translate the visual data in segmented video clips into semantic tags. However, users usually require to go beyond tagging at the video level. For example, someone may want
Sim-to-Real Transfer for Autonomous Navigation Matthias Mueller, Ph.D., Electrical and Computer Engineering May 14, 16:00 - 17:00 B2 L5 R5220 Computer Vision UAV robotics machine learning This work investigates the problem of transfer from simulation to the real world in the context of autonomous navigation. To this end, we first present a photo-realistic training and evaluation simulator Sim4CV which enables several applications across various fields of computer vision. Built on top of the Unreal Engine, the simulator features cars and unmanned aerial vehicles (UAVs) with a realistic physics simulation and diverse urban and suburban 3D environments. We demonstrate the versatility of the simulator with two case studies: autonomous UAV-based tracking of moving objects and autonomous driving using supervised learning.
ML Hub Seminar Series | The Machine Learning (ML) Hub Bernard Ghanem, Professor, Electrical and Computer Engineering Feb 13, 12:00 - 13:00 B9 H2 R2325 machine learning The Machine Learning Hub @ KAUST is designed to be the one-stop-shop for machine learning (ML) and artificial intelligence (AI) at KAUST. It is an informal forum for exchanging ideas in these areas, including (but not limited to) theoretical foundations, systems, tools, and applications. It will be providing several offerings to the KAUST community interested in ML and AI, including a regular seminar series where new research in the field is presented, an online social forum dedicated to AI and ML discussions, announcements, brainstorming, collaborations, and hands-on activities (e.g