Efficient and Embedded Vision, Deep Learning, Visual AI
Advance the ability of machines to visually understand the world through improved representation learning and context awareness for visual object detection and segmentation, and robust visual learning models.
Investigate trade-offs associated with model complexity, predictive power, and computational demands to improve the efficiency of deep learning vision algorithms, facilitating their adoption within mobile and embedded platforms.
Adapt and improve state-of-the-art machine learning and computer vision algorithms for use in real-world applications such as vehicle traffic monitoring, infrastructure inspection, robotics, and emergency response. Explore data-driven acceleration techniques for video and image processing.
tinyDL for real-time CV
Computer vision (CV) technologies have made impressive progress in recent years due to the advent of deep learning (DL) and Convolutional Neural Networks (ConvNets), but often at the expense of increasingly complex models needing more and more computational and storage resources. ConvNets typically exert severe demands on local device resources and this conventionally limits their adoption within mobile and embedded platforms. Thus there is a need to develop small ConvNets that provide higher performance, smaller memory footprint, and faster development times. My research involves identifying and exploiting the trade-offs between computation, image resolution, and parameter count in order to develop the most efficient neural network models. Specifically, it aims at the development of light-weight and general purpose convolutional neural networks for various vision tasks. One particular research direction is to explore efficient operators (group point-wise and depth-wise dilated separable convolutions) to learn representations with fewer FLOPs and parameters.
Visual Object Detection
Object Detection is one of the most fundamental computer vision tasks; where the goal is to localize the object within an image. This task is especially crucial in areas where the object size is small and a large image needs to be processed. My research efforts in this area have been towards the development of efficient algorithms for image search to direct the detection process. Applications, in this area include unmanned aerial vehicles for traffic monitoring and analytics.
Smart camera systems in a wide spectrum of machine vision applications including video surveillance, autonomous driving, robots and drones, smart factory and health monitoring. By leveraging recent advances in deep learning through convolutional neural networks, we can not only enable advanced perception through efficient optimized ConvNets, but also enable the direct control of cameras for active vision without having to rely on hand-crafted pipelines that incorporate various aspects of detection-tracking-control.
Robustifying automated vision systems against Attacks
State-of-the-art deep learning models used in computer vision are susceptible to adversarial attacks which seek small perturbations of the input causing large errors in the estimation by the perception modality. The use of AI/ML-based techniques in detection and possibly mitigation of dynamic cyber-attacks on the camera system/data in the context of automated vision systems.
Visual Understanding for Emergency Monitoring
Early detection of hazards and calamities (e.g., wildfire, collapsed building, flood) is of utmost importance to ensure the safety of citizens and fast response time in case of a disaster. Towards this direction the problem of visual scene understanding for disaster classification is tackled through deep learning. A novel dataset called AIDER (Aerial Image Database for Emergency Response) is introduced and EmergencyNet small deep neural network is developed to classify aerial images and recognize disaster events. The small DNN can run on the processing platform of a UAV to improve autonomy and privacy.
Intelligent Multi-camera Video Surveillance
Cameras mounted on aerial and ground vehicles are becoming increasingly accessible in terms of cost and availability, leading to new forms of visual sensing. These mobile devices are significantly expanding the scope of video analytics beyond traditional static cameras by providing quicker and more effective means such as wide area monitoring for civil security and crowd analytics for large gathering and events. Combining stationary cameras with moving cameras enables new capabilities in video analytics, at the intersection of Internet of Things, Smart Cities, and sensing.
My initial postdoctoral research focused on networked smart cameras with on-board detection capabilities and how to develop optimization and collaboration algorithms that enable the overall improve performance of the system. A generic and flexible probabilistic camera detection model was formulated capable of capturing the detection behavior of the object detection modules running on smart cameras. On top of that mixed integer linear programming techniques were formulated that assigned a control action to each camera in order to maximize the overall detection probability. A real experimental setup was developed where the algorithms were validated.
HardwareAccelerated Embedded Vision
My PhD thesis focused on the development of hardware accelerators on Field Programmable Gate Arrays (FPGAs) for machine learning algorithms used in computer vision such as Haar- casades and monolithical/cascade Support Vector Machines. Beyond the hardware acceleration my doctoral thesis focused on utilizing multiple cues such as edge and depth information to further accelerate the overall object detection process through data reduction. Specific applications of interest handled by the accelerator include face detection, pedestrian detection, and vehicle detection. Today modern processors utilize such architectures and core ideas to provide Real-Time on device Perception (RToP)