Efficient and Embedded Vision, Deep Learning, Visual AI

"My research focuses on investigating, developing, and deploying novel methods for Deep Learning and Computer Vision to enable efficient, real-time on-device perception while ensuring robust operation."

Research Goals

_________

  • Advance the ability of machines to visually understand the world through improved representation learning and context awareness for visual object detection and segmentation, and robust visual learning models.

  • Investigate trade-offs associated with model complexity, predictive power, and computational demands to improve the efficiency of deep learning vision algorithms, facilitating their adoption within mobile and embedded platforms.

  • Adapt and improve state-of-the-art machine learning and computer vision algorithms for use in real-world applications such as vehicle traffic monitoring, infrastructure inspection, robotics, and emergency response. Explore data-driven acceleration techniques for video and image processing.


tinyDL for real-time CV

Computer vision (CV) technologies have made impressive progress in recent years due to the advent of deep learning (DL) and Convolutional Neural Networks (ConvNets), but often at the expense of increasingly complex models needing more and more computational and storage resources. ConvNets typically exert severe demands on local device resources and this conventionally limits their adoption within mobile and embedded platforms. Thus there is a need to develop small ConvNets that provide higher performance, smaller memory footprint, and faster development times. My research involves identifying and exploiting the trade-offs between computation, image resolution, and parameter count in order to develop the most efficient neural network models. Specifically, it aims at the development of light-weight and general purpose convolutional neural networks for various vision tasks. One particular research direction is to explore efficient operators (group point-wise and depth-wise dilated separable convolutions) to learn representations with fewer FLOPs and parameters.

A small ConvNet trained to detect hazards in aerial images from UAVs. (e.g., Traffic Incident)

Visual Object Detection

Object Detection is one of the most fundamental computer vision tasks; where the goal is to localize the object within an image. This task is especially crucial in areas where the object size is small and a large image needs to be processed. My research efforts in this area have been towards the development of efficient algorithms for image search to direct the detection process. Applications, in this area include unmanned aerial vehicles for traffic monitoring and analytics.

Single Shot Visual Object Detection for Aerial Imagery.

DeepCameras

Smart camera systems in a wide spectrum of machine vision applications including video surveillance, autonomous driving, robots and drones, smart factory and health monitoring. By leveraging recent advances in deep learning through convolutional neural networks, we can not only enable advanced perception through efficient optimized ConvNets, but also enable the direct control of cameras for active vision without having to rely on hand-crafted pipelines that incorporate various aspects of detection-tracking-control.

DeepCamera: (top) An end-to-end solution for active vision systems that leverages deep learning for camera control. (bottom)Integrate deep learning into smart cameras for active vision control as well as enhances perception (segmentation, detection, etc.)

Robustifying automated vision systems against Attacks

State-of-the-art deep learning models used in computer vision are susceptible to adversarial attacks which seek small perturbations of the input causing large errors in the estimation by the perception modality. The use of AI/ML-based techniques in detection and possibly mitigation of dynamic cyber-attacks on the camera system/data in the context of automated vision systems.

An encoder-decoder deep learning architecture for learning to reconstruct and restore image content in the presence of noise or an attack.

Visual Understanding for Emergency Monitoring

Early detection of hazards and calamities (e.g., wildfire, collapsed building, flood) is of utmost importance to ensure the safety of citizens and fast response time in case of a disaster. Towards this direction the problem of visual scene understanding for disaster classification is tackled through deep learning. A novel dataset called AIDER (Aerial Image Database for Emergency Response) is introduced and EmergencyNet small deep neural network is developed to classify aerial images and recognize disaster events. The small DNN can run on the processing platform of a UAV to improve autonomy and privacy.

Visualization of the saliency map for a neural network trained to detect fire.

Intelligent Multi-camera Video Surveillance

Cameras mounted on aerial and ground vehicles are becoming increasingly accessible in terms of cost and availability, leading to new forms of visual sensing. These mobile devices are significantly expanding the scope of video analytics beyond traditional static cameras by providing quicker and more effective means such as wide area monitoring for civil security and crowd analytics for large gathering and events. Combining stationary cameras with moving cameras enables new capabilities in video analytics, at the intersection of Internet of Things, Smart Cities, and sensing.

My initial postdoctoral research focused on networked smart cameras with on-board detection capabilities and how to develop optimization and collaboration algorithms that enable the overall improve performance of the system. A generic and flexible probabilistic camera detection model was formulated capable of capturing the detection behavior of the object detection modules running on smart cameras. On top of that mixed integer linear programming techniques were formulated that assigned a control action to each camera in order to maximize the overall detection probability. A real experimental setup was developed where the algorithms were validated.

People detected from multiple cameras which are dynamically reconfigured to maximize the overall detection capabilities.

HardwareAccelerated Embedded Vision

My PhD thesis focused on the development of hardware accelerators on Field Programmable Gate Arrays (FPGAs) for machine learning algorithms used in computer vision such as Haar- casades and monolithical/cascade Support Vector Machines. Beyond the hardware acceleration my doctoral thesis focused on utilizing multiple cues such as edge and depth information to further accelerate the overall object detection process through data reduction. Specific applications of interest handled by the accelerator include face detection, pedestrian detection, and vehicle detection. Today modern processors utilize such architectures and core ideas to provide Real-Time on device Perception (RToP)

Accelerator Architecture for Support Vector Machine Image Classification.
My research is developed through projects funded by the European Commission, and the Cyprus Government through its Research and Innovation Foundation.In addition, my research has also been supported by NVIDIA through its GPU Grant Program with a donation of 2 Titan Xp GPUs.