Computer Vision & Deep Learning


Computer Vision is the set of algorithms, methodologies and models by which computers can be programmed to derive higher-level descriptions from raw visual data, such as pictures or videos. This visual understanding of the world, coupled with decision systems, enables the automation of tasks otherwise possible only for the human visual system.


Example tasks comprise:
  • Human 3D pose estimation
  • Optical character recognition
  • Face detection
  • Crowd counting and tracking



These applications and many others are performed extracting from the raw pictures refined information about the scene such as edges, corners, shape, color, texture and depth. This high-level description is used to train a model of the scene and, on its basis, the system takes actions according to the task to be performed.



In the last decade, the field of Computer Vision is being revolutionized by Deep Neural Networks. DNNs are complex layered systems made of simple interconnected computational units, called artificial neurons.


Thanks to theoretical, software and hardware advancements, DNNs are now a mature technology that powers several products most of which perform at super-human levels of performances such as:

  • natural language processing and translation,
  • speech-to-text and text-to-speech,
  • high-level game play,
  • object detection in videos
  • diagnosis support tools for medical research

One of the main advantages of DNNs with respect to the other systems is the capability of learning higher levels of representation of the information directly from the data in order to maximize the desired task performances.

In the context of visual tasks, Convolutional DNNs allowed to reach levels of performances higher than the best human competitors. Among other things, the success of CDNNs in visual tasks is thanks to higher and higher levels of descriptors learned in each convolutional layer. However, the higher costs in terms of power consumption, training and inference times hinder the process of pervasive and ubiquitous distribution of vision CDNN models on low-cost and low-power devices.

We actively work to port complex vision models on low-power embedded platforms aiming at improving power consumption of remote smart sensors while maintaining state-of-the art performances.

We achieve our goals by combining our optimized computational platforms and software with low-power dedicated hardware accelerators enabling the development of energy-efficient smart cameras and sensors.