Main Content

Deep Learning for Image and Video Analysis

A pile of film camera tubes. — Photo: Andrey Konstantinov

The challenge in content-based image and video analysis is often referred to in the literature as the semantic gap. This describes the discrepancy between the semantic content we see in an image and the array of numbers the computer or algorithm sees in an image. Currently, artificial neural networks are experiencing a renaissance in research, mainly due to massive increases in the computing capacity of modern graphics cards, the availability of datasets with millions of training examples and, last but not least, new technologies that make learning deep network architectures possible in the first place.

Deep neural networks (i.e., networks with a potentially high number of neural layers) and deep learning, especially in the form of deep convolutional neural networks, are increasingly being used for complex problems in image, video, and audio analysis. The use of deep learning technologies has brought us much closer to solving the semantic gap problem in recent years.

In this seminar, current deep learning methods for image and video analysis will be presented and discussed. In addition to a basic understanding of deep learning, the aim is to become more familiar with individual deep learning methods in specific areas of image and video processing.

Possible topics include:
- Image classification, video classification, audio classification, object recognition
- Object recognition, face recognition, text recognition, speech recognition
- Similarity search
- Data compression, model compression
- Network architectures, optimization methods

The participants should be enabled to gain knowledge and experience in the field of content-based image and video analysis according to their own interests, with a focus on deep learning methods.