Video Content Analysis

Group Description

The interest in automated video analysis solutions has constantly increased throughout the past years. Video analysis methods applied under mainly controlled conditions, such as industrial environments, are nowadays an established technology. Despite great progress this field, the application auf video analysis techniques under uncontrolled conditions is still a widely unsolved problem. Major challenges mainly stem from the complexity and variability of unstructured outdoor environments. Additional challenges are e.g. objects showing a strong shape variation. Taking these challenges into account, we mainly focus on the following research questions:

  • Robustness. The focus lies on the analysis of video data originating from multi-spectral sensor platforms, which can freely move through unstructured environments. This includes the detection and tracking of objects, as well as the development of efficient preprocessing mechanisms capable of supporting automated scene understanding.
      
  • Adaptivity. Video analysis systems, applied in unknown environments must be able to adapt themselves automatically to specific conditions up to a certain degree.  Adaptation can be done “offline” by applying statistical approaches, as well as through methods that automatically learn from the environment.
       
  • Scalability. Current video analytic is mainly restricted to a limited number of object categories or events which are robustly detectable. In general there is a strong interest in methods showing advanced discriminative capabilities. Towards this end the question on how to scale up video analysis systems is of significant importance. 


Team:

Dr. Wolfgang Hübner
Dr. Stefan Becker
Ronny Hug
Dr. David Münch
Jens Bayer
Ann-Kristin Grosselfinger
Vanessa Burmester


Projects and Research Topics


Path Prediction and Risk Assessment
The prediction of possible paths, which e.g. pedestrians or vehicles will follow in a given situation, is a central building block for an automated risk assessment. In contrast to backward-looking inference methods, prediction has the advantage that it can be applied without time delay.

Objective. Based on deep recurrent networks, a statistical model of spatial behavior is built up. The behavioral model integrates different information sources, like visually detectable obstacles, the behavior of other agents, as well as movement profiles along trajectories.


Automatic generation of complex filter designs
State estimators, such as linear Kalman-filters or particle filters, are used for modeling dynamic systems, which have major applications in tracking tasks. In order to design efficient filter configurations, a variety of model assumptions have to be made, including object dynamics and the structure of system states. However, a manual configuration of complex filter designs is only possible up to a limited extent.

The objective is to develop a machine learning based approach that can automatically generate complex filter designs. In this context, applications are envisioned which require complex filter systems, like the observation of objects without depth information ("monocular tracking").


Integrated track management
Tracking methods are usually used in conjunction with an object detector in order to ensure the identity of individual objects across multiple images. In addition to direct applications, like the generation of motion profiles, tracking approaches further help to improve the overall system performance. Although strongly depended, detection and tracking are often considered as isolated building blocks.

The objective is the development of a deep learning architecture that integrates object detection and track management. This eliminates ad hoc processing steps, and further leads to a robust data association mechanism.


Semi-automatic data acquisition and processing
The availability of large amounts of data combined with high-quality ground truth annotations is the foundation of most current machine learning techniques. Irrespective of the rapid development of learning algorithms, the lack of large data sets is still a significant hurdle for the applicability of deep learning methods.

The objective is the development of an interactive modeling tool, which provides in addition to manual ground truth generation automated components for data preparation. These also includes privacy protection functions realized by anonymization of video data, like the removal of license plates.


Automatic monitoring of indoor activities
Automated logging of indoor activities using video technology bears several key challenges including legal and social issues, scalability issues, and compatibility issues with other cyber-physical security systems.

The objective of the project is the development of video analysis components that support automated generation of log files from video data. The log files can further be used to summarize or index video streams. They further build a data link to other cyber-physical systems at a semantically abstract level.

Multi-spectral video analysis
In addition to images captured in the visible spectrum, IR images still provide sufficient information even in dim ambient lighting. Multi-spectral approaches are mainly used in order to deal with highly variable light and weather conditions. The application of machine learning approaches for IR images are mainly limited by the availability of training data.

The objective is the development of suitable learning methods, which can transfer information from the visible spectrum into the IR spectrum in order to make approaches such as object detection and action classification applicable to multi-spectral sensor systems.
 

Auto-calibration of "master-slave" systems
Observing large areas from a single view point poses the problem of finding a tradeoff between the field of view of a camera system and the minimum resolution required for many video analysis methods. Multi-focal systems offer a potential solution to this problem. Multi-focal systems can be based on a Master-Slave design, which consists of multiple cameras that need to be referenced between each other.

The objective is to provide a simple auto-calibration method that allows to easily setup master-slave components based on affordable consumer cameras and that has minimum quality requirements on the mounting.

"Online" visualization of pedestrian tracks from single view points
Observing and assessing situations over an extended period of time poses a major challenge for human observers, especially in complex, dynamic scenes where many events occur in parallel.

The objective is to develop methods that generate a static visualization of dynamic scenes and thus relieve human observers. Individual tracks are automatically selected from pedestrian tracks ("Best Shot Analysis") and a static summary of individual motion sequences is generated.



Selected Publications

  • Hug R., Becker S., Hübner W., Arens M., Introducing Probabilistic Bezier Curves for N-Step Sequence Prediction, The Thirty-Fourth {AAAI} Conference on Artificial Intelligence (AAAI), 2020
  • Kieritz H., Hübner W., Arens M.: „Joint detection and online multi-object tracking“, Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018
  • Becker S., Hübner W., Arens M.: „State estimation for tracking in image space with a de- and re-coupled IMM filter“, Multimedia Tools and Applications, Springer, 2017
  • Becker S., Kieritz H., Hübner W., Arens M.: „On the Benefit of State Separation for Tracking in Image Space with an Interacting Multiple Model Filter“, Proc. 7th Int. Conf. on Image and Signal Processing (ICISP), 2016 [Best Paper Award]
  • Münch D., Grosselfinger A., Hübner W., Arens M.: „Automatic unconstrained  online configuration of a master-slave camera system“,  In Proc. of the Inter. Conf. on Computer Vision Systems (ICVS), LCNS, Springer, 2013 [Best Paper Award]

Recent Publications More ]

  • Hug R., Becker S., Hübner W., Arens M., Introducing Probabilistic Bezier Curves for N-Step Sequence Prediction, The Thirty-Fourth {AAAI} Conference on Artificial Intelligence (AAAI), 2020
  • Hug R., Becker S., Hübner W., Arens M., A Short Note on Analyzing Sequence Complexity in Trajectory Prediction Benchmarks, Workshop on Long-term Human Motion Prediction (LHMP), 2020
  • Hug R., Becker S., Hübner W., Arens M., A complementary trajectory prediction benchmark, Workshop on Benchmarking Trajectory Forecasting Models (BTFM), 2020
  • Hug R., Hübner W., Arens M.: Modeling continuous-time stochastic processes using N-Curve mixtures, arXiv:1908.04030 [stat.ML], 2019
  • Becker S., Hug R., Hübner W., Arens M.: An RNN-based IMM Filter Surrogate, Scandinavian Conference on Image Analysis (SCIA), 2019
  • Becker S.: „RNN-based Prediction of Pedestrian Turning Maneuvers“, Proc. of the Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory, KIT Scientific Publishing, 2019
  • Becker S., Hug R., Hübner W., Arens M.: „RED: A simple but effective Baseline Predictor for the TrajNet Benchmark“, European Conference on Computer Vision (ECCV) Workshops, 2018
  • Hug R., Becker S., Hübner W., Arens M.:„Particle-based pedestrian path prediction using lstm-mdl models“, IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2018
  • Kieritz H., Hübner W., Arens M.: „Joint detection and online multi-object tracking“, Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018
  • Münch D., Arens M.:„Dynamic belief fusion of different person detection methods“, SPIE, 2018
  • Becker S., Hübner W., Arens M.: „State estimation for tracking in image space with a de- and re-coupled IMM filter“, Multimedia Tools and Applications, Springer, 2018
  • Hug R., Hübner W., Arens M.: „Interactive concepts for shaping generative models of spatial behavior“, In Proc. of IEEE 4th Int. Conf. on Soft Computing & Machine Intelligence (ISCMI), 2017
  • Hug R., Becker S., Hübner W., Arens M.: „On the reliability of LSTM-MDL models for pedestrian trajectory prediction“, Int. Workshop on Representation, analysis and recognition of shape and motion, 2017
  • Hug R., Hübner W., Arens M.: „Supporting generative models of spatial behaviour by user interaction“, In Proc. of European Symposium on Artifical Neural Networks, Computational Intelligence and Machine Learning, 2017
  • Becker S., Kieritz H., Hübner W., Arens M.: „On the Benefit of State Separation for Tracking in Image Space with an Interacting Multiple Model Filter“, Proc. 7th Int. Conf. on Image and Signal Processing (ICISP), 2016
  • Kieritz H., Becker S., Hübner W., Arens M.: „Online multi-person tracking using integral channel features“, In Proc. of the 13th IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), 2016
  • Hilsenbeck B., Münch D., Kieritz H., Hübner W., Arens M.: „Hierarchical Hough forests for view-independent action recognition“, In Proc. 23rd Int. Conf. on Pattern Recognition (ICPR), 2016
  • More ]

Recent Theses

  • T. Kostov, „Integration of Observation Uncertainty into an RNN-based Prediction Network“, Masterarbeit, KIT, 2019

Dissertations

  • S. Becker, „Dynamic Switching State Systems for Visual Tracking“, (DOI 10.5445/IR/1000119143), Doktorarbeit, Karlsruher Institut für Technologie (KIT), 2020
  • D. Münch, „Begriffliche Situationsanalyse aus Videodaten bei unvollständiger und fehlerhafter Information“, Doktorarbeit, Karlsruher Institut für Technologie (KIT), 2017
  • J. Brauer, „Human Pose Estimation with Implicit Shape Models“, Doktorarbeit, Karlsruher Institut für Technologie (KIT), 2014

Datasets

  • Single Trajectory Sanity Check Benchmark The datasets provides a benchmark for machine learning path prediction tasks. The benchmark and further details are available on GitHub.
  • Multispectral Action Dataset The IOSB Multispectral Action Dataset containts video sequences showing violent and non-violent behaviour, recorded in the visible and the infrared spectrum. The dataset is freely available on request. [ Download ]