Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB

Gaining knowledge from large data sets

Gaining knowledge from large data sets

In addition to explicit data, large data sets usually contain hidden data in the form of patterns, which can be discovered by various machine learning methods. The process of gaining additional knowledge from large data sets is called the KDD process (Knowledge Discovery in Databases). The core of the KDD process is data mining - a collection of methods for pattern recognition.

The Fraunhofer IOSB is working on new and existing methods that can be used to improve the data quality, for example by identifying potential errors in a data set. Extracted patterns are used to train machine learning techniques. The learned prediction models can then point out irregularities to the user when entering new data. An essential factor that must be taken into account in prediction models is interpretability. Especially in sensitive areas such as the medicine the comprehensibility of the prediction results is of great importance.

The extraction of knowledge from large amounts of data can support decision-makers in different areas such as the medical field or the banking and insurance industries. In the medical domain, quality assurance methods can be used for test results such as blood values. Another field of application is the detection of incorrectly entered data in large data sets.

Publications

 

2017
El Bekri, Nadia; Peinsipp-Byma, Elisabeth:
Data quality assistance - the use of data mining algorithms to enhance data quality. In: Journal of Telecommunication, Electronic and Computer Engineering: JTEC 9 (2017), Nr.2-3, S.155-159.
2017
El Bekri, Nadia; Peinsipp-Byma, Elisabeth:
Adaptive knowledge discovery in expert systems. In: Hu, G.; International Society for Computers and Their Applications: 30th International Conference on Computer Applications in Industry and Engineering, CAINE 2017: San Diego, California, USA, 2-4 October 2017. Red Hook, NY: Curran, 2017, S.91-96.
2016
Anneken, Matthias; Fischer, Yvonne; Beyerer, Jürgen:
Detection of conspicuous behavior in street traffic by using B-splines as feature vector. In: Ambacher, Oliver (Ed.); Wagner, Joachim (Ed.); Quay, Rüdiger (Ed.); Fraunhofer-Institut für Angewandte Festkörperphysik, Freiburg/Brsg.: Security Research Conference. 11th Future Security : Berlin, September 13-14, 2016. Proceedings Stuttgart: Fraunhofer Verlag, 2016, S.331-337.
2ß16
El Bekri, Nadia; Peinsipp-Byma, Elisabeth:
Assuring data quality by placing the user in the loop. In: Arabnia, H.R.; Institute of Electrical and Electronics Engineers: International Conference on Computational Science and Computational Intelligence, CSCI 2016. Proceedings: 15-17 December 2016, Las Vegas, Nevada, USA. Piscataway, NJ: IEEE, 2016, S.468-471.
2016
El Bekri, Nadia; Peinsipp, Byma:
Generic error identification in data sets. In: Harris, F.C.; International Society for Computers and Their Applications: 25th International Conference on Software Engineering and Data Engineering, SEDE 2016: Denver, Colorado, USA, 26-28 September 2016; Co-located with the 29th International Conference on Computer Applications in Industry and Engineering (CAINE 2016). Red Hook, NY: Curran, 2016, S.177-182.