News from the Artificial Intelligence Group

The chair of artificial intelligence deals with the wide field of machine learning. In particular the chair concentrates on the development and implementation of learning algorithms that solve challenging problems.

Frohe Weihnachten und ein frohes neues Jahr

Christmas 2014

The secretary's office is not occupied between December 19th, 2016 and January 6th, 2017. We wish you a merry Christmas and a happy New Year!

BMVI Data-Run

On December 2nd and 3rd, the Federal Ministry of Transport and Digital Infrastructure (BMVI) hosted the second BMVI Data-Run, this time with the theme "Realtime Data in Traffic". Over the course of two days, attending teams worked on creating innovative mobility solutions based on the provided data.

Sebastian Peter and Philipp Honysz from LS8 participated with the idea of creating an app that would help commuters compensate for traffic problems. They implemented an Android app which analyses the user's commute and notifies them of impending problems, such as overloaded bicycle stations. Additionally it uses a Google API to compute routes for common means of transportation.

Wissenschaftliche Mitarbeiter zwischen Nachrichten, Weltraum und Wissenschaft

This summer, three of our graduate studetns were between news, space and science. They were at Google, NASA, Stanford and the Wirtschaftswoche. While it was certainly not a walk in the park, it was definitely an experience and a great success. Congratulation! 

Elena Erdmann received a Google News Lab Fellowship and worked two months at the Wirtschaftswoche. She has developed both journalistic know-how and technical skills to drive innovation in digital and data journalism. Nico Piatkowski visited Stefano Ermon at Stanford University. Together they worked on techniques for scalable and exact inference in graphical models. He also made a detour to NASA. Last but not least, Martin Mladenov got an internship at Google. Some people say this is more difficult than getting admitted to Stanford or Harvard. Who knows? But this year they accepted about 2% of applicants (1,600 people). What did he work on? We do not know it, but he visited Craig Boutilier, so very likely something related to making decisions under uncertainty.

Health: Smart Data & Data Analytics


The kick-off of the "Smart Data & Data Analytics" department of CPS.HUB took place on 23rd of November 2016 at the Leibniz Institute for Analytical Sciences (ISAS). This session focused on a variety of aspects of data and data analysis in the context of health and health economy.

After the introduction by Monika Gatzke, the topic of "health" with regard to Smart Data was further discussed:

  • Prof. Dr. Katharina Morik (Head of the Department) gave an overview and presented a detailed application in intensive care medicine.
  • Sven Löffler from T-Systems spoke about Smart Data potentials in the health care sector using the example of self-tracking data.
  • Dr. Wolfgang Thronicke of Atos C-LAB presented Big Dependable Systems. These are systems that consist of different interdependent subsystems and are the object of the project Medolution.
  • The founder of the Quantified Self Movement in Germany, Florian Schumacher, spoke about the potential for Big Data Analytics.
  • Philip Potratz from the Cluster InnovativeMedizin.NRW presented the project Smart.Health.Data.
(Weiter...  )

relNet Opening Workshop

The project partners of LS8, Dortmund and CERES, Bochum hosted an opening workshop for their new joint project relNet on "Modelling Topics and Structures in Religious Online Communication" in Bochum on May 23-24. The goal of this project is to apply methods of data analytics, network analysis and text mining to analyse how digital communication has changed religious communities and the social roles within these communities.

On to days we have presented the project, listened to talks by our invited guests and discussed the potentials of joint research in computer science and the social sciences, in this case religious studies. Click below for the full program.


(Weiter...  )

Springer Edited Volume `Compuational Sustainability' published

Katharina Morik and Kristian Kersting together wit Jörg Lässig from the University of Applied Sciences Zittau/Görlitz have published an edited volume on Computational Sustainability. Computational Sustainability is a broad field that attempts to optimize societal, economic, and environmental resources using methods from computer science, mathematics and related fields:

Springerl Jörg Lässig, Kristian Kersting, Katharina Morik, Computational Sustainability. Studies in Computational Intelligence, Volume 645 2016, Springer, ISBN: 978-3-319-31856-1, 2016.
(Weiter...  )

Best Paper Award of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE) 2015

The joint work "Predicting Purchase Decisions in Free To Play Mobile Games" of Kristian Kersting with colleagues from Wooga, goedle.io, Aalborg University, and the Fraunhofer IAIS received the Best Paper Award of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE) 2015.

LS8 is an international collaborator of CompSusNet

LS8 is an international collaborator of CompSusNet. CompSustNet is a research network sponsored by the National Science Foundation through an Expeditions in Computing award. Twelve U.S. academic institutions led by Cornell University, along with many national and international collaborators, are exploring new research directions in computational sustainability. (Weiter...  )

Morgan and Claypool Book on Statistical Relational AI

Together with colleagues from UBC, KU Leuven, and U. Indiana, Kristian Kersting published a book on Statistical Relational AI. This is the study and design of intelligent agents that act in worlds composed of individuals (objects, things), where there can be complex relations among the individuals, where the agents can be uncertain about what properties individuals have, what relations are true, what individuals exist, whether different terms denote the same individual, and the dynamics of the world:

Morgan and Claypool Luc De Raedt, Kristian Kersting, Sriraam Natarajan, David Poole, Statistical Relational Artificial Intelligence: Logic, Probability, and Computation. Morgan and Claypool Publishers,Synthesis Lectures on Artificial Intelligence and Machine Learning, ISBN: 9781627058414, 2016.
(Weiter...  )

VaVeL Project

Urban environments are flooded with data from fixed or mobile sensors that are gathering data. If these data were used successfully, European citizens could benefit in various areas like public transport or crime prevention. However, urban data is heterogenous, noisy and unlabeled, since the usability of the data is low. The VaVeL project aims towards using these data in application for increasing the living conditions in urban areas. The goal of the project is developing a general framework for managing and mining heterogenous urban data streams.

As part of the project, the functionality of current stream frameworks shall be fit on data streams from urban sensors. The access to urban data streams is not an easy task; In this project, a set of black boxes will be implemented to give easier access to the data and analysis procedures. Big data companies shall get an access to the gathered knowledge so that actual problems of an urban environment can be tackled.

(Weiter...  )

Call for Papers - Data Mining for Smart Cities

There is a Call for Papers for the journal Data Mining for Smart Cities. They are looking, for example, for the following topics:

  • Real-time nowcasting and prediction of events
  • Interactive exploration of city data
  • Feature extraction and deep learning from urban data

Submission is due July 4, 2016, 23:59.

(Weiter...  )

Katharina Morik in acatech - Deutsche Akademie der Technikwissenschaften - aufgenommen

Katharina Morik

The National Acadamy of Science and Engineering advises society and governments in all questions regarding the future of technology. Acatech is one of the most important academies for novel technology research. Additionally, acatech provides a platform for transfer of concepts to applications and enables the dialogue between science and industry. The members work together with external researchers in interdisciplinary projects to ensure the practiability of recent trends. Internationally oriented, acatech wants to provide solutions for global problems and new perspectives for technological value added in Germany.

By the appointment of Katharina Morik as member of acatech, the acadamy recognizes her research profile, her achievements as speaker of the collaborative research center SFB 876, her international reputation and innovative research in machine learning.

Christian Bockermann Defends his Dissertation at LS8

Christian Bockermanns Disputation

Christian Bockermann has successfully defended his dissertation with the title “Mining Big Data Streams for Multiple Concepts”. His thesis was supervised by Katharina Morik. Summary of the thesis:

Modelling streaming data applications in near real-time is motivated by today’s growing demand for in-time data analysis. The thesis reviews the Lambda architecture and state of the art frameworks for data streams and introduces a middle-layer easing the definition of streaming applications in a platform independent way. This enabling technique is demonstrated in two Big Data applications, namely the inline processing and analysis of data in Cherenkov astronomy and the near real-time extraction of viewership statistics in the context of an IP-TV platform.

(Weiter...  )

Fabian Hadiji verteidigt seine Promotion am LS8

Fabian Hadiji successfully defended his dissertation under the title "Graphical Models Beyond Standard Settings: Lifted Decimation, Labeling, and Counting". His thesis was supervised by Professor Kristian Kersting.

He summarises his thesis in the following abstract:

With increasing complexity and growing problem sizes in AI and Machine Learning, inference and learning are still major issues in Probabilistic Graphical Models (PGMs). On the other hand, many problems are specified in such a way that symmetries arise from the underlying model structure. Exploiting these symmetries during inference, which is referred to as "lifted inference", has lead to significant efficiency gains. This thesis provides several enhanced versions of known algorithms that show to be liftable too and thereby applies lifting in "non-standard" settings. By doing so, the understanding of the applicability of lifted inference and lifting in general is extended. Among various other experiments, it is shown how lifted inference in combination with an innovative Web-based data harvesting pipeline is used to label author-paper-pairs with geographic information in online bibliographies. This results is a large-scale transnational bibliography containing affiliation information over time for roughly one million authors. Analyzing this dataset reveals the importance of understanding count data. Although counting is done literally everywhere, mainstream PGMs have widely been neglecting count data. In the case where the ranges of the random variables are defined over the natural numbers, crude approximations to the true distribution are often made by discretization or a Gaussian assumption. To handle count data, Poisson Dependency Networks (PDNs) are introduced which presents a new class of non-standard PGMs naturally handling count data.

If you are interested in Fabian's past and future work, also see his personal homepage http://hadiji.com/.

Zweite On the Record im Signal Iduna Park

Being at the famous stadion of the Dortmund football team BVB 09, we could not resist to pretend giving a press conference. Actually, the conference that we visited was on economic journalism in the digital age. http://www.wipojo.de/ontherecord/

Best Paper Presentation Award of the "New Challenges in Neural Computation" Workshop 2015

The joint work "Archetypal Analysis as an Autoencoder" Of Kristian Kersting with colleagues from the University of Bonn and the Twenty Billion Neurons GmbH received the Best Presentation Award of the "Challenges in Neural Computation" (NC^2) Workshop of the GI-Fachgruppe Neuronale Netze and the German Neural Networks Society in connection to GCPR 2015, Aachen.

Successful Final Review of the EU Project INSIGHT in Luxemburg

The goal of the INSIGHT project was to radically advance our ability of coping with emergency situations in smart cities. INSIGHT stands for Intelligent Synthesis and Real-time Response using massive Streaming of Heterogeneous Data and the developed technologies for data stream mining put new capabilities in the hands of disaster planners and city personell to improve emergency planning and response.

(Weiter...  )

LS8 in Banff, Kanada

From July 24th to July 26th, the LS8 gave two talks at the Workshop "Advances in interactive Knowledge Discovery and Data Mining in Complex and Big Data Sets" in Banff, Canada.

Professor Katharina Morik spoke about "Big Data and Small Devices": Analyzing data on small devices confronts us with new challanges with regard to runtime, memory consumption and energy consumption. Her talk investigates the use of graphical models for data mining in resource-restricted environments and presents results from the research project SFB876. 

Furthermore,Sibylle Hess presented results of her diploma thesis: "Investigation of Code Tables to Compress and Describe Underlying Characteristics of Binary Databases". She connects traditional methods of frequent pattern mining with the Minimum-Description-Length principle and matrix factorization, combining these techniques into new algorithms for frequent pattern mining based on numerical optimization.

2nd Workshop on Mining Urban Data held in conjunction with ICML

We co-organize this years 2nd Workshop on Mining Urban Data. The workshop takes place July 11th at ICML Lille. Please see the proceedings at http://ceur-ws.org/Vol-1392/ This year we welcome three invited speakers: * Dr. Eleni Pratsini - "Using Big Mobile Data to Analyze Social Events in Cities" * Prof. Kristian Kersting - "Poisson Dependency Networks: Gradient Boosted Models for Multivariate Count Data" * Prof. Sharad Mehrotra - Towards `on the fly' data cleaning (Weiter...  )

Stellen für studentische Hilfskräfte ab sofort

An der TU Dortmund, Fakultät für Informatik am Lehrstuhl VIII, sind ab sofort Stellen für Studentische Hilfskräfte zu besetzen.

(Weiter...  )

Summer school

Summer School 2015

The next summer school will be hosted at the faculty of sciences of the university of Porto from 2nd to 5th of September and is collocated with ECMLPKDD 2015. It will be organize by LIAAD-INESC TEC and TU Dortmund.

For the summer school, world leading researchers in machine learning and data mining will give lectures on recent techniques for example dealing with huge amounts of data or spatio-temporal streaming data.

(Weiter...  )

SHE - Sie hat's erfunden

Bei den Unternhemenstagen , die vom 26.01.2015 bis zum 09.02.2015 stattfanden, waren Frauen im Berufsleben einer der Schwerpunkte der Veranstaltungsreihe. Um die Innovationen von klein und mittelständigen Unternehmen zu fördern, muss die Wahrnehmung von Frauen als Erfinderinnen gefördert werden. Professorin Katharina Morik nahm an einer Gesprächsrunde teil, die unter anderem über die Vereinbarkeit von Familie und Berufsleben und die mangelhafte Wahrnehmung von Frauen als Erfinderinnen diskutierte.

(Weiter...  )

Datenanalysten werden in der Wirtschaft gesucht

Datenanalyse ist in der Wirtschaft die gesuchte Kompetenz — die Lehre des LS 8 liefert den Studierenden auch im Sommersemester 2015 wieder das Wissen dazu! (Weiter...  )

RapidMiner Academics - Free Access to RapidMiner Studio for Students

RapidMiner CEO and founder Ingo Mierswa presents RapidMiner Academia: A program that grants students free access to commercial versions of RapidMiner Studio.

The RapidMiner project began in 2001, at that point still called YALE, at the LS8 here at TU Dortmund. Today it is one of the most popular software environments for predictive Data Analysis and Data Mining.

(Weiter...  )

Celebratory Colloquium at the Faculty for Computer Science

Katharina Morik

Ein besonderes Spektrum an Vorträgen fand am Jahresende zum 60. Geburtstag von Katharina Morik statt. Gemeinsam war den drei Hauptrednern, dass sie im Bereich Maschinelles Lernen bzw. Data Mining international höchst renommiert sind und bei Katharina Morik an der (Technischen) Universität Dortmund promovierten. Völlig verschieden ihre Tätigkeitsfelder.

Inhaltliche Gemeinsamkeiten der Redner mit der Jubilarin wurden in der kurzen Einführung deutlich, in der Katharina Morik ihre Forschungsziele zusammenfasste, die sie an der TU Dortmund verfolgt: situierte Systeme, die durch Lernfähigkeit Sensorik, Kommunikation und Handlung verbinden. Anfang der 90er Jahre entstanden Arbeiten zur Robotik: realzeitlich wurden in verteilten, heterogenen Datenströmen Muster entdeckt, die zur Handlungsplanung eingesetzt wurden. Der SFB 876 (Informatik), dessen zweite Phase gerade bewilligt wurde, kann mit seiner Verbindung von Datenanalyse und Cyber Physical Systems in der Leitlinie lernfähiger, situierter Systeme gesehen werden.

Die Arbeiten zu sehr großen Datenmengen, die Katharina Morik in 12 Jahren im SFB 475 (Statistik) zusammen mit Claus Weihs durchgeführt hat, wurden von der Sprecherin dieses Sonderforschungsbereichs, Ursula Gather, in einer kurzen Ansprache gewürdigt.

Ganz unterschiedliche Herangehensweisen, maschinelles Lernen erfolgreich zu erforschen und anzuwenden, wurden durch die Hauptvorträge der drei herausragenden Wissenschaftler deutlich.

Stefan Wrobel Thorsten Joachims Ingo Mierswa

Unsere Studierenden mag es freuen, wenn sie einige Beispiele sehen, wozu das Studium an der TU Dortmund befähigt: Forschungsdirektor, Professor, CEO einer Firma – auf der Grundlage herausragender Forschung zu maschinellem Lernen, Data Mining, Big Data Analytics lässt sich einiges machen!


(Weiter...  )

Frohe Weihnachten und ein schönes neue Jahr

Christmas 2014

From 19 of December 2014 until 9th of January 2015 the secretary's office is not occupied.
We wish you a merry christmas and a happy new year!


LS8 publishes SpringerBrief on boosting statistical relational learners

This SpringerBrief addresses the challenges of analyzing multi-relational and noisy data by proposing several Statistical Relational Learning (SRL) methods. These methods combine the expressiveness of first-order logic and the ability of probability theory to handle uncertainty. It provides an overview of the methods and the key assumptions that allow for adaptation to different models and real world applications. The models are highly attractive due to their compactness and comprehensibility but learning their structure is computationally intensive. To combat this problem, the authors review the use of functional gradients for boosting the structure and the parameters of statistical relational models. The algorithms have been applied successfully in several SRL settings and have been adapted to several real problems from Information extraction in text to medical problems. Including both context and well-tested applications, Boosting Statistical Relational Learning from Benchmarks to Data-Driven Medicine is designed for researchers and professionals in machine learning and data mining. Computer engineers or students interested in statistics, data management, or health informatics will also find this brief a valuable resource.

(Weiter...  )

Stellen für studentische Hilfskräfte ab Januar 2015

An der TU Dortmund, Fakultät für Informatik am Lehrstuhl VIII, sind ab Januar 2015 Stellen für Studentische Hilfskräfte zu besetzen.

(Weiter...  )

Vorlesung Natürlichsprachliche Systeme (Katharina Morik)

IBM Watson

Google, Facebook oder Netflix brauchen für viele ihrer Dienste die Verarbeitung natürlicher Sprache. So gibt es die große Abteilung Natural Language Processing bei Google http://research.google.com/pubs/NaturalLanguageProcessing.html

Das IBM-Programm Watson konnte im Februar 2011 in dem Quiz Jeopardy auf natürlichsprachliche Fragen besser antworten als zwei menschliche Quiz-Sieger.

Ray Kurzweil (Google Director of Engineering) möchte darüber hinausgehen: „So IBM’s Watson is a pretty weak reader on each page, but it read the 200m pages of Wikipedia. And basically what I'm doing at Google is to try to go beyond what Watson could do.“ http://searchengineland.com/ray-kurzweils-job-google-beat-ibms-watson-natural-language-search-185149 Es gibt eine Fülle von Methoden zur Analyse sehr großer Textmengen für ebenfalls viele Anwendungen: Sentiment Analysis, personalisierte Werbung, Empfehlungen, email Routing, automatische Texterstellung für Kurznachrichten und Reporting, automatische Fragebeantwortung, Informationsextraktion aus dem WWW. In der Vorlesung mit Übungen lernen Sie die Methoden und Werkzeuge dazu kennen. Das neue Lehrkonzept beinhaltet inverted class room Sitzungen und selbstständige Arbeiten, so dass Sie für die Praxis gerüstet sind. http://www-ai.cs.uni-dortmund.de/LEHRE/VORLESUNGEN/NLS/WS1415/index.html

Vorlesung Probabilistische Graphische Modelle (Kristian Kersting)

Wie handelt man unter Unsicherheit, bei fehlenden oder fehlerhaften Daten? Um mit solchen Unsicherheiten umgehen zu können, haben sich in den letzten Jahren probabilistische, graphischen Modellen bewährt. Sie gehören zu den Bemühungen der modernen Informationstechnik, das Schlussfolgern unter Unsicherheit zu ermöglichen.

Tag-Cloud Probabilistische graphische Modelle

Prominente Anwendungsfelder sind die Robotik, die Bioinformatik, die Künstliche Intelligenz, das Maschinelle Lernen. So kommen sie zum Beispiel in der Auswertung von medizinischen Daten, der Analyse von Genexpressionsdaten und dem Tracken von Bewegungen zum Einsatz. Gegenstand der Vorlesung "Probabilistische Graphische Modelle" des LS8 sind grundlegende Fragestellungen und Techniken der graphischen Modelle. http://www-ai.cs.uni-dortmund.de/LEHRE/VORLESUNGEN/PGM/WS1415/index.html

Vorlesung Large-Scale Optimization (Sangkyun Lee)


Ganz allgemein sind Daten oft billiger zu erhalten als das Wissen von Experten zu extrahieren und dann zu modellieren. Aber wie können Rechner automatisch große Modelle --- wie sie in der Verarbeitung natürlicher Sprache, bei dem Schätzen von Graphischen Modellen und im statischen Maschinellen Lernen auftreten --- aus Daten schätzen?

In den meisten Lernverfahren steckt als Kern eine Optimierungsaufgabe: der Fehler soll miniert oder die Wahrscheinlichkeit für das richtige Ergebnis maximiert werden. Die theoretischen Grundlagen und Methoden behandelt in englischer Sprache die Vorlesung "Large-Scale Optimization".

PG infoscreen (Kristian Kersting, Hendrik Blom)


Die Ansätze aus allen Vorlesungen können dann zur Anwendungen in der PG "Infoscreen" kommen. Infoscreens sind digitale Bildflächen und sollen eine besondere Aufmerksamkeit in "reizarmen" öffentlichen Räumen erzielen.

Es soll über Aktuelles an der Fakultät für Informatik der TU Dortmund informiert werden.

KDD 2014 sold out

KDD 2014 is sold out. They had to close registrations. 2200 attendees will enjoy the conference next week in Times Square. Katharina Morik gives a keynote talk at the workshop BigMine’14.

(Weiter...  )

The virtual steel works

After the press conference the LS8 project (Katharina Morik, Hendrik Blom, Tobias Beckers)  in collaboration with the SMS Siemag and the Dillinger Hütte is outlined in two interviews: Dominik Schöne of the Dillinger Hütte and Katharina Morik.

(Weiter...  )

ViSTA-TV in a Nutshell

The European project VistaTV had its successful final review meeting in Amsterdam, 1st of July. LS 8 contributed live stream analysis separating ads from shows in internet television. Online recommendations of shows based on user behavior have been produced based on Termset Clustering.


(Weiter...  )

Mediaday of the SMS Group in Hilchenbach

Mediaday of the SMS Group in Hilchenbach at 3. July 2014

Data Mining/ Industrie 4.0

Summary talk by Katharina Morik about "Data Mining, Big Data and Prediction Models"

(Weiter...  )

Talk at the TU Dortmund: What happens to our data? Between permanent harassment paranoia and post-privacy

Wednesday, 2. July 2014, 16:00 (s.t.) -18:30, P1-05-309
  • Kristian Kersting (Chair for artificial intelligence)
  • Sarah Küsgen (Chair for service and technology management)
  • Kai-Uwe Loser (Data security engineer of the RUB)
  • Johannes Weyer & Robin D. Fink (specific field technical sociology)

The youngest exposures of whistle-blowser Edward Snowden showed one more time the attractiveness of collecting massive data in the age of social media.

The question 'what happens to our data?', viewed from technical, economic and sociological background, will be investigated in the context of this event. The technical possibilities of modern data-mining are diverse and allow conclusions down to the individual level. Collected data from social networks are especially attractive for marketing and product design. Behind this background the protection of privacy will be assigned to new tasks.

The contributors will hold a 10-15 minutes talk each and will afterwards take part in a discussion with the audience. The event will be moderated by Johannes Weyer.

(Weiter...  )

Umzug Otto-Hahn-Str. 12

The chair for Artificial Intelligence is moving to the new building in Otto-Hahn-Str. 12. Thus, between 06/30/14 and 07/04/14 we may not be available at all times.

Talk at VigLink: Resource-aware graphical models

Prof. Morik talks at VigLink

Machine learning can help to enhance small devices. For instance, keeping the energy consumption of smart phones low is one of the major concerns of the users, as is well illustrated by various “charge your mobile” stations at public places. Where the operating systems of smart phones already offer heuristics and battery apps show consumption profiles, machine learning can do more. Predictions allow better optimizations of the operating system, prepare for particular app usages at certain points in time, or manage services such as GPS or WLAN in a context-aware and adaptive manner. This challenges learning algorithms to real-time application of their models. Moreover, it demands the models to run on the resource-restricted device without consuming more energy themselves than they save!

(Weiter...  )

Vortrag bei der NASA: Data Analytics for Sustainability

Title: Data Analytics for Sustainability

  • Speaker: Katharina Morik, Technische Universität, Dortmund
  • Date & Time: Wednesday, May 28, 2:00 pm - 3:00 pm
  • Location: Building N245 Auditorium


Sustainability has many facets and researchers from many disciplines are working onthem. Particularly knowledge discovery always considered sustainability an importanttopic (e.g., special issue on data mining for sustainability in Data Mining andKnowledge Discovery Journal, March 2012).

Host: Dr. Kamalika Das
NASA Ames Research Center
MS 269-1, PO Box 1, Moffett Field, CA 94035

PROF. MORIK setzt ihre Vortragsreihe bei google fort


On Tue 05/27/2014 Prof. Katharina Morik give a talk about "Resource-aware graphical models and spatio-temporal predictions" at the Google Headquarters in Palo Alto, California, USA.

Machine learning can help to enhance small devices. For instance, keeping the energy consumption of smart phones low is one of the major concerns of the users, as is well illustrated by various “charge your mobile” stations at public places. Where the operating systems of smart phones already offer heuristics and battery apps show consumption profiles, machine learning can do more. Predictions allow better optimizations of the operating system, prepare for particular app usages at certain points in time, or manage services such as GPS or WLAN in a context-aware and adaptive manner. This challenges learning algorithms to real-time application of their models. Moreover, it demands the models to run on the resource-restricted device without consuming more energy themselves than they save!
In the talk, graphical models are presented that face these challenges. Using Conditional Random Fields (CRF) for the prediction of files that the user will fetch next on her smart phone can be used by the operating system for organizing the memory. Analyzing groups of apps running on the smart phone may estimate the energy consumption over time.
A novel spatio-temporal random field (STRF) has been implemented, smoothing the temporal changes and distributing the optimization. This graphical model has been used to predict app usage over time. In another application, it has been combined with a trip planner resulting in smart routing for smart cities. In order to run graphical models on very restricted devices, even those withoutvfloating point calculation, one computing with integer values only has been developed. The integer approximation of graphical models shows good accuracy and speed-up and opens up novel applications on resource-restricted devices.



Sustainability has many facets and researchers from many disciplines are working on them. Particularly knowledge discovery always considered sustainability an important topic (e.g., special issue on data mining for sustainability in Data Mining and Knowledge Discovery Journal, March 2012).

(Weiter...  )

Prof. Morik gives a talk about 'Data Analytics for Sustainability' at the University of Maryland, Baltimore County on Thursday 22 May 2014.


Sustainability has many facets and researchers from many disciplines are working on them. Particularly knowledge discovery always considered sustainability an important topic (e.g., special issue on data mining for sustainability in Data Mining and Knowledge Discovery Journal, March 2012).

  • Environmental tasks include risk analysis concerning floods, earthquakes, fires, and other disasters as well as the ability to react to them in order to guarantee resilience. The climate is certainly of influence and the debate on climate change received quite some attention.
  • Energy efficiency demands energy-aware algorithms, operating systems, green computing. System operations are to be adapted to a predicted user behavior such that the required processing is optimized with respect to minimal energy consumption.
  • Engineering tasks in manufacturing, assembly, material processing, and waste removal or recycling offer opportunities to save resources to a large degree. Adding the prediction precision of learning algorithms to the general knowledge of the engineers allows for surprisingly large savings.

Global reports on the millennium goals and open government data regarding sustainability are publicly available. For the investigation of influence factors, however, data analytics is necessary. Big data challenges the analysis to create data summaries. Moreover, the prediction of states is necessary in order to plan accordingly. In this talk, two case studies will be presented. Disaster management in case of a flood combines diverse sensor data streams for a better traffic administration. A novel spatiotemporal random field approach is used for smart routing based on traffic predictions. The other case study is in engineering and saves energy in the steel production based on the multivariate prediction of the processing end-point by the regression support vector machine.

11:00am-12:30pm, Thursday 22 May 2014, ITE 456, UMBC

(Weiter...  )

Call for Papers - MLDM 2015

MLDM 2015

11th International Conference on Machine Learning and Data Mining

July 11 - 24, 2015, Freie Hansestadt Hamburg, Germany

This congress will feature three events the 11th International Conference on Machine Learning and Data Mining MLDM, the 15 th Industrial Conference on Data Mining ICDM ( www.data-mining-forum.de), and the 10 th International Conference on Mass Data Analyisis of Signals and Images MDA (www.mda-signals.de). Workshops and Tutorial will also be given.

  • Submission of papers: January 15th, 2015
  • Notification of acceptance: February 28, 2015
  • Submission of camera-ready copy: April 5th, 2015
(Weiter...  )

Katharina Morik in Wien

Dortmunder postdoc Wouter Duivesteijn wins C.J. Kok Jury Award 2013.
Prof. Dr. Dr. h. c. Monika Henzinger und Prof. Dr. Katharina Morik with some participants of the college, where Katharina Morik gives a course “Data Analytics”.

More than 1 year after the faculty of computer science at the TU Dortmund has conferred an honorary doctorate to Monika Henzinger, Professor at the University of Vienna, Katharina Morik gives a course on "Data Analytics" in the context of the interdisciplinary college at the computer science of the University of Vienna and also presented in a well-attended colloquium lecture results of the SFB876: "Big Data Analytics and Astrophysics".

Workshop: Needles In a Stream of Hay (NISH2014)

Workshop collocated with INFORMATIK 2014, September 22-26, Stuttgart, Germany.

This workshop focuses on the area where two branches of data analysis research meet: data stream mining, and local exceptionality detection.

Local exceptionality detection is an umbrella term describing data analysis methods that strive to find the needle in a hay stack: outliers, frequent patterns, subgroups, etcetera. The common ground is that a subset of the data is sought where something exceptional is going on: finding the needles in a hay stack.

Data stream mining can be seen as a facet of Big Data analysis. Streaming data is not necessarily big in terms of volume per se but instead it can be in terms of the high troughput rate. Gathering data for analyzing is infeasible so the relevant data of a data point has to be extracted when it arrives.


Submissions are possible as either a full paper or extended abstract. Full papers should present original studies that combine aspects of both the following branches of data analysis:

stream mining: extracting the relevant information from data that arrives at such a high throughput rate, that analysis or even recording of records in the data is prohibited;
local exceptionality mining: finding subsets of the data where something exceptional is going on.

In addition, extended abstracts may present position statements or results of original studies concerning only one of the aforementioned branches.

Full papers can consist of a maximum of 12 pages; extended abstracts of up to 4 pages, following the LNI formatting guidelines. The only accepted format for submitted papers is PDF. Each paper submission will be reviewed by at least two members of the program committee.

(Weiter...  )

NEM Position Paper of Big and Open Data

"NEM position papers are documents giving the NEM Initiative view on any subject related to the networked electronic media area. The NEM position papers typically include: letters of advice to the Commission, formal opinions submitted to the Commissioner, submissions to regulatory bodies, or any other formal statement of this nature, as well as further views of the NEM community on various technological, societal, and policy issues related to NEM." Source: www.nem-initiative.org

(Weiter...  )

Many companies hope for big data

Our students at LS 8 learn exactly what is in demand at many companies.

(Weiter...  )

Dortmunder postdoc Wouter Duivesteijn wins C.J. Kok Jury Award 2013.

Dortmunder postdoc Wouter Duivesteijn wins C.J. Kok Jury Award 2013.

Annually, the Faculty of Science at Leiden University, the Netherlands, grants the C.J. Kok Jury Award for the best PhD thesis of the past year. All institutes within the faculty (astronomy, physics, mathematics, computer science, chemistry, pharmacy, biology, and environmental sciences) are given the opportunity to nominate candidates for the award.

 Out of a pool of over 120 dissertations, the C.J. Kok Jury Award 2013 was won by Wouter Duivesteijn, with his thesis "Exceptional Model Mining". Notably, this is the first time ever that the award (existing since 1971) has been bestowed upon a computer scientist.

Book Announcement: RapidMiner: Data Mining Use Cases and Business Analytics Applications

The book "RapidMiner: Data Mining Use Cases and Business Analytics Applications" has been published on 6 November, 2013 by Chapman and Hall/CRC

"In this book, case studies communicate how to analyze databases, text collections, and image data. … How the given data are transformed to meet the requirements of the method is illustrated by screenshots of RapidMiner. The RapidMiner processes and datasets described in the case studies are published on the companion web page of this book. The inspiring applications may be used as a blueprint and a justification of future applications."
—From the Foreword by Professor Dr. Katharina Morik, Technical University of Dortmund

(Weiter...  )

SFB-Artikel des LS 8 von der ECML PKDD 2013 preisgekrönt

ECML presentationThe paper Spatio-Temporal Random Fields: Compressible Representation and Distributed Estimation by Nico Piatkowski, Sankyun Lee and Katharina Morik is the winner of this year's ECMLPKDD 2013 machine learning best student paper award. The ceremony took place on Monday, September 23rd, in Prague (www.ecmlpkdd2013.org).

The article has been selected out of 182 papers for the journal publication. With an acceptance rate of 7% there were 14 accepted journal publications. 124 papers were selected out of 460 submissions for the proceedings (acceptance rate 26%). From 138 accepted submissions alltogether 4 won the award for best paper. The above article from Nico Piatkowski, Sankyun Lee und Katharina Morik is one of these.

EDBT/ICDT 2014 Call for Workshops

On the last day of EDBT/ICDT 2014, 28. March 2014, there are some workshops. More information about formatting guidelines and registration can be found here.

Deadline: 7. December

(Weiter...  )

EDBT/ICDT 2014 Joint Conference: Call for papers

The International Conference on Extending Database Technology is a leading international forum for database researchers, practitioners, developers, and users to discuss cutting-edge ideas, and to exchange techniques, tools, and experiences related to data management. Data management is an essential enabling technology for scientific, engineering, business, and social communities. Data management technology is driven by the requirements of applications across many scientific and business communities, and runs on diverse technical platforms associated with the web, enterprises, clouds and mobile devices. The database community has a continuing tradition of contributing with models, algorithms and architectures, to the set of tools and applications enabling day-to-day functioning of our societies. Faced with the broad challenges of today's applications, data management technology constantly broadens its reach, exploiting new hardware and software to achieve innovative results.

EDBT 2014 invites submissions of original research contributions, as well as descriptions of industrial and application achievements, and proposals for tutorials and software demonstrations. We encourage submissions relating to all aspects of data management defined broadly, and particularly encourage work on topics of emerging interest in the research and development communities.

Deadline: 15. October 2013

(Weiter...  )

LS8 at the International Broadcasting Convention (IBC) with the EU project Vista-TV

The highly respected conference with an exhibition, IBC, takes place in Amsterdam and Vista-TV is one of the exhibitors. In the Future Zone, Vista-TV presents realtime analytics of Internet-TV use. (more)

"With more than 50,000+ attendees from more than 160 countries, IBC combines a highly respected and peer-reviewed conference with an exhibition that exhibits more than 1,400 leading suppliers of state of the art electronic media technology...
Run by the industry, for the industry, IBC is owned by six industry partners that represent both exhibitors and visitors." (http://www.ibc.org/page.cfm/link=628)
Vista-TV provides users with real-time recommendations of shows and an excellent overview of the current TV program that eases the selection of the channel. In addition, for the producers of shows and for marketing companies, Vista-TV offers a real-time statistics of watching behavior. How many use the smartphone, the computer or the large TV screen for watching Internet-TV right now? In which region are the watching users located? From which channel to which other channel do users switch frequently? All these real-time analyses respect the privacy of the users and do not allow to trace a specific user. The statistics, however, is a source of valuable information.

(Weiter...  )

Fußball-Analyse mit dem streams Framework - TechniBall gewinnt Audience-Award!

In enger Zusammenarbeit mit dem Technion (Israel Institute of Technology) entstand basierend auf dem *streams* Framework ein System zur Echtzeitanalyse von Fußball-Daten für den Wettbewerb der diesjährigen DEBS Konferenz. Aufgabe der Challenge war die Berechnung von Statistiken über das Lauf- und Spielverhalten der Spieler, die mit Bewegungs- und Ortungssensoren des RedFIR Systems (Fraunhofer) augestattet wurden.
Im Rahmen des Wettbewerbs entwickelte der Lehrstuhl 8 zusammen mit dem Technion das "TechniBall" System auf Basis des *streams* Frameworks von Christian Bockermann. TechniBall ist in der Lage, die erforderlichen Statistiken deutlich schneller als in Echtzeit (mehr als 250.000 Events pro Sekunde) zu verarbeiten und wurde vom Publikum des Konferenz zum Gewinner des DEBS Challenge 2013 gekürt.

(Weiter...  )

"Machine Learning and Knowledge Discovery in Databases" as one of the top 50% most downloaded eBooks at Springer

Since its online publication on Sep 04, 2008 there has been a total of 11732 chapter downloads of "Machine Learning and Knowledge Discovery in Databases". In 2012 it is still one of the top 50% most downloaded eBooks in the relevant Springer eBook Collection with 1055 downloads.

(Weiter...  )

BBC about the project Vista TV

The BBC blog about the project Vista-TV in which Libby Miller shows visualizations of user behavior. (Weiter...  )

UBICOMM 2013: Call for papers

The goal of the International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies, UBICOMM 2013, is to bring together researchers from the academia and practitioners from the industry in order to address fundamentals of ubiquitous systems and the new applications related to them. The conference will provide a forum where researchers shall be able to present recent research results and new research problems and directions related to them. The conference seeks contributions presenting novel research in all aspects of ubiquitous techniques and technologies applied to advanced mobile applications.

Deadline: 17. May 2013

(Weiter...  )

Stellen für studentische Hilfskräfte

An der TU Dortmund, Fakultät für Informatik am Lehrstuhl VIII sind ab sofort Stellen für Studentische Hilfskräfte im Umfang von bis zu 10 Wochenstunden zu besetzen. (Weiter...  )

TechniBall - Solution for the DEBS Challenge 2013

LS8 analysis football games in realtime! Each player is equipped with a sensor and so is the ball. The streams framework from LS8 is coupled with the Esper event recognition of Technion. (Weiter...  )

Mit Datenstrom-Algorithmen zum besseren TV-Erlebnis - ViSTA TV Coding Camp am Lehrstuhl 8

Fernsehen über das Internet (IP-TV) spielt eine immer größere Rolle in der heutigen Medienlandschaft. Größere Programmvielfalt, Fernsehen auf mobilen Geräten, oder Mediatheken sind nur ein paar Vorzüge de neuen Fernsehwelt. Um das TV-Erlebnis für jeden Zuschauer zu optimieren ist im Hintergrund jede Menge Hightech gefragt. Das EU-Projekt ViSTA-TV erforscht das TV-Verhalten von Benutzern, sucht nach ähnlichen Sendungen und versucht so, dem Zuschauer das bestmögliche Programm zu empfehlen. Von der Lieblingssendung zu interessanten Dokumentationen oder die neuesten Trends - in der Fülle der Angebote wird für jeden Zuschauer das richtige gefunden.

Das Projekt ViSTA-TV ist ein Gemeinschaftsprojekt der Universitäten Zürich, Amsterdam und des Lehrstuhl 8 der Informatik der TU Dortmund, sowie den Unternehmen BBC, Zattoo und der Dortmunder Firma Rapid-I. Ziel des Projektes ist die Analyse des Fernsehverhaltens von IPTV Nutzern um z.B. Empfehlungen von Sendungen möglichst genau an die Bedürfnisse und Vorlieben der Zuschauer anzupassen. Dafür wird das Ein- und Umschaltverhaltens der Benutzer, sowie Eigenschaften des Video-Signals (zB. Werbungserkennung) analysiert.

Eine Herausforderung stellt dabei die große Datenrate von Video-Daten, die in Echtzeit analysiert werden müssen. Dazu wurde die Datenstrom-Umgebung „streams“, die von Christian Bockermann am Lehrstuhl 8 entwickelt wurde, um die Fähigkeit der Video-Analyse erweitert. Dies ermöglicht die gleichzeitige Analyse von Video-Daten mit dem dazugehörigen Umschaltverhalten aus Log-Daten. Die Ergebnisse werden dann innerhalb eines Empfehlungssystems weiter verarbeitet um Nutzern einen maßgeschneiderten Blick auf das TV-Angebot zu bieten.

Mit im Blick haben die Forscher aus Dortmund dabei natürlich auch die Integra-tion weiterer Datenquellen, wie DBpedia, elektronische Fernsehzeitschriften oder die beliebte Internet Movie Database (imdb). Im Sinne des „Big Data“ Gedankens, werden alle diese Informationen zeitnah analysiert und lassen so auch Informationen über Schauspieler, Nachrichten oder aktuelle Trends auf Twitter und facebook mit in die Empfehlungen einfließen.

Coding-Camp an der TU

In dieser Woche findet im an der TU Dortmund das zweite Coding-Camp zum ViSTA-TV Projekt statt. Dabei stehen insbesondere die Integration der Module der Projektpartner im Mittelpunkt. Das Ziel des Coding-Camp ist ein erster lauffähiger Prototyp des Projektes, der Programmempfehlungen an Zuschauer über Handy-Apps anbietet.

Jugend Forscht: Regionalwettbewerb in Dortmund

Am 19. Februar findet in Dortmund der Regionalwettbewerb Jugend forscht statt. In den Räumen der DASA Arbeitswelt Austellung präsentieren die jungen Nachwuchsforscher ihre Ideen und Arbeiten in verschiedenen Forschungsgebieten der Jury. Für das Gebiet Mathematik/Informatik ist mit Christian Bockermann auch der Lehrstuhl 8 der Fakultät für Informatik und ein Mitarbeiter im Projekt C1 des SFB in der Jury vertreten.

Book on Managing and Mining Sensor Data published

The book Managing and Mining Sensor Data has been published as an ebook and will be available as hardcover from 28th of February 2013. The book has been supported by the collaborative research center by the authors Marco Stolpe (project B3, Artificial Intelligence) and the guest researcher Kanishka Bhaduri. They contributed the chapter on Distributed Data Mining in Sensor Networks.

Especially sensor networks provide data at different, distributed locations. For an efficient analysis new technologies need to calculate results even if communication ressources are constrained.

(Weiter...  )

IEEE International Conference on Data Mining

Katharina Morik organizes a Panel on the value of data at the IEEE International Conference on Data Mining (Weiter...  )

Zwei Wissenschaftliche Mitarbeiter gesucht

Der Lehrstuhl für künstliche Intelligenz sucht zum nächstmöglichen Zeitpunkt zwei wissenschaftliche Beschäftigte.

  • Für das Projekt DDMD (Data Driven Materials Development) wird ein/eine (Post-)Doktorand/in gesucht. Das Projekt läuft in Zusammenarbeit mit Univ. Duisburg-Essen und RUB. Weitere Details können der Ausschreibung entnommen werden.
  • Für das Projekt KobRA (Korpus-basierte linguistische Recherche und Analyse mit Hilfe von Data-Mining) wird ebenfalls eine/ein wissenschaftliche/r Beschäftigte/r gesucht. Dabei soll die Korpus-basierte Linguistik durch Methoden des Data Mining unterstützt werden. Weitere Details können der Ausschreibung entnommen werden.

New newspaper article about Katharina Morik published

The German newspaper "Westdeutsche Allgemeine Zeitung" has published an article about Katharina Morik. The full article can be found on their website. (Weiter...  )

Stellenausschreibung: Entwicklung einer prozessdatenbasierten realzeitlichen Parameteradaptierung in automatisierten Produktionsprozessen

Im Anwendungsfall energie- und ressourcenintensiver Industrien besteht die Herausforderung darin, steigende Produktqualität bei gleichzeitiger Reduzierung von Kosten und Produktionszeiten zu realisieren. Prinzipien und Methoden von Qualitätsmanagement- und Produktionssystemen nach dem Vorbild der japanischen Automobilindustrie rücken dabei als vorrangiges Leitbild branchenübergreifend in den Mittelpunkt. Als ein wesentliches Element des TPS leistet das Prinzip einer prozessimmanenten Qualitätskontrolle, auch bekannt unter den Begriffen Jidoka oder Autonome Automation, einen entscheidenden Beitrag. Jedoch ist das Jidoka-Prinzip im Fall automatisierter, verketteter Produktionsprozesse, wie sie beispielsweise in der Stahlindustrie vorzufinden sind, auf konventionellem Weg nicht ohne weiteres realisierbar.

Ziel dieses Promotionsvorhabens ist die Entwicklung und Validierung einer Systematik zur Ausschussminimierung und Produktqualitätsoptimierung im Kontext starr verketteter, automatisierter Produktionsprozesse. Ein möglicher Ansatz stellt dabei das Konzept der Advanced Process Control dar. Zentraler Gedanke ist dabei die realzeitliche, prozessdatenbasierte Überwachung und Auswertung von Produktionsprozessen mit dem Ziel, kurzfristige Prozessschwankungen ausgleichen und somit die Produktqualität sicherstellen zu können. Das Promotionsvorhaben soll für das oben skizzierte Produktionssystem einen Ansatz entwickeln, der basierend auf der automatisierten Auswertung von Prozessparametern entscheidet, ob die Qualität des aktuell bearbeiteten Produkts den Spezifikationen entspricht oder ob und in welcher Form eine Anpassung der Prozessparameter erforderlich und realzeitlich möglich ist, um die Qualitäts­spezifikationen zu erfüllen. Alternativ besteht eine weitere Entscheidungsmöglichkeit darin, das Produkt nicht weiter zu bearbeiten, wenn die Qualitätsabweichung durch Anpassung des Produktionsprozessablaufes nicht korrigiert werden kann.

Die Durchführung des Vorhabens umfasst neben der Entwicklung des theoretischen Konzeptes, eine simulationsbasierte Validierung sowie in enger Kooperation mit der Deutsche Edelstahlwerke GmbH am Standort Witten die Integration des Konzeptes in die betrieblichen Produktionsabläufe. Zur Lösung der Aufgabe soll auf den Einsatz modernster Data Mining-Techniken zurückgegriffen werden.

Betreuer: Prof. Deuse

Bewerbungen ab sofort an:

Dipl.-Wirt.-Ing. Uta Spörer
Tel.: +49 (231) 755 – 5787
Fax: +49 (231) 755 – 5772
E-Mail: spoerer@gsoflog.de
Mo- Do: 8:30 - 12:30 Uhr


(Weiter...  )

LWA2012 from 12.09. to 14.09. at the Computer Science Department

LWA stands for "Lernen, Wissen, Adaption" (Learning, Knowledge, Adaptation). It is the joint forum of four special interest groups of the German Computer Science Society (GI). Following the tradition of the last years, LWA provides a joint forum for experienced and for young researchers, to bring insights to recent trends, technologies and applications, and to promote interaction among the SIGs. (Weiter...  )

HIGHLIGHTS from the 5th Annual Rexer Analytics Data Miner Survey (2011)

  • SURVEY & PARTICIPANTS: 52-item survey of data miners, conducted on-line in 2011. Participants: 1,319 data miners from over 60 countries.
  • FIELDS & GOALS: Data miners work in a diverse set of fields. CRM/Marketing has been the #1 field for the past five years. Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals continue to be the goals identified by the most data miners.
  • ALGORITHMS: Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. A third of data miners currently use text mining and another third plan to do so in the future.
  • TOOLS: R continued its rise this year and is now being used by close to half of all data miners (47%). R users report preferring it for being free, open source, and having a wide variety of algorithms. Many people also cited R's flexibility and the strength of the user community. STATISTICA is selected as the primary data mining tool by the most respondents (17%). Data miners report using an average of 4 software tools. STATISTICA, KNIME, Rapid Miner and Salford Systems received the strongest satisfaction ratings in 2011.
  • ANALYTIC CAPABILITY & SUCCESS MEASUREMENT: Only 12% of corporate respondents rate their company as having very high analytic sophistication. However, companies with better analytic capabilities are outperforming their peers. Respondents report analyzing analytic success via Return on Investment (ROI) and analyzing the predictive validity or accuracy of their models. Challenges to measuring success include client or user cooperation and data availability/quality.
  • SHARED INSIGHTS: In the 2010 Survey data miners shared best practices in overcoming the key challenges data miners face ( verbatims ). In the 2011 Survey data miners shared their best practices for measuring analytic success ( verbatims ) and examples of the positive impact that data mining can have to benefit society, health, and the world ( verbatims ). Additionally, 225 R users shared information about how and why they are using R ( verbatims ).
After the 2011 survey, Rexer Analytics Data Miner Survey has moved to a bi-annual schedule; the next Data Miner Survey will be launched in early 2013. Information about Rexer Analytics is available at www.RexerAnalytics.com (Weiter...  )

Grant application in line with the 4th call for proposals of the Mercator Research Center Ruhr (MERCUR) granted

The grant application Data Driven Materials Design (DDMD) was granted in line with the 4th call for proposals of the Mercator Research Center Ruhr (MERCUR). The project is a cooperation between Prof. Dr. Ralf Drautz, Prof. Dr. Alfred Ludwig, (both Ruhr-University Bochum), Prof. Dr. Katharina Morik (Chair 8) and Prof. Dr. Sven Rahmann (University Duisburg-Essen). The connected usage of experimental high-through-put-methods and analytic modeling in materials research, especially in the fields of thin-layer-material-libraries, "Attribute-Screenings" and "Advance Materials Simulation", is one of Ruhr-University's unique features, which is intended to be strengthened with this application. The mentioned fields have in common that they generate an extremely huge amount of multidimensional data that can not be analyzed efficiently without the help of computers. Analyzing huge amounts of data is one of TU Dortmund's focuses of which in this case particularly Data Mining is addressed. At Univerity Essen-Duisburg high-through-put-analysis is in front. The intention of this colaboration is to initiate a more rational development of new materials. The application tends to establish the foundation for the field of Data Driven Material Development. In this field new discoveries as well as new comprehensions (e.g. unknown phases or special physical properties) are supposed to be gained. In addition the development of new materials is to be speeded up. (Weiter...  )

Beste Bewertung: EU-Antrag INSIGHT

The application "INtelligent Synthesis and Real-tIme Response using Massive StreaminG of HeTerogeneous Data" is the best rated one in field FP7 "Intelligent Information Management" reaching 14.5 of possible 15 points. The Department of Computer Science at the TU Dortmund is involved with chair 8. Coordinator is Dimitrios Gunopulos (National University Athen). It is about analysing the huge amount of heterogeneous datastreams from sensors, mobile phones and control systems to enhance the management of cases of emergency. Examples are taken from the city of Dublin and the german Federal Office of Civil Protection and Disaster Assistance. The innovation of Data Mining finally enabled analysing social networks (e.g. Twitter), sensor-networks, and traffic-systems in connection and integrating citizens in this process.

ViSTA-TV started on June 1st

Live video content is increasingly consumed over IP networks in addition to traditional broadcasting. The move to IP provides a huge opportunity to discover what people are watching in much greater breadth and depth than currently possible through interviews or set-top box based data gathering by rating organizations. The ViSTA-TV project proposes to gather consumers’ anonymized viewing behavior and the actual video streams combined with enhanced electronic program guide information. ViSTA-TV will be in the position to provide highly accurate market research information about viewing behavior that can be used for a variety of analyses of high interest to all participants in the TV-industry. Furthermore ViSTA-TV will employ the information gathered to build a recommendation service. ViSTA-TV is an European Union-funded research project, beginning on 1 June 2012, and lasting for two years. The Artificial Intelligence Group participates alongside 5 other partners. (Weiter...  )

RapidMiner tested

Rapid-I is based in Dortmund, Germany, and has been working on RapidMiner, a Data-Mining-Software, since 2001. With its wide range of other tools such as RapidAnalytics, RapidLab, RapidNet and RapidSentilyzer it could win over such clients as Siemens, Allianz and Pepsico. The website JTonEDM.com introduces Rapid-I and its software RapidMiner in a short overview. (Weiter...  )

NEU MASTER-/ DIPLOMARBEIT ZU VERGEBEN: Effiziente Erfassung von Concept Drifts bei zyklischen Veränderungen in Stahlwerksprozessen

In heutigen Industrieanlagen zeichnen Sensoren während des Produktionsprozesses große Mengen an Daten auf. Aus diesen Daten wird schon während des laufenden Prozesses auf die Qualität des Endprodukts geschlossen. Produktionsbedingt kommt es während des laufenden Prozesses zu Veränderungen von Anlagenkomponenten und Messtechnik, die nur zyklisch instand gehalten werden können. Die Veränderungen spiegeln sich auch in den Prognosemodellen wieder. Es kommt zum Concept Drift. SMS Siemag und ein führender Hersteller von Grobblechen stellen für diese Arbeit aktuelle Produktionsdaten zur Verfügung. Im Rahmen der Bachelor, Diplom- oder Masterarbeit sollen Strategien zur Identifizierung von Concept Drifts und zur Stabilisierung der Prognosegüte entwickelt werden. Eine besondere Herausforderung stellt es dar, dass die Concept Drift Erkennung und Bereinigung in Realzeit geschehen soll. Der Schwerpunkt der Arbeit liegt daher auf der Auswahl, der Implementierung und dem Vergleich besonders effizienter Verfahren zur Entdeckung von Concept Drifts. (Weiter...  )

NEU MASTER-/ DIPLOMARBEIT ZU VERGEBEN: Steuerung von Prozessen in der Stahlproduktion mit Hilfe von multikriterieller Optimierung

In heutigen Industrieanlagen zeichnen Sensoren während des Produktionsprozesses große Mengen an Daten auf. Aus diesen Daten wird schon während des laufenden Prozesses auf die Qualität des Endprodukts geschlossen. Bisher wird von der Optimierung meist nur eine Zielgröße behandelt. Die Qualität des Endproduktes hängt aber oft von mehreren Zielgrößen ab, die sich obendrein widersprechen können. Dies kann nun als multikriterielles Optimierungsproblem formalisiert werden. Insbesondere muss eine geeignete Fitnessfunktion bestimmt werden. Dann können die Anwender aus den pareto-optimalen Lösungen Handlungsempfehlungen ableiten. Am LS8 stehen Sensordaten über den Produktionsprozess eines führenden Herstellers von Grobblechen zur Verfügung. An diesem Beispiel kann die Formalisierung von widersprüchlichen Zielgrößen als multikriterielle Optimierung untersucht werden. Dabei können Implementierungen in Rapid-Miner genutzt werden. Die genaue Aufgabenstellung wird darauf angepasst, ob es eine Bachelor, Diplom- oder Masterarbeit wird. (Weiter...  )

Special Issue of the international journal Data Mining and KnowledgeDiscovery published!

Together with Kanishka Bhaduri and Hillol Kargupta, Katharina Morik has edited a special issue of the international journal Data Mining and Knowledge Discovery. The special issue on Data Mining for Sustainability including a comprehensive introduction is now online at http://www.springerlink.com/. (Weiter...  )

Projektgruppenvorstellung "Kooperatives Datamining mit vernetzen Robotern"

Die Projektgruppe "Kooperatives Datamining mit vernetzen Robotern" wird am 22.12.2011 um 14:00 Uhr (s.t.) in den neuen Räumlichkeiten des Lehrstuhls 8 (Joseph-von-Frauenhofer Straße 23 in Raum 1.48) präsentiert.

Neue Diplom-/Masterarbeit zu vergeben: Personalisierung von Hotelempfehlungen anhand von Klickpfaden

Die Suche und Buchung von Hotels über das Internet wird heute üblicherweise über spezielle Portale abgewickelt. Die reine Filterung anhand von Suchkriterien führt häufig zur Ausgabe einer noch immer unüberschaubaren Anzahl von Hotels. Für die langfristige Bindung von Kunden an ein Portal ist es jedoch entscheidend, so schnell wie möglich Hotels anbieten zu können, die für die jeweilige Person (oder Personengruppe) tatsächlich geeignet sind. Mittels Methoden des Data Minings und maschinellen Lernens sollen Benutzerpräferenzen gelernt werden, die personalisierte und damit geeignetere Empfehlungen von Hotels ermöglichen. Hierzu werden vom weltweit führenden Portalbetreiber "Hotel Reservation Service" (HRS) Daten über Hotels, Portalbesucher, Kunden, Buchungen und Hotelbewertungen zur Verfügung gestellt. (Weiter...  )

KDD 2011 Workshop on Data Mining Applications in Sustainability in San Diego, CA

The annual ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. KDD-2011 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition. KDD-2011 will run from August 21-24 in San Diego, CA and will feature hundreds of practitioners and academic data miners converging on the one location. (Weiter...  )

Übersicht über den Einfluss führender Datenbank und Data Mining Journale 2010 veröffentlicht

Being in the editorial boards of Knowledge and Information Systems (KAIS) and of Data Mining and Knowledge Discovery (DMKD), Katharina Morik happily presents the impact factors (2010) of some leading database and data mining journals:
  • ACM Transactions on Information Systems (TOIS): 1.085
  • ACM Transactions on Database Systems (TODS): 1.216
  • Data Mining and Knowledge Discovery (DMKD): 1.238
  • Information Systems (IS): 1.595
  • Data and Knowledge Engineering (DKE): 1.717
  • IEEE Transactions on Knowledge and Data Engineering (TKDE): 1.847
  • Machine Learning (ML): 1.956
  • Knowledge and Information Systems (KAIS): 2
Download the complete list

New Topic for a Master-/DA- Thesis: Feature Extraction from video-data

Neben YouTube und Co. wird das Internet mit zunehmender Bandbreite auch für klassisische Fernsehübertragungen immer interessanter. War IP-TV bisher meist für große Sportereignisse im Fokus, bieten Firmen wie z.B. zattoo.com bereits die Möglichkeit sich einer Vielzahl unterschiedlicher Kanäle zu bedienen, Sendungen online aufzuzeichnen und zu Archivieren. Aber wie findet man interessante Sendungen? Welche Informationen geben Aufschluß über Programme die mir gefallen? Lassen sich Spartensender allein anhand der Informationen aus den Video-Daten unterscheiden? In dieser Master-Arbeit geht es um die Extraktion von Merkmalen, die für die Klassifikation oder die Gruppierung von Sendungen, Sendern oder Fernsehzuschauern wichtig sind. (Weiter...  )

Feature Selection Extension for RapidMiner - NEW RELEASE 1.1.3

The Feature Selection Extension für RapidMiner 5 contains some operators for feature selection and -weighting and for classification. All operators are also highly suitable for high-dimensional data, e.g. microarray data. New in this version are:
  • RCCW - Recursive Conditional Correlation Weighting a very fast feature subset selection method.
  • FCBF - Fast Correlation Based Feature Selection
  • PAM - Classification by Shrunken Centroids
  • BAHSIC - Backward Feature Selection via Hilbert-Schmidt information criterion
  • t-Test - Computes a p-Value for the difference of the mean values between two classes
  • Test Significance - Assumes normal distribution, then checks for equal class variances via F-test and afterwards computes p-Value via t-Test or Welch-test
  • Benjamini-Hochberg-Correction - Performs the correction for FDR on significance values in an AttributeWeights object
Already available since older version are - amongst others - Recursive Feature Elimination (RFE) and minimum Redundancy Maximum Relevance Feature Selection (MRMR) / Correlation based Feature Selection (CFS) and a meta-operator for ensemble feature selection. The most recent version is available for free from SourceForge: https://sourceforge.net/projects/rm-featselext/ . (Weiter...  )

RapidMiner is most popular data mining tool according to KDnuggets poll

RapidMiner is again the most popular data mining tool in KDnuggets poll. (Weiter...  )

Colloquium of the Collaborative Research Center SFB 876 on June 30th, 2011: Prof. Preeti Ranjan Panda (Indian Institute of Technology Delhi)

Graphics processor (GPU) architectures have evolved rapidly in recent years with increasing performance demanded by 3D graphics applications such as games. However, challenges exist in integrating complex GPUs into mobile devices because of power and energy constraints, motivating the need for energy efficiency in GPUs. While a significant amount of power optimisation research effort has concentrated on the CPU system, GPU power efficiency is a relatively new and important area because the power consumed by GPUs is similar in magnitude to CPU power. Power and energy efficiency can be introduced into GPUs at many different levels: (i) Hardware component level - queue structures, caches, filter arithmetic units, interconnection networks, processor cores, etc., can be optimised for power. (ii) Algorithm level - the deep and complex graphics processing computation pipeline can be modified to be energy aware. Shader programs written by the user can be transformed to be energy aware. (iii) System level - co-ordination at the level of task allocation, voltage and frequency scaling, etc., requires knowledge and control of several different GPU system components. (Weiter...  )

Colloquium of the Collaborative Research Center SFB 876 on June 9th, 2011: Prof Piero Bonatti (University of Naples)

An increasing amount of information is being encoded via ontologies and knowledge representation languages of some sort. Some of these knowledge bases are encoded manually, while others are generated automatically by information extraction techniques. In order to protect the confidentiality of this information, a natural choice consists in encoding policies with the same language as the ontology language. This approach led to so-called "semantic web policies". The semantic web is founded on two knowledge representation languages: description logics and logic programs. In this talk we compare their expressive power as *policy* representation languages, and show that logic programming approaches are currently more mature than description logics, although this picture may change in the near future. (Weiter...  )

Colloquium of the Collaborative Research Center SFB 876 on May 5th, 2011: Henrik Blunck (University of Aarhus)

Emerging and envisioned applications within domains such as indoor navigation, fire-fighting, and precision agriculture still pose challenges for existing positioning solutions to operate accurately, reliably, and robustly in a variety of environments and conditions and under various application-specific constraints. This talk will first give a brief overview of efforts made in a Danish project to address challenges as mentioned above, and will subsequently focus on addressing the energy constraints imposed by Location-based Services (LBS), running on mobile user devices such as smartphones. A variety of LBS, including services for navigation, location-based search, social networking, games, and health and sports trackers, demand the positioning and trajectory tracking of smartphones. To be useful, such tracking has to be energy-efficient to avoid having a major impact on the battery life of the mobile device, since the battery capacity in modern smartphones is a scarce resource, and is not increasing at the same pace as new power-demanding features, including various positioning sensors, are added to such devices. We present novel on-device sensor management and trajectory updating strategies which intelligently determine when to sample different on-device positioning sensors (accelerometer, compass and GPS) and when data should be sent to a remote server and to which extent to simplify it beforehand in order to save communication costs. The resulting system is provided as uniform framework for both position and trajectory tracking and is configurable with regards to accuracy requirements. The effectiveness of our approach and the energy savings achievable are demonstrated both by emulation experiments using real-world data and by real-world deployments. (Weiter...  )

MonetDB: Open-source Columnar Database Technology Beyond Textbooks - Vortrag von Stefan Manegold

Stefan Manegold from CWI Amsterdam will be giving a talk on the column-store DBMS MonetDB on 2011/02/11 um 16.00 at Room E23, Otto-Hahn-Straße 14.

Column-store database management systems have recently experienced a considerable popularity-boost. The underlying ideas, however, date back to (at least) the mid 1980's and the technology has been pioneered since the early 1990's in the MonetDB system, a column-store research prototype that has been developed into a complete SQL- and XML/XQuery-compliant column-store DBMS freely available in open source. Next to its column-store back-bone, MonetDB focuses on high-performance hardware-conscious algorithms, novel workload-adaptive query processing techniques such as "cracking", "recycling" and run-time query optimization, and extensibility at all layers of its software stack.

In this talk, we will provide detailed insight into MonetDB's column-store architecture and query-processing technology as available in open-source, discussing its benefits for data mining, OLAP, BI, as well as science workloads.

Eröffnungskolloquium des SFB 876 - Jetzt Folien Online!

The new Collaborative Research Center SFB 876 "Providing Information by Resource-Constrained Data Analysis" starts the new year with a kick-off colloquium. The colloquium takes place on January 20th 2011 starting at 4 pm at auditorium E23, Otto-Hahn-Straße 14, TU Dortmund University campus. For further information about the program and speeches please have a look at the attachment.

SFB 876 - Die Bewerbungsfrist ist abgelaufen

At this time, no futher applications for open positions at the SFB 876 are being accepted.

SFB 876 granted!

The DFG granted the SFB 876. (Weiter...  )

Presentations online!

First presentations and pictures available on the MODAP workshop website. (Weiter...  )

First International Workshop on Social and Privacy aspects of the Mobility

Analyzing huge amounts of mobility data has posed new challenges not only in the discovery and interpretation of interesting patterns, but also in the privacy preservation of individuals under observation. However, the social and privacy aspects of mobility have not been studied in a systematic and combinatorial way, while the census and the conception of their effects in our lives is rather in childhood. The convergence of these complementary aspects, and more specifically, the way that mobility affects (or is affected by) the social behavior of individuals and their privacy, emerges the exciting new area of "socio-mobility". Socio-mobility arises a number of challenging questions. Are people moving together socially related? Are there social relations between people moving to semantically similar places? How could we combine mobility data and patterns with social networking information? Can social interactions be mined from mobility data by using external media? To what extend do social interactions affect privacy? What are the risks of disclosing social interactions between people and how can we design privacy-preserving techniques to minimize the risks? What kinds of social interactions are considered sensitive and how can we model / distort / suppress such interactions? (Weiter...  )

Interdisciplinary College in Günne in Günne at Lake Möhne, 25.März - 1. April 2011

The Interdisciplinary College (IK) is an annual, intense one-week spring school which offers a dense state-of-the-art course program in neurobiology, neural computation, cognitive science/psychology, artificial intelligence, robotics and philosophy. It is aimed at students, postgraduates and researchers from academia and industry. (Weiter...  )

PG 542 Final presentation

The student project group 542 "Stream Mining for Intrusion Detection in Distributed Systems" has succesfully finished their work on a generic framework for online and distributed data mining. All results including system's architecture, evluation of learning algorithms and a live demo covering the use case of intrusion detection will be presented on Thursday, 28th October, 10.15 at GB 4 room 136. (Weiter...  )

RapidMiner Hierarchical Heavy Hitters Plugin

After presenting our paper "Implementing Hierarchical Heavy Hitters in RapidMiner: Solutions and Open Questions" at the RCOMM 2010, we have released all accompanying Java code as RapidMiner 5 plugin. The plugin can be used to calculate Hierarchical Heavy Hitters on system call data. It furthermore contains domain-independent implementations of the related algorithms in Java. (Weiter...  )

RapidMiner Microarray Feature Selection Plugin released

The Microarray Feature Selection Plugin for RapidMiner 5 contains some feature selection and -weighting operators useful for working on high-dimensional (microarray-) data. These are - amongst others - Recursive Feature Elimination (RFE) and minimum Redundancy Maximum Relevance Feature Selection (MRMR) / Correlation based Feature Selection (CFS) and a meta-operator for ensemble feature selection. (Weiter...  )

RapidMiner Information Extraction Extension

The RapidMiner Information Extraction extension supports Information Extraction techniques in RapidMiner. Visualizers, annotators and preprocessing operators have been implemented for textual purpose. Structured models - namely Conditional Random Fields - for the extraction of named entities are available. Operators to extract relations with will be available soon. (Weiter...  )

Summer School on Mobility, Data Mining, and Privacy

The 1st Summer School on Mobility, Data Mining, and Privacy is co-organized by the FP7/ICT project MODAP "Mobility, Data Mining, and Privacy" (www.modap.org) and the COST Action IC0903 MOVE "Knowledge Discovery from Moving Objects" (http://move-cost.info/). This is the first doctoral school ever on the 'hot' intersection of three domains: modeling and management of moving object databases (Mobility), data analysis and knowledge discovery from mobility data (Data Mining), and privacy aspects that raise when processing human mobility (Privacy). (Weiter...  )

NEW Book: Ubiquitous Knowledge Discovery

Knowledge discovery in ubiquitous environments is an emerging area of research at the intersection of the two major challenges of highly distributed and mobile systems and advanced knowledge discovery systems. The new book, edited by Michael May and Lorenza Saitta, provides a state-of-the-art survey. It is the outcome of a large number of workshops, summer schools, tutorials and dissemination events of the European project KDubiq. (Weiter...  )

Initiative zur Datenanalyse unter Ressourcenbeschränkungen - Treffen in Bommerholz am 24./25.8.09

Bringing together embedded systems and data mining enables new solutions in computer science, bio medicine, physics and mechanical engineering. Embedded system can be further improved using machine learning while data mining algorithms can be realized in hardware, e.g. FPGAs. The restrictions in computing power, memory and energy demands new algorithms for known learning tasks. At Bommerholz 26 scientists and researchers from TU Dortmund and University Duisburg-Essen came together to gain a deeper understanding of the topic and exchange progress of ongoing projects.

RapidMiner -- most used open source data mining tool

RapidMiner is the most successful open source data mining tool for the third year in series -- only the commercial product Clementine (SPSS PASW Modeler) is more popular. (Weiter...  )


Asian Conference on Machine Learning
November 8-10, 2010, Tokyo, Japan (Weiter...  )

Special Issue on Sustainability of the Data Mining and Knowledge Discovery Journal

Special Issue on Sustainability of the Data Mining and Knowledge Discovery Journa (Weiter...  )

Recording of Talk

Katharina Morik "Handling Texts -- A Challenge for Data Mining" talk (in English), introduced by Jean-Gabriel Ganascia on the 9th francophone expert conference on Machine Learning and Data Mining, Strasbourg 2009

(Needs Microsoft Media Player Plugin):
<img src="http://canalc2.u-strasbg.fr/images/fondWM.gif" width="240" height="180" align="top" />

Videolink for other players

(Weiter...  )

Machine learning and biology

Lecture of Yoav Freund at the ECML PKDD 2008 about machine learning and biology

BioDatatbases(bioDatabases.m4v, 170.6 MB)

(Weiter...  )

Informatik kompakt

Based on the experiences of the 1999 lecture DAP1 a new textbook has finally arisen. This book introduces the fundamentals of the common core of different computer science areas by means of the programming language JAVA. (Weiter...  )

Chancengleichheit von Frauen an Universitäten

Prof. Dr. Katharina Morik was asked for a statement about equal opportunity for women.

The resulting TV report was shown on 07/11/2007 during the "tagesschau" news broadcast.

Source: Tagesschau-archive


Several programs have been developed at the AI unit within its research activities, such as myKLR, SVMlight, mySVM, RapidMiner (formerly YALE), the Information Layer or the USCHIFICATOR. Check our software page for a complete list. (Weiter...  )

Der Lehrstuhl zieht um!