Hauptnavigation

News from the Artificial Intelligence Group

The chair of artificial intelligence deals with the wide field of machine learning. In particular the chair concentrates on the development and implementation of learning algorithms that solve challenging problems.

LS8 publishes SpringerBrief on boosting statistical relational learners

This SpringerBrief addresses the challenges of analyzing multi-relational and noisy data by proposing several Statistical Relational Learning (SRL) methods. These methods combine the expressiveness of first-order logic and the ability of probability theory to handle uncertainty. It provides an overview of the methods and the key assumptions that allow for adaptation to different models and real world applications. The models are highly attractive due to their compactness and comprehensibility but learning their structure is computationally intensive. To combat this problem, the authors review the use of functional gradients for boosting the structure and the parameters of statistical relational models. The algorithms have been applied successfully in several SRL settings and have been adapted to several real problems from Information extraction in text to medical problems. Including both context and well-tested applications, Boosting Statistical Relational Learning from Benchmarks to Data-Driven Medicine is designed for researchers and professionals in machine learning and data mining. Computer engineers or students interested in statistics, data management, or health informatics will also find this brief a valuable resource.

(Weiter...  )

Stellen für studentische Hilfskräfte ab Januar 2015

An der TU Dortmund, Fakultät für Informatik am Lehrstuhl VIII, sind ab Januar 2015 Stellen für Studentische Hilfskräfte zu besetzen.

(Weiter...  )

Vorlesung Natürlichsprachliche Systeme (Katharina Morik)

IBM Watson

Google, Facebook oder Netflix brauchen für viele ihrer Dienste die Verarbeitung natürlicher Sprache. So gibt es die große Abteilung Natural Language Processing bei Google http://research.google.com/pubs/NaturalLanguageProcessing.html

Das IBM-Programm Watson konnte im Februar 2011 in dem Quiz Jeopardy auf natürlichsprachliche Fragen besser antworten als zwei menschliche Quiz-Sieger.

Ray Kurzweil (Google Director of Engineering) möchte darüber hinausgehen: „So IBM’s Watson is a pretty weak reader on each page, but it read the 200m pages of Wikipedia. And basically what I'm doing at Google is to try to go beyond what Watson could do.“ http://searchengineland.com/ray-kurzweils-job-google-beat-ibms-watson-natural-language-search-185149 Es gibt eine Fülle von Methoden zur Analyse sehr großer Textmengen für ebenfalls viele Anwendungen: Sentiment Analysis, personalisierte Werbung, Empfehlungen, email Routing, automatische Texterstellung für Kurznachrichten und Reporting, automatische Fragebeantwortung, Informationsextraktion aus dem WWW. In der Vorlesung mit Übungen lernen Sie die Methoden und Werkzeuge dazu kennen. Das neue Lehrkonzept beinhaltet inverted class room Sitzungen und selbstständige Arbeiten, so dass Sie für die Praxis gerüstet sind. http://www-ai.cs.uni-dortmund.de/LEHRE/VORLESUNGEN/NLS/WS1415/index.html

Vorlesung Probabilistische Graphische Modelle (Kristian Kersting)

Wie handelt man unter Unsicherheit, bei fehlenden oder fehlerhaften Daten? Um mit solchen Unsicherheiten umgehen zu können, haben sich in den letzten Jahren probabilistische, graphischen Modellen bewährt. Sie gehören zu den Bemühungen der modernen Informationstechnik, das Schlussfolgern unter Unsicherheit zu ermöglichen.

Tag-Cloud Probabilistische graphische Modelle

Prominente Anwendungsfelder sind die Robotik, die Bioinformatik, die Künstliche Intelligenz, das Maschinelle Lernen. So kommen sie zum Beispiel in der Auswertung von medizinischen Daten, der Analyse von Genexpressionsdaten und dem Tracken von Bewegungen zum Einsatz. Gegenstand der Vorlesung "Probabilistische Graphische Modelle" des LS8 sind grundlegende Fragestellungen und Techniken der graphischen Modelle. http://www-ai.cs.uni-dortmund.de/LEHRE/VORLESUNGEN/PGM/WS1415/index.html

Vorlesung Large-Scale Optimization (Sangkyun Lee)

Optimierung

Ganz allgemein sind Daten oft billiger zu erhalten als das Wissen von Experten zu extrahieren und dann zu modellieren. Aber wie können Rechner automatisch große Modelle --- wie sie in der Verarbeitung natürlicher Sprache, bei dem Schätzen von Graphischen Modellen und im statischen Maschinellen Lernen auftreten --- aus Daten schätzen?

In den meisten Lernverfahren steckt als Kern eine Optimierungsaufgabe: der Fehler soll miniert oder die Wahrscheinlichkeit für das richtige Ergebnis maximiert werden. Die theoretischen Grundlagen und Methoden behandelt in englischer Sprache die Vorlesung "Large-Scale Optimization".

PG infoscreen (Kristian Kersting, Hendrik Blom)

Infoscreen

Die Ansätze aus allen Vorlesungen können dann zur Anwendungen in der PG "Infoscreen" kommen. Infoscreens sind digitale Bildflächen und sollen eine besondere Aufmerksamkeit in "reizarmen" öffentlichen Räumen erzielen.

Es soll über Aktuelles an der Fakultät für Informatik der TU Dortmund informiert werden.

KDD 2014 sold out

KDD 2014 is sold out. They had to close registrations. 2200 attendees will enjoy the conference next week in Times Square. Katharina Morik gives a keynote talk at the workshop BigMine’14.

(Weiter...  )

The virtual steel works

After the press conference the LS8 project (Katharina Morik, Hendrik Blom, Tobias Beckers)  in collaboration with the SMS Siemag and the Dillinger Hütte is outlined in two interviews: Dominik Schöne of the Dillinger Hütte and Katharina Morik.

(Weiter...  )

ViSTA-TV in a Nutshell

The European project VistaTV had its successful final review meeting in Amsterdam, 1st of July. LS 8 contributed live stream analysis separating ads from shows in internet television. Online recommendations of shows based on user behavior have been produced based on Termset Clustering.

 

(Weiter...  )

Mediaday of the SMS Group in Hilchenbach

Mediaday of the SMS Group in Hilchenbach at 3. July 2014

Data Mining/ Industrie 4.0

Summary talk by Katharina Morik about "Data Mining, Big Data and Prediction Models"

(Weiter...  )

Talk at the TU Dortmund: What happens to our data? Between permanent harassment paranoia and post-privacy

Wednesday, 2. July 2014, 16:00 (s.t.) -18:30, P1-05-309
  • Kristian Kersting (Chair for artificial intelligence)
  • Sarah Küsgen (Chair for service and technology management)
  • Kai-Uwe Loser (Data security engineer of the RUB)
  • Johannes Weyer & Robin D. Fink (specific field technical sociology)


The youngest exposures of whistle-blowser Edward Snowden showed one more time the attractiveness of collecting massive data in the age of social media.

The question 'what happens to our data?', viewed from technical, economic and sociological background, will be investigated in the context of this event. The technical possibilities of modern data-mining are diverse and allow conclusions down to the individual level. Collected data from social networks are especially attractive for marketing and product design. Behind this background the protection of privacy will be assigned to new tasks.

The contributors will hold a 10-15 minutes talk each and will afterwards take part in a discussion with the audience. The event will be moderated by Johannes Weyer.

(Weiter...  )

Umzug Otto-Hahn-Str. 12

The chair for Artificial Intelligence is moving to the new building in Otto-Hahn-Str. 12. Thus, between 06/30/14 and 07/04/14 we may not be available at all times.

Talk at VigLink: Resource-aware graphical models

Prof. Morik talks at VigLink

Abstract:
Machine learning can help to enhance small devices. For instance, keeping the energy consumption of smart phones low is one of the major concerns of the users, as is well illustrated by various “charge your mobile” stations at public places. Where the operating systems of smart phones already offer heuristics and battery apps show consumption profiles, machine learning can do more. Predictions allow better optimizations of the operating system, prepare for particular app usages at certain points in time, or manage services such as GPS or WLAN in a context-aware and adaptive manner. This challenges learning algorithms to real-time application of their models. Moreover, it demands the models to run on the resource-restricted device without consuming more energy themselves than they save!

(Weiter...  )

Vortrag bei der NASA: Data Analytics for Sustainability

Title: Data Analytics for Sustainability

  • Speaker: Katharina Morik, Technische Universität, Dortmund
  • Date & Time: Wednesday, May 28, 2:00 pm - 3:00 pm
  • Location: Building N245 Auditorium

Abstract:

Sustainability has many facets and researchers from many disciplines are working onthem. Particularly knowledge discovery always considered sustainability an importanttopic (e.g., special issue on data mining for sustainability in Data Mining andKnowledge Discovery Journal, March 2012).

Host: Dr. Kamalika Das
NASA Ames Research Center
MS 269-1, PO Box 1, Moffett Field, CA 94035

PROF. MORIK setzt ihre Vortragsreihe bei google fort

 

On Tue 05/27/2014 Prof. Katharina Morik give a talk about "Resource-aware graphical models and spatio-temporal predictions" at the Google Headquarters in Palo Alto, California, USA.

Abstract:
Machine learning can help to enhance small devices. For instance, keeping the energy consumption of smart phones low is one of the major concerns of the users, as is well illustrated by various “charge your mobile” stations at public places. Where the operating systems of smart phones already offer heuristics and battery apps show consumption profiles, machine learning can do more. Predictions allow better optimizations of the operating system, prepare for particular app usages at certain points in time, or manage services such as GPS or WLAN in a context-aware and adaptive manner. This challenges learning algorithms to real-time application of their models. Moreover, it demands the models to run on the resource-restricted device without consuming more energy themselves than they save!
In the talk, graphical models are presented that face these challenges. Using Conditional Random Fields (CRF) for the prediction of files that the user will fetch next on her smart phone can be used by the operating system for organizing the memory. Analyzing groups of apps running on the smart phone may estimate the energy consumption over time.
A novel spatio-temporal random field (STRF) has been implemented, smoothing the temporal changes and distributing the optimization. This graphical model has been used to predict app usage over time. In another application, it has been combined with a trip planner resulting in smart routing for smart cities. In order to run graphical models on very restricted devices, even those withoutvfloating point calculation, one computing with integer values only has been developed. The integer approximation of graphical models shows good accuracy and speed-up and opens up novel applications on resource-restricted devices.

PROF. MORIK gave A TALK ABOUT 'DATA ANALYTICS FOR SUSTAINABILITY' AT THE Cornell University in New York, USA

 

Sustainability has many facets and researchers from many disciplines are working on them. Particularly knowledge discovery always considered sustainability an important topic (e.g., special issue on data mining for sustainability in Data Mining and Knowledge Discovery Journal, March 2012).

(Weiter...  )

Prof. Morik gives a talk about 'Data Analytics for Sustainability' at the University of Maryland, Baltimore County on Thursday 22 May 2014.

 

Sustainability has many facets and researchers from many disciplines are working on them. Particularly knowledge discovery always considered sustainability an important topic (e.g., special issue on data mining for sustainability in Data Mining and Knowledge Discovery Journal, March 2012).

  • Environmental tasks include risk analysis concerning floods, earthquakes, fires, and other disasters as well as the ability to react to them in order to guarantee resilience. The climate is certainly of influence and the debate on climate change received quite some attention.
  • Energy efficiency demands energy-aware algorithms, operating systems, green computing. System operations are to be adapted to a predicted user behavior such that the required processing is optimized with respect to minimal energy consumption.
  • Engineering tasks in manufacturing, assembly, material processing, and waste removal or recycling offer opportunities to save resources to a large degree. Adding the prediction precision of learning algorithms to the general knowledge of the engineers allows for surprisingly large savings.

Global reports on the millennium goals and open government data regarding sustainability are publicly available. For the investigation of influence factors, however, data analytics is necessary. Big data challenges the analysis to create data summaries. Moreover, the prediction of states is necessary in order to plan accordingly. In this talk, two case studies will be presented. Disaster management in case of a flood combines diverse sensor data streams for a better traffic administration. A novel spatiotemporal random field approach is used for smart routing based on traffic predictions. The other case study is in engineering and saves energy in the steel production based on the multivariate prediction of the processing end-point by the regression support vector machine.

11:00am-12:30pm, Thursday 22 May 2014, ITE 456, UMBC

(Weiter...  )

Call for Papers - MLDM 2015

MLDM 2015

11th International Conference on Machine Learning and Data Mining

July 11 - 24, 2015, Freie Hansestadt Hamburg, Germany

This congress will feature three events the 11th International Conference on Machine Learning and Data Mining MLDM, the 15 th Industrial Conference on Data Mining ICDM ( www.data-mining-forum.de), and the 10 th International Conference on Mass Data Analyisis of Signals and Images MDA (www.mda-signals.de). Workshops and Tutorial will also be given.

  • Submission of papers: January 15th, 2015
  • Notification of acceptance: February 28, 2015
  • Submission of camera-ready copy: April 5th, 2015
(Weiter...  )

Katharina Morik in Wien

Dortmunder postdoc Wouter Duivesteijn wins C.J. Kok Jury Award 2013.
Prof. Dr. Dr. h. c. Monika Henzinger und Prof. Dr. Katharina Morik with some participants of the college, where Katharina Morik gives a course “Data Analytics”.

More than 1 year after the faculty of computer science at the TU Dortmund has conferred an honorary doctorate to Monika Henzinger, Professor at the University of Vienna, Katharina Morik gives a course on "Data Analytics" in the context of the interdisciplinary college at the computer science of the University of Vienna and also presented in a well-attended colloquium lecture results of the SFB876: "Big Data Analytics and Astrophysics".

Workshop: Needles In a Stream of Hay (NISH2014)

Workshop collocated with INFORMATIK 2014, September 22-26, Stuttgart, Germany.

This workshop focuses on the area where two branches of data analysis research meet: data stream mining, and local exceptionality detection.

Local exceptionality detection is an umbrella term describing data analysis methods that strive to find the needle in a hay stack: outliers, frequent patterns, subgroups, etcetera. The common ground is that a subset of the data is sought where something exceptional is going on: finding the needles in a hay stack.

Data stream mining can be seen as a facet of Big Data analysis. Streaming data is not necessarily big in terms of volume per se but instead it can be in terms of the high troughput rate. Gathering data for analyzing is infeasible so the relevant data of a data point has to be extracted when it arrives.

Submission

Submissions are possible as either a full paper or extended abstract. Full papers should present original studies that combine aspects of both the following branches of data analysis:

stream mining: extracting the relevant information from data that arrives at such a high throughput rate, that analysis or even recording of records in the data is prohibited;
local exceptionality mining: finding subsets of the data where something exceptional is going on.

In addition, extended abstracts may present position statements or results of original studies concerning only one of the aforementioned branches.

Full papers can consist of a maximum of 12 pages; extended abstracts of up to 4 pages, following the LNI formatting guidelines. The only accepted format for submitted papers is PDF. Each paper submission will be reviewed by at least two members of the program committee.

(Weiter...  )

NEM Position Paper of Big and Open Data

"NEM position papers are documents giving the NEM Initiative view on any subject related to the networked electronic media area. The NEM position papers typically include: letters of advice to the Commission, formal opinions submitted to the Commissioner, submissions to regulatory bodies, or any other formal statement of this nature, as well as further views of the NEM community on various technological, societal, and policy issues related to NEM." Source: www.nem-initiative.org

(Weiter...  )

Many companies hope for big data

Our students at LS 8 learn exactly what is in demand at many companies.

(Weiter...  )

Dortmunder postdoc Wouter Duivesteijn wins C.J. Kok Jury Award 2013.

Dortmunder postdoc Wouter Duivesteijn wins C.J. Kok Jury Award 2013.

Annually, the Faculty of Science at Leiden University, the Netherlands, grants the C.J. Kok Jury Award for the best PhD thesis of the past year. All institutes within the faculty (astronomy, physics, mathematics, computer science, chemistry, pharmacy, biology, and environmental sciences) are given the opportunity to nominate candidates for the award.

 Out of a pool of over 120 dissertations, the C.J. Kok Jury Award 2013 was won by Wouter Duivesteijn, with his thesis "Exceptional Model Mining". Notably, this is the first time ever that the award (existing since 1971) has been bestowed upon a computer scientist.

Book Announcement: RapidMiner: Data Mining Use Cases and Business Analytics Applications

The book "RapidMiner: Data Mining Use Cases and Business Analytics Applications" has been published on 6 November, 2013 by Chapman and Hall/CRC

"In this book, case studies communicate how to analyze databases, text collections, and image data. … How the given data are transformed to meet the requirements of the method is illustrated by screenshots of RapidMiner. The RapidMiner processes and datasets described in the case studies are published on the companion web page of this book. The inspiring applications may be used as a blueprint and a justification of future applications."
—From the Foreword by Professor Dr. Katharina Morik, Technical University of Dortmund

(Weiter...  )

SFB-Artikel des LS 8 von der ECML PKDD 2013 preisgekrönt

ECML presentationThe paper Spatio-Temporal Random Fields: Compressible Representation and Distributed Estimation by Nico Piatkowski, Sankyun Lee and Katharina Morik is the winner of this year's ECMLPKDD 2013 machine learning best student paper award. The ceremony took place on Monday, September 23rd, in Prague (www.ecmlpkdd2013.org).

The article has been selected out of 182 papers for the journal publication. With an acceptance rate of 7% there were 14 accepted journal publications. 124 papers were selected out of 460 submissions for the proceedings (acceptance rate 26%). From 138 accepted submissions alltogether 4 won the award for best paper. The above article from Nico Piatkowski, Sankyun Lee und Katharina Morik is one of these.

EDBT/ICDT 2014 Call for Workshops

On the last day of EDBT/ICDT 2014, 28. March 2014, there are some workshops. More information about formatting guidelines and registration can be found here.

Deadline: 7. December

(Weiter...  )

EDBT/ICDT 2014 Joint Conference: Call for papers

The International Conference on Extending Database Technology is a leading international forum for database researchers, practitioners, developers, and users to discuss cutting-edge ideas, and to exchange techniques, tools, and experiences related to data management. Data management is an essential enabling technology for scientific, engineering, business, and social communities. Data management technology is driven by the requirements of applications across many scientific and business communities, and runs on diverse technical platforms associated with the web, enterprises, clouds and mobile devices. The database community has a continuing tradition of contributing with models, algorithms and architectures, to the set of tools and applications enabling day-to-day functioning of our societies. Faced with the broad challenges of today's applications, data management technology constantly broadens its reach, exploiting new hardware and software to achieve innovative results.

EDBT 2014 invites submissions of original research contributions, as well as descriptions of industrial and application achievements, and proposals for tutorials and software demonstrations. We encourage submissions relating to all aspects of data management defined broadly, and particularly encourage work on topics of emerging interest in the research and development communities.

Deadline: 15. October 2013

(Weiter...  )

LS8 at the International Broadcasting Convention (IBC) with the EU project Vista-TV

The highly respected conference with an exhibition, IBC, takes place in Amsterdam and Vista-TV is one of the exhibitors. In the Future Zone, Vista-TV presents realtime analytics of Internet-TV use. (more)

"With more than 50,000+ attendees from more than 160 countries, IBC combines a highly respected and peer-reviewed conference with an exhibition that exhibits more than 1,400 leading suppliers of state of the art electronic media technology...
Run by the industry, for the industry, IBC is owned by six industry partners that represent both exhibitors and visitors." (http://www.ibc.org/page.cfm/link=628)
Vista-TV provides users with real-time recommendations of shows and an excellent overview of the current TV program that eases the selection of the channel. In addition, for the producers of shows and for marketing companies, Vista-TV offers a real-time statistics of watching behavior. How many use the smartphone, the computer or the large TV screen for watching Internet-TV right now? In which region are the watching users located? From which channel to which other channel do users switch frequently? All these real-time analyses respect the privacy of the users and do not allow to trace a specific user. The statistics, however, is a source of valuable information.

(Weiter...  )

Fußball-Analyse mit dem streams Framework - TechniBall gewinnt Audience-Award!

In enger Zusammenarbeit mit dem Technion (Israel Institute of Technology) entstand basierend auf dem *streams* Framework ein System zur Echtzeitanalyse von Fußball-Daten für den Wettbewerb der diesjährigen DEBS Konferenz. Aufgabe der Challenge war die Berechnung von Statistiken über das Lauf- und Spielverhalten der Spieler, die mit Bewegungs- und Ortungssensoren des RedFIR Systems (Fraunhofer) augestattet wurden.
Im Rahmen des Wettbewerbs entwickelte der Lehrstuhl 8 zusammen mit dem Technion das "TechniBall" System auf Basis des *streams* Frameworks von Christian Bockermann. TechniBall ist in der Lage, die erforderlichen Statistiken deutlich schneller als in Echtzeit (mehr als 250.000 Events pro Sekunde) zu verarbeiten und wurde vom Publikum des Konferenz zum Gewinner des DEBS Challenge 2013 gekürt.

(Weiter...  )

"Machine Learning and Knowledge Discovery in Databases" as one of the top 50% most downloaded eBooks at Springer

Since its online publication on Sep 04, 2008 there has been a total of 11732 chapter downloads of "Machine Learning and Knowledge Discovery in Databases". In 2012 it is still one of the top 50% most downloaded eBooks in the relevant Springer eBook Collection with 1055 downloads.

(Weiter...  )

BBC about the project Vista TV

The BBC blog about the project Vista-TV in which Libby Miller shows visualizations of user behavior. (Weiter...  )

UBICOMM 2013: Call for papers

The goal of the International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies, UBICOMM 2013, is to bring together researchers from the academia and practitioners from the industry in order to address fundamentals of ubiquitous systems and the new applications related to them. The conference will provide a forum where researchers shall be able to present recent research results and new research problems and directions related to them. The conference seeks contributions presenting novel research in all aspects of ubiquitous techniques and technologies applied to advanced mobile applications.

Deadline: 17. May 2013

(Weiter...  )

Stellen für studentische Hilfskräfte

An der TU Dortmund, Fakultät für Informatik am Lehrstuhl VIII sind ab sofort Stellen für Studentische Hilfskräfte im Umfang von bis zu 10 Wochenstunden zu besetzen. (Weiter...  )

TechniBall - Solution for the DEBS Challenge 2013

LS8 analysis football games in realtime! Each player is equipped with a sensor and so is the ball. The streams framework from LS8 is coupled with the Esper event recognition of Technion. (Weiter...  )

Mit Datenstrom-Algorithmen zum besseren TV-Erlebnis - ViSTA TV Coding Camp am Lehrstuhl 8

Fernsehen über das Internet (IP-TV) spielt eine immer größere Rolle in der heutigen Medienlandschaft. Größere Programmvielfalt, Fernsehen auf mobilen Geräten, oder Mediatheken sind nur ein paar Vorzüge de neuen Fernsehwelt. Um das TV-Erlebnis für jeden Zuschauer zu optimieren ist im Hintergrund jede Menge Hightech gefragt. Das EU-Projekt ViSTA-TV erforscht das TV-Verhalten von Benutzern, sucht nach ähnlichen Sendungen und versucht so, dem Zuschauer das bestmögliche Programm zu empfehlen. Von der Lieblingssendung zu interessanten Dokumentationen oder die neuesten Trends - in der Fülle der Angebote wird für jeden Zuschauer das richtige gefunden.

Das Projekt ViSTA-TV ist ein Gemeinschaftsprojekt der Universitäten Zürich, Amsterdam und des Lehrstuhl 8 der Informatik der TU Dortmund, sowie den Unternehmen BBC, Zattoo und der Dortmunder Firma Rapid-I. Ziel des Projektes ist die Analyse des Fernsehverhaltens von IPTV Nutzern um z.B. Empfehlungen von Sendungen möglichst genau an die Bedürfnisse und Vorlieben der Zuschauer anzupassen. Dafür wird das Ein- und Umschaltverhaltens der Benutzer, sowie Eigenschaften des Video-Signals (zB. Werbungserkennung) analysiert.

Eine Herausforderung stellt dabei die große Datenrate von Video-Daten, die in Echtzeit analysiert werden müssen. Dazu wurde die Datenstrom-Umgebung „streams“, die von Christian Bockermann am Lehrstuhl 8 entwickelt wurde, um die Fähigkeit der Video-Analyse erweitert. Dies ermöglicht die gleichzeitige Analyse von Video-Daten mit dem dazugehörigen Umschaltverhalten aus Log-Daten. Die Ergebnisse werden dann innerhalb eines Empfehlungssystems weiter verarbeitet um Nutzern einen maßgeschneiderten Blick auf das TV-Angebot zu bieten.

Mit im Blick haben die Forscher aus Dortmund dabei natürlich auch die Integra-tion weiterer Datenquellen, wie DBpedia, elektronische Fernsehzeitschriften oder die beliebte Internet Movie Database (imdb). Im Sinne des „Big Data“ Gedankens, werden alle diese Informationen zeitnah analysiert und lassen so auch Informationen über Schauspieler, Nachrichten oder aktuelle Trends auf Twitter und facebook mit in die Empfehlungen einfließen.


Coding-Camp an der TU

In dieser Woche findet im an der TU Dortmund das zweite Coding-Camp zum ViSTA-TV Projekt statt. Dabei stehen insbesondere die Integration der Module der Projektpartner im Mittelpunkt. Das Ziel des Coding-Camp ist ein erster lauffähiger Prototyp des Projektes, der Programmempfehlungen an Zuschauer über Handy-Apps anbietet.

Jugend Forscht: Regionalwettbewerb in Dortmund

Am 19. Februar findet in Dortmund der Regionalwettbewerb Jugend forscht statt. In den Räumen der DASA Arbeitswelt Austellung präsentieren die jungen Nachwuchsforscher ihre Ideen und Arbeiten in verschiedenen Forschungsgebieten der Jury. Für das Gebiet Mathematik/Informatik ist mit Christian Bockermann auch der Lehrstuhl 8 der Fakultät für Informatik und ein Mitarbeiter im Projekt C1 des SFB in der Jury vertreten.

Book on Managing and Mining Sensor Data published

The book Managing and Mining Sensor Data has been published as an ebook and will be available as hardcover from 28th of February 2013. The book has been supported by the collaborative research center by the authors Marco Stolpe (project B3, Artificial Intelligence) and the guest researcher Kanishka Bhaduri. They contributed the chapter on Distributed Data Mining in Sensor Networks.

Especially sensor networks provide data at different, distributed locations. For an efficient analysis new technologies need to calculate results even if communication ressources are constrained.

(Weiter...  )

IEEE International Conference on Data Mining

Katharina Morik organizes a Panel on the value of data at the IEEE International Conference on Data Mining (Weiter...  )

Zwei Wissenschaftliche Mitarbeiter gesucht

Der Lehrstuhl für künstliche Intelligenz sucht zum nächstmöglichen Zeitpunkt zwei wissenschaftliche Beschäftigte.

  • Für das Projekt DDMD (Data Driven Materials Development) wird ein/eine (Post-)Doktorand/in gesucht. Das Projekt läuft in Zusammenarbeit mit Univ. Duisburg-Essen und RUB. Weitere Details können der Ausschreibung entnommen werden.
  • Für das Projekt KobRA (Korpus-basierte linguistische Recherche und Analyse mit Hilfe von Data-Mining) wird ebenfalls eine/ein wissenschaftliche/r Beschäftigte/r gesucht. Dabei soll die Korpus-basierte Linguistik durch Methoden des Data Mining unterstützt werden. Weitere Details können der Ausschreibung entnommen werden.

New newspaper article about Katharina Morik published

The German newspaper "Westdeutsche Allgemeine Zeitung" has published an article about Katharina Morik. The full article can be found on their website. (Weiter...  )

Stellenausschreibung: Entwicklung einer prozessdatenbasierten realzeitlichen Parameteradaptierung in automatisierten Produktionsprozessen

Im Anwendungsfall energie- und ressourcenintensiver Industrien besteht die Herausforderung darin, steigende Produktqualität bei gleichzeitiger Reduzierung von Kosten und Produktionszeiten zu realisieren. Prinzipien und Methoden von Qualitätsmanagement- und Produktionssystemen nach dem Vorbild der japanischen Automobilindustrie rücken dabei als vorrangiges Leitbild branchenübergreifend in den Mittelpunkt. Als ein wesentliches Element des TPS leistet das Prinzip einer prozessimmanenten Qualitätskontrolle, auch bekannt unter den Begriffen Jidoka oder Autonome Automation, einen entscheidenden Beitrag. Jedoch ist das Jidoka-Prinzip im Fall automatisierter, verketteter Produktionsprozesse, wie sie beispielsweise in der Stahlindustrie vorzufinden sind, auf konventionellem Weg nicht ohne weiteres realisierbar.

Ziel dieses Promotionsvorhabens ist die Entwicklung und Validierung einer Systematik zur Ausschussminimierung und Produktqualitätsoptimierung im Kontext starr verketteter, automatisierter Produktionsprozesse. Ein möglicher Ansatz stellt dabei das Konzept der Advanced Process Control dar. Zentraler Gedanke ist dabei die realzeitliche, prozessdatenbasierte Überwachung und Auswertung von Produktionsprozessen mit dem Ziel, kurzfristige Prozessschwankungen ausgleichen und somit die Produktqualität sicherstellen zu können. Das Promotionsvorhaben soll für das oben skizzierte Produktionssystem einen Ansatz entwickeln, der basierend auf der automatisierten Auswertung von Prozessparametern entscheidet, ob die Qualität des aktuell bearbeiteten Produkts den Spezifikationen entspricht oder ob und in welcher Form eine Anpassung der Prozessparameter erforderlich und realzeitlich möglich ist, um die Qualitäts­spezifikationen zu erfüllen. Alternativ besteht eine weitere Entscheidungsmöglichkeit darin, das Produkt nicht weiter zu bearbeiten, wenn die Qualitätsabweichung durch Anpassung des Produktionsprozessablaufes nicht korrigiert werden kann.

Die Durchführung des Vorhabens umfasst neben der Entwicklung des theoretischen Konzeptes, eine simulationsbasierte Validierung sowie in enger Kooperation mit der Deutsche Edelstahlwerke GmbH am Standort Witten die Integration des Konzeptes in die betrieblichen Produktionsabläufe. Zur Lösung der Aufgabe soll auf den Einsatz modernster Data Mining-Techniken zurückgegriffen werden.

Betreuer: Prof. Deuse

Bewerbungen ab sofort an:

Dipl.-Wirt.-Ing. Uta Spörer
Tel.: +49 (231) 755 – 5787
Fax: +49 (231) 755 – 5772
E-Mail: spoerer@gsoflog.de
Mo- Do: 8:30 - 12:30 Uhr

 

(Weiter...  )

LWA2012 from 12.09. to 14.09. at the Computer Science Department

LWA stands for "Lernen, Wissen, Adaption" (Learning, Knowledge, Adaptation). It is the joint forum of four special interest groups of the German Computer Science Society (GI). Following the tradition of the last years, LWA provides a joint forum for experienced and for young researchers, to bring insights to recent trends, technologies and applications, and to promote interaction among the SIGs. (Weiter...  )

HIGHLIGHTS from the 5th Annual Rexer Analytics Data Miner Survey (2011)

  • SURVEY & PARTICIPANTS: 52-item survey of data miners, conducted on-line in 2011. Participants: 1,319 data miners from over 60 countries.
  • FIELDS & GOALS: Data miners work in a diverse set of fields. CRM/Marketing has been the #1 field for the past five years. Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals continue to be the goals identified by the most data miners.
  • ALGORITHMS: Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. A third of data miners currently use text mining and another third plan to do so in the future.
  • TOOLS: R continued its rise this year and is now being used by close to half of all data miners (47%). R users report preferring it for being free, open source, and having a wide variety of algorithms. Many people also cited R's flexibility and the strength of the user community. STATISTICA is selected as the primary data mining tool by the most respondents (17%). Data miners report using an average of 4 software tools. STATISTICA, KNIME, Rapid Miner and Salford Systems received the strongest satisfaction ratings in 2011.
  • ANALYTIC CAPABILITY & SUCCESS MEASUREMENT: Only 12% of corporate respondents rate their company as having very high analytic sophistication. However, companies with better analytic capabilities are outperforming their peers. Respondents report analyzing analytic success via Return on Investment (ROI) and analyzing the predictive validity or accuracy of their models. Challenges to measuring success include client or user cooperation and data availability/quality.
  • SHARED INSIGHTS: In the 2010 Survey data miners shared best practices in overcoming the key challenges data miners face ( verbatims ). In the 2011 Survey data miners shared their best practices for measuring analytic success ( verbatims ) and examples of the positive impact that data mining can have to benefit society, health, and the world ( verbatims ). Additionally, 225 R users shared information about how and why they are using R ( verbatims ).
After the 2011 survey, Rexer Analytics Data Miner Survey has moved to a bi-annual schedule; the next Data Miner Survey will be launched in early 2013. Information about Rexer Analytics is available at www.RexerAnalytics.com (Weiter...  )

Grant application in line with the 4th call for proposals of the Mercator Research Center Ruhr (MERCUR) granted

The grant application Data Driven Materials Design (DDMD) was granted in line with the 4th call for proposals of the Mercator Research Center Ruhr (MERCUR). The project is a cooperation between Prof. Dr. Ralf Drautz, Prof. Dr. Alfred Ludwig, (both Ruhr-University Bochum), Prof. Dr. Katharina Morik (Chair 8) and Prof. Dr. Sven Rahmann (University Duisburg-Essen). The connected usage of experimental high-through-put-methods and analytic modeling in materials research, especially in the fields of thin-layer-material-libraries, "Attribute-Screenings" and "Advance Materials Simulation", is one of Ruhr-University's unique features, which is intended to be strengthened with this application. The mentioned fields have in common that they generate an extremely huge amount of multidimensional data that can not be analyzed efficiently without the help of computers. Analyzing huge amounts of data is one of TU Dortmund's focuses of which in this case particularly Data Mining is addressed. At Univerity Essen-Duisburg high-through-put-analysis is in front. The intention of this colaboration is to initiate a more rational development of new materials. The application tends to establish the foundation for the field of Data Driven Material Development. In this field new discoveries as well as new comprehensions (e.g. unknown phases or special physical properties) are supposed to be gained. In addition the development of new materials is to be speeded up. (Weiter...  )

Beste Bewertung: EU-Antrag INSIGHT

The application "INtelligent Synthesis and Real-tIme Response using Massive StreaminG of HeTerogeneous Data" is the best rated one in field FP7 "Intelligent Information Management" reaching 14.5 of possible 15 points. The Department of Computer Science at the TU Dortmund is involved with chair 8. Coordinator is Dimitrios Gunopulos (National University Athen). It is about analysing the huge amount of heterogeneous datastreams from sensors, mobile phones and control systems to enhance the management of cases of emergency. Examples are taken from the city of Dublin and the german Federal Office of Civil Protection and Disaster Assistance. The innovation of Data Mining finally enabled analysing social networks (e.g. Twitter), sensor-networks, and traffic-systems in connection and integrating citizens in this process.

ViSTA-TV started on June 1st

Live video content is increasingly consumed over IP networks in addition to traditional broadcasting. The move to IP provides a huge opportunity to discover what people are watching in much greater breadth and depth than currently possible through interviews or set-top box based data gathering by rating organizations. The ViSTA-TV project proposes to gather consumers’ anonymized viewing behavior and the actual video streams combined with enhanced electronic program guide information. ViSTA-TV will be in the position to provide highly accurate market research information about viewing behavior that can be used for a variety of analyses of high interest to all participants in the TV-industry. Furthermore ViSTA-TV will employ the information gathered to build a recommendation service. ViSTA-TV is an European Union-funded research project, beginning on 1 June 2012, and lasting for two years. The Artificial Intelligence Group participates alongside 5 other partners. (Weiter...  )

RapidMiner tested

Rapid-I is based in Dortmund, Germany, and has been working on RapidMiner, a Data-Mining-Software, since 2001. With its wide range of other tools such as RapidAnalytics, RapidLab, RapidNet and RapidSentilyzer it could win over such clients as Siemens, Allianz and Pepsico. The website JTonEDM.com introduces Rapid-I and its software RapidMiner in a short overview. (Weiter...  )

NEU MASTER-/ DIPLOMARBEIT ZU VERGEBEN: Effiziente Erfassung von Concept Drifts bei zyklischen Veränderungen in Stahlwerksprozessen

In heutigen Industrieanlagen zeichnen Sensoren während des Produktionsprozesses große Mengen an Daten auf. Aus diesen Daten wird schon während des laufenden Prozesses auf die Qualität des Endprodukts geschlossen. Produktionsbedingt kommt es während des laufenden Prozesses zu Veränderungen von Anlagenkomponenten und Messtechnik, die nur zyklisch instand gehalten werden können. Die Veränderungen spiegeln sich auch in den Prognosemodellen wieder. Es kommt zum Concept Drift. SMS Siemag und ein führender Hersteller von Grobblechen stellen für diese Arbeit aktuelle Produktionsdaten zur Verfügung. Im Rahmen der Bachelor, Diplom- oder Masterarbeit sollen Strategien zur Identifizierung von Concept Drifts und zur Stabilisierung der Prognosegüte entwickelt werden. Eine besondere Herausforderung stellt es dar, dass die Concept Drift Erkennung und Bereinigung in Realzeit geschehen soll. Der Schwerpunkt der Arbeit liegt daher auf der Auswahl, der Implementierung und dem Vergleich besonders effizienter Verfahren zur Entdeckung von Concept Drifts. (Weiter...  )

NEU MASTER-/ DIPLOMARBEIT ZU VERGEBEN: Steuerung von Prozessen in der Stahlproduktion mit Hilfe von multikriterieller Optimierung

In heutigen Industrieanlagen zeichnen Sensoren während des Produktionsprozesses große Mengen an Daten auf. Aus diesen Daten wird schon während des laufenden Prozesses auf die Qualität des Endprodukts geschlossen. Bisher wird von der Optimierung meist nur eine Zielgröße behandelt. Die Qualität des Endproduktes hängt aber oft von mehreren Zielgrößen ab, die sich obendrein widersprechen können. Dies kann nun als multikriterielles Optimierungsproblem formalisiert werden. Insbesondere muss eine geeignete Fitnessfunktion bestimmt werden. Dann können die Anwender aus den pareto-optimalen Lösungen Handlungsempfehlungen ableiten. Am LS8 stehen Sensordaten über den Produktionsprozess eines führenden Herstellers von Grobblechen zur Verfügung. An diesem Beispiel kann die Formalisierung von widersprüchlichen Zielgrößen als multikriterielle Optimierung untersucht werden. Dabei können Implementierungen in Rapid-Miner genutzt werden. Die genaue Aufgabenstellung wird darauf angepasst, ob es eine Bachelor, Diplom- oder Masterarbeit wird. (Weiter...  )

Special Issue of the international journal Data Mining and KnowledgeDiscovery published!

Together with Kanishka Bhaduri and Hillol Kargupta, Katharina Morik has edited a special issue of the international journal Data Mining and Knowledge Discovery. The special issue on Data Mining for Sustainability including a comprehensive introduction is now online at http://www.springerlink.com/. (Weiter...  )

Projektgruppenvorstellung "Kooperatives Datamining mit vernetzen Robotern"

Die Projektgruppe "Kooperatives Datamining mit vernetzen Robotern" wird am 22.12.2011 um 14:00 Uhr (s.t.) in den neuen Räumlichkeiten des Lehrstuhls 8 (Joseph-von-Frauenhofer Straße 23 in Raum 1.48) präsentiert.

Neue Diplom-/Masterarbeit zu vergeben: Personalisierung von Hotelempfehlungen anhand von Klickpfaden

Die Suche und Buchung von Hotels über das Internet wird heute üblicherweise über spezielle Portale abgewickelt. Die reine Filterung anhand von Suchkriterien führt häufig zur Ausgabe einer noch immer unüberschaubaren Anzahl von Hotels. Für die langfristige Bindung von Kunden an ein Portal ist es jedoch entscheidend, so schnell wie möglich Hotels anbieten zu können, die für die jeweilige Person (oder Personengruppe) tatsächlich geeignet sind. Mittels Methoden des Data Minings und maschinellen Lernens sollen Benutzerpräferenzen gelernt werden, die personalisierte und damit geeignetere Empfehlungen von Hotels ermöglichen. Hierzu werden vom weltweit führenden Portalbetreiber "Hotel Reservation Service" (HRS) Daten über Hotels, Portalbesucher, Kunden, Buchungen und Hotelbewertungen zur Verfügung gestellt. (Weiter...  )

KDD 2011 Workshop on Data Mining Applications in Sustainability in San Diego, CA

The annual ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. KDD-2011 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition. KDD-2011 will run from August 21-24 in San Diego, CA and will feature hundreds of practitioners and academic data miners converging on the one location. (Weiter...  )

Übersicht über den Einfluss führender Datenbank und Data Mining Journale 2010 veröffentlicht

Being in the editorial boards of Knowledge and Information Systems (KAIS) and of Data Mining and Knowledge Discovery (DMKD), Katharina Morik happily presents the impact factors (2010) of some leading database and data mining journals:
  • ACM Transactions on Information Systems (TOIS): 1.085
  • ACM Transactions on Database Systems (TODS): 1.216
  • Data Mining and Knowledge Discovery (DMKD): 1.238
  • Information Systems (IS): 1.595
  • Data and Knowledge Engineering (DKE): 1.717
  • IEEE Transactions on Knowledge and Data Engineering (TKDE): 1.847
  • Machine Learning (ML): 1.956
  • Knowledge and Information Systems (KAIS): 2
Download the complete list

New Topic for a Master-/DA- Thesis: Feature Extraction from video-data

Neben YouTube und Co. wird das Internet mit zunehmender Bandbreite auch für klassisische Fernsehübertragungen immer interessanter. War IP-TV bisher meist für große Sportereignisse im Fokus, bieten Firmen wie z.B. zattoo.com bereits die Möglichkeit sich einer Vielzahl unterschiedlicher Kanäle zu bedienen, Sendungen online aufzuzeichnen und zu Archivieren. Aber wie findet man interessante Sendungen? Welche Informationen geben Aufschluß über Programme die mir gefallen? Lassen sich Spartensender allein anhand der Informationen aus den Video-Daten unterscheiden? In dieser Master-Arbeit geht es um die Extraktion von Merkmalen, die für die Klassifikation oder die Gruppierung von Sendungen, Sendern oder Fernsehzuschauern wichtig sind. (Weiter...  )

Feature Selection Extension for RapidMiner - NEW RELEASE 1.1.3

The Feature Selection Extension für RapidMiner 5 contains some operators for feature selection and -weighting and for classification. All operators are also highly suitable for high-dimensional data, e.g. microarray data. New in this version are:
  • RCCW - Recursive Conditional Correlation Weighting a very fast feature subset selection method.
  • FCBF - Fast Correlation Based Feature Selection
  • PAM - Classification by Shrunken Centroids
  • BAHSIC - Backward Feature Selection via Hilbert-Schmidt information criterion
  • t-Test - Computes a p-Value for the difference of the mean values between two classes
  • Test Significance - Assumes normal distribution, then checks for equal class variances via F-test and afterwards computes p-Value via t-Test or Welch-test
  • Benjamini-Hochberg-Correction - Performs the correction for FDR on significance values in an AttributeWeights object
Already available since older version are - amongst others - Recursive Feature Elimination (RFE) and minimum Redundancy Maximum Relevance Feature Selection (MRMR) / Correlation based Feature Selection (CFS) and a meta-operator for ensemble feature selection. The most recent version is available for free from SourceForge: https://sourceforge.net/projects/rm-featselext/ . (Weiter...  )

RapidMiner is most popular data mining tool according to KDnuggets poll

RapidMiner is again the most popular data mining tool in KDnuggets poll. (Weiter...  )

Colloquium of the Collaborative Research Center SFB 876 on June 30th, 2011: Prof. Preeti Ranjan Panda (Indian Institute of Technology Delhi)

Graphics processor (GPU) architectures have evolved rapidly in recent years with increasing performance demanded by 3D graphics applications such as games. However, challenges exist in integrating complex GPUs into mobile devices because of power and energy constraints, motivating the need for energy efficiency in GPUs. While a significant amount of power optimisation research effort has concentrated on the CPU system, GPU power efficiency is a relatively new and important area because the power consumed by GPUs is similar in magnitude to CPU power. Power and energy efficiency can be introduced into GPUs at many different levels: (i) Hardware component level - queue structures, caches, filter arithmetic units, interconnection networks, processor cores, etc., can be optimised for power. (ii) Algorithm level - the deep and complex graphics processing computation pipeline can be modified to be energy aware. Shader programs written by the user can be transformed to be energy aware. (iii) System level - co-ordination at the level of task allocation, voltage and frequency scaling, etc., requires knowledge and control of several different GPU system components. (Weiter...  )

Colloquium of the Collaborative Research Center SFB 876 on June 9th, 2011: Prof Piero Bonatti (University of Naples)

An increasing amount of information is being encoded via ontologies and knowledge representation languages of some sort. Some of these knowledge bases are encoded manually, while others are generated automatically by information extraction techniques. In order to protect the confidentiality of this information, a natural choice consists in encoding policies with the same language as the ontology language. This approach led to so-called "semantic web policies". The semantic web is founded on two knowledge representation languages: description logics and logic programs. In this talk we compare their expressive power as *policy* representation languages, and show that logic programming approaches are currently more mature than description logics, although this picture may change in the near future. (Weiter...  )

Colloquium of the Collaborative Research Center SFB 876 on May 5th, 2011: Henrik Blunck (University of Aarhus)

Emerging and envisioned applications within domains such as indoor navigation, fire-fighting, and precision agriculture still pose challenges for existing positioning solutions to operate accurately, reliably, and robustly in a variety of environments and conditions and under various application-specific constraints. This talk will first give a brief overview of efforts made in a Danish project to address challenges as mentioned above, and will subsequently focus on addressing the energy constraints imposed by Location-based Services (LBS), running on mobile user devices such as smartphones. A variety of LBS, including services for navigation, location-based search, social networking, games, and health and sports trackers, demand the positioning and trajectory tracking of smartphones. To be useful, such tracking has to be energy-efficient to avoid having a major impact on the battery life of the mobile device, since the battery capacity in modern smartphones is a scarce resource, and is not increasing at the same pace as new power-demanding features, including various positioning sensors, are added to such devices. We present novel on-device sensor management and trajectory updating strategies which intelligently determine when to sample different on-device positioning sensors (accelerometer, compass and GPS) and when data should be sent to a remote server and to which extent to simplify it beforehand in order to save communication costs. The resulting system is provided as uniform framework for both position and trajectory tracking and is configurable with regards to accuracy requirements. The effectiveness of our approach and the energy savings achievable are demonstrated both by emulation experiments using real-world data and by real-world deployments. (Weiter...  )

MonetDB: Open-source Columnar Database Technology Beyond Textbooks - Vortrag von Stefan Manegold

Stefan Manegold from CWI Amsterdam will be giving a talk on the column-store DBMS MonetDB on 2011/02/11 um 16.00 at Room E23, Otto-Hahn-Straße 14.

Abstract:
Column-store database management systems have recently experienced a considerable popularity-boost. The underlying ideas, however, date back to (at least) the mid 1980's and the technology has been pioneered since the early 1990's in the MonetDB system, a column-store research prototype that has been developed into a complete SQL- and XML/XQuery-compliant column-store DBMS freely available in open source. Next to its column-store back-bone, MonetDB focuses on high-performance hardware-conscious algorithms, novel workload-adaptive query processing techniques such as "cracking", "recycling" and run-time query optimization, and extensibility at all layers of its software stack.

In this talk, we will provide detailed insight into MonetDB's column-store architecture and query-processing technology as available in open-source, discussing its benefits for data mining, OLAP, BI, as well as science workloads.


Eröffnungskolloquium des SFB 876 - Jetzt Folien Online!

The new Collaborative Research Center SFB 876 "Providing Information by Resource-Constrained Data Analysis" starts the new year with a kick-off colloquium. The colloquium takes place on January 20th 2011 starting at 4 pm at auditorium E23, Otto-Hahn-Straße 14, TU Dortmund University campus. For further information about the program and speeches please have a look at the attachment.

SFB 876 - Die Bewerbungsfrist ist abgelaufen

At this time, no futher applications for open positions at the SFB 876 are being accepted.

SFB 876 granted!

The DFG granted the SFB 876. (Weiter...  )

Presentations online!

First presentations and pictures available on the MODAP workshop website. (Weiter...  )

First International Workshop on Social and Privacy aspects of the Mobility

Analyzing huge amounts of mobility data has posed new challenges not only in the discovery and interpretation of interesting patterns, but also in the privacy preservation of individuals under observation. However, the social and privacy aspects of mobility have not been studied in a systematic and combinatorial way, while the census and the conception of their effects in our lives is rather in childhood. The convergence of these complementary aspects, and more specifically, the way that mobility affects (or is affected by) the social behavior of individuals and their privacy, emerges the exciting new area of "socio-mobility". Socio-mobility arises a number of challenging questions. Are people moving together socially related? Are there social relations between people moving to semantically similar places? How could we combine mobility data and patterns with social networking information? Can social interactions be mined from mobility data by using external media? To what extend do social interactions affect privacy? What are the risks of disclosing social interactions between people and how can we design privacy-preserving techniques to minimize the risks? What kinds of social interactions are considered sensitive and how can we model / distort / suppress such interactions? (Weiter...  )

Interdisciplinary College in Günne in Günne at Lake Möhne, 25.März - 1. April 2011

The Interdisciplinary College (IK) is an annual, intense one-week spring school which offers a dense state-of-the-art course program in neurobiology, neural computation, cognitive science/psychology, artificial intelligence, robotics and philosophy. It is aimed at students, postgraduates and researchers from academia and industry. (Weiter...  )

PG 542 Final presentation

The student project group 542 "Stream Mining for Intrusion Detection in Distributed Systems" has succesfully finished their work on a generic framework for online and distributed data mining. All results including system's architecture, evluation of learning algorithms and a live demo covering the use case of intrusion detection will be presented on Thursday, 28th October, 10.15 at GB 4 room 136. (Weiter...  )

RapidMiner Hierarchical Heavy Hitters Plugin

After presenting our paper "Implementing Hierarchical Heavy Hitters in RapidMiner: Solutions and Open Questions" at the RCOMM 2010, we have released all accompanying Java code as RapidMiner 5 plugin. The plugin can be used to calculate Hierarchical Heavy Hitters on system call data. It furthermore contains domain-independent implementations of the related algorithms in Java. (Weiter...  )

RapidMiner Microarray Feature Selection Plugin released

The Microarray Feature Selection Plugin for RapidMiner 5 contains some feature selection and -weighting operators useful for working on high-dimensional (microarray-) data. These are - amongst others - Recursive Feature Elimination (RFE) and minimum Redundancy Maximum Relevance Feature Selection (MRMR) / Correlation based Feature Selection (CFS) and a meta-operator for ensemble feature selection. (Weiter...  )

RapidMiner Information Extraction Extension

The RapidMiner Information Extraction extension supports Information Extraction techniques in RapidMiner. Visualizers, annotators and preprocessing operators have been implemented for textual purpose. Structured models - namely Conditional Random Fields - for the extraction of named entities are available. Operators to extract relations with will be available soon. (Weiter...  )

Summer School on Mobility, Data Mining, and Privacy

The 1st Summer School on Mobility, Data Mining, and Privacy is co-organized by the FP7/ICT project MODAP "Mobility, Data Mining, and Privacy" (www.modap.org) and the COST Action IC0903 MOVE "Knowledge Discovery from Moving Objects" (http://move-cost.info/). This is the first doctoral school ever on the 'hot' intersection of three domains: modeling and management of moving object databases (Mobility), data analysis and knowledge discovery from mobility data (Data Mining), and privacy aspects that raise when processing human mobility (Privacy). (Weiter...  )

NEW Book: Ubiquitous Knowledge Discovery

Knowledge discovery in ubiquitous environments is an emerging area of research at the intersection of the two major challenges of highly distributed and mobile systems and advanced knowledge discovery systems. The new book, edited by Michael May and Lorenza Saitta, provides a state-of-the-art survey. It is the outcome of a large number of workshops, summer schools, tutorials and dissemination events of the European project KDubiq. (Weiter...  )

Initiative zur Datenanalyse unter Ressourcenbeschränkungen - Treffen in Bommerholz am 24./25.8.09

Bringing together embedded systems and data mining enables new solutions in computer science, bio medicine, physics and mechanical engineering. Embedded system can be further improved using machine learning while data mining algorithms can be realized in hardware, e.g. FPGAs. The restrictions in computing power, memory and energy demands new algorithms for known learning tasks. At Bommerholz 26 scientists and researchers from TU Dortmund and University Duisburg-Essen came together to gain a deeper understanding of the topic and exchange progress of ongoing projects.

RapidMiner -- most used open source data mining tool

RapidMiner is the most successful open source data mining tool for the third year in series -- only the commercial product Clementine (SPSS PASW Modeler) is more popular. (Weiter...  )

ACML'10

Asian Conference on Machine Learning
November 8-10, 2010, Tokyo, Japan (Weiter...  )

Special Issue on Sustainability of the Data Mining and Knowledge Discovery Journal

Special Issue on Sustainability of the Data Mining and Knowledge Discovery Journa (Weiter...  )

Recording of Talk

Katharina Morik "Handling Texts -- A Challenge for Data Mining" talk (in English), introduced by Jean-Gabriel Ganascia on the 9th francophone expert conference on Machine Learning and Data Mining, Strasbourg 2009

(Needs Microsoft Media Player Plugin):
<img src="http://canalc2.u-strasbg.fr/images/fondWM.gif" width="240" height="180" align="top" />

Videolink for other players

(Weiter...  )

Machine learning and biology

Lecture of Yoav Freund at the ECML PKDD 2008 about machine learning and biology

BioDatatbases(bioDatabases.m4v, 170.6 MB)

(Weiter...  )

Informatik kompakt

Based on the experiences of the 1999 lecture DAP1 a new textbook has finally arisen. This book introduces the fundamentals of the common core of different computer science areas by means of the programming language JAVA. (Weiter...  )

Chancengleichheit von Frauen an Universitäten

Prof. Dr. Katharina Morik was asked for a statement about equal opportunity for women.

The resulting TV report was shown on 07/11/2007 during the "tagesschau" news broadcast.

Source: Tagesschau-archive

Software

Several programs have been developed at the AI unit within its research activities, such as myKLR, SVMlight, mySVM, RapidMiner (formerly YALE), the Information Layer or the USCHIFICATOR. Check our software page for a complete list. (Weiter...  )

Der Lehrstuhl zieht um!