Hauptnavigation

SFB 475 Teilprojekt A4: Statistical Methods and Machine Learning

Both statistics and machine learning share the goal of analysing data to find regularities and predict future events. This project explores three forms of synergy arising between machine learning and statistics. Their effectiveness is analysed on selected problems. 

  • Support users of statistical methods using a knowledge based approach. Machine learning techniques are used for building the knowledge base.
  • Support users of machine learning methods using tools from statistics. This involves drawing appropriate samples from large datasets, clustering numerical data as preprocessing and data reduction based on sufficiency, invariances etc.
  • Knowledge discovery in databases using both methods from statistics and machine learning. In particular this will involve new techniques from inductive logic programming, which have the ability to represent relational concepts. This high expressive power leads to high computational demands. This makes a reduction of the raw data mandatory. Potential solutions are the incorporation of statistical tests for dimensionality reduction and sampling into the induction algorithm.

This project aims to combine methods and experience from statistics and machine learning to build systems for analysing the large datasets common in todays applications. To be able to integrate and combine different methods, their theoretical charaterization as well as practical experience with such multi-strategy systems are crucial. The goal is to cross the border between the two disciplines and develop new methods more powerful than those in each field alone. Dieses Dokument gibt es auch in deutscher Sprache

Partners

Contacts

Open Topics for Master Theses

Interessted students can view our list with open topics for master theses dealing with Support Vector Machines and learning to learn. This list is in german (sorry).

Software and Online Resources

Publications

Klinkenberg/Rueping/2003a Klinkenberg, Ralf and Rüping, Stefan (2003). Concept Drift and the Importance of Examples. In Text Mining -- Theoretical Aspects and Applications, pages 55-77. Physica-Verlag.
Morik/etal/2002a Morik, Katharina and Joachims, Thorsten and Imhoff, Michael and Brockhausen, Peter and Rüping, Stefan (2002). Integrating Kernel Methods into a Knowledge-based Approach to Evidence-based Medicine, chapter Integrating Kernel Methods into a Knowledge-based Approach to Evidence-based Medicine, pages 71-99. Physica-Verlag. [.ps] [.pdf]
Rueping/2001b Rüping, Stefan (2001). Incremental Learning with Support Vector Machines. In Cercone, Nick and Lin, T.Y. and Wu, Xindong, editor(s), Proceedings of the 2001 IEEE International Conference on Data Mining, pages 641--642. .
Sondhauss/Weihs/2001a Sondhauss, Ursula and Weihs, Claus (2001). Incorporating background knowledge for better prediction of cycle phases. Technical report, Universität Dortmund.
Joachims/00a Joachims, Thorsten (2000). Estimating the Generalization Performance of a SVM Efficiently. In Langley, Pat, editor(s), Proceedings of the International Conference on Machine Learning, pages 431--438. Morgan Kaufman. [.ps.gz] [.pdf]
Klinkenberg/Joachims/2000a Klinkenberg, Ralf and Joachims, Thorsten (2000). Detecting Concept Drift with Support Vector Machines. In Langley, Pat, editor(s), Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 487--494. Morgan Kaufmann. [.ps.gz] [.pdf]
Arminger/Goetz/99a Arminger, Gerhard and Götz, Norman (1999). Asymmetric Loss Functions for Evaluating the Quality of Forecasts in Time Series for Goods Management Systems. Technical report, Universität Dortmund.
Arminger/Schneider/99a Arminger, Gerhard and Schneider, Carsten (1999). Frequent Problems of Model Specification and Forecasting of Time Series in Goods Management Systems. Technical report, Universität Dortmund.
Brockhausen/99a Peter Brockhausen (1999). Learning First Order Rules in Intensive Care Monitoring. In ILP--99 Late-Breaking Papers, pages 22--27. . Session held at the Ninth International Workshop On Inductive Logic Programming (ILP--99). [.ps.gz] [.pdf]
Brockhausen/99b Peter Brockhausen (1999). Learning First--Order Rules in Intensive Care Monitoring. In Petra Perner, editor(s), Maschinelles Lernen, FGML 99 in series IBal Report, pages 1--7. Institut für Bildverarbeitung und angewandte Informatik.
Joachims/99a Joachims, Thorsten (1999). Making large-Scale SVM Learning Practical. In Advances in Kernel Methods - Support Vector Learning, chapter 11. MIT Press. [.ps.gz] [.pdf]
Joachims/99c Thorsten Joachims (1999). Transductive Inference for Text Classification using Support Vector Machines. In International Conference on Machine Learning (ICML). . [.ps.gz] [.pdf]
Joachims/99e T. Joachims (1999). Estimating the Generalization Performance of a SVM Efficiently. Technical report, Universität Dortmund, LS VIII. [.ps.gz]
Joachims/etal/99a T. Joachims and A. McCallum and M. Sahami and M. Craven, editor(s) (1999). Machine Learning for Information Filtering in series IJCAI Workshop. AAAI Press.
Morik/etal/99a Katharina Morik and Peter Brockhausen and Thorsten Joachims (1999). Combining statistical learning with a knowledge-based approach -- A case study in intensive care monitoring. In Proc. 16th Int'l Conf. on Machine Learning (ICML-99). . [.ps.gz] [.pdf]
Scheffer/Joachims/99a Tobias Scheffer and Thorsten Joachims (1999). Expected Error Analysis for Model Selection. In International Conference on Machine Learning (ICML). .
Brockhausen/Morik/98a Peter Brockhausen and Katharina Morik (1998). Wissensentdeckung in relationalen Datenbanken: Eine Herausforderung für das maschinelle Lernen. In Data Mining, theoretische Aspekte und Anwendungen, pages 193--211. Physica Verlag. [.ps.gz] [.pdf]
Joachims/98a Joachims, Thorsten (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Claire N\'edellec and C\'eline Rouveirol, editor(s), Proceedings of the European Conference on Machine Learning, pages 137 -- 142. Springer. [.ps.gz] [.pdf]
Joachims/98c Thorsten Joachims (1998). Making large-Scale SVM Learning Practical. Technical report, Universität Dortmund, LS VIII-Report. [.ps.gz] [.pdf]
Sahami/etal/98a M. Sahami and M. Craven and T. Joachims and A. McCallum, editor(s) (1998). Learning for Text Categorization WS-98-05 in series ICML/AAAI Workshop. AAAI Press.
Scheffer/Joachims/98a Tobias Scheffer and Thorsten Joachims (1998). Estimating the expected error of empirical minimizers for model selection. Technical report, TU-Berlin. [.ps]
Imhoff/etal/97a Michael Imhoff and Markus Bauer and Ursula Gather and D. Löhlein (1997). Time Series Analysis in Intensive Care Medicine. Applied Cardiopulmonary Pathophysiology, 6 pages 203 -- 281.
Joachims/97b T. Joachims (1997). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Technical report, Universität Dortmund, LS VIII-Report. [.ps.gz] [.pdf]
Joachims/97d T. Joachims (1997). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Technical report, Universität Dortmund, Fachbereich Informatik. [.ps.gz]
Morik/97c Katharina Morik (1997). Knowledge Discovery in Databases -- An Inductive Logic Programming Approach. In Foundations of Computer Science -- Theory, Cognition, Applications, pages 429--436. Springer. [.ps.gz] [.pdf]
Morik/Brockhausen/97a Morik, Katharina and Brockhausen, Peter (1997). A Multistrategy Approach to Relational Knowledge Discovery in Databases. Machine Learning Journal, 27 (3):287--312.
Morik/etal/97a Katharina Morik, Iris Pigeot, Ursula Robers (1997). The Use of Inductive Logic Programming for the Development of the Statistical Software Tool CORA. In Workshop Logische Programmierung. .
Wiechers/97a F. Wiechers (1997). Verwaltung grosser Datenmengen für die effiziente Anwendung des Apriori-Algorithmus zur Wissensentdeckung in Datenbanken. Master's thesis, Universität Dortmund, Lehrstuhl 8. [.ps.gz] [.pdf]
Morik/Brockhausen/96a Morik, Katharina and Brockhausen, Peter (1996). A Multistrategy Approach to Relational Knowledge Discovery in Databases. In Michalski, Ryszard S. and Wnek, Janusz, editor(s), Proceedings of the Third International Workshop on Multistrategy Learning (MSL-96), pages 17--27. AAAI Press. [.ps.gz]