Hauptnavigation

SFB 475 subproject A4 - Statistical Methods and Machine Learning

Link:

http://www.sfb475.uni-dortmund.de/dienst/de/content/struk-d/bereicha-d/tpa4-d.html

Description:

The aim of project A4 is to combine statistical methods and methods of machine learning in order to improve Knowledge Discovery in Databases (KDD). After the process of the knowledge discovery was examined as a whole in the last period, we focus on two important problems in the current period. These problems often occur in practice of knowledge discovery. Corresponding research promises a special synergy effect because of the combination of statistical methods and machine learning methods: analysis temporal phenomenons in the form of events and the application of experimental design. Additionally, emphasis of the project is placed on the applied analysis of real databases.

Partners:

 

Staff Members:

Mierswa, Ingo
Morik, Katharina
Rüping, Stefan
Scholz, Martin
Weihs, Claus
Wurst, Michael

Software:

Picana
RapidMiner (YALE)
RapidMiner Data Stream Plugin (formerly: YALE Concept Drift Plugin)
RapidMiner Value Series Plugin
SVM-light
Uschificator
myKLR
mySVM
mySVM/db

Publications:

Mierswa/Morik/2008a Mierswa, Ingo and Morik, Katharina. About the Non-Convex Optimization Problem Induced by Non-positive Semidefinite Kernel Learning. In Advances in Data Analysis and Classification, Vol. 2, No. 3, pages 241--258, 2008.
Mierswa/2007b Mierswa, Ingo. Finding all Local Models in Parallel: Multi-Objective SVM. 2007.
Mierswa/2007c Mierswa, Ingo. Regularization through Multi-Objective Optimization. In Klinkenberg, Ralf and Mierswa, Ingo and Hinneburg, Alexander and Posch, Stefan and Neumann, Steffen (editors), Proc. of LWA 2007 - Lernen - Wissensentdeckung - Adaptivität, 2007.
Scholz/Klinkenberg/2006b Scholz, Martin and Klinkenberg, Ralf. Boosting Classifiers for Drifting Concepts. In Intelligent Data Analysis (IDA), Special Issue on Knowledge Discovery from Data Streams, Vol. 11, No. 1, pages 3--28, 2007.
Mierswa/2006a Mierswa, Ingo. Evolutionary Learning with Kernels: A Generic Solution for Large Margin Problems. In Proc. of the Genetic and Evolutionary Computation Conference (GECCO 2006), 2006.
Mierswa/Wurst/2006a Mierswa, Ingo and Wurst, Michael. Information Preserving Multi-Objective Feature Selection for Unsupervised Learning. In Maarten Keijzer and Mike Cattolico and Dirk Arnold and Vladan Babovic and Christian Blum and Peter Bosman and Martin V. Butz and Carlos Coello Coello and Dipankar Dasgupta and Sevan G. Ficici and James Foster and Arturo Hernandez-Aguirre and Greg Hornby and Hod Lipson and Phil McMinn and Jason Moore and Guenther Raidl and Franz Rothlauf and Conor Ryan and Dirk Thierens (editors), GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1545--1552, New York, NY, USA, ACM Press, 2006.
Mierswa/Wurst/2006b Mierswa, Ingo and Wurst, Michael. Sound Multi-Objective Feature Space Transformation for Clustering. In Proceedings of the Knowledge Discovery, Data Mining, and Machine Learning (KDML), pages 330--337, 2006.
Homburg/etal/2005a Homburg, Helge and Mierswa,Ingo and Moller, Bulent and Morik, Katharina and Wurst, Michael. A Benchmark Dataset for Audio Classification and Clustering. In Joshua D. Reiss and Geraint A. Wiggins (editors), Proc. of the International Symposium on Music Information Retrieval 2005, pages 528--531, London, UK, Queen Mary University, 2005.
Mierswa/Morik/2005a Mierswa, Ingo and Morik, Katharina. Automatic Feature Extraction for Classifying Audio Data. In Machine Learning Journal, Vol. 58, pages 127--149, 2005.
Mierswa/Morik/2005b Mierswa, Ingo and Morik, Katharina. Method trees: building blocks for self-organizable representations of value series: how to evolve representations for classifying audio data. In Proceedings of the Genetic and Evolutionary Computation Conference GECCO 2005, Workshop on Self-Organization In Representations For Evolutionary Algorithms: Building complexity from simplicity, pages 293--300, New York, NY, USA, ACM, 2005.
Mierswa/Wurst/2005b Mierswa, Ingo and Wurst, Michael. Efficient Feature Construction by Meta Learning -- Guiding the Search in Meta Hypothesis Space. In Proc. of the International Conference on Machine Learning, Workshop on Meta Learning, 2005.
Mierswa/Wurst/2005c Mierswa, Ingo and Wurst, Michael. Efficient Case Based Feature Construction for Heterogeneous Learning Tasks. In Alipio Jorge and Luis Torgo and Pavel Brazdil and Rui Camacho and Joao Gama (editors), Proceedings of the European Conference on Machine Learning (ECML 2005), pages 641--648, Berlin, Springer, 2005.
Morik/2005a Morik, Katharina. Informatik Spektrum, Themenheft Musik. 2005. Arrow Symbol
Morik/etal/2005c Morik, Katharina and Boulicaut, Jean-François and Siebes, Arno. Local Pattern Detection. Vol. 3539, Springer, 2005. Arrow Symbol
Morik/Koepcke/2005a Morik, Katharina and Köpcke, Hanna. Features for Learning Local Patterns in Time-Stamped Data. In Katharina Morik and Jean-Francois Boulicaut and Arno Siebes (editors), Local Pattern Detection: International Seminar, Dagstuhl Castle, Germany, April 12-16, 2004, Revised Selected Papers, Vol. LNCS 3539, pages 98--114, Springer, 2005.
Rueping/2005c Rüping, Stefan. Learning with Local Models. In Local Pattern Detection, pages 153-170, Springer, 2005.
Rueping/Scheffer/2005a Ruping, Stefan and Scheffer, Tobias (editors). Proceedings of the ICML 2005 Workshop on Learning with Multiple Views. 2005.
Scholz/2005b Scholz, Martin. Sampling-Based Sequential Subgroup Mining. In Grossman, R. L. and Bayardo, R. and Bennett, K. and Vaidya, J. (editors), Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '05), pages 265--274, Chicago, Illinois, USA, ACM Press, 2005.
Scholz/2005c Scholz, Martin. Comparing Knowledge-Based Sampling to Boosting. No. 26, Collaborative Research Center on the Reduction of Complexity for Multivariate Data Structures (SFB 475), University of Dortmund, Dortmund, Germany, 2005.
Scholz/2005d Scholz, Martin. On the Tractability of Rule Discovery from Distributed Data. In Han, J. and Wah, B.W. and Raghavan, V. and Wu, X. and Rastogi, R. (editors), Proceedings of the 5th IEEE International Conference on Data Mining (ICDM '05), pages 761--764, Houston, Texas, USA, IEEE Computer Society, 2005.
Scholz/2005e Scholz, Martin. On the Complexity of Rule Discovery from Distributed Data. No. 31, SFB475, Universitat Dortmund, Dortmund, Germany, 2005.
Scholz/Klinkenberg/2005a Scholz, Martin and Klinkenberg, Ralf. An Ensemble Classifier for Drifting Concepts. In Gama, J. and Aguilar-Ruiz, J. S. (editors), Proceedings of the Second International Workshop on Knowledge Discovery in Data Streams, pages 53--64, Porto, Portugal, 2005. Arrow Symbol
Wurst/etal/2005a Wurst, Michael and Mierswa, Ingo and Morik, Katharina. Structuring Music Collections by Exploiting Peers' Processing. No. 43/05, Collaborative Research Center 475, University of Dortmund, 2005.
Mierswa/2004b Mierswa, Ingo. Automatic Feature Extraction from Large Time Series. In Weihs, C. and Gaul, W. (editors), Classification -- the Ubiquitous Challenge, Proc. of the 28. Annual Conference of the GfKl 2004, pages 600--607, Springer, 2004.
Mierswa/Morik/2004a Mierswa, Ingo and Morik, Katharina. Learning Feature Extraction for Learning from Audio Data. No. 55/04, Collaborative Research Center 475, University of Dortmund, 2004.
Morik/Koepcke/2004a Morik, Katharina and Köpcke, Hanna. Analysing Customer Churn in Insurance Data - A Case Study. In Jean-Francois Boulicaut and Floriana Esposito and Fosca Giannotti and Dino Pedreschi (editors), PKDD '04: Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Vol. 3202, pages 325--336, New York, NY, USA, Springer, 2004.
Klinkenberg/Rueping/2003a Klinkenberg, Ralf and Rüping, Stefan. Concept Drift and the Importance of Examples. In Franke, Jurgen and Nakhaeizadeh, Gholamreza and Renz, Ingrid (editors), Text Mining -- Theoretical Aspects and Applications, pages 55--77, Berlin, Germany, Physica-Verlag, 2003.
Rueping/Morik/2003a Rüping, Stefan and Morik, Katharina. Support Vector Machines and Learning about Time. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'03), 2003.
Rueping/Morik/2003b Rüping, Stefan and Morik, Katharina. Support Vector Machines and Learning about Time. No. 4, SFB475, Universitat Dortmund, Dortmund, Germany, 2003.
Joachims/2002b Joachims, Thorsten. Learning to Classify Text using Support Vector Machines. Vol. 668, Kluwer, 2002. Arrow Symbol
Morik/2002a Morik, Katharina. Detecting Interesting Instances. In Hand, David J. and Adams, Niall M. and Bolton, Richard J. (editors), Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery, Vol. 2447, pages 13-23, Berlin, Springer, 2002.
Morik/etal/2002a Morik, Katharina and Joachims, T. and Imhoff, M. and Brockhausen, P. and Rüping, S.. Integrating Kernel Methods into a Knowledge-based Approach to Evidence-based Medicine. In Schmitt, Manfred and Teodorescu, Horia-Nicolai and Jain, Ashlesha and Jain, Ajita and Jain, Sandhya and Jain, Lakhmi C. (editors), Studies in Fuzzi- ness and Soft Computing., Vol. 96, pages 71--99, Physica-Verlag, 2002.
Morik/Rueping/2002a Morik, Katharina and Rüping, Stefan. A Multistrategy Approach to the Classification of Phases in Business Cycles. In Elomaa, Taprio and Mannila, Heikki and Toivonen, Hannu (editors), Machine Learning: ECML 2002, Vol. 2430, pages 307--318, Berlin, Springer, 2002.
Rueping/2002a Stefan Rüping. Support Vector Machines in Relational Databases. In Seong-Whan Lee and Alessandro Verri (editors), Pattern Recognition with Support Vector Machines --- First International Workshop, SVM 2002, pages 310--320, Springer, 2002.
Rueping/2002c Rüping, Stefan. Incremental Learning with Support Vector Machines. No. 18, SFB475, Universitat Dortmund, Dortmund, Germany, 2002.
Klinkenberg/etal/2001a Klinkenberg, Ralf and Ruping, Stefan and Fick, Andreas and Henze, Nicola and Herzog, Christian and Molitor, Ralf and Schroder, Olaf (editors). LLWA 01 -- Tagungsband der GI-Workshop-Woche Lernen -- Lehren -- Wissen -- Adaptivitat. No. Nr. 763, Dortmund, Germany, 2001. Arrow Symbol
Rueping/2001a Rüping, Stefan. SVM Kernels for Time Series Analysis. In Klinkenberg, Ralf and Ruping, Stefan and Fick, Andreas and Henze, Nicola and Herzog, Christian and Molitor, Ralf and Schroder, Olaf (editors), LLWA 01 - Tagungsband der GI-Workshop-Woche Lernen - Lehren - Wissen - Adaptivitat, pages 43-50, Dortmund, Germany, 2001.
Rueping/2001b Rüping, Stefan. Incremental Learning with Support Vector Machines. In Cercone, Nick and Lin, T.Y. and Wu, Xindong (editors), Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM '01), pages 641--642, IEEE, 2001.
Sondhauss/Weihs/2001a Sondhauss, Ursula and Weihs, Claus. Incorporating background knowledge for better prediction of cycle phases. No. 24, Universitat Dortmund, 2001.
Joachims/00a Joachims, Thorsten. Estimating the Generalization Performance of a SVM Efficiently. In Langley, Pat (editors), Proceedings of the International Conference on Machine Learning, pages 431--438, San Francisco, CA, USA, Morgan Kaufman, 2000. Arrow Symbol
Klinkenberg/Joachims/2000a Klinkenberg, Ralf and Joachims, Thorsten. Detecting Concept Drift with Support Vector Machines. In Langley, Pat (editors), Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 487--494, San Francisco, CA, USA, Morgan Kaufmann, 2000.
Morik/etal/2000a Morik, Katharina and Imhoff, Michael and Brockhausen, Peter and Joachims, Thorsten and Gather, Ursula. Knowledge Discovery and Knowledge Validation in Intensive Care. In Artificial Intelligence in Medicine, Vol. 19, No. 3, pages 225--249, 2000.
Arminger/Goetz/99a Arminger, Gerhard and Gotz, Norman. Asymmetric Loss Functions for Evaluating the Quality of Forecasts in Time Series for Goods Management Systems. No. 22, Universitat Dortmund, 1999.
Arminger/Schneider/99a Arminger, Gerhard and Schneider, Carsten. Frequent Problems of Model Specification and Forecasting of Time Series in Goods Management Systems. No. 21, Universitat Dortmund, 1999.
Brockhausen/99a Peter Brockhausen. Learning First Order Rules in Intensive Care Monitoring. In Sa\vso D\vzeroski and Peter Flach (editors), ILP--99 Late-Breaking Papers, pages 22--27, Bled, Slovenia, 1999.
Brockhausen/99b Peter Brockhausen. Learning First--Order Rules in Intensive Care Monitoring. In Petra Perner (editors), Maschinelles Lernen, FGML 99, pages 1--7, Leipzig, Institut fur Bildverarbeitung und angewandte Informatik, 1999.
Joachims/99a Joachims, Thorsten. Making large-Scale SVM Learning Practical. In B. Schölkopf and C. Burges and A. Smola (editors), Advances in Kernel Methods - Support Vector Learning, Cambridge, MA, MIT Press, 1999.
Joachims/99e T. Joachims. Estimating the Generalization Performance of a SVM Efficiently. No. 25, Universitat Dortmund, LS VIII, 1999.
Joachims/etal/99a T. Joachims and A. McCallum and M. Sahami and M. Craven (editors). Machine Learning for Information Filtering. AAAI Press, 1999.
Morik/etal/99a Morik, Katharina and Brockhausen, Peter and Joachims, Thorsten. Combining statistical learning with a knowledge-based approach -- A case study in intensive care monitoring. In ICML '99: Proceedings of the Sixteenth International Conference on Machine Learning, pages 268--277, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc., 1999.
Scheffer/Joachims/99a Tobias Scheffer and Thorsten Joachims. Expected Error Analysis for Model Selection. In International Conference on Machine Learning (ICML), Bled, Slowenien, 1999.
Brockhausen/Morik/98a Brockhausen, Peter and Morik, Katharina. Wissensentdeckung in relationalen Datenbanken: Eine Herausforderung für das maschinelle Lernen. In Gholamreza Nakhaeizadeh (editors), Data Mining, theoretische Aspekte und Anwendungen, pages 193--211, Physica Verlag, 1998.
Joachims/98a Joachims, Thorsten. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Claire N\'edellec and C\'eline Rouveirol (editors), Proceedings of the European Conference on Machine Learning, pages 137 -- 142, Berlin, Springer, 1998.
Joachims/98c Thorsten Joachims. Making large-Scale SVM Learning Practical. No. 24, Universitat Dortmund, LS VIII-Report, 1998.
Sahami/etal/98a M. Sahami and M. Craven and T. Joachims and A. McCallum (editors). Learning for Text Categorization. No. WS-98-05, AAAI Press, 1998.
Scheffer/Joachims/98a Tobias Scheffer and Thorsten Joachims. Estimating the expected error of empirical minimizers for model selection. No. TR-98-9, TU-Berlin, 1998. Arrow Symbol
Imhoff/etal/97a Michael Imhoff and Markus Bauer and Ursula Gather and D. Lohlein. Time Series Analysis in Intensive Care Medicine. In Applied Cardiopulmonary Pathophysiology, Vol. 6, pages 203 -- 281, 1997.
Joachims/97b T. Joachims. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. No. 23, Universitat Dortmund, LS VIII-Report, 1997.
Morik/97c Morik, Katharina. Knowledge Discovery in Databases -- An Inductive Logic Programming Approach. In Freksa, Jantzen and Valk (editors), Foundations of Computer Science -- Theory, Cognition, Applications, pages 429--436, Springer, 1997.
Morik/Brockhausen/97a Morik, Katharina and Brockhausen, Peter. A Multistrategy Approach to Relational Knowledge Discovery in Databases. In Machine Learning Journal, Vol. 27, No. 3, pages 287--312, Kluwer, 1997.
Morik/etal/97a Morik, Katharina and Pigeot, Iris and Robers, Ursula. The Use of Inductive Logic Programming for the Development of the Statistical Software Tool CORA. In Workshop Logische Programmierung, München, 1997.
Wiechers/97a F. Wiechers. Verwaltung grosser Datenmengen fur die effiziente Anwendung des Apriori-Algorithmus zur Wissensentdeckung in Datenbanken. Universitat Dortmund, Lehrstuhl 8, 1997.
Morik/Brockhausen/96a Morik, Katharina and Brockhausen, Peter. A Multistrategy Approach to Relational Knowledge Discovery in Databases. In Michalski, Ryszard S. and Wnek, Janusz (editors), Proceedings of the Third International Workshop on Multistrategy Learning (MSL-96), pages 17--27, Palo Alto, AAAI Press, 1996.