Hauptnavigation

MiningMart -

Link:

http://mmart.cs.uni-dortmund.de/

Description:

Within the data mining process considerable time is spent for pre-processing the data. Practical experiences have shown that the time spent on preprocessing can take from 50% up to 80% of the entire data mining process when using the traditional attribute-value learners. Thats why preprocessing is the key issue in data analysis. The time is spend for:

  • Choosing the learning task
  • Sampling
  • Feature generation, extraction, and selection
  • Data cleaning
  • Model selection or tuning the hypothesis space
  • Defining appropriate evaluation criteria

Experienced users can apply any learning system successfully to any application, since they prepare the data well. The representation of examples and the choice of a sample determines the applicability of learning methods. A chain of data transformations (learning steps or manual preprocessing) delivers the desired result. Experienced users remember prototypical successful transformation/learning chains.

Partners:

 

Staff Members:

Euler, Timm
Hakenjos, Daniel
Liedtke, Harald
Morik, Katharina
Scholz, Martin
Schwedt, Stefan

Software:

MiningMart system

Publications:

Euler/2006a Timm Euler. Data Mining mit MiningMart. In Programmieren unter Linux, No. 1, pages 56--60, 2006.
Euler/2006b Timm Euler. Modeling Preparation for Data Mining Processes. In Journal of Telecommunications and Information Technology, No. 4, pages 81--87, 2006.
Euler/2005a Timm Euler. Publishing Operational Models of Data Mining Case Studies. In Proceedings of the Workshop on Data Mining Case Studies at the 5th IEEE International Conference on Data Mining (ICDM), pages 99--106, Houston, Texas, USA, 2005.
Euler/2005b Timm Euler. An Adaptable Software Product Evaluation Metric. In Proceedings of the 9th IASTED International Conference on Software Engineering and Applications (SEA), Phoenix, Arizona, USA, 2005.
Euler/2005c Timm Euler. Churn Prediction in Telecommunications Using MiningMart. In Proceedings of the Workshop on Data Mining and Business (DMBiz) at the 9th European Conference on Principles and Practice in Knowledge Discovery in Databases (PKDD), Porto, Portugal, 2005.
Euler/2005d Timm Euler. Modelling Data Mining Processes on a Conceptual Level. In Proceedings of the 5th International Conference on Decision Support for Telecommunications and Information Society, Warsaw, Poland, 2005.
Euler/Scholz/2004a Euler, Timm and Scholz, Martin. Using Ontologies in a KDD Workbench. In Buitelaar, P. and Franke, J. and Grobelnik, M. and Paa?, G. and Svatek, V. (editors), Workshop on Knowledge Discovery and Ontologies at ECML/PKDD '04, pages 103--108, Pisa, Italy, 2004.
Morik/Koepcke/2004a Morik, Katharina and Köpcke, Hanna. Analysing Customer Churn in Insurance Data - A Case Study. In Jean-Francois Boulicaut and Floriana Esposito and Fosca Giannotti and Dino Pedreschi (editors), PKDD '04: Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Vol. 3202, pages 325--336, New York, NY, USA, Springer, 2004.
Morik/Scholz/2004a Morik, Katharina and Scholz, Martin. The MiningMart Approach to Knowledge Discovery in Databases. In Ning Zhong and Jiming Liu (editors), Intelligent Technologies for Information Analysis, pages 47--65, Springer, 2004.
Chudzian/etal/2003a Cezary Chudzian and Janusz Granat and Wieslaw Traczyk. Call Center Case. No. D17.2b, IST Project MiningMart, IST-11993, 2003.
Euler/etal/2003a Euler, Timm and Morik, Katharina and Scholz, Martin. MiningMart: Sharing Successful KDD Processes. In Hotho, Andreas and Stumme, Gerd (editors), LLWA 2003 -- Tagungsband der GI-Workshop-Woche Lehren -- Lernen -- Wissen -- Adaptivitat, pages 121--122, 2003.
Granat/etal/2003a Janusz Granat and Wieslaw Traczyk and Cezary Chudzian. Evaluation report by NIT. No. D17.3b, IST Project MiningMart, IST-11993, 2003.
Morik/etal/2003a Morik,Katharina and Scholz, Martin and Euler, Timm. MiningMart Final Report. No. D20.4, IST Project MiningMart, IST-11993, 2003.
Morik/etal/2003b Morik, Katharina and Scholz, Martin and Euler, Timm. Ext-MM Final Report. No. D20.5, IST Project MiningMart, IST-11993, 2003.
Richeldi/Perucci/2003a Marco Richeldi and Alessandro Perucci. Mining data with the MiningMart system -- Evaluation Report. No. D17.0, IST Project MiningMart, IST-11993, 2003.
Scholz/2003a Martin Scholz. One Day Seminar -- Data Mining In Practice. No. D11.1, IST Project MiningMart, IST-11993, 2003.
Berka/2002a Petr Berka. Discretization and Grouping operators. No. D16.1, IST Project MiningMart, IST-11993, 2002.
Berka/etal/2002a Berka, Petr and Jirousek, Radim and Pudil, Pavel. Feature Selection Operators based on Information Theoretical Measures. No. D14.4, IST Project MiningMart, IST-11993, 2002.
Bredeche/etal/2002a Bredeche, N. and Saitta, L. and Zucker, J.D.. A Wrapper Approach for Robot Visual Perception. In ICML Workshop on Machine Learning in Computer Vision, pages 22--30, Sydney, Australia, 2002.
Euler/2002b Euler, Timm. Feature Selection with Support Vector Machines. No. D14.3, IST Project MiningMart, IST-11993, 2002.
Euler/2002c Timm Euler. Operator Specifications. No. TR12-02, IST Project MiningMart, IST-11993, 2002.
Euler/2002d Timm Euler. How to implement M4 operators. No. TR12-04, IST Project MiningMart, IST-11993, 2002.
Haustein/2002a Stefan Haustein. Internet Presentation of MiningMart Cases. No. D9, IST Project MiningMart, IST-11993, 2002.
Kietz/2002a Kietz, Jorg-Uwe. On the Learnability of Description Logic. In Proc of the 12th Int. Conf. on Inductive Logic Programming, 2002.
Kietz/2002b Jorg-Uwe Kietz. On the Learnability of Description Logic. kdlabs AG, 2002.
Kietz/2002c Jorg-Uwe Kietz. On the Learnability of Description Logic Programs. No. D13, IST Project MiningMart, IST-11993, 2002.
Laverman/Rem/2002a Bert Laverman and Olaf Rem. Description of the M4 Interface used by the HCI of WP12. No. D12.2, IST Project MiningMart, IST-11993, 2002.
May/Geppert/2002a Michael May and Detlef Geppert. Description of the HCI for Pre-Processing Chains. No. D12.3, IST Project MiningMart, IST-11993, 2002.
Portinale/Saitta/2002a Portinale, Luigi and Saitta, Lorenza. Feature Selection. No. D14.1, IST Project MiningMart, IST-11993, 2002.
Rem/2002a Olaf Rem. Case Base of Preprocessing. No. D10, IST Project MiningMart, IST-11993, 2002.
Rem/Darwinkel/2002a Olaf Rem and Erik Darwinkel. The Concept Editor. No. D12.4, IST Project MiningMart, IST-11993, 2002.
Rem/Trautwein/2002a Olaf Rem and Marten Trautwein. Best practices report. No. D11.3, IST Project MiningMart, IST-11993, 2002.
Richeldi/Perrucci/2002a Marco Richeldi and Alessandro Perrucci. Mining Mart Evaluation Report. No. D17.3, IST Project MiningMart, IST-11993, 2002.
Richeldi/Perrucci/2002b Marco Richeldi and Alessandro Perrucci. Churn Analysis Case Study. No. D17.2, IST Project MiningMart, IST-11993, 2002.
Scholz/2002b Martin Scholz. Representing Constraints, Conditions and Assertions in M4. No. TR18-01, IST Project MiningMart, IST-11993, 2002.
Scholz/2002c Martin Scholz. Using Constraints, Conditions and Assertions. No. TR18-02, IST Project MiningMart, IST-11993, 2002.
Scholz/etal/2002a Martin Scholz and Timm Euler and Lorenza Saitta. Applicability Constraints on Learning Operators. No. D18, IST Project MiningMart, IST-11993, 2002.
Scholz/Euler/2002a Martin Scholz and Timm Euler. Documentation of the MiningMart Meta Model (M4). No. TR12-05, IST Project MiningMart, IST-11993, 2002.
Bathoorn/etal/2001a Ronnie Bathoorn, Nico Brandt, Marc de Haas and Olf Rem. Problem Modeling. No. D19, IST Project MiningMart, IST-11993, 2001.
Brockhausen/etal/2001a Peter Brockhausen and Marc de Haas and Jorg-Uwe Kietz and Arno Knobbe and Olaf Rem and Regina Zucker and Nico Brandt. Mining Multi-Relational Data. No. D15, IST Project MiningMart, IST-11993, 2001.
Kietz/etal/2001a Kietz, Jorg--Uwe and Vaduva, Anca and Zucker, Regina. MiningMart: Metadata-Driven Preprocessing. In Proceedings of the ECML/PKDD Workshop on Database Support for KDD, 2001.
Knobbe/etal/2001a Arno J. Knobbe and Marc de Haas and Arno Siebes. Propositionalisation and Aggregates. In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 277--288, London, UK, Springer, 2001.
Morik/etal/2001a Anca Vaduva and Jorg-Uwe Kietz and Regina Zucker and Klaus R. Dittrich. M4 -- The MiningMart Meta Model. No. ifi-2001.02, Institute for Computer Science, Univ. Zurich, 2001.
Morik/etal/2001b Morik, Katharina and Botta, Marco and Dittrich, Klaus R. and Kietz, Jorg-Uwe and Portinale, Luigi and Vaduva, Anca and Zucker, Regina. M4 -- The MiningMart Meta Model. No. D8/9, IST Project MiningMart, IST-11993, 2001.
Vaduva/etal/2001a Anca Vaduva and Jorg-Uwe Kietz and Regina Zucker and Klaus R. Dittrich. M4 -- The MiningMart Meta Model. No. ifi-2001.02, Institute for Computer Science, Univ. Zurich, 2001.
Vaduva/etal/2001b Anca Vaduva and Jorg-Uwe Kietz and Regina Zucker and Klaus R. Dittrich. M4 a Metamodel for Data Preprocessing. In Proc. of the ACM Fourth International Workshop on Data Warehousing and OLAP (DOLAP 2001), 2001.
Zuecker/2001a Regina Zucker. Description of the Metadata-Compiler using the M4-Relational Metadata-Schema. No. D7b, IST Project MiningMart, IST-11993, 2001.
Zuecker/etal/2001c Regina Zucker. Description of the M4-Relational Metadata-Schema within the Database. No. D7a, IST Project MiningMart, IST-11993, 2001.
Kietz/etal/2000a Kietz, Jorg-Uwe and Vaduva, Anca and Zucker, Regina. Mining Mart: Combining Case-Based-Reasoning and Multi-Strategy Learning into a Framework to reuse KDD-Application. In R.S. Michalki and P. Brazdil (editors), Proceedings of the fifth International Workshop on Multistrategy Learning (MSL2000), Guimares, Portugal, 2000.
Kietz/etal/2000b Kietz, Jorg-Uwe and Fiammengo, Anna and Beccari, Giuseppe and Zucker, Regina. Data Sets, Meta-data and Preprocessing Operators at Swiss Life and CSELT. No. D6.2, IST Project MiningMart, IST-11993, 2000.
Knobbe/etal/2000b Arno Knobbe and Adriaan Schipper and Peter Brockhausen. Domain Knowledge and Data Mining Process Decisions. No. D5, IST Project MiningMart, IST-11993, 2000.
Morik/2000a Morik, Katharina. The Representation Race - Preprocessing for Handling Time Phenomena. In Ramon L\'opez de M\'antaras and Enric Plaza (editors), ECML '00: Proceedings of the 11th European Conference on Machine Learning, Vol. 1810, pages 4--19, Berlin, Heidelberg, New York, Springer, 2000.
Morik/Liedtke/2000a Morik, Katharina and Liedtke, Harald. Learning about Time. No. D3, IST Project MiningMart, IST-11993, 2000.
Saitta/etal/2000a Saitta, Lorenza and Kietz, Joerg-Uwe and Beccari, Giuseppe. Specification of Pre-Processing Operators Requirements. No. D1, IST Project MiningMart, IST-11993, 2000.
Saitta/etal/2000b Saitta, Lorenza and Botta, Marco and Beccari, Giuseppe and Klinkenberg, Ralf. Studies in Parameter Setting. No. D4.2, IST Project MiningMart, IST-11993, 2000.
Saitta/etal/2000c Lorenza Saitta, Giuseppe Beccari and Alessandro Serra. Informed Parameter Setting. No. D4.1, IST Project MiningMart, IST-11993, 2000.
Vetterli/etal/2000a Thomas Vetterli and Anca Vaduva and Martin Staudt. Metadata Standards for Data Warehousing: Open Information Model vs. Common Warehouse Metamodel. In ACM SigMod Record, Vol. 29, No. 3, 2000.
Wettschereck/Mueller/2000a Wettschereck, Dietrich and Mueller, Stefan. MiningMart Deliverable D2.1. No. D2.1, IST Project MiningMart, IST-11993, 2000.
Zuecker/Kietz/2000a Zucker, Regina and Kietz, Jorg--Uwe. How to preprocess large databases. In Data Mining, Decision Support, Meta-learning and ILP: Forum forPractical Problem Presentation and Prospective Solutions, Lyon, France, 2000.