Sampling Techniques

The data volumes processed in knowledge discovery in databases are often huge. Since most data mining algorithms scale super-linearly, sampling constitutes an important pre-processing step. Intelligent sampling techniques analyze the theoretically required sample complexity of a subsequently applied learning algorithm, which allows to yield results with probabilistic guarantees based on subsamples. Moreover, more complex sampling strategies constitute the basis for most ensemble methods. They also allow to incorporate asymmetric misclassification costs and prior knowledge without requiring any modifications to the applied base learners.


SFB 475 subproject A4


RapidMiner (YALE)
RapidMiner Data Stream Plugin (formerly: YALE Concept Drift Plugin)


Scholz, Martin

Past Master Thesis


Scholz/Klinkenberg/2006b Scholz, Martin and Klinkenberg, Ralf. Boosting Classifiers for Drifting Concepts. In Intelligent Data Analysis (IDA), Special Issue on Knowledge Discovery from Data Streams, Vol. 11, No. 1, pages 3--28, 2007.
Scholz/2005a Scholz, Martin. Knowledge-Based Sampling for Subgroup Discovery. In Morik, Katharina and Boulicaut, Jean-Francois and Siebes, Arno (editors), Local Pattern Detection, Vol. LNAI 3539, pages 171--189, Springer, 2005.
Scholz/2005b Scholz, Martin. Sampling-Based Sequential Subgroup Mining. In Grossman, R. L. and Bayardo, R. and Bennett, K. and Vaidya, J. (editors), Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '05), pages 265--274, Chicago, Illinois, USA, ACM Press, 2005.
Scholz/2005c Scholz, Martin. Comparing Knowledge-Based Sampling to Boosting. No. 26, Collaborative Research Center on the Reduction of Complexity for Multivariate Data Structures (SFB 475), University of Dortmund, Dortmund, Germany, 2005.
Scholz/Klinkenberg/2005a Scholz, Martin and Klinkenberg, Ralf. An Ensemble Classifier for Drifting Concepts. In Gama, J. and Aguilar-Ruiz, J. S. (editors), Proceedings of the Second International Workshop on Knowledge Discovery in Data Streams, pages 53--64, Porto, Portugal, 2005.
Wrobel/etal/2000a Wrobel, Stephan and Morik, Katharina and Joachims, Thorsten. Maschinelles Lernen und Data Mining. In Görz, G. and Rollinger, C.-R. and Schneeberger, J (editors), Einführung in die Künstliche Intelligenz, pages 517--597, Oldenburg, 2000.