The data volumes processed in knowledge discovery in databases are often huge. Since most data mining algorithms scale super-linearly, sampling constitutes an important pre-processing step. Intelligent sampling techniques analyze the theoretically required sample complexity of a subsequently applied learning algorithm, which allows to yield results with probabilistic guarantees based on subsamples. Moreover, more complex sampling strategies constitute the basis for most ensemble methods. They also allow to incorporate asymmetric misclassification costs and prior knowledge without requiring any modifications to the applied base learners.
Scholz/Klinkenberg/2006b |
Scholz, Martin and Klinkenberg, Ralf.
Boosting Classifiers for Drifting Concepts.
In
Intelligent Data Analysis (IDA), Special Issue on Knowledge Discovery from Data Streams,
Vol. 11,
No. 1,
pages 3--28,
2007.
|
Scholz/2005a |
Scholz, Martin.
Knowledge-Based Sampling for Subgroup Discovery.
In
Morik, Katharina and Boulicaut, Jean-Francois and Siebes, Arno (editors),
Local Pattern Detection,
Vol. LNAI 3539,
pages 171--189,
Springer,
2005.
|
Scholz/2005b |
Scholz, Martin.
Sampling-Based Sequential Subgroup Mining.
In
Grossman, R. L. and Bayardo, R. and Bennett, K. and Vaidya, J. (editors),
Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '05),
pages 265--274,
Chicago, Illinois, USA,
ACM Press,
2005.
|
Scholz/2005c |
Scholz, Martin.
Comparing Knowledge-Based Sampling to Boosting.
No. 26,
Collaborative Research Center on the Reduction of Complexity for Multivariate Data Structures (SFB 475), University of Dortmund,
Dortmund, Germany,
2005.
|
Scholz/Klinkenberg/2005a |
Scholz, Martin and Klinkenberg, Ralf.
An Ensemble Classifier for Drifting Concepts.
In
Gama, J. and Aguilar-Ruiz, J. S. (editors),
Proceedings of the Second International Workshop on Knowledge Discovery in Data Streams,
pages 53--64,
Porto, Portugal,
2005.
|
Wrobel/etal/2000a |
Wrobel, Stephan and Morik, Katharina and Joachims, Thorsten.
Maschinelles Lernen und Data Mining.
In
Görz, G. and Rollinger, C.-R. and Schneeberger, J (editors),
Einführung in die Künstliche Intelligenz,
pages 517--597,
Oldenburg,
2000.
|