Hauptnavigation

Text Mining

Machine learning from text and text mining cover a wide range of topics both in terms of methods and applications, some of which are listed here as examples:
  • Automated data pre-processing: Textual data often is unstructured (e.g. free text in natural language) or semi-structured (e.g. HTML documents, i.e. pages in the World Wide Web, or XML documents) and therefore needs to be pre-processed for many applications (e.g. transforming the free text in documents into document vectors).
  • Automated text classification: Machine learning of classifiers from examples has many applications. They include e.g. information filtering of news texts for the automated generation of personalized news from multiple sources, the automated sorting of documents or web pages into pre-defined categories, or e-mail routing, i.e. the automatic forwarding of e-mail messages to the most appropriate person or department in a company or organization.
  • Clustering: automated grouping of documents into groups (clusters) of documents with similar content or the autmated detection of new groups or topics.
  • Text summarization: e.g. summarizing e-mail messages to the lenght of short messages (SMS) for display on mobile phone displays.
  • Information extraction: Finding and marking or extracting information in texts, e.g. names of companies, products, cities, places, persons, etc., often as pre-processing for further tasks.
  • Automated question answering: Providing answers or text fragments for answers from collections of known question-answer-pairs, i.e. finding question-answer-pairs with questions similar to the current question or request, e.g. for supporting customer service and customer support (e.g. in call centers).
  • Search in the World Wide Web (WWW) or in document collections (information retrieval), personalization of search.

Related Topics

Text Classification
Data Mining

Projects

SFB 531 Computational Intelligence

Software

ADT
RapidMiner (YALE)
RapidMiner Conditional Random Fields Plugin
RapidMiner Data Stream Plugin (formerly: YALE Concept Drift Plugin)
Word Vector Tool and RapidMiner Word Vector Tool Plugin

Staff

Euler, Timm
Jungermann, Felix
Klinkenberg, Ralf
Pölitz, Christian

Past Master Thesis

Publications

Scholz/Klinkenberg/2006b Scholz, Martin and Klinkenberg, Ralf. Boosting Classifiers for Drifting Concepts. In Intelligent Data Analysis (IDA), Special Issue on Knowledge Discovery from Data Streams, Vol. 11, No. 1, pages 3--28, 2007.
Deutsch/2006a Deutsch, Stephan. Outlier Detection in USENET Newsgruppen. University of Dortmund, 2006.
Hennig/Wurst/2006a Hennig, Sascha and Wurst, Michael. Incremental Clustering of Newsgroup Articles. In Moonis Ali and Richard Dapoigny (editors), Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE 06), pages 332--341, Berlin, Heidelberg, Springer, 2006.
Mierswa/etal/2006a Mierswa, Ingo and Wurst, Michael and Klinkenberg, Ralf and Scholz, Martin and Euler, Timm. YALE: Rapid Prototyping for Complex Data Mining Tasks. In Tina Eliassi-Rad and Lyle H. Ungar and Mark Craven and Dimitrios Gunopulos (editors), Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), pages 935--940, ACM, New York, USA, ACM Press, 2006.
Scholz/Klinkenberg/2006a Scholz, Martin and Klinkenberg, Ralf. Boosting Classifiers for Drifting Concepts. No. 6/06, Collaborative Research Center on the Reduction of Complexity for Multivariate Data Structures (SFB 475), University of Dortmund, Dortmund, Germany, 2006.
Klinkenberg/2005a Klinkenberg, Ralf. Meta-Learning, Model Selection, and Example Selection in Machine Learning Domains with Concept Drift. In Furnkranz, Johannes and Grieser, Gunter (editors), Annual workshop of the special interest group on machine learning, knowledge discovery, and data mining (FGML-2005) of the German Computer Science Society (GI) within the workshop week \em Learning -- Knowledge Discovery -- Adaptivity (LWA-2005), Saarbrucken, Germany, 2005.
Roessler/Morik/2005a Roessler, Marc and Morik, Katharina. Using Unlabeled Texts for Named-Entity Recognition. In Tobias Scheffer and Stefan R\"uping (editors), ICML Workshop on Multiple View Learning, 2005.
Scholz/Klinkenberg/2005a Scholz, Martin and Klinkenberg, Ralf. An Ensemble Classifier for Drifting Concepts. In Gama, J. and Aguilar-Ruiz, J. S. (editors), Proceedings of the Second International Workshop on Knowledge Discovery in Data Streams, pages 53--64, Porto, Portugal, 2005.
Klinkenberg/2004a Klinkenberg, Ralf. Learning Drifting Concepts: Example Selection vs. Example Weighting. In Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, Vol. 8, No. 3, pages 281--300, 2004.
Klinkenberg/Rueping/2003a Klinkenberg, Ralf and Rüping, Stefan. Concept Drift and the Importance of Examples. In Franke, Jurgen and Nakhaeizadeh, Gholamreza and Renz, Ingrid (editors), Text Mining -- Theoretical Aspects and Applications, pages 55--77, Berlin, Germany, Physica-Verlag, 2003.
Daniel/etal/2002a Daniel, Guido and Dienstuhl, J. and Engell, S. and Felske, S. and Goser, K. and Klinkenberg, R. and Morik, K. and Ritthoff, O. and Schmidt-Traub, H.. Novel Learning Tasks, Optimization, and Their Application. In Schwefel, H.-P. and Wegener, I. and Weinert, K. (editors), Advances in Computational Intelligence -- Theory and Practice, pages 245--318, Berlin, Germany, Springer, 2002.
Euler/2002a Timm Euler. Tailoring Text Using Topic Words: Selection and Compression. In Proceedings of the 13th International Workshop on Database and Expert Systems Applications (DEXA), pages 215--219, Los Alamitos, CA, IEEE Computer Society Press, 2002.
Joachims/2002b Joachims, Thorsten. Learning to Classify Text using Support Vector Machines. Vol. 668, Kluwer, 2002.
Klinkenberg/2002a Klinkenberg, Ralf. Transductive Learning of Drifting Concepts. No. CI-125/02, Collaborative Research Center 531, University of Dortmund, Dortmund, Germany, 2002.
Klinkenberg/etal/2002a Klinkenberg, Ralf and Ritthoff, Oliver and Morik, Katharina. Novel Learning Tasks From Practical Applications. In Henze, Nicola and Kókai, Gabriella and Zeidler, Jens (editors), LLA'02: Lehren -- Lernen -- Adaptivitat, Proceedings of the workshop of the special interest groups Machine Learning (FGML), Intelligent Tutoring Systems (ILLS), and Adaptivity and User Modeling in Interactive Systems (ABIS) of the German Computer Science Society (GI), pages 46--59, Hannover, Germany, University of Hannover, 2002.
Joachims/2001a Thorsten Joachims. The Maximum-Margin Approach to Learning Text Classifiers: Methods, Theory, and Algorithms. Fachbereich Informatik, Universität Dortmund, 2001.
Klinkenberg/2001a Klinkenberg, Ralf. Using Labeled and Unlabeled Data to Learn Drifting Concepts. In Kubat, Miroslav and Morik, Katharina (editors), Workshop notes of the IJCAI-01 Workshop on \em Learning from Temporal and Spatial Data, pages 16--24, IJCAI, Menlo Park, CA, USA, AAAI Press, 2001.
Klinkenberg/Joachims/2000a Klinkenberg, Ralf and Joachims, Thorsten. Detecting Concept Drift with Support Vector Machines. In Langley, Pat (editors), Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 487--494, San Francisco, CA, USA, Morgan Kaufmann, 2000.
Joachims/99c Thorsten Joachims. Transductive Inference for Text Classification using Support Vector Machines. In Proceedings of the 16th Int. Conf. on Machine Learning (ICML), pages 200--209, San Francisco, CA, Morgan Kaufmann Publishers Inc., 1999.
Klinkenberg/99a Klinkenberg, Ralf. Learning Drifting Concepts with Partial User Feedback. In Perner, Petra and Fink, Volkmar (editors), Beitrage zum Treffen der GI-Fachgruppe 1.1.3 Maschinelles Lernen (FGML-99), Magdeburg, Germany, 1999.
Armstrong/etal/98a Armstrong, Robert and Freitag, Dayne and Joachims, Thorsten and Mitchell, Tom. WebWatcher: A Learning Apprentice for the World Wide Web. In R. Michalski and I. Bratko and M. Kubat (editors), Machine Learning and Data Mining, pages 297-312, Wiley, 1998.
Hoelscher/98a Holscher, Markus. Informationsextraktion aus Freitext-Eintragen einer Datenbank. Fachbereich Informatik, Universitat Dortmund, 1998.
Joachims/98a Joachims, Thorsten. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Claire N\'edellec and C\'eline Rouveirol (editors), Proceedings of the European Conference on Machine Learning, pages 137 -- 142, Berlin, Springer, 1998.
Joachims/Mladenic/98a T. Joachims and D. Mladeni\`c. Browsing-Assistenten, Tour Guides und adaptive WWW-Server. In Kunstliche Intelligenz, Vol. 3, No. 28, pages 23 -- 29, 1998.
Klinkenberg/98a Klinkenberg, Ralf. Maschinelle Lernverfahren zum adaptiven Informationsfiltern bei sich verandernden Konzepten. Fachbereich Informatik, Universitat Dortmund, Germany, 1998.
Klinkenberg/Renz/98a Klinkenberg, Ralf and Renz, Ingrid. Adaptive Information Filtering: Learning in the Presence of Concept Drifts. In Sahami, Mehran and Craven, Mark and Joachims, Thorsten and McCallum, Andrew (editors), Workshop Notes of the ICML/AAAI-98 Workshop \em Learning for Text Categorization, pages 33--40, Menlo Park, CA, USA, AAAI Press, 1998.
Klinkenberg/Renz/98b Klinkenberg, Ralf and Renz, Ingrid. Adaptive Information Filtering: Learning Drifting Concepts. In Wysotzki, F. and Geibel, P. and Schadler, K. (editors), Beitrage zum Treffen der GI-Fachgruppe 1.1.3 Maschinelles Lernen (FGML-98), No. 98/11, pages 98--105, Germany, Fachbereich Informatik, TU Berlin, 1998.
Joachims/97a Joachims, Thorsten. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In Proceedings of International Conference on Machine Learning (ICML), 1997.
Joachims/97b T. Joachims. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. No. 23, Universitat Dortmund, LS VIII-Report, 1997.
Joachims/97c Joachims, Thorsten. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In Proceedings of the 14th International Conference on Machine Learning ICML97, pages 143--151, 1997.
Schewe/97b Schewe, Sandra. Automatische Kategorisierung von Volltexten unter Anwendung von NLP-Techniken. Fachbereich Informatik, Universitat Dortmund, 1997.
Boyan/etal/96a Boyan, J. and Freitag, D. and Joachims, T.. A Machine Learning Architecture for Optimizing Web Search Engines. In AAAI Workshop on Internet Based Information Systems, 1996.
Joachims/96a Joachims, Thorsten. Einsatz eines intelligenten, lernenden Agenten fur das World Wide Web. Fachbereich Informatik, Universitat Dortmund, 1996.
Armstrong/etal/95a Armstrong, Robert and Freitag, Dayne and Joachims, Thorsten and Mitchell, Tom. WebWatcher: A Learning Apprentice for the World Wide Web. In Proceedings of the 1995 AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, Stanford, 1995.
Joachims/etal/95a Joachims, Thorsten and Mitchell, Tom and Freitag, Dayne and Armstrong, Robert. WebWatcher: Machine Learning and Hypertext. In Beitrage zum 7. Fachgruppentreffen MASCHINELLES LERNEN der GI-Fachgruppe 1.1.3, pages 145 -- 149, 1995.