Beginn: 15.10.2002
Inhalt: Bis zu 80% der Informationen in einer Institution sind als unstrukturierte Texte abgelegt: Freitext in einer Datenbank, Texte im RTF-Format, HTML-Dokumente. Man kann dann das Dokument finden, hat aber noch nicht die Information in dem Dokument gefunden, die man braucht. Informationsextraktion ist das Gebiet der KI, das die Suche nach Inhalten in großen Dokumentsammlungen betreibt [Eikvil/1999a]. Es speist sich mindestens aus drei Quellen:
In dem Seminar sollen die Anwendungsfelder (z.B. eLearning, Wissensmanagement, intelligente Suche im und Aufbau einer Wissensbasis aus dem WWW), die Ansätze und Methoden gemeinsam erarbeitet werden.
Winkler/Spiliopoulou/2002a | Karsten Winkler and Myra Spiliopoulou (2002). Structuring Domain-Specific Text Archives by Deriving a Probabilistic XML DTD. In Principles of Data Mining and KNowledge Discovery. . |
Gruendel/etal/2001a | Hans Gruendel and Tino Naphtali and Christian Wiech and Jan-Marian Gluba and Maiken Rohdenberg and Tobias Scheffer (2001). Clipping and analyzing news using machine learning techniques. In Proceedings of the Int. Conference on Discovery Science. . |
Scheffer/etal/2001a | Tobias Scheffer and Christian Decomain and Stefan Wrobel (2001). Mining the Web using Hidden Marcov Models for Information Extraction. In Frank Hoffmann and David Hand and Niall Adams and Douglas Fisher and Gabriela Guimaraes, editor(s), Advances in Intelligent Data Analysis, pages 309 - 318. Springer. |
Craven/etal/2000a | Mark Craven and Dan DiPasquo and Dayne Freitag and Andrew K. McCallum and Tom M. Mitchell and Kamal Nigam and Se?n Slattery (2000). Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence, 118 (1/2):69-113. |
Grieser/etal/2000a | Gunter Grieser and Klaus P. Jantke and Steffen Lange and Bernd Thomas (2000). A Unifying Approach to HTML Wrapper Representation and Learning. In Discovery Science, Third International Conference, DS 2000, Kyoto, Japan, DEcember 2000, Proceedings, pages 50-64. Springer, Berlin. |
Klein/etal/2000a | M. Klein and D. Fensel and F. van Harmelen and I. Horrocks (2000). The Relation between Ontologies and Schema-languages: Translating OIL-specifications in XML-Schema. In Proceedings of the Workshop on Applications of Ontologies and Problem-Solving Methods. . |
Yangarber/etal/2000a | R. Yangarber and R. Grishman and P. Tapanaine and S. Huttunen (2000). Unsupervised discovery of scenario-level patterns for information extraction. In Proceedings of the Sixth Conference on Applied Natural Language Processing, (ANLP-NAACL 2000), pages 282-289. . |
Eikvil/99a | Line Eikvil (1999). Information Extraction from World Wide Web - A Survey. Technical report, Norweigan Computing Center. |
Muslea/etal/99a | Ion Muslea and Steve Minton and Craig Knoblock (1999). A hierarchical approach to wrapper induction. In Proceedings of the Third International Conference on Autonomous Agents (Agents'99), pages 190-197. ACM Press. |
Miller/etal/98a | Scott Miller and Michael Crystal and Heidi Fox and Lance Ramshaw and Richard Schwartz and Rebecca Stone and Ralph Weischedel and the Annotation Group (1998). Algorithms that learn to extract information-BBN: Description of the SIFT system as used for MUC. In Proceedings of the Seventh Message Understanding Conference (MUC-7). . |
Bikel/etal/97a | Daniel M. Bikel and Scott Miller and Richard Schwartz and Ralph Weischedel (1997). Nymble: a high-performance learning name-finder. In Proceedings of ANLP-97, pages 194-201. . |
Connolly/97a | Dan Connolly (1997). XML Principles, Tools, and Techniques. O'Reilly. |
Grishman/97a | Ralph Grishman (1997). Information Extraction: Techniques and Challenges. In SCIE, pages 10-27. . |
Mintert/97a | Stefan Mintert (1997). Auszeichnungssprachen im WWW. Technical report, Univ. Dortmund. |
Goldfarb/90a | Charles Goldfarb (1990). The SGML Handbook. Oxford University Press. |
Yangarber/Grishman/2000a | Roman Yangarber and Ralph Grishman Machine Learning of Extraction Patterns from Unannotated Corpora: Position Statement. |