Link zur Startseite der Universitt Dortmund

German
Suche >

Hauptnavigation

General

Research

Teaching

Staff

Bartz/etal/2015a: Using Data Mining and the CLARIN Infrastructure to Extend Corpus-based Linguistic Research

Bibtype	Incollection
Bibkey	Bartz/etal/2015a
Author	Bartz, Thomas; Pölitz, Christian; Morik, Katharina and Storrer, Angelika
Ls8autor	Bartz, Thomas Morik, Katharina Pölitz, Christian
Editor	Odijk, Jan
Title	Using Data Mining and the CLARIN Infrastructure to Extend Corpus-based Linguistic Research
Booktitle	Selected Papers from the CLARIN 2014 Conference, October 24-25, 2014, Soesterberg, The Netherlands
Pages	1-13
Address	Linköping
Publisher	Linköping University Electronic Press
Abstract	Large digital corpora of written language, such as those that are held by the CLARIN-D centers, provide excellent possibilities for linguistic research on authentic language data. Nonetheless, the large number of hits that can be retrieved from corpora often leads to challenges in concrete linguistic research settings. This is particularly the case, if the queried word-forms or constructions are (semantically) ambiguous. The joint project called ‘Corpus-based Linguistic Research and Analysis Using Data Mining’ (“Korpus-basierte linguistische Recherche und Analyse mit Hilfe von Data-Mining” – ‘KobRA’) is therefore underway to investigating the benefits and issues of using machine learning technologies in order to perform after-retrieval cleaning and disambiguation tasks automatically. The following article is an overview of the questions, methodologies and current results of the project, specifically in the scope of corpus-based lexicography/historical semantics. In this area, topic models were used in order to partition search result KWIC lists retrieved by querying various corpora for polysemous or homonym words by the individual meanings of these words.
Year	2015
Projekt	Kobra

Url	http://www.ep.liu.se/ecp_article/index.en.aspx?issue=116;article=001