Tagging the Past: Experiments using the Saga Corpus

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

There is an increasing interest in the NLP community in developing tools for annotating historical data, for example, to facilitate research in the field of corpus linguistics. In this work, we experiment with several PoS taggers using a sub-corpus of the Icelandic Saga Corpus. This is carried out in three main steps. First, we evaluate taggers, which were trained on Modern Icelandic, when tagging Old Icelandic. Second, we semi-automatically correct errors in the training corpus using a bootstrapping method. Finally, we evaluate the taggers on the corrected training corpus. The best performing single tagger is Stagger, a tagger based on the averaged perceptron algorithm, obtaining an accuracy of 91.76%. By combining the output of three
taggers, using a simple voting scheme, the accuracy increases to 92.32%.
Original languageEnglish
Title of host publicationProceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)
Place of PublicationOslo, Norway
PublisherLinköping University Electronic Press, Sweden
Pages89-104
Number of pages16
Publication statusPublished - 1 May 2013

Fingerprint

Dive into the research topics of 'Tagging the Past: Experiments using the Saga Corpus'. Together they form a unique fingerprint.

Cite this