Abstract
There is an increasing interest in the NLP community in developing tools for annotating historical data, for example, to facilitate research in the field of corpus linguistics. In this work, we experiment with several PoS taggers using a sub-corpus of the Icelandic Saga Corpus. This is carried out in three main steps. First, we evaluate taggers, which were trained on Modern Icelandic, when tagging Old Icelandic. Second, we semi-automatically correct errors in the training corpus using a bootstrapping method. Finally, we evaluate the taggers on the corrected training corpus. The best performing single tagger is Stagger, a tagger based on the averaged perceptron algorithm, obtaining an accuracy of 91.76%. By combining the output of three
taggers, using a simple voting scheme, the accuracy increases to 92.32%.
taggers, using a simple voting scheme, the accuracy increases to 92.32%.
Original language | English |
---|---|
Title of host publication | Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013) |
Place of Publication | Oslo, Norway |
Publisher | Linköping University Electronic Press, Sweden |
Pages | 89-104 |
Number of pages | 16 |
Publication status | Published - 1 May 2013 |