Is Part-of-Speech Tagging a Solved Problem for Icelandic?

Örvar Kárason, Hrafn Loftsson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We train and evaluate four Part-of-Speech tagging models for Icelandic. Three are older models that obtained the highest accuracy for Icelandic when they were introduced. The fourth model is of a type that currently reaches state-of-the-art accuracy. We use the most recent version of the MIM-GOLD training/testing corpus, its newest tagset, and augmentation data to obtain results that are comparable between the various models. We examine the accuracy improvements with each model and analyse the errors produced by our transformer model, which is based on a previously published ConvBERT model. For the set of errors that all the models make, and for which they predict the same tag, we extract a random subset for manual inspection. Extrapolating from this subset, we obtain a lower bound estimate on annotation errors in the corpus as well as on some unsolvable tagging errors. We argue that further tagging accuracy gains for Icelandic can still be obtained by fixing the errors in MIM-GOLD and, furthermore, that it should still be possible to squeeze out some small gains from our transformer model.
Original languageEnglish
Title of host publicationProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Place of PublicationTórshavn, Faroe Islands
PublisherUniversity of Tartu Library
Pages71-79
Number of pages9
Publication statusPublished - 1 May 2023

Fingerprint

Dive into the research topics of 'Is Part-of-Speech Tagging a Solved Problem for Icelandic?'. Together they form a unique fingerprint.

Cite this