This paper presents ongoing work that aims to improve machine parsing of Faroese using a combination of Faroese and Icelandic training data. We show that even if we only have a relatively small parsed corpus of one language, namely 53,000 words of Faroese, we can obtain better results by adding information about phrase structure from a closely related language which has a similar syntax. Our experiment uses the Berkeley parser. We demonstrate that the addition of Icelandic data without any other modification to the experimental setup results in an f-measure improvement from 75.44% to 78.05% in Faroese and an improvement in part-of-speech tagging accuracy from 88.86% to 90.40%.
|Title of host publication||Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014|
|Editors||Nicoletta Calzolari, Khalid Choukri, Sara Goggi, Thierry Declerck, Joseph Mariani, Bente Maegaard, Asuncion Moreno, Jan Odijk, Helene Mazo, Stelios Piperidis, Hrafn Loftsson|
|Publisher||European Language Resources Association (ELRA)|
|Number of pages||5|
|Publication status||Published - 2014|
|Event||9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland|
Duration: 26 May 2014 → 31 May 2014
|Name||Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014|
|Conference||9th International Conference on Language Resources and Evaluation, LREC 2014|
|Period||26/05/14 → 31/05/14|
Bibliographical noteFunding Information:
The construction of IcePaHC was supported by the Icelandic Research Fund (Rannsóknasjóður), grant no 090662011, Viable Language Technology beyond English — Icelandic as a test case; the U.S. National Science Foundation (NSF) International Research Fellowship Program (IRFP), grant #OISE-0853114, Evolution of Language Systems: a comparative study of grammatical change in Icelandic and English; the University of Iceland Research Fund (Rannsóknasjóður Háskóla Íslands), grant Icelandic Diachronic Treebank (Sögulegur íslenskur trjábanki); and the EU ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme, grant agreement no 270899 (META-NORD). The Faroese Parsed Historical Corpus was funded by the University of Iceland Research Fund (Rannsóknasjóður Háskóla Íslands), grant Faroese treebank (Frumgerð færeysks trjábanka). We would like to thank members of the Treebanks Lab at the University of Pennsylvania for helpful comments and discussions. Thanks to anonymous reviewers for useful comments.