Rapid deployment of phrase structure parsing for related languages: A case study of insular scandinavian

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

This paper presents ongoing work that aims to improve machine parsing of Faroese using a combination of Faroese and Icelandic training data. We show that even if we only have a relatively small parsed corpus of one language, namely 53,000 words of Faroese, we can obtain better results by adding information about phrase structure from a closely related language which has a similar syntax. Our experiment uses the Berkeley parser. We demonstrate that the addition of Icelandic data without any other modification to the experimental setup results in an f-measure improvement from 75.44% to 78.05% in Faroese and an improvement in part-of-speech tagging accuracy from 88.86% to 90.40%.

Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
EditorsNicoletta Calzolari, Khalid Choukri, Sara Goggi, Thierry Declerck, Joseph Mariani, Bente Maegaard, Asuncion Moreno, Jan Odijk, Helene Mazo, Stelios Piperidis, Hrafn Loftsson
PublisherEuropean Language Resources Association (ELRA)
Pages91-95
Number of pages5
ISBN (Electronic)9782951740884
Publication statusPublished - 2014
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: 26 May 201431 May 2014

Publication series

NameProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

Conference

Conference9th International Conference on Language Resources and Evaluation, LREC 2014
Country/TerritoryIceland
CityReykjavik
Period26/05/1431/05/14

Bibliographical note

Funding Information:
The construction of IcePaHC was supported by the Icelandic Research Fund (Rannsóknasjóður), grant no 090662011, Viable Language Technology beyond English — Icelandic as a test case; the U.S. National Science Foundation (NSF) International Research Fellowship Program (IRFP), grant #OISE-0853114, Evolution of Language Systems: a comparative study of grammatical change in Icelandic and English; the University of Iceland Research Fund (Rannsóknasjóður Háskóla Íslands), grant Icelandic Diachronic Treebank (Sögulegur íslenskur trjábanki); and the EU ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme, grant agreement no 270899 (META-NORD). The Faroese Parsed Historical Corpus was funded by the University of Iceland Research Fund (Rannsóknasjóður Háskóla Íslands), grant Faroese treebank (Frumgerð færeysks trjábanka). We would like to thank members of the Treebanks Lab at the University of Pennsylvania for helpful comments and discussions. Thanks to anonymous reviewers for useful comments.

Other keywords

  • Faroese
  • Icelandic
  • Parsing

Fingerprint

Dive into the research topics of 'Rapid deployment of phrase structure parsing for related languages: A case study of insular scandinavian'. Together they form a unique fingerprint.

Cite this