Konferensartikel

Cross-lingual Parser Selection for Low-resource Languages

Željko Agic
Department of Computer Science, IT University of Copenhagen, Denmark

Ladda ner artikel

Ingår i: Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies, 22 May, Gothenburg Sweden

Linköping Electronic Conference Proceedings 135:1, s. 1-10

NEALT Proceedings Series 31:1, p. 1-10

Visa mer +

Publicerad: 2017-05-29

ISBN: 978-91-7685-501-0

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

In multilingual dependency parsing, transferring delexicalized models provides unmatched language coverage and competitive scores, with minimal requirements. Still, selecting the single best parser for any target language poses a challenge. Here, we propose a lean method for parser selection. It offers top performance, and it does so without disadvantaging the truly low-resource languages. We consistently select appropriate source parsers for our target languages in a realistic cross-lingual parsing experiment.

Nyckelord

Inga nyckelord är tillgängliga

Referenser

Zeljko Agic, Dirk Hovy, and Anders Søgaard. 2015. If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages. In ACL, pages 268–272.

Zeljko Agic, Anders Johannsen, Barbara Plank, Héctor Mart´inez Alonso, Natalie Schluter, and Anders Søgaard. 2016. Multilingual Projection for Parsing Truly Low-Resource Languages. TACL, 4:301–312.

Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, and Noah A. Smith. 2016. Many Languages, One Parser. TACL, pages 431–444.

Bernd Bohnet. 2010. Very High Accuracy and Fast Dependency Parsing is not a Contradiction. In COLING, pages 89–97.

Thorsten Brants. 2000. TnT: A Statistical Part-of-Speech Tagger. In ANLP, pages 224–231.

Dipanjan Das and Slav Petrov. 2011. Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. In ACL, pages 600–609.

Matthew S. Dryer and Martin Haspelmath, editors. 2013. The World Atlas of Language Structures Online. Max Planck Institute for Evolutionary Anthropology, Leipzig.

Ryan Georgi, Fei Xia, and William Lewis. 2010. Comparing Language Similarity Across Genetic and Typologically-Based Groupings. In COLING, pages 385–393.

Jan Hajic, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Martí, Lluís Màrquez, Adam Meyers, Joakim Nivre, Sebastian Pad´o, Jan ? St?ep´anek, et al. 2009. The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages. In CoNLL, pages 1–18.

Anders Johannsen, Željko Agic, and Anders Søgaard. 2016. Joint Part-of-Speech and Dependency Projection from Multiple Sources. In ACL, pages 561–566.

Marco Lui and Timothy Baldwin. 2012. langid.py: An Off-the-Shelf Language Identification Tool. In ACL, pages 25–30.

Xuezhe Ma and Fei Xia. 2014. Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization. In ACL, pages 1337–1348.

Ryan McDonald, Slav Petrov, and Keith Hall. 2011. Multi-Source Transfer of Delexicalized Dependency Parsers. In EMNLP, pages 62–72.

Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, and Jungmee Lee. 2013. Universal Dependency Annotation for Multilingual Parsing. In ACL, pages 92–97.

Thomas Müller, Helmut Schmid, and Hinrich Schütze. 2013. Efficient Higher-Order CRFs for Morphological Tagging. In EMNLP, pages 322–332.

Tahira Naseem, Regina Barzilay, and Amir Globerson. 2012. Selective Sharing for Multilingual Dependency Parsing. In ACL, pages 629–637.

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, et al. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In LREC 2016, pages 1659–1666.

Helen O’Horan, Yevgeni Berzak, Ivan Vuli´c, Roi Reichart, and Anna Korhonen. 2016. Survey on the Use of Typological Information in Natural Language Processing. arXiv preprint arXiv:1610.03349.

Robert Östling. 2015. Word Order Typology Through Multilingual Word Alignment. In ACL, pages 205–211.

Slav Petrov, Dipanjan Das, and Ryan McDonald. 2011. A Universal Part-of-Speech Tagset. arXiv preprint arXiv:1104.2086.

Barbara Plank and Gertjan Van Noord. 2011. Effective Measures of Domain Similarity for Parsing. In ACL, pages 1566–1576.

Taraka Rama and Prasanth Kolachina. 2012. How Good are Typological Distances for Determining Genealogical Relationships among Languages? In COLING, pages 975–984.

Mohammad Sadegh Rasooli and Michael Collins. 2015. Density-Driven Cross-Lingual Transfer of Dependency Parsers. In EMNLP, pages 328–338.

Rudolf Rosa and Zdenek Žabokrtský. 2015a. KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer. In ACL, pages 243–249.

Rudolf Rosa and Zdenek Žabokrtský. 2015b. MSTParser Model Interpolation for Multi-source Delexicalized Transfer. In IWPT, pages 71–75.

Kenji Sagae and Alon Lavie. 2006. Parser Combination by Reparsing. In NAACL, pages 129–132.

Anders Søgaard and Julie Wulff. 2012. An Empirical Study of Non-Lexical Extensions to Delexicalized Transfer. In COLING, pages 1181–1190.

Anders Søgaard. 2011. Data Point Selection for Cross-Language Adaptation of Dependency Parsers. In ACL, pages 682–686.

Oscar Täckström, Ryan McDonald, and Joakim Nivre. 2013. Target Language Adaptation of Discriminative Transfer Parsers. In NAACL, pages 1061–1071.

Daniel Zeman and Philip Resnik. 2008. Cross-Language Parser Adaptation between Related Languages.In IJCNLP, pages 35–42.

Yuan Zhang and Regina Barzilay. 2015. Hierarchical Low-Rank Tensors for Multilingual Transfer Parsing. In EMNLP, pages 1857–1867.

Citeringar i Crossref