Konferensartikel

Domain Adaptation in Dependency Parsing via Transformation Based Error Driven Learning

Atreyee Mukherjee
Indiana University, Bloomington, IN, USA

Sandra Kübler
Indiana University, Bloomington, IN, USA

Ladda ner artikel

Ingår i: Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), December 13–14, 2018, Oslo University, Norway

Linköping Electronic Conference Proceedings 155:16, s. 179-192

Visa mer +

Publicerad: 2018-12-10

ISBN: 978-91-7685-137-1

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

Dependency parsers generally perform well when they are trained and tested on data from the same domain. However, if the data set on which we use the parser is different from the sentences on which it is trained, results tend to be low. Addressing this problem via domain adaptation is still a challenging problem and has achieved only limited improvements in the target domain. One problem that has been ignored to date concerns the differences in annotation schemes between corpora from different domains, even when the annotations are based on the same underlying annotation guidelines. In the most extreme case, the target annotations may contain labels that do not occur in the source domain. This significantly affects the overall performance of the parser. This paper presents an approach of applying transformation based error driven learning (TBL) for domain adaptation of dependency parsing. We use TBL to learn dependency label corrections in the target domain, based on errors made by the source domain parser. We show that this method can reduce dependency label errors significantly. A major advantage of this method is that we can address all types of errors with this method. The method can also be easily applied to any domain without any major change to the rule templates.

Nyckelord

domain adaptation; dependency parsing; transformation-based error-driven learning

Referenser

Attardi, G. and Ciaramita, M. (2007). Tree revision learning for dependency parsing. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 388–395.

Attardi, G., Dell’Orletta, F., Simi, M., Chanev, A., and Ciaramita, M. (2007). Multilingual dependency parsing and domain adaptation using DeSR. In EMNLP-CoNLL, pages 1112–1118.

Attardi, G., Dell’Orletta, F., Simi, M., and Turian, J. (2009). Accurate dependency parsing with a stacked multilayer perceptron. Proceedings of EVALITA, 9:1–8.

Blitzer, J., McDonald, R., and Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 120–128, Sydney, Australia.

Bohnet, B. (2010). Top accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pages 89–97, Beijing, China.

Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 112–116, Harriman, NY.

Brill, E. (1993). Automatic grammar induction and parsing free text: A transformation-based approach. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages 259–265.

Brill, E. (1995). Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 24(1):543–565.

Brill, E. and Resnik, P. (1994). A rule-based approach to prepositional phrase attachment disambiguation. In Proceedings of the 15th Conference on Computational Linguistics, pages 1198–1204.

Dredze, M., Blitzer, J., Talukdar, P. P., Ganchev, K., Graca, J., and Pereira, F. C. (2007). Frustratingly hard domain adaptation for dependency parsing. In EMNLP-CoNLL, pages 1051–1055.

Eberhard, K., Nicholson, H., Kübler, S., Gunderson, S., and Scheutz, M. (2010). The Indiana "Cooperative Remote Search Task" (CReST) Corpus. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC), Valetta, Malta.

Francis, W. N. and Kucera, H. (1979). Brown corpus manual. Brown University.

Johansson, R. and Nugues, P. (2007). Extended constituent-to-dependency conversion for English. In Proceedings of NODALIDA 2007, pages 105–112, Tartu, Estonia.

Kawahara, D. and Uchimoto, K. (2008). Learning reliability of parses for domain adaptation of dependency parsing. In Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP), Hyderabad, India.

Marcus, M., Kim, G., Marcinkiewicz, M. A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., and Schasberger, B. (1994). The Penn Treebank: Annotating predicate argument structure. In Proceedings of the Workshop on Human Language Technology, HLT ’94, pages 114–119, Plainsboro, NJ.

McClosky, D., Charniak, E., and Johnson, M. (2006a). Effective self-training for parsing. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL ’06, pages 152–159, New York, New York.

McClosky, D., Charniak, E., and Johnson, M. (2006b). Reranking and self-training for parser adaptation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44, pages 337–344, Sydney, Australia.

McDonald, R., Pereira, F., Ribarov, K., and Haji?c, J. (2005). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 523–530.

Sagae, K. and Tsujii, J. (2007). Dependency parsing and domain adaptation with LR models and parser ensembles. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 1044–1050, Prague, Czech Republic.

Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing.

Woodley, A. and Geva, S. (2005). Applying transformation-based error-driven learning to structured natural language queries. In International Conference on Cyberworlds, pages 8–pp.

Yu, J., Elkaref, M., and Bohnet, B. (2015). Domain adaptation for dependency parsing via self-training. In Proceedings of the 14th International Conference on Parsing Technologies, pages 1–10.

Citeringar i Crossref