Konferensartikel

Towards Universal Dependencies for Learner Chinese

John Lee
Department of Linguistics and Translation, City University of Hong Kong, Hong Kong

Herman Leung
Department of Linguistics and Translation, City University of Hong Kong, Hong Kong

Keying Li
Department of Linguistics and Translation, City University of Hong Kong, Hong Kong

Ladda ner artikel

Ingår i: Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies, 22 May, Gothenburg Sweden

Linköping Electronic Conference Proceedings 135:8, s. 67-71

NEALT Proceedings Series 31:8, s. 67-71

Visa mer +

Publicerad: 2017-05-29

ISBN: 978-91-7685-501-0

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

We propose an annotation scheme for learner Chinese in the Universal Dependencies (UD) framework. The scheme was adapted from a UD scheme for Mandarin Chinese to take interlanguage characteristics into account. We applied the scheme to a set of 100 sentences written by learners of Chinese as a foreign language, and we report inter-annotator agreement on syntactic annotation.

Nyckelord

Inga nyckelord är tillgängliga

Referenser

Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia Lam, Keiko Sophie Mori, Sebastian Garza, and Boris Katz. 2016. Universal Dependencies for Learner English. In Proc. ACL.

Daniel Dahlmeier, Hwee Tou Ng, and Siew Mei Wu. 2013. Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English. In Proc. 8th Workshop on Innovative Use of NLP for Building Educational Applications.

Ana Díaz-Negrillo, Detmar Meurers, Salvador Valera, and Holger Wunsch. 2010. Towards Interlanguage POS Annotation for Effective Learner Corpora in SLA and FLT. Language Forum, 36(1-2):139–154.

Jeroen Geertzen, Theodora Alexopoulou, and Anna Korhonen. 2013. Automatic Linguistic Annotation of Large Scale L2 Databases: The EF-Cambridge Open Language Database (EFCAMDAT). In Proc. 31st Second Language Research Forum (SLRF).

Yu-Kung Kao and Tsu-Lin Mei. 1971. Syntax, Diction, and Imagery in T’ang Poetry. Harvard Journal of Asiatic Studies, 31:49–136.

Lung-Hao Lee, Li-Ping Chang, and Yuen-Hsien Tseng. 2016a. Developing Learner Corpus Annotation for Chinese Grammatical Errors. In Proc. International Conference on Asian Language Processing (IALP).

Lung-Hao Lee, Gaoqi Rao, Liang-Chih Yu, Endong Xun, Baolin Zhang, and Li-Ping Chang. 2016b. Overview of NLP-TEA 2016 Shared Task for Chinese Grammatical Error Diagnosis. In Proc. 3rd Workshop on Natural Language Processing Techniques for Educational Applications.

Herman Leung, Rafaël Poiret, Tak sum Wong, Xinying Chen, Kim Gerdes, and John Lee. 2016. Developing Universal Dependencies for Mandarin Chinese. In Proc. Workshop on Asian Language Resources. Ryo Nagata and Keisuke Sakaguchi. 2016. Phrase Structure Annotation and Parsing for Learner English. In Proc. ACL.

Ryo Nagata, Edward Whittaker, and Vera Sheinman. 2011. Creating a Manually Error-tagged and Shallow-parsed Learner Corpus. In Proc. ACL. Courtney Napoles, Aoife Cahill, and Nitin Madnani. 2016. The Effect of Multiple Grammatical Errors on Processing Non-Native Writing. In Proc. 11th Workshop on Innovative Use of NLP for Building Educational Applications.

Diane Nicholls. 2003. The Cambridge Learner Corpus - error coding and analysis for lexicography and ELT. In Proc. Computational Linguistics Conference.

Marwa Ragheb and Markus Dickinson. 2013. Interannotator Agreement for Dependency Annotation of Learner Language. In Proc. 8th Workshop on Innovative Use of NLP for Building Educational Applications. Marwa Ragheb and Markus Dickinson. 2014. Developing a Corpus of Syntactically-Annotated Learner Language for English. In Proc. 13th International Workshop on Treebanks and Linguistic Theories (TLT).

Ines Rehbein, Hagen Hirschmann, Anke Lüdeling, and Marc Reznicek. 2012. Better tags give better trees — or do they? LiLT, 7(10):1–18.

Marc Reznicek, Anke Lüdeling, and Hagen Hirschmann. 2013. Competing Target Hypotheses in the Falko Corpus: A Flexible Multi-Layer Corpus Architecture. In Ana Díaz-Negrillo, editor, Automatic Treatment and Analysis of Learner Corpus Data, pages 101–123, Amsterdam. John Benjamins.

Kenji Sagae, Eric Davis, Alon Lavie, Brian MacWhinney, and Shuly Wintner. 2010. Morphosyntactic Annotation of CHILDES Transcripts. Journal of Child Language, 37(3):705–729.

Geoffrey Sampson. 1995. English for the Computer: The SUSANNE Corpus and Analytic Scheme. Clarendon Press, Oxford, UK.

Maolin Wang, Shervin Malmasi, and Mingxuan Huang. 2015. The Jinan Chinese Learner Corpus. In Proc. 10th Workshop on Innovative Use of NLP for Building Educational Applications.

Li Wang. 2003. The metric of Chinese poems (Hanyu shiluxue ?????). Zhonghua shuju, Hong Kong.

Helen Yannakoudakis, Ted Briscoe, and Ben Medlock. 2011. A New Dataset and Method for Automatically Grading ESOL Texts. In Proc. ACL.

Baolin Zhang. 2009. The Characteristics and Functions of the HSK Dynamic Composition Corpus. International Chinese Language Education, 4(11).

Citeringar i Crossref