Konferensartikel

Parsed Annotation with Semantic Calculation

Alastair Butler
Faculty of Humanities and Social Sciences, Hirosaki University, Japan

Stephen Wright Horn
Theory and Typology Division, National Institute for Japanese Language and Linguistics, Japan

Ladda ner artikel

Ingår i: Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), December 13–14, 2018, Oslo University, Norway

Linköping Electronic Conference Proceedings 155:6, s. 39-51

Visa mer +

Publicerad: 2018-12-10

ISBN: 978-91-7685-137-1

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

This paper describes a corpus building program implemented for Japanese (Contemporary and Old) and for English. First, constituent tree syntactic annotations defined to describe intuitions about sentence meaning are added to the texts. The annotations undergo tree transformations that normalise the analyses while preserving basic syntactic relations. The normalisation takes the parsed data for what are very different languages to a level where language particulars have a common interface to feed a semantic calculation. This calculation makes explicit connective, predicate, argument, and operator-binding information. Such derived information reflects uni- versal principles of language: headedness, argumenthood, modification, co-reference, scope, etc. The semantic calculation also sets some minimal conditions for well-formedness: that pred- icative expressions are paired with subjects; that pro-forms have retrievable referents; that a constituent is associated with at least one grammatical function, etc. Annotators confirm and correct the source annotation with the aid of a visualisation tool that integrates the calculated output as overlaid dependency links. In this way annotators ensure that their interpretation of a text is correctly represented in its annotation. Furthermore, the integration of results from the semantic calculation makes it possible to establish multiple layers of grammatical dependencies with a minimum of invested annotation work.

Nyckelord

parsed corpus, sentence and discourse meaning, normalisation, frammatical dependencies, discourse referents, visualisation

Referenser

Abzianidze, Lasha, Johannes Bjerva, Kilian Evang, Hessel Haagsma, Rik van Noord, Pierre Ludmann, Duc-Duy Nguyen, and Johan Bos. 2017. The parallel meaning bank: Towards a multilingual corpus of translations annotated with compositional meaning representations. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages Valencia, Spain. 242–247.

Basile, V., J. Bos, K. Evang, and N.J. Venhuizen. 2012. Developing a large semantically annotated corpus. In Proceedings of the 8th Int. Conf. on Language Resources and Evaluation. Istanbul, Turkey.

Butler, Alastair and Kei Yoshimoto. 2012. Banking meaning representations from treebanks. Linguistic Issues in Language Technology - LiLT 7(1):1–22.

Dyvik, Helge, Paul Meurer, Victoria Rosén, Koenraad De Smedt, Petter Haugereid, Gyri Smørdal Losnegaard, Gunn Inger Lyse, and Martha Thunes. 2016. NorGramBank: A ‘Deep’ Treebank for Norwegian. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 3555–3562. Paris, France: European Language Resources Association (ELRA).

Flickinger, Dan, Valia Kordoni, and Yi Zhang. 2012. DeepBank: A dynamically annotated treebank of the Wall Street Journal. In Proceedings of TLT-11. Lisbon, Portugal.

Garside, Roger, Geoffrey Leech, and Geoffrey Sampson, eds. 1987. The Computational Analysis of English: a corpus-based approach. London: Longman.

Hockenmaier, Julia and Mark Steedman. 2005. CCGbank: User’s manual. Tech. Rep. MS-CIS-05-09, Department of Computer and Information Science, University of Pennsylvania, Philadelphia.

Kamp, Hans and Uwe Reyle. 1993. From Discourse to Logic: Introduction to Model-theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Dordrecht: Kluwer.

Kiselyov, Oleg. 2018. Transformational Semantics on a Tree Bank. In S. Arai, K. Kojima, K. Mineshima, D. Bekki, K. Satoh, and Y. Ohta, eds., JSAI-isAI 2017, vol. 10838 of Lecture Notes in Computer Science, pages 241–252. Heidelberg: Springer.

Kroch, Anthony, Beatrice Santorini, and Ariel Diertani. 2010. The Penn-Helsinki Parsed Corpus of Modern British English (PPCMBE). Department of Linguistics, University of Pennsylvania. CD-ROM, second edition, (http://www.ling.upenn.edu/hist-corpora).

Marcus, Michell, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2):313–330.

Moot, Richard. 2015. A Type-Logical Treebank for French. Journal of LanguageModelling 3(1):229–265.

Randall, Beth. 2009. CorpusSearch 2 Users Guide. (http://corpussearch.sourceforge.net/CSmanual/Contents.html).

Sampson, Geoffrey R. 1995. English for the Computer: The SUSANNE Corpus and Analytic Scheme. Oxford: Clarendon Press (Oxford University Press).

Santorini, Beatrice. 2010. Annotation manual for the Penn Historical Corpora and the PCEEC (Release 2). Tech. rep., Department of Computer and Information Science, University of Pennsylvania, Philadelphia. (http://www.ling.upenn.edu/histcorpora/annotation).

Citeringar i Crossref