discoursegraphs: A graph-based merging tool and converter for multilayer annotated corpora

Arne Neumann
Applied Computational Linguistics, SFB 632 / EB Cognitive Science, Universität Potsdam, Germany

Ladda ner artikel

Ingår i: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:43, s. 309-312

NEALT Proceedings Series 23:43, p. 309-312

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


discoursegraphs is a Python-based converter for linguistic annotation formats which facilitates the combination of several, heterogeneous layers of annotation of a document into a unified graph representation. The library supports a range of syntax and discourse-related formats and was successfully used to revise and merge a multilayered corpus (Stede and Neumann, 2014).


Inga nyckelord är tillgängliga


Ulrik Brandes, Markus Eiglsperger, Jürgen Lerner, and Christian Pich. 2013. Graph markup language (GraphML). In Roberto Tamassia, editor, Handbook of Graph Drawing and Visualization. CRC Press.

Stephan Druskat, Lennart Bierkandt, Volker Gast, Christoph Rzymski, and Florian Zipser. 2014. Atomic: an open-source software platform for multi-level corpus annotation. In Proceedings of the 12th edition of the KONVENS conference Vol. 1. Universität Hildesheim.

John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen C North, and Gordon Woodhull. 2002. Graphviz–open source graph drawing tools. In Graph Drawing, pages 483–484. Springer.

Richárd Farkas, Veronika Vincze, György Móra, János Csirik, and Gy¨orgy Szarvas. 2010. The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning—Shared Task, pages 1–12. Association for Computational Linguistics.

Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Exploring network structure, dynamics, and function using NetworkX. In G¨ael Varoquaux, Travis Vaught, and Jarrod Millman, editors, Proceedings of the 7th Python in Science Conference (SciPy2008), pages 11–15, Pasadena, CA USA.

Jan Hajic, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Marti, Lluiis Màrquez, Adam Meyers, Joakim Nivre, Sebastian Padó, Jan Štepánek, et al. 2009. The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, pages 1–18. Association for Computational Linguistics.

Nancy Ide and Keith Suderman. 2007. GrAF: A graphbased format for linguistic annotations. In Proceedings of the Linguistic Annotation Workshop, pages 1–8. Association for Computational Linguistics.

ISO 24612. 2012. Language Resource Management – Linguistic Annotation Framework. International Standards Organization, Geneva, Switzerland.

Shafiq Joty and Alessandro Moschitti. 2014. Discriminative Reranking of Discourse Parses Using Tree Kernels. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2049–2060. Association for Computational Linguistics.

Thomas Krause and Amir Zeldes. 2014. ANNIS3: A new architecture for generic corpus query and visualization. Literary and Linguistic Computing.

Andreas Mengel and Wolfgang Lezius. 2000. An XML-based Representation Format for Syntactically Annotated Corpora. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC 2000).

Christoph M¨uller and Michael Strube. 2006. Multilevel annotation of linguistic data with MMAX2. In Sabine Braun, Kurt Kohn, and Joybrato Mukherjee, editors, Corpus technology and language pedagogy: New resources, new tools, new methods, pages 197–214. Peter Lang.

Arne Neumann, Nancy Ide, and Manfred Stede. 2013. Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF. In Proceedings of the Seventh Linguistic Annotation Workshop (LAW), pages 98–102. Association for Computational Linguistics.

Michael O’Donnell. 2000. RSTTool 2.4: a markup tool for Rhetorical Structure Theory. In Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), pages 253–256.

Association for Computational Linguistics. Fernando P´erez and Brian E. Granger. 2007. IPython: a system for interactive scientific computing. Computing in Science and Engineering, 9(3):21–29.

Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind K Joshi, and Bonnie L Webber. 2008. The Penn Discourse TreeBank 2.0. In Proceedings of LREC 2008.

Thomas Schmidt. 2004. Transcribing and annotating spoken language with EXMARaLDA. In Proceedings of the LREC-Workshop on XML based richly annotated corpora, Lisbon, pages 69–74.

Manfred Stede and Silvan Heintze. 2004. Machineassisted rhetorical structure annotation. In Proceedings of the 20th international conference on Computational Linguistics, page 425. Association for Computational Linguistics.

Manfred Stede and Arne Neumann. 2014. Potsdam Commentary Corpus 2.0: Annotation for Discourse Research. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland. European Language Resources Association (ELRA).

Pontus Stenetorp, Sampo Pyysalo, Goran Topic, Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. brat: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations Session at EACL 2012, Avignon, France. Association for Computational Linguistics.

Maarten van Gompel and Martin Reynaert. 2013. FoLiA: A practical XML Format for Linguistic Annotation-a descriptive and comparative study. Computational Linguistics in the Netherlands Journal, 3:63–81.

Seid Muhie Yimam, Iryna Gurevych, Richard Eckart de Castilho, and Chris Biemann. 2013. WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations. In ACL (Conference System Demonstrations), pages 1–6.

Amir Zeldes, Florian Zipser, and Arne Neumann. 2013. PAULA XML Documentation: Format Version 1.1. Research Report, hal-00783716, https://hal.inria.fr/hal-00783716.

Florian Zipser, Laurent Romary, et al. 2010. A model oriented approach to the mapping of annotation formats using standards. InWorkshop on Language Resource and Language Technology Standards, LREC 2010.

Florian Zipser, Mario Frank, and Jakob Schmolling. 2014. Merging data, the essence of creation of multi-layer corpora. In Postersession der Sektion Computerlingustik auf der 36. Jahrestagung der Deutschen Gesellschaft f¨ur Sprachwissenschaft (DGfS).

Citeringar i Crossref