The Potsdam Commentary Corpus 2.1 in ANNIS3

Peter Bourgonje
Applied Computational Linguistics, University of Potsdam, Germany

Manfred Stede
Applied Computational Linguistics, University of Potsdam, Germany

Ladda ner artikel

Ingår i: Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), December 13–14, 2018, Oslo University, Norway

Linköping Electronic Conference Proceedings 155:5, s. 31-38

Visa mer +

Publicerad: 2018-12-10

ISBN: 978-91-7685-137-1

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


We present a new version of the Potsdam Commentary Corpus; a German corpus of news commentary articles annotated on several different layers. This new release includes additional annotation layers for dependency trees and information-structural aboutness topics as well as some bug fixes. In addition to discussing the additional layers, we demonstrate the added value of loading the corpus in ANNIS3, a tool to merge different annotation layers on the same corpus and allow for queries combining information from different annotation layers. Using several cross-layer example queries we demonstrate its suitability to corpus analysis for various different areas.


treebanks, information structure, cross-layer analysis


Becker, M., Palmer, A., and Frank, A. (2016). Argumentative texts and clause types. In Proceedings of the Third Workshop on Argumentation Mining, Berlin. Association for Computational Linguistics.

Bourgonje, P. and Stede, M. (2018). Identifying explicit discourse connectives in German. In Proceedings of the 19th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2018), pages 327–331, Melbourne, Australia. Association for Computational Linguistics.

Bourgonje, P. and Stede, M. (To appear). Topics and subjects in German newspaper editorials: A corpus study.

Brants, S., Dipper, S., Hansen, S., Lezius, W., and Smith, G. (2002). The TIGER treebank. In Proc. of the Workshop on Treebanks and Linguistic Theories, Sozopol.

Cook, P. and Bildhauer, F. (2013). Identifying ‘aboutness topics’: two annotation experiments. Dialogue and Discourse, 4(2):118–141.

Danlos, L., Rysova, K., Rysova, M., and Stede, M. (2018). Primary and secondary discourse connectives: definitions and lexicons. Dialogue and Discourse, 9(1):50–78.

Dipper, S., Götze, M., Stede, M., and Wegst, T. (2004). Annis: A linguistic database for exploring information structure. In Interdisciplinary Studies on Information Structure, ISIS Working papers of the SFB 632 (1), pages 245–279.

Hoek, J., Evers-Vermeul, J., and Sanders, T. (2018). Segmenting discourse: Incorporating interpretation into segmentation? Corpus Linguistics and Linguistic Theory, 14(2):357–386.

Jacobs, J. (2001). The dimensions of Topic–Comment. Linguistics, 39(4):641–681.

Krasavina, O. and Chiarcos, C. (2007). PoCoS: The Potsdam Coreference Scheme. In Proc. Of the Linguistic Annotation Workshop (LAW) at ACL-07, Prague.

Krause, T. and Zeldes, A. (2016). ANNIS3: A new architecture for generic corpus query and visualization. Digital Scholarship in the Humanities, 31.

Mann, W. and Thompson, S. (1988). Rhetorical Structure Theory: Towards a functional theory of text organization. TEXT, 8:243–281.

Matthiessen, C. and Thompson, S. (1988). The structure of discourse and ‘subordination’. In Haiman, J. and Thompson, S., editors, Clause combining in grammar and discourse, pages 275–329. John Benjamins, Amsterdam.

Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., and Webber, B. (2008). The Penn Discourse Treebank 2.0. In Proc. of the 6th International Conference on Language Resources and Evaluation (LREC), Marrakech, Morocco.

Reitter, D. (2003). Simple signals for complex rhetorics: On rhetorical analysis with rich-feature support vector models. LDV Forum, 18(1/2):38–52.

Sennrich, R., Schneider, G., Volk, M., and Warin, M. (2009). A new hybrid dependency parser for German. In Chiarcos, C., de Castilho, R. E., and Stede, M., editors, From Text to Meaning: Processing Text Automatically. Proceedings of the Biennial GSCL Conference 2009, pages 115–124, Tübingen. Narr.

Stede, M. (2004). The Potsdam Commentary Corpus. In Proc. of the ACL Workshop on Discourse Annotation, pages 96–102, Barcelona.

Stede, M., editor (2016). Handbuch Textannotation: Potsdamer Kommentarkorpus 2.0, volume 8 of Potsdam Cognitive Science Series. Universitätsverlag, Potsdam.

Stede, M. and Mamprin, S. (2016). Information structure in the Potsdam Commentary Corpus: Topics. In Proc. of the Ninth International Conference on Language Resources and Evaluation (LREC), Portorož, Slovenia. European Language Resources Association (ELRA).

Stede, M. and Neumann, A. (2014). Potsdam Commentary Corpus 2.0: Annotation for discourse research. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), pages 925–929, Reikjavik.

Tuggener, D. (2016). Incremental Coreference Resolution for German. PhD thesis, University of Zurich, Faculty of Arts.

Citeringar i Crossref