The automatic identification of discourse units in Dutch text

Nynke van der Vliet
University of Groningen, The Netherlands

Gosse Bouma
University of Groningen, The Netherlands

Gisela Redeker
University of Groningen, The Netherlands

Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:37, s. 411-421

NEALT Proceedings Series 16:37, s. 411-421

Publicerad: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


The identification of discourse units is an essential step in discourse parsing; the automatic construction of a discourse structure from a text. We present a rule-based algorithm to identify elementary discourse units (EDUs) in Dutch written text. Contrary to approaches that focus on the determination of segment boundaries; we identify complete discourse units; which is especially helpful for the recognition of interrupted EDUs that contain embedded discourse units. We use syntactic and lexical information to decompose sentences into EDUs. Experimental results show that our algorithm for EDU identification performs well on texts of various genres.


Discourse analysis; elementary discourse units; segmentation


