The Benefit of Syntactic vs. Linear n-grams for Linguistic Description

Melanie Andresen
Universität Hamburg, Institute for German Language and Literature, Germany

Heike Zinsmeister
Universität Hamburg, Institute for German Language and Literature, Germany

Published in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy

Linköping Electronic Conference Proceedings 139:3, s. 4-14

Published: 2017-09-13

ISBN: 978-91-7685-467-9

ISSN: 1650-3686 (print), 1650-3740 (online)


Automatic dependency annotations have been used in all kinds of language applications. However, there has been much less exploitation of dependency annotations for the linguistic description of language varieties. This paper presents an attempt to employ dependency annotations for describing style. We argue that for this purpose, linear n-grams (that follow the text’s surface) alone do not appropriately represent a language like German. For this claim, we present theoretically as well as empirically founded arguments. We suggest syntactic n-grams (that follow the dependency paths) as a possible solution. To demonstrate their potential, we compare the German academic languages of linguistics and literary studies using both linear and syntactic n-grams. The results show that the approach using syntactic n-grams allows for the detection of linguistically meaningful patterns that do not emerge in a linear n-gram analysis, e. g. complex verbs and light verb constructions.


