The IPP effect in Afrikaans: a corpus analysis

Liesbeth Augustinus
Centre for Computational Linguistics, University of Leuven, Belgium

Peter Dirix
Centre for Computational Linguistics, University of Leuven, Belgium and Nuance Communications, Inc.

Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:20, s. 213-225

NEALT Proceedings Series 16:20, s. 213-225

Publicerad: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Compared to well-resourced languages such as English and Dutch; NLP tools for linguistic analysis in Afrikaans are still not abundant. In order to facilitate corpus-based linguistic research for Afrikaans; we are creating a treebank based on the Taalkommissie corpus. We adapted a tokenizer and a shallow parser; while using a TnT tagger to do part-of-speech annotation. A first linguistic phenomenon we are investigating is the occurrence of infinitivus pro participio (IPP) in Afrikaans. IPP refers to constructions with a perfect auxiliary; in which an infinitive appears instead of the expected past participle. The phenomenon has been studied extensively in Dutch and German; but studies on Afrikaans IPP triggers are sparse. In contrast to the former two languages; it is often mentioned in the literature that in Afrikaans; IPP occurs optionally. We want to check this statement doing a corpus analysis.


Afrikaans; tokenizer; parser; chunker; corpus search tool; IPP


