Conference article

The IPP effect in Afrikaans: a corpus analysis

Liesbeth Augustinus
Centre for Computational Linguistics, University of Leuven, Belgium

Peter Dirix
Centre for Computational Linguistics, University of Leuven, Belgium and Nuance Communications, Inc.

Download article

Published in: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:20, p. 213-225

NEALT Proceedings Series 16:20, p. 213-225

Show more +

Published: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

Compared to well-resourced languages such as English and Dutch; NLP tools for linguistic analysis in Afrikaans are still not abundant. In order to facilitate corpus-based linguistic research for Afrikaans; we are creating a treebank based on the Taalkommissie corpus. We adapted a tokenizer and a shallow parser; while using a TnT tagger to do part-of-speech annotation. A first linguistic phenomenon we are investigating is the occurrence of infinitivus pro participio (IPP) in Afrikaans. IPP refers to constructions with a perfect auxiliary; in which an infinitive appears instead of the expected past participle. The phenomenon has been studied extensively in Dutch and German; but studies on Afrikaans IPP triggers are sparse. In contrast to the former two languages; it is often mentioned in the literature that in Afrikaans; IPP occurs optionally. We want to check this statement doing a corpus analysis.

Keywords

Afrikaans; tokenizer; parser; chunker; corpus search tool; IPP

References

Augustinus; L.; Vandeghinste; V.; and Van Eynde; F. (2012). Example-Based Treebank Querying. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012); Istanbul.

Brants; T. (2000). TnT – A Statistical Part-of-Speech Tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP-2000); pages 224–231; Seattle.

Breed; A. (2012). Die grammatikalisering van aspek in Afrikaans: semantiese studie van perifrastiese progressiewe konstruksies. PhD thesis; North-West University; Potchefstroom.

Brill; E. (1992). A simple rule-based part of speech tagger. In Proceedings of the third conference on Applied natural language processing (ANLC 42); pages 152–155; Stroudburg; PA.

De Vos; M. (2001). Afrikaans Verb Clusters: A Functional-Head Analysis. Master’s thesis; University of Tromsø; Tromsø.

Dirix; P.; Vandeghinste; V.; and Schuurman; I. (2005). METIS-II: Example-based machine translation using monolingual corpora – System description. In Proceedings of MT Summit X; Workshop on Example-Based Machine Translation; pages 43–50; Phuket.

Donaldson; B. C. (1993). A Grammar of Afrikaans. Mouton de Gruyter; Berlin/New York. Dudenredaktion (2006). DUDEN. Die Grammatik. Unentbehrlich für richtiges Deutsch. Dudenverlag; Mannheim/Leipzig/Vienna/Zürich.

Grover; A. S.; van Huyssteen; G. B.; and Pretorius; M. W. (2011). A Technology Audit: The State of Human Language Technologies (HLT) R&D in South Africa. In Proceedings of PICMET’11: Technology Management In The Energy-Smart World (PICMET); pages 1693– 1706.

Haeseryn; W.; ; Romijn; K.; Geerts; G.; de Rooij; J.; and van den Toorn; M. (1997). Algemene Nederlandse Spraakkunst. Martinus Nijhoff/Wolters Plantyn; Groningen/Deurne; second edition.

Pilon; S. (2005). Outomatiese Afrikaanse woordsoortetikettering. Master’s thesis; North- West University; Potchefstroom.

Ponelis; F. A. (1979). Afrikaanse Sintaksis. J.L. van Schaik; Pretoria.

Puttkammer; M. J. (2006). Outomatiese Afrikaanse tekseenheididentifisering. Master’s thesis; North-West University; Potchefstroom.

Schlünz; G. I. (2010). The effects of part-of-speech tagging on text-to-speech synthesis for resource-scarce languages. Master’s thesis; North-West University; Potchefstroom. Taalkommissie van die Suid-Afrikaanse Akademie vir Wetenskap en Kuns (2011). Taalkommissiekorpus 1.1.; CTexT; North West University; Potchefstroom.

Vandeghinste; V. (2008). A Hybrid Modular Machine Translation System. PhD thesis;
University of Leuven.

Verdoolaege; A. and Van Keymeulen; J. (2010). Grammatica van het Afrikaans. Academia Press; Ghent.

Zwart; J.-W. (2007). Some notes on the origin and distribution of the IPP-effect. Groninger Arbeiten zur Germanistischen Linguistik; 45:77–99.

Citations in Crossref