Language-independent exploration of repetition and variation in longitudinal child-directed speech: a tool and resources

Gintarė Grigonytė
Department of Linguistics, Stockholm University, Stockholm, Sweden

Kristina Nilsson Björkenstam
Department of Linguistics, Stockholm University, Stockholm, Sweden

Ladda ner artikel

Ingår i: Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016

Linköping Electronic Conference Proceedings 130:6, s. 41-50

Visa mer +

Publicerad: 2016-11-15

ISBN: 978-91-7685-633-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


We present a language-independent tool, Varseta, for extracting variation sets in child-directed speech. We also present a corpus annotated with variation sets for Swedish, MINGLE-3-VS, and corpora derived from the CHILDES database, CHILDES-26-VS, suitable for the exploration of variation sets in 26 languages. The tool and the resources are freely available for research.


Variation sets, corpora, tools, CDS


Kristina Nilsson Björkenstam, Mats Wirén, and Robert Östling. 2016. Modelling the informativeness and timing of non-verbal cues in parent–child interaction. In Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning, August 11, 2016, Association for Computational Linguistics, pages 82–90, Berlin, Germany.

Paul E. Black. 2004. Ratcliff/Obershelp pattern recognition. Dictionary of Algorithms and Data Structures, 17.

Peter Brodsky, Heidi R. Waterfall, and Shimon Edelman. 2007. Characterizing motherese: On the computational structure of child-directed language. In Proc. 29th Cognitive Science Society Conference, Nashville, TN.

Erika Hoff-Ginsberg. 1986. Function and structure in maternal speech: Their relation to the child’s development of syntax. Developmental Psychology, 22(3):155–163.

Erika Hoff-Ginsberg. 1990. Maternal speech and the child’s development of syntax: a further look. Journal of Child Language, 17:85–99.

Nini Hoiting and Dan I. Slobin. 2002. What a deaf child needs to see: Advantages of a natural sign language over a sign system. In R. Schulmeister and H. Reinitzer, editors, Progress in sign language research. In honor of Siegmund Prillwitz/Fortschritte in der Geb¨ardensprachforschung. Festschrift für Siegmund Prillwitz, pages 268–277. Signum, Hamburg.

Aylin C. K¨untay and Dan I. Slobin. 1996. Listening to a turkish mother: Some puzzles for acquisition. In Social Interaction, Social Context, and Language. Essays in the Honor of Susan Ervin-Tripp, pages 265–286. Lawrence Erlbaum, Mahwah, NJ.

Aylin C. K¨untay and Dan I. Slobin. 2002. Putting interaction back into child language: Examples from Turkish. Psychology of Language and Communication, 6:5–14.

Francisco Lacerda. 2009. On the emergence of early linguistic functions: A biological and interactional perspective. In M. Lindgren M. Roll K. Alter, M. Horne and J. von Koss Torkildsen, editors, Brain Talk: Discourse with and in the brain, pages 207–230. Media-Tryck, Lund, Sweden.

Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707–710.

Brian MacWhinney. 2000. The CHILDES Project: Tools for analyzing talk. Lawrence Erlbaum Associates, Mahwah, NJ, 3 edition. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.

Anat Ninio, Catherine E. Snow, Barbara A. Pan, and Pamela R. Rollins. 1994. Classifying communicative acts in children’s interactions. Journal of Communicative Disorders, 27:157–187.

Luca Onnis, Heidi R. Waterfall, and Shimon Edelman. 2008. Learn locally, act globally: Learning language from variation set cues. Cognition, 109(3):423–430.

Robert Östling. 2013. Stagger: an open-source part of speech tagger for Swedish. Northern European Journal of Language Technology, 3:1–18.

Heidi R. Waterfall, Ben Sandbank, Luca Onnis, and Shimon Edelman. 2010. An empirical generative framework for computational modeling of language acquisition. Journal of Child Language, 37:671–703.

Heidi R. Waterfall. 2006. A Little Change is a Good Thing: Feature Theory, Language Acquisition and Variation Sets. Ph.D. thesis, Department of Linguistics, University of Chicago.

Mats Wirén, Kristina Nilsson Björkenstam, Gintare Grigonyte, and Elisabet Eir Cortes. 2016. Longitudinal studies of variation sets in child-directed speech. In Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning, August 11, 2016, Association for Computational Linguistics, pages 44–52, Berlin, Germany.

P. Wittenburg, H. Brugman, A. Russel, A. Klassmann, and H. Sloetjes. 2006. ELAN: a professional framework for multimodality research. In Proceedings of LREC 2006, Fifth International Conference on Language Resources and Evaluation, pages 1556–1559, Genoa, Italy, May. ELRA.

Citeringar i Crossref