Conference article

TalkBank and CLARIN

Brian MacWhinney
Department of Psychology, Carnegie Mellon University, Pittsburgh USA

Download article

Published in: Selected papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26–28 October 2016, CLARIN Common Language Resources and Technology Infrastructure

Linköping Electronic Conference Proceedings 136:6, p. 76-89

Show more +

Published: 2017-05-23

ISBN: 978-91-7685-499-0

ISSN: 1650-3686 (print), 1650-3740 (online)


TalkBank promotes the use of corpora, web-based access, multimedia linkage, and human language technology (HLT) for the study of spoken language interactions in a variety of discourse types across many languages, involving children, second language learners, bilinguals, people with language disorders, and classroom learners. Integration of these materials within CLARIN provides open access to access a large amount of research data, as well as a test bed for the development of new computational methods.


Corpora, spoken language, interoperability, standards, metadata


Baroni, M., & Kilgarriff, A. (2006). Large linguistically-processed web corpora for multiple languages. Paper presented at the Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations.

Bernstein Ratner, N., & MacWhinney, B. (2016). Your laptop to the rescue: Using the Child Language Data Exchange System archive and CLAN utilities to improve child language sample analysis. Seminars in Speech and Language, 37, 74-84.

Biber, D. (1991). Variation across speech and writing: Cambridge University Press.

Bird, S., & Liberman, M. (2001). A formal framework for linguistic annotation. Speech Communication, 33, 23-60.

Clahsen, H., & Rothweiler, M. (1992). Inflectional rules in children’s grammars: Evidence from German participles. In G. Booij & J. Van Marle (Eds.), Yearbook of Morphology. Dordrecht: Kluwer.

Farrar, S., & Langendoen, D. T. (2010). An owl-dl implementation of gold Linguistic Modeling of Information and Markup Languages (pp. 45-66): Springer.

Freudenthal, D., Pine, J., & Gobet, F. (2010). Explaining quantitative variation in the rate of Optional Infinitive errors across languages: A comparison of MOSAIC and the Variational Learning Model. Journal of Child Language, 37, 643-669.

Givon, T. (2005). Context as other minds: The pragmatics of sociality, cognition, and communication. Philadelphia, PA: John Benjamins.

Hausser, R. (1999). Foundations of computational linguistics. Berlin: Springer.

Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., . . . Sainath, T. N. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82-97.

Hymes, D. (1962). The ethnography of speaking. Anthropology and human behavior, 13(53), 11-74.

Lee, L. (1974). Developmental Sentence Analysis. Evanston, IL: Northwestern University Press.

Lehrer, R., & Curtis, C. L. (2000). Why are some solids perfect? Teaching Children Mathematics, 6(5), 324.

Lüdeling, A., Walter, M., Kroymann, E., & Adolphs, P. (2005). Multi-level error annotation in learner corpora. Proceedings of corpus linguistics 2005, 15-17.

MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd Edition. Mahwah, NJ: Lawrence Erlbaum Associates.

MacWhinney, B. (2008). Enriching CHILDES for morphosyntactic analysis. In H. Behrens (Ed.), Trends in corpus research: Finding structure in data (pp. 165-198). Amsterdam: John Benjamins.

MacWhinney, B. (2015a). Introduction: Language Emergence. In B. MacWhinney & W. O’Grady (Eds.), Handbook of Language Emergence (pp. 1-32). New York, NY: Wiley.

MacWhinney, B. (2015b). Multidimensional SLA. In S. Eskilde & T. Cadierno (Eds.), Usage-based perspectives on second language learning (pp. 22-45). New York, NY: Oxford University Press.

MacWhinney, B. (in press). A shared platform for studying second language acquisition. Language Learning.

MacWhinney, B., & Fromm, D. (2015). AphasiaBank as Big Data. Seminars in Speech and Language, 37, 10-22.

MacWhinney, B., & Leinbach, J. (1991). Implementations are not conceptualizations: Revising the verb learning model. Cognition, 29, 121-157.

MacWhinney, B., & Wagner, J. (2010). Transcribing, searching and data sharing: The CLAN software and the TalkBank data repository. Gesprächsforschung, 11, 154-173.

Malvern, D., Richards, B., Chipere, N., & Purán, P. (2004). Lexical diversity and language development. New York, NY: Palgrave Macmillan.

Marcus, G. F., Pinker, S., Ullman, M., Hollander, M., Rosen, T. J., Xu, F., & Clahsen, H. (1992). Overregularization in language acquisition. Monographs of the Society for Research in Child Development, i-178.

Metze, F., Riebling, E., Warlaumont, A. S., & Bergelson, E. (2016). Virtual machines and containers as a platform for experimentation. Paper presented at the Cognitive Science, Philadelphia, PA.

Meurers, D. (2005). On the use of electronic corpora for theoretical linguistics: Case studies from the syntax of German. Lingua, 115(11), 1619-1639.

Meurers, D. (2012). Natural language processing and language learning. The Encyclopedia of Applied Linguistics. doi: 10.1002/9781405198431.wbeal0858

Meurers, D., Ziai, R., Amaral, L., Boyd, A., Dimitrov, A., Metcalf, V., & Ott, N. (2010). Enhancing authentic web pages for language learners. Paper presented at the Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications.

Myers-Scotton, J. (2005). Supporting a differential access hypothesis: Code switching and other contact data. In J. F. Kroll & A. M. B. DeGroot (Eds.), Handbook of bilingualism: Psycholinguistic approaches (pp. 326-348). New York, NY: Oxford University Press.

Nathani, S., & Oller, D. K. (2001). Beyond ba-ba and gu-gu: Challenges and strategies in coding infant vocalizations. Behavior Research Methods, Instruments, & Computers, 33(3), 321-330.

Parisse, C., & Le Normand, M.-T. (2000). Automatic disambiguation of the morphosyntax in spoken language corpora. Behavior Research Methods, Instruments, and Computers, 32, 468-481.

Pennebaker, J. W. (2012). Opening up: The healing power of expressing emotions: Guilford Press.

Pine, J. M., & Lieven, E. V. M. (1997). Slot and frame patterns and the development of the determiner category. Applied Psycholinguistics, 18, 123-138.

Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a Parallel Distributed Processing Model of language acquisition. Cognition, 29, 73-193.

Redeker, G. (1984). On differences between spoken and written language*. Discourse Processes, 7(1), 43-55.

Rose, Y., & MacWhinney, B. (2014). The PhonBank Project: Data and software-assisted methods for the study of phonology and phonological development. In J. Durand, U. Gut, & G. Kristoffersen (Eds.), The Oxford handbook of corpus phonology (pp. 380-401). Oxford: Oxford University Press.

Sagae, K., Davis, E., Lavie, A., MacWhinney, B., & Wintner, S. (2007). High-accuracy annotation and parsing of CHILDES transcripts Proceedings of the 45th Meeting of the Association for Computational Linguistics (pp. 1044-1050). Prague: ACL.

Scarborough, H. S. (1990). Index of productive syntax. Applied Psycholinguistics, 11, 1-22. doi: 10.1017/S0142716400008262

Stigler, J., Gallimore, R., & Hiebert, J. (2000). Using video surveys to compare classrooms and teaching across cultures: Examples and lessons from the TIMSS video studies. Educational Psychologist, 35(2), 87-100.

Valian, V., Solt, S., & Stewart, J. (2009). Abstract categories or limited-scope formulae? The case of children’s determiners. Journal of Child Language, 36(04), 743-778.

VanDam, M., Warlaumont, A. S., Bergelson, E., Cristia, A., Soderstrom, M., Palma, P. D., & MacWhinney, B. (2016). HomeBank: An online repository of daylong child-centered audio recordings. Seminars in Speech and Language, 37(2), 128-142. doi: 10.1055/s-0036-1580745

Wexler, K. (1998). Very early parameter setting and the unique checking constraint: A new explanation of the optional infinitive stage. Lingua, 106, 23-79.

Zipser, F., & Romary, L. (2010). A model oriented approach to the mapping of annotation formats using standards. Paper presented at the Workshop on Language Resource and Language Technology Standards, LREC 2010.

Citations in Crossref