Towards a gold standard for Swedish CEFR-based ICALL

Elena Volodina
Språkbanken, University of Gothenburg, Göteborg, Sweden

Dijana Pijetlovic
Språkbanken, University of Gothenburg, Göteborg, Sweden

Ildiko Pilán
Språkbanken, University of Gothenburg, Göteborg, Sweden

Sofie Johansson Kokkinakis
Språkbanken, University of Gothenburg, Göteborg, Sweden

Ladda ner artikel

Ingår i: Proceedings of the second workshop on NLP for computer-assisted language learning at NODALIDA 2013; May 22-24; Oslo; Norway. NEALT Proceedings Series 17

Linköping Electronic Conference Proceedings 86:5, s. 48-65

NEALT Proceedings Series 17:5, s. 48-65

Visa mer +

Publicerad: 2013-05-17

ISBN: 978-91-7519-588-9

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In qualitative projects on ICALL (Intelligent Computer-Assisted Language Learning); research and development always go hand in hand: development both depends upon the research results and dictates the research agenda. Likewise; in the development of the Swedish ICALL platform Lärka; the practical issues of development have dictated its research agenda. With NLP approaches; sooner or later; the necessity for reliable training data becomes unavoidable. At the moment Lärka’s research agenda cannot be addressed without access to reliable training data; so-called “gold standard”. This paper gives an overview of the current state of the Swedish ICALL platform development and related research agenda; and describes the first attempts to collect the reference corpus (“gold standard”) coming from course books used in CEFR-based language teaching.


ICALL; CEFR; exercise generator; course book corpus compilation


Aldabe; I.; Lacalle; M.L.D.; Maritxalar; M.; Martinez; E.; Uria; L. (2006). ArikIturri: An Automatic Question Generator Based on Corpora and NLP Techniques. In Intelligent Tutoring Systems (2006); 584-594

Amaral; L. & Meurers; D. (2011). On using intelligent computer-assisted language learning in real-life foreign language teaching and learning. ReCALL 23(1): 4–24.

Amaral; L.; Meurers; D. & Ziai; R. (2011). Analyzing learner language: towards a flexible natural language processing architecture for intelligent language tutors. Computer Assisted Language Learning 24(1): 1–16.

Beinborn; L.; Zesch; T.; & Gurevych; I. (2012). Towards fine-grained readability measures for self-directed language learning. In Electronic Conference Proceedings (Vol. 80; pp. 11-19).

Borin; L.; Forsberg; M.; & Roxendal; J. (2012). Korp – the corpus infrastructure of Språkbanken. Proceedings of LREC 2012. Istanbul: ELRA; p.474–478.

Byrnes H. (2007). Perspectives. The Modern Language Journal; 91; iv; p.641– 645.

Carlsten; C. (2012). Proficiency Level – a Fuzzy Variable in Computer Learner Corpora. Applied Linguistics; Volume 33(2); p.161-183

Collins-Thompson; K. & Callan; J. (2005). Predicting reading difficulty with statistical language models. Journal of the American Society for Information¨Science and Technology; 56(13). pp. 1448-1462.

Collins-Thompson; K. and Callan; J. (2007). Automatic and Human Scoring of Word Definition Responses. Proceedings of NAACL HLT 2007; 476-483. Rochester; NY.

Council of Europe. (2001). The Common European Framework of Reference for Languages: Learning; Teaching; Assessment. Cambridge University Press.

Council of Europe. 2009. Relating language examinations to the Common European Framework of Reference for Languages: learning; teaching; assessment (CEFR). A Manual; Strasbourg: Language Policy Division.

Dávid; G.A. 2010. Linking the general English suite of Euro Examinations to the CEFR: a case study report. In Martyniuk; W. (Ed.) Aligning Tests with the CEFR. Cambridge University Press; p.177-203.

Einarsson; J. (1976). Talbanken: Talbankens skriftspråkskonkordans/ Talbankens talspråkskonkordans. Lund University.

Francois; T. & Miltsakaki; E. (2012). Do NLP and Machine Learning Improve Traditional Readability Formulas? In Proceedings of the First Workshop on Predicting and Improving Text Readability for Target Reader Population;NAACL

Hawkins; J. A. & Buttery; P. (2009). Using learner language from corpora to profile levels of proficiency: Insights from the English Profile Programme. In Taylor; L. & Weir; C. J. (Eds). Language Testing Matters: Investigating the Wider Social and Educational Impact of Assessment; 158-175. Cambridge: Cambridge University Press.

Heift; T. (2003). Multiple learner errors and meaningful feedback: A challenge for ICALL systems. CALICO Journal; 20(3); 533–548.

Heilman; M.; Collins-Thompson; K.; Callan; J. and Eskenazi; M. (2007). Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts. Proceedings of NAACL HLT 2007; 460-467. Rochester; NY.

Heimann Mühlenbock; K. (2013). I see what you mean: Assessing readability for specific target groups. PhD Thesis. Data linguistica; University of

Hultman; T. G. & Westman; M. (1977). Gymnasistsvenska. Lund: Liber Läromedel.

Johansson Kokkinakis; S. & Magnusson; U. (2011). Computer based quantitative methods applied to first and second language student writing. Young urban Swedish. Variation and change in multilingual settings.University of Gothenburg; 105-124.

Kate; R. J.; Luo; X.; Patwardhan; S.; Franz; M.; Florian; R.; Mooney; R. J. Roukos; S. & Welty; C. (2010). Learning to predict readability using diverse
linguistic features. In Proceedings of the 23rd International Conference on Computational Linguistics (pp. 546-554). Association for Computational Linguistics.

Khalifa; H.; Ffrench; A. & Salamoura; A. 2010. Maintaining alighnment to the CEFR: the FCE case study. In Martyniuk; W. (Ed.) Aligning Tests with the CEFR. Cambridge University Press; p.80-101.

Kilgarriff A.; Charalabopoulou F.; Gavrilidou M.; Bondi Johannessen J.; Khalil S.; Johansson Kokkinakis S.; Lew R.; Sharoff S.; Vadlapudi R; Volodina E. (accepted; LREJ 2013). Corpus-Based Vocabulary lists for Language Learners for Nine Languages. Language Resources and Evaluation Journal; special issue.

Kilgarriff; A.; Husak; M.; McAdam; K.; Rundell; M.; & Rychlý; P. (2008). GDEX: Automatically finding good dictionary examples in a corpus. In Proc. Euralex.

Knoop; S. & Wilske; S. (2013). Automatic Generation of Gap-Filling Vocabulary Exercises for Mobile Learning. 2nd workshop on NLP in Computer-Assisted Language Learning. Proceedings of the NODALIDA 2013 workshop on NLP for CALL. Linköping Electronic Conference Proceedings 85.

Källgren; G.; Gustafson-Capková; S. and Hartmann; B. (2006). Manual of the Stockholm Umeå Corpus version 2.0. Department of Linguistics; Stockholm University.

Lindberg; I. & Johansson Kokkinakis; S. (2007). OrdiL - en korpusbaserad kartläggning av ordförrådet i läromedel för grundskolans senare år. Göteborgs universite

Lindberg; I. & Johansson Kokkinakis; K. (2009). Word Type Grouping in Swedish Secondary School Textbooks - An Inventory of Words from a Second Language Perspective Multilingualism; Proceedings of the 23rd Scandinavian Conference of Linguistics. 337-339

Little D. (2007). The Common European Framework of Reference for Languages: Perspectives on the Making of Supranational Language Education Policy. The Modern Language Journal 91; p.645–655.

Little D. (2011). The Common European Framework of Reference for Languages: A research agenda. Language Teaching; Vol 44.3; p.381-393.
Cambridge University Press 2011.

Martin; J.R. & Rose; D. (2008). Genre Relations. Equinox Publishing Ltd.

Meurers; D.; Ziai; R.; Amaral; L.; Boyd; A.; Dimitrov; A.; Metcalf; V. & Ott; N. (2010. Enhancing Authentic Web Pages for Language Learners. Proceedings of the 5th Workshop on Innovative Use of NLP for Building Educational Applications; NAACL-HLT 2010; Los Angeles.

Milton; J. (2009). Measuring Second Language Vocabulary Acquisition. Toronto: Multilingual Matters.

Nagata; N. 2009. Robo-Sensei‘s NLP-based error detection and feed-back generation. CALICO Journal; 26(3); 562–579.

Nivre; J.; Nilsson; J. and Hall; J. (2006). Talbanken05: A Swedish Treebank with Phrase Structure and Dependency Annotation. In Proceedings of the fifth international conference on Language Resources and Evaluation (LREC2006) Genoa: ELRA. 1392-1395.

North; B. (2007). The CEFR illustrative descriptor scales. The Modern Language Journal 91; p.656–659.

Nyström; C. (2000). Gymnasisters skrivande. En studie av genre; textstruktur och sammanhang. Uppsala: Uppsala universitet.

Pijetlovic; D. & Volodina; E. (forthcoming). Developing a Swedish spelling game on an ICALL platform. Proceedings of EuroCALL 2013.

Pilán; I.; Volodina; E. & Johansson; R. (forthcoming). Automatic selection of suitable sentences for language learning exercises. Proceedings of EuroCALL 2013.

Szabó; G. 2010. Relating language examinations to the CEFR: ECL as a case study. In Martyniuk; W. (Ed.) Aligning Tests with the CEFR. Cambridge University Press; p.133-144.

Tanaka-Ishii; K.; Tezuka; S.; & Terada; H. (2010). Sorting texts by readability.Computational Linguistics; 36(2); 203-227.

Teleman; U. (1974). Manual för grammatisk beskrivning av talad och skriven svenska. Lund.

Toole; J. & Heift; T. (2002). Task-Generator: A Portable System for Generating Learning Tasks for Intelligent Language Tutoring Systems.
Proceedings of ED-MEDIA 02; World Conference on Educational Multimedia; Hypermedia & Telecommunications; Charlottesville; VA: AACE: 1972-1978. Volodina; E. and Borin; L. (2012). Developing a freely available web-based exercise generator for Swedish. CALL: Using; Learning; Knowing. EuroCALL Conference; Gothenburg; Sweden; 22-25 August 2012; Proceedings. Eds. Linda Bradley and Sylvie Thouësny. Research-publishing.net; Dublin; Ireland.

Volodina; E.; Borin; L.; Loftsson; H.; Arnbjörnsdóttir; B. & Örn Leifsson; G. (2012a). Waste not; want not: Towards a system architecture for ICALL
based on NLP component re-use. Workshop on NLP in Computer-Assisted Language Learning. Proceedings of the SLTC 2012 workshop on NLP for CALL. Linköping Electronic Conference Proceedings 80: 47-58.

Volodina; E.; Johansson; R. & Johansson Kokkinakis; S. (2012b). Semiautomatic selection of best corpus examples for Swedish: Initial algorithm evaluation. Workshop on NLP in Computer-Assisted Language Learning. Proceedings of the SLTC 2012 workshop on NLP for CALL. Linköping Electronic Conference Proceedings 80: 59–70.

Volodina; E. & Johansson Kokkinakis; S. (2012). Introducing Swedish Kellylist; a new lexical e-resource for Swedish. Proceedings of LREC 2012.
Istanbul: ELRA. Westhoff G. (2007). Challengens and Opportunities of the CEFR for Reimagining Foreign Language Pedagogy. The Modern Language Journal 91; p.676–679.

Östlund-Stjärnegårdh; E. (2002). Godkänd i svenska? Bedömning och analys av gymnasieelevers texter. Uppsala: Uppsala universitet.

Citeringar i Crossref