An analysis of a French as a Foreign Language Corpus for Readability Assessment

Thomas Francois
IL&C, Cental, Universitå catholique de Louvain, Belgium

Ladda ner artikel

Ingår i: Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University

Linköping Electronic Conference Proceedings 107:2, s. 13–32

NEALT Proceedings Series 22:2, p. 13–32

Visa mer +

Publicerad: 2014-11-11

ISBN: 978-91-7519-175-1

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Readability aims to assess the difficulty of texts based on various linguistic predictors (the lexicon used, the complexity of sentences, the coherence of the text, etc.). It is an active field that has applications in a large number of NLP domains, among which machine translation, text simplification, text summarisation, or CALL (Computer-Assisted Language Learning). For CALL, readability tools could be used to help the retrieval of educational materials or to make CALL platforms more adaptive. However, developing a readability formula is a costly process that requires a large amount of texts annotated in terms of difficulty. The current mainstream method to gather such a large corpus of annotated texts is to get them from educational resources such as textbooks or simplified readers. In this paper, we describe the collection process of an annotated corpus of French as a foreign language texts with the purpose of training a readability model. We follow the mainstream approach, getting the texts from textbooks, but we are concerned with the limitations of such “annotation” approach, in particular, as regards the homogeneity of the difficulty annotations across textbook series. Their reliability is assessed using both a qualitative and a quantitative analysis. It appears that, for some educational levels, the hypothesis of the annotation homogeneity must be rejected. Various reasons for such findings are discussed and the paper concludes with recommandations for future similar attempts.


Readability; FFL; corpus collect; reliability of difficulty annotations


Agresti, A. (2002). Categorical Data Analysis. 2nd edition. Wiley-Interscience, New York.

Al-Khalifa, S. and Al-Ajlan, A. (2010). Automatic readability measurements of the arabic text: An exploratory study. 35(2C).

Alderson, J. (2007). The cefr and the need for more research. The Modern Language Journal, 91(4):659–663.

Antoniadis, G., Echinard, S., Kraif, O., Lebarbé, T., and Ponton, C. (2005). Modélisation de l’intégration de ressources TAL pour l’apprentissage des langues : la plateforme MIRTO.

Apprentissage des langues et systèmes d’information et de communication (ALSIC), 8(1):65–79.

Aquino, M., Mosberg, L., and Sharron, M. (1969). Reading comprehension difficulty as a function of content area and linguistic complexity. The Journal of Experimental Educational, 37(4):1–4.

Björnsson, C. (1983). Readability of newspapers in 11 languages. Reading Research Quarterly, 18(4):480–497.

Bormuth, J. (1969). Development of Readability Analysis. Technical report, Projet n°7-0052, U.S. Office of Education, Bureau of Research, Department of Health, Education and Welfare, Washington, DC.

Brown, C., Snodgrass, T., Kemper, S., Herman, R., and Covington, M. (2008). Automatic measurement of propositional idea density from part-of-speech tagging. Behavior research methods, 40(2):540–545.

Brown, J. (1952). The Flesch Formula ’Through the Looking Glass’. College English, 13(7):393–394.

Brown, J., Frishkoff, G., and Eskenazi, M. (2005). Automatic question generation for vocabulary assessment. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 819–826, Vancouver, Canada.

Brown, M. and Forsythe, A. (1974). Robust tests for the equality of variances. Journal of the American Statistical Association, 69(346):364–367.

Carver, R. (1974). Improving Reading Comprehension: Measuring Readability. Technical report, Final Report, Contract No. N00014-72-C0240. American Institues for Research in the Behavioral Sciences, Silver Spring, Maryland.

Caylor, J., Sticht, T., Fox, L., and Ford, J. (1973). Methodologies for Determining Reading Requirements of Military Occupational Specialties. Technical report, Projet n°73-5, Human Resources Research Organization, Alexandria, VA.

Chen, C.-Y., Liou, H.-C., and Chang, J. S. (2006). Fast: an automatic generation system for grammar tests. In Proceedings of the COLING/ACL on Interactive presentation sessions, pages 1–4.

Collins-Thompson, K. and Callan, J. (2005). Predicting reading difficulty with statistical language models. Journal of the American Society for Information Science and Technology, 56(13):1448–1462.

Coniam, D. (1997). A preliminary inquiry into using corpus word frequency data in the automatic generation of English language cloze tests. Calico Journal, 14:15–34.

Council of Europe (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Press Syndicate of the University of Cambridge.

Dale, E. and Chall, J. (1948). A formula for predicting readability. Educational research bulletin, 27(1):11–28.

Dale, E. and Tyler, R. (1934). A study of the factors influencing the difficulty of reading materials for adults of limited reading ability. The Library Quarterly, 4:384–412.

Dascalu, M. (2014). Readerbench (2)-individual assessment through reading strategies and textual complexity. In Analyzing Discourse and Text Complexity for Learning and Collaborating, pages 161–188. Springer.

de Landsheere, G. (1978). Le test de closure : mesure de la lisibilité et de la compréhension. Nathan, Paris.

Desmet, P. (2006). L’enseignement/apprentissage des langues à l’ère du numérique: tendances récentes et défis. Revue française de linguistique appliquée, 11(1):119–138.

Feng, L., Elhadad, N., and Huenerfauth, M. (2009). Cognitively motivated features for readability assessment. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 229–237.

Feng, L., Jansche, M., Huenerfauth, M., and Elhadad, N. (2010). A Comparison of Features for Automatic Readability Assessment. In COLING 2010: Poster Volume, pages 276–284.

Ferrand, L. (2007). Psychologie cognitive de la lecture. De Boeck, Bruxelles.

Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3):221–233.

François, T. (2009). Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL. In Proceedings of the 12th Conference of the EACL : Student Research Workshop, pages 19–27.

François, T. (2011). Les apports du traitement automatique du langage à la lisibilité du français langue étrangère. PhD thesis, Université Catholique de Louvain. Thesis Supervisors : Cédrick Fairon and Anne Catherine Simon.

François, T., Brouwers, L., Naets, H., and Fairon, C. (2014). AMesure: une formule de lisibilité pour les textes administratifs. In Actes de la 21e Conférence sur le Traitement automatique des Langues Naturelles (TALN 2014).

François, T. and Fairon, C. (2012). An ”AI readability” formula for French as a foreign language. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP 2012), pages 466–477.

Gunning, R. (1952). The technique of clear writing. McGraw-Hill, New York. Heilman, M. (2011). Automatic factual question generation from text. PhD thesis, Carnegie Mellon University.

Heilman, M., Collins-Thompson, K., Callan, J., and Eskenazi, M. (2007). Combining lexical and grammatical features to improve readability measures for first and second language texts. In Proceedings of NAACL HLT, pages 460–467.

Hosmer, D. and Lemeshow, S. (1989). Applied Logistic Regression. Wiley, New York. Howell, D. (2008). Méthodes statistiques en sciences humaines, 6ème édition. De Boeck, Bruxelles.

Jongsma, E. (1969). The cloze procedure: a survey of the research. Technical report, Indiana University, Bloomington. School of Education.

Just, M. and Carpenter, P. (1980). A theory of reading: From eye fixations to comprehension. Psychological review, 87(4):329–354.

Kincaid, J., Fishburne, R., Rodgers, R., and Chissom, B. (1975). Derivation of new readability formulas for navy enlisted personnel. Technical report, n°8-75, Research Branch Report.

Kintsch, W., Kozminsky, E., Streby, W., McKoon, G., and Keenan, J. (1975). Comprehension and recall of text as a function of content variables1. Journal of Verbal Learning and Verbal Behavior, 14(2):196–214.

Klenner, M. and Visser, H. (2003). What exactly is wrong and why? tutorial dialogue for intelligent call systems. Linguistik online, 17(5/03):57–80.

Lee, H., Gambette, P., Maillé, E., and Thuillier, C. (2010). Densidées: calcul automatique de la densité des idées dans un corpus oral. In Actes de la douxième Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des langues (RECITAL).

Lewis-Beck, M. (1993). Experimental Design and Methods, volume 3 of International Handbooks of Quantitative Applications in the Social Sciences. Sage Publications, Singapore.

Little, D. (2006). The common european framework of reference for languages: Content, purpose, origin, reception and impact. Language Teaching, 39(3):167–190.

Lively, B. and Pressey, S. (1923). A method for measuring the “vocabulary burden” of textbooks. Educational Administration and Supervision, 9:389–398.

Lorge, I. (1944). Predicting readability. the Teachers College Record, 45(6):404–419.

Meurers, D., Ziai, R., Amaral, L., Boyd, A., Dimitrov, A., Metcalf, V., and Ott, N. (2010). Enhancing authentic web pages for language learners. In Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, pages 10–18.

Association for Computational Linguistics. Miller, G. and Coleman, E. (1967). A set of thirty-six prose passages calibrated for complexity. Journal of Verbal Learning and Verbal Behavior, 6(6):851–854.

Nerbonne, J. (2003). Computer-assisted language learning and natural language processing. In Mitkov, R., editor, Handbook of computational linguistics. Oxford University Press.

Ojemann, R. (1934). The reading ability of parents and factors associated with the reading difficulty of parent education materials. University of Iowa Studies in Child Welfare, 8:11–32.

Oller, J. (1972). Assessing competence in ESL: reading. TESOL Quarterly, 6(4):313–323.

Pilán, I., Volodina, E., and Johansson, R. (2013). Automatic selection of suitable sentences for language learning exercises. In 20 Years of EUROCALL: Learning from the Past, Looking to the Future: 2013 EUROCALL Conference Proceedings, pages 218–225.

Pilán, I., Volodina, E., and Johansson, R. (2014). Rule-based and machine learning approaches for second language sentence-level readability. In Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, pages 174–184.

Richaudeau, F. (1974). 6 phrases, 200 sujets, 42 lapsus, 1 rêve. Communication et langages, 23(1):5–24.

Schwarm, S. and Ostendorf, M. (2005). Reading level assessment using support vector machines and statistical language models. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 523–530.

Selva, T. (2002). Génération automatique d’exercices contextuels de vocabulaire. In Actes de TALN 2002, pages 185–194.

Shanahan, T., Kamil, M., and Tobin, A. (1982). Cloze as a measure of intersentential comprehension. Reading Research Quarterly, 17(2):229–255.

Shapiro, S. and Wilk, M. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3-4):591–611.

Si, L. and Callan, J. (2001). A statistical model for scientific readability. In Proceedings of the Tenth International Conference on Information and Knowledge Management, pages 574–576. ACM New York, NY, USA.

Singer, H. (1975). The seer technique: A non-computational procedure for quickly estimating readability level. Journal of Literacy Research, 7(3):255–267.

Smith, S., Kilgarriff, A., Sommers, S., Wen-liang, G., and Guang-Zhong, W. (2009). Automatic cloze generation for english proficiency testing. In Proceedings of LTTC conference.

Snowdon, D., Kemper, S., Mortimer, J., Greiner, L., Wekstein, D., and Markesbery, W. (1996). Linguistic ability in early life and cognitive function and Alzheimer’s disease in late life. Journal of the American Medical Association, 275(7):528–532.

Spache, G. (1953). A new readability formula for primary-grade reading materials. The Elementary School Journal, 53(7):410–413.

Tanaka-Ishii, K., Tezuka, S., and Terada, H. (2010). Sorting texts by readability. Computational Linguistics, 36(2):203–227.

Taylor, W. (1953). Cloze procedure: A new tool for measuring readability. Journalism quarterly, 30(4):415–433.

Taylor, W. (1957). "Cloze" readability scores as indices of individual differences in comprehension and aptitude. Journal of Applied Psychology, 41(1):19–26.

Citeringar i Crossref