Improving Collocation Correction by Ranking Suggestions Using Linguistic Knowledge

Roberto Carlini
Natural Language Processing Group, Department of Information and Communication Technologies Pompeu Fabra University, Barcelona, Spain

Joan Codina-Filba
Natural Language Processing Group, Department of Information and Communication Technologies Pompeu Fabra University, Barcelona, Spain

Leo Wanner
Catalan Institute for Research and Advanced Studies (ICREA)/Natural Language Processing Group, Department of Information and Communication Technologies Pompeu Fabra University, Barcelona, Spain

Ladda ner artikel

Ingår i: Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University

Linköping Electronic Conference Proceedings 107:1, s. 1–12

NEALT Proceedings Series 22:1, s. 1–12

Visa mer +

Publicerad: 2014-11-11

ISBN: 978-91-7519-175-1

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


The importance of collocations in the context of second language learning is generally acknowledged. Studies show that the “collocation density" in learner corpora is nearly the same as in native corpora, i.e., that use of collocations by learners is as common as it is by native speakers, while the collocation error rate in learner corpora is about ten times as high as in native reference corpora. Therefore, CALL could be of great aid to support the learners for better mastering of collocations. However, surprisingly few works address specifically research on CALL-oriented collocation learning assistants that detect miscollocations in the writings of the learners and propose suggestions for their correction or that offer the learner the possibility to verify a word co-occurrence with respect to its correctness as collocation and obtain suggestions for its correction in case it is determined to be a miscollocation. This disregard is likely to be, on the one hand, due to the focus of the CALL research so far on grammatical matters, and, on the other hand, due to the complexity of the problem. In order to be able to provide an adequate correction of a miscollocation, the collocation learning assistant must “guess" the meaning that the learner intended to express. This makes it very different from grammar or spell checkers, which can draw on grammatical respectively orthographic regularities of a language. In this paper, we focus on the problem of the provision of a ranked list of correction suggestions in a context in which the learner submits a collocation for verification and obtains a list of correction suggestions in the case of a miscollocation. We show that the retrieval of the suggestions and their ranking benefits greatly from NLP techniques that provide the syntactic dependency structure and subcategorization information of the word co-occurrences and a weighted Pointwise Mutual Information (PMI) that reflects the fact that in a collocation, it is the base that is subject of the free choice of the speaker, while the occurrence of the collocate is restricted by the base, i.e., that collocations are per se asymmetric.


CALL; collocations; miscollocation correction; syntactic dependencies; sub-categorization; PMI


Alonso Ramos, M., Wanner, L., Vincze, O., Casamayor, G., Vázquez, N., Mosqueira, E., and Prieto, S. (2010). Towards a motivated annotation schema of collocation errors in learner corpora. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), pages 3209–3214, La Valetta, Malta.

Benson, M. (1989). The structure of the collocational dictionary. International Journal of Lexicography, 2(1):1–13.

Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In Chiarcos, C., Eckart de Castilho, R., and Stede, M., editors, Von der Form zur Bedeutung: Texte automatisch verarbeiten / From Form to Meaning: Processing Texts Automatically. Proceedings of the Biennial GSCL Conference, pages 31–40. Gunter Narr Verlag, Tübingen.

Bouma, G. (2010). Collocation extraction beyond the independence assumption. In Proceedings of the ACL 2010, Short paper track, Uppsala.

Chang, Y., Chang, J., Chen, H., and Liou, H. (2008). An Automatic Collocation Writing Assistant for Taiwanese EFL learners. A case of Corpus Based NLP Technology. Computer Assisted Language Learning, 21(3):283–299.

Choueka, Y. (1988). Looking for needles in a haystack or locating interesting collocational expressions in large textual databases. In In Proceedings of the RIAO, pages 34–38.

Church, K. and Hanks, P. (1989). Word Association Norms, Mutual Information, and Lexicography. In Proceedings of the 27th Annual Meeting of the ACL, pages 76–83.

Cowie, A. (1994). Phraseology. In Asher, R. and Simpson, J., editors, The Encyclopedia of Language and Linguistics, Vol. 6, pages 3168–3171. Pergamon, Oxford.

Dahlmeier, D. and Ng, H. (2011). Correcting semantic collocation errors with L1-induced paraphrases. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 107–117, Edinburgh, Scotland.

Evert, S. (2007). Corpora and collocations. In Lüdeling, A. and Kytö, M., editors, Corpus Linguistics. An International Handbook. Mouton de Gruyter, Berlin.

Ferraro, G., Nazar, R., Ramos, M. A., and Wanner, L. (2014). Towards advanced collocation error correction in Spanish learner corpora. Language Resources and Evaluation, 48(1):45–64.

Firth, J. (1957). Modes of meaning. In Firth, J., editor, Papers in Linguistics, 1934-1951, pages 190–215. Oxford University Press, Oxford.

Futagi, Y., Deane, P., Chodorow, M., and Tetreault, J. (2008). A computational approach to detecting collocation errors in the writing of non-native speakers of English. Computer Assisted Language Learning, 21(1):353–367.

Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and Formulae. In Cowie, A., editor, Phraseology: Theory, Analysis and Applications, pages 145–160. Oxford University Press, Oxford.

Halliday, M. (1961). Categories of the theory of grammar. Word, 17:241–292.

Hausmann, F.-J. (1984). Wortschatzlernen ist kollokationslernen. zum lehren und lernen französischer wortwendungen. Praxis des neusprachlichen Unterrichts, 31(1):395–406.

Lesniewska, J. (2006). Collocations and second language use. Studia Lingüística Universitatis lagellonicae Cracoviensis, 123:95–105.

Lewis, M. (2000). Teaching Collocation. Further Developments in the Lexical Approach. LTP, London.

Liu, A. L.-E., Wible, D., and Tsao, N.-L. (2009). Automated suggestions for miscollocations. In Proceedings of the NAACL HLT Workshop on Innovative Use of NLP for Building Educational Applications, pages 47–50, Boulder, CO.

Lozano, C. (2009). CEDEL2: Corpus escrito del español L2. In Bretones Callejas, C., editor, Applied Linguistics Now: Understanding Language and Mind, pages 197–212. Universidad de Almería, Almería.

Mel’?cuk, I. (1995). Phrasemes in Language and Phraseology in Linguistics. In Everaert, M., van der Linden, E.-J., Schenk, A., and Schreuder, R., editors, Idioms: Structural and Psychological Perspectives, pages 167–232. Lawrence Erlbaum Associates, Hillsdale.

Nesselhauf, N. (2004). How learner corpus analysis can contribute to language teaching: A study of support verb constructions. In Aston, G., Bernardini, S., and Stewart, D., editors, Corpora and language learners, pages 109–124. Benjamins Academic Publishers, Amsterdam.

Nesselhauf, N. (2005). Collocations in a Learner Corpus. Benjamins Academic Publishers, Amsterdam.

Orol, A. and Alonso Ramos, M. (2013). A Comparative Study of Collocations in a Native Corpus and a Learner Corpus of Spanish. Procedia–Social and Behavioural Sciences, 96:563–570.

Pecina, P. (2008). A machine learning approach to multiword expression extraction. In Proceedings of the LREC 2008 Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pages 54–57, Marrakech.

Wanner, L., Verlinde, S., and Alonso Ramos, M. (2013). Writing assistants and automatic lexical error correction: word combinatorics. In Kosem, I., Kallas, J., Gantar, P., Krek, S., Langemets, M., and Tuulik, M., editors, Electronic lexicography in the 21st century: Thinking outside the paper. Proceedings of the eLex 2013 conference, pages 472–487, Tallinn & Ljubljana. Trojina, Institute for Applied Slovene Studies & Eesti Keele Instituut.

Wu, J.-C., Chang, Y.-C., Mitamura, T., and Chang, J. (2010). Automatic collocation suggestion in academic writing. In Proceedings of the ACL Conference, Short paper track, Uppsala.

Citeringar i Crossref