Conference article

NLP Corpus Observatory – Looking for Constellations in Parallel Corpora to Improve Learners’ Collocational Skills

Gerold Schneider
English Department & Institute of Computational Linguistics, University of Zurich, Switzerland

Johannes Graën
Institute of Computational Linguistics, University of Zurich, Switzerland

Download article

Published in: Proceedings of the 7th Workshop on NLP for Computer Assisted Language Learning (NLP4CALL 2018) at SLTC, Stockholm, 7th November 2018

Linköping Electronic Conference Proceedings 152:8, p. 69-78

NEALT Proceedings Series 36:8, p. 69-78

Show more +

Published: 2018-11-02

ISBN: 978-91-7685-173-9

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

The use of corpora in language learning, both in classroom and self-study situations, has proven useful. Investigations into technology use show a benefit for learners that are able to work with corpus data using easily accessible technology. But relatively little work has been done on exploring the possibilities of parallel corpora for language learning applications.

Our work described in this paper explores the applicability of a parallel corpus enhanced with several layers generated by NLP techniques for extracting collocations that are non-compositional and thus indispensable to learn. We identify constellations, i.e. combinations of intra- and interlingual relations, calculate association scores on each relation and, based thereon, a joint score for each constellation.

This way, we are able to find relevant collocations for different types of constellations. We evaluate our approach and discuss scenarios in which language learners can playfully explore collocations. Our web tools are freely accessible, generate collocation dictionaries on the fly, and link them to example sentences to ensure context embedding.

Keywords

data-driven learning, collocation measures, parallel corpora, alignment association, collocation dictionaries, interactive exploration, adjective-noun structures, verb-object structures, verb-preposition structures

References

Ackermann, K. and Y. H. Chen (2013). “Developing the Academic Collocation List (ACL): A corpus- driven and expert-judged approach.” In: Journal of English for Academic Purposes I2.4, pp. 235–247.

Aston, G. and L. Burnard (1998). The BNC Handbook. Exploring the British National Corpus
with SARA. Edinburgh: Edinburgh University Press.

Buyse, K. and S. Verlinde (2013). “Possible effects of free on line data driven lexicographic instruments on foreign language learning: The case of Linguee and the interactive language toolbox”. In: Procedia-Social and Behavioral Sciences 95, pp. 507–512.

Chujo, K., Y. Kobayashi, A. Mizumoto, and K. Oghigian (2016). “Exploring the Effectiveness of Combined Web-based Corpus Tools for Beginner”. In: Linguistics and Literature Studies 4.4, pp. 262–274.

Church, K. W. and P. Hanks (1990). “Word Association Norms, Mutual Information, and Lexicography”. In: Computational Linguistics 16.1, pp. 22–29.

Clematide, S., J. Graën, and M. Volk (2016). “Multilingwis – A Multilingual Search Tool for Multi-Word Units in Multiparallel Corpora”. In: Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives – Fraseologia computacional y basada en corpus: perspectivas monolingües y multilingües. Ed. by G. C. Pastor. Geneva: Tradulex, pp. 447–455.

Durrant, P. (2009). “Investigating the viability of a collocation list for students of English for academic purposes”. In: English for Specific Purposes 28.3, pp. 157–169.

Dyer, C., V. Chahuneau, and N. A. Smith (2013). “A Simple, Fast, and Effective Reparameterization of IBM Model 2”. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACLHLT). Association for Computational Linguistics (ACL), pp. 644–649.

Evert, S. (2004). “The Statistics of Word Cooccurrences: Word Pairs and Collocations”. PhD thesis. University of Stuttgart. – (2008). “Corpora and collocations”. In: Corpus Linguistics. An International Handbook. Ed. By A. Lüdeling and M. Kytö. Vol. 2. Berlin: Walter de Gruyter, pp. 1212–1248.

Gardner, D. and M. Davies (2007). “Pointing out frequent phrasal verbs: A corpus-based analysis”. In: TESOL Quarterly: A Journal for Teachers of English to Speakers of Other Languages and of Standard English as a Second Dialect 41.2, pp. 339–359.

Gilquin, G. and S. Granger (2011). “From EFL to ESL: Evidence from the International Corpus of Learner English”. In: Exploring Second-Language Varieties of English and Learner Englishes: Bridging a Paradigm Gap. Ed. by J. Mukherjee and M. Hundt. Amsterdam: John
Benjamins, pp. 55–78.

Graën, J. (n.d.). “Identifying Phrasemes via Interlingual Association Measures”. In: Lexemkombinationen und typisierte Rede im mehrsprachigen Kontext. Ed. by C. Konecny, E. Autelli, A. Abel, and L. Zanasi. Tübingen: Stauffenburg Linguistik. In press.

– (2018). “Exploiting Alignment in Multiparallel Corpora for Applications in Linguistics and Language Learning”. PhD thesis. University of Zurich. In press.

Graën, J., D. Batinic, and M. Volk (Oct. 2014). “Cleaning the Europarl Corpus for Linguistic Applications”. In: Proceedings of the Conference on Natural Language Processing (KONVENS) (Hildesheim). Stiftung Universität Hildesheim, pp. 222–227.

Graën, J. and C. Bless (2017). “Exploring Properties of Intralingual and Interlingual Association Measures Visually”. In: Proceedings of the 21st Nordic Conference of Computational Linguistics (NODALIDA). Linköping Electronic Conference Proceedings 131. Linköping University Electronic Press, Linköpings universitet, pp. 314–317.

Graën, J. and S. Clematide (2015). “Challenges in the Alignment, Management and Exploitation of Large and Richly Annotated Multi-Parallel Corpora”. In: Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC) (Lancaster). Ed. by P. Banski, H. Biber, et al., pp. 15–20.

Graën, J., D. Sandoz, and M. Volk (2017). “Multilingwis2 – Explore Your Parallel Corpus”. In: Proceedings of the 21st Nordic Conference of Computational Linguistics (NODALIDA). Linköping Electronic Conference Proceedings 131. Linköping University Electronic Press, Linköpings universitet, pp. 247–250.

Graën, J. and G. Schneider (2017). “Crossing the Border Twice: Reimporting Prepositions to Alleviate L1-Specific Transfer Errors”. In: Proceedings of the Joint 6th Workshop on NLP for Computer Assisted Language Learning & 2nd Workshop on NLP for Research on Language Acquisition. Linköping Electronic Conference Proceedings 134. Linköpings universitet Electronic Press, pp. 18–26.

Granger, S., E. Dagneaux, F. Meunier, and M. Paquot (2009). International Corpus of Learner English v2 (Handbook + CD-Rom). Presses universitaires de Louvain. Louvain-la-Neuve.

Hadley, G. and M. Charles (2017). “Enhancing extensive reading with data-driven learning”. In: Language Learning & Technology 21.3, pp. 131–152.

Huang, P.-Y., C.-M. Chen, N.-L. Tsao, and D. Wible (2013). “A Corpus-Based Tool for Exploring Domain-Specific Collocations in English”. In: 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC), pp. 542–549.

Källkvist, M. (1998). “Lexical infelicity in English: the case of nouns and verbs”. English. In: Perspectives on Lexical Acquisition in a Second Language. Ed. by K. Haastrup and Å. Viberg. Lund University Press.

Koehn, P. (2005). “Europarl: A parallel corpus for statistical machine translation”. In: Machine Translation Summit (Phuket). Vol. 5. Asia-Pacific Association for Machine Translation, pp. 79–86.

Li, S. (2017). “Using corpora to develop learners’ collocational competence”. In: Language Learning & Technology 21.3, pp. 153–171.

Liang, P., B. Taskar, and D. Klein (2006). “Alignment by Agreement”. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACLHLT). Association for Computational Linguistics (ACL), pp. 104–111.

McGee, I. (2012). “Collocation dictionaries as inductive learning resources in data-driven learning: An analysis and evaluation”. In: International Journal of Lexicography 25.3, pp. 319–361.

Mel’cuk, I. (1998). “Collocations and Lexical Functions”. In: Phraseology. Theory, Analysis, and Applications. Ed. by A. P. Cowie, pp. 23–53.

Namvar, F. (Jan. 2012). “The relationship between language proficiency and use of collocation by Iranian EFL students”. In: The Southeast Asian Journal of English Language Studies 18 (3), pp. 41–52.

Och, F. J. and H. Ney (2003). “A Systematic Comparison of Various Statistical Alignment Models”. In: Computational Linguistics 29.1, pp. 19–51.

Östling, R. and J. Tiedemann (2016). “Efficient word alignment with Markov Chain Monte Carlo”. In: Prague Bulletin of Mathematical Linguistics 106, pp. 125–146.

Pecina, P. (2009). Lexical Association Measures: Collocation Extraction. Vol. 4. Studies in Computational and Theoretical Linguistics. Praha, Czech Republic: Institute of Formal and Applied Linguistics, Charles University in Prague.

Ronan, P. and G. Schneider (2015). “Determining Light Verb Constructions in Contemporary British and Irish English”. In: International Journal of Corpus Linguistics 20.3, pp. 326–354.

St. John, E. (2001). “A case for using a parallel corpus and concordancer for beginners of a foreign language”. In: Language Learning & Technology 5.3, pp. 185–203.

Volk, M., J. Graën, and E. Callegaro (2014). “Innovations in Parallel Corpus Search Tools”. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC) (Reykjavik). Ed. by N. Calzolari et al. European Language Resources Association (ELRA),
pp. 3172–3178.

Vyatkina, N. (2016). “Data-driven learning for beginners: The case of German verb-preposition collocations Data-driven learning for beginners: The case of German verb-preposition collocations”. In: ReCALL 28.2, pp. 207–226.

Vyatkina, N. and A. Boulton (2017). “Corpora in language learning and teaching: Commentary”. In: Language Learning & Technology 21.3, pp. 1–8.

Wanner, L. (1996). Lexical Functions in Lexicography and Natural Language Processing. Vol. 31. John Benjamins Publishing.

Citations in Crossref