
Exploring Properties of Intralingual and Interlingual Association Measures Visually

Johannes Graën
Institute of Computational Linguistics, University of Zurich, Switzerland

Christof Bless
Institute of Computational Linguistics, University of Zurich, Switzerland

Ladda ner artikel

Ingår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 131:45, s. 314-317

NEALT Proceedings Series 29:45, p. 314-317

Visa mer +

Publicerad: 2017-05-08

ISBN: 978-91-7685-601-7

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


We present an interactive interface to explore the properties of intralingual and interlingual association measures. In conjunction, they can be employed for phraseme identification in word-aligned parallel corpora. The customizable component we built to visualize individual results is capable of showing part-of-speech tags, syntactic dependency relations and word alignments next to the tokens of two corresponding sentences.


Inga nyckelord är tillgängliga


Dyer, Chris, Victor Chahuneau, and Noah A. Smith (2013). “A Simple, Fast, and Effective Reparameterization of IBM Model 2”. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 644–649.

Evert, Stefan (2004). “The Statistics of Word Cooccurrences: Word Pairs and Collocations”. PhD thesis. Universität Stuttgart.

– (2008). “Corpora and collocations”. In: Corpus linguistics: An international handbook 2. Ed. By A. Lüdeling and M. Kytö, pp. 1212–1248.

Graën, Johannes (2017). “Identifying Phrasemes via Interlingual Association Measures”. In: Lexemkombinationen und typisierte Rede im mehrsprachigen Kontext. Ed. by Christine Konecny et al. Tübingen: Stauffenburg Linguistik.

Graën, Johannes, Dolores Batinic, and Martin Volk (2014). “Cleaning the Europarl Corpus for Linguistic Applications”. In: Proceedings of the Conference on Natural Language Processing. (Hildesheim). Stiftung Universität Hildesheim, pp. 222–227.

Koehn, Philipp (2005). “Europarl: A parallel corpus for statistical machine translation”. In: Machine Translation Summit. (Phuket). Vol. 5, pp. 79–86.

Liang, Percy, Ben Taskar, and Dan Klein (2006). “Alignment by Agreement”. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 104–111.

Marneffe, Marie-Catherine de, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D. Manning (2014). “Universal Stanford Dependencies: A cross-linguistic typology”. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation. Ed. by Nicoletta Calzolari et al. Vol. 14. European Language Resources Association (ELRA), pp. 4585–4592.

McDonald, Ryan T., Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith B. Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, and Jungmee Lee (2013). “Universal Dependency Annotation for Multilingual Parsing”. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2, pp. 92–97.

Nivre, Joakim, Johan Hall, and Jens Nilsson (2006). “Maltparser: A data-driven parsergenerator for dependency parsing”. In: Proceedings of the 5th International Conference on Language Resources and Evaluation. Vol. 6, pp. 2216–2219.

Och, Franz Josef and Hermann Ney (2003). “A Systematic Comparison of Various Statistical Alignment Models”. In: Computational linguistics 29.1, pp. 19–51.

Petrov, Slav, Dipanjan Das, and Ryan McDonald (2012). “A Universal Part-of-Speech Tagset”. In: Proceedings of the 8th International Conference on Language Resources and Evaluation. Ed. by Nicoletta Calzolari et al. Istanbul: European Language Resources Association (ELRA). Schmid, Helmut (1994). “Probabilistic part-ofspeech tagging using decision trees”. In: Proceedings of International Conference on New Methods in Natural Language Processing. (Manchester). Vol. 12, pp. 44–49.

Tiedemann, Jörg (2011). Bitext Alignment. Vol. 4. Synthesis Lectures on Human Language Technologies 2. Morgan & Claypool.

Varga, Dániel, László Németh, Péter Halácsy, András Kornai, Viktor Trón, and Viktor Nagy (2005). “Parallel corpora for medium density languages”. In: Proceedings of the Recent Advances in Natural Language Processing. (Borovets), pp. 590–596.

Volk, Martin, Chantal Amrhein, Noëmi Aepli, Mathias Müller, and Phillip Ströbel (2016). “Building a Parallel Corpus on the World’s Oldest Banking Magazine”. In: Proceedings of the Conference on Natural Language Processing. (Bochum).

Citeringar i Crossref