Multilingwis2 – Explore Your Parallel Corpus

Johannes Graën
Institute of Computational Linguistics, University of Zurich, Switzerland

Dominique Sandoz
Institute of Computational Linguistics, University of Zurich, Switzerland

Martin Volk
Institute of Computational Linguistics, University of Zurich, Switzerland

Ladda ner artikel

Ingår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 131:31, s. 247-250

NEALT Proceedings Series 29:31, s. 247-250

Visa mer +

Publicerad: 2017-05-08

ISBN: 978-91-7685-601-7

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


We present Multilingwis2, a web based search engine for exploration of wordaligned parallel and multiparallel corpora. Our application extends the search facilities by Clematide et al. (2016) and is designed to be easily employable on any parallel corpus comprising universal part-ofspeech tags, lemmas and word alignments. In addition to corpus exploration, it has proven useful for the assessment of word alignment quality. Loading the results of different alignment methods on the same corpus as different corpora into Multilingwis2 alleviates their comparison.


Inga nyckelord är tillgängliga


Bartunov, Oleg and Teodor Sigaev (2016). “FTS is DEAD ? – Long live FTS !” https://www.slideshare.net/ArthurZakirov1/better-full-text-search-in-postgresql. Accessed March 12th, 2017.

Clematide, Simon, Johannes Graën, and Martin Volk (2016). “Multilingwis – A Multilingual
Search Tool for Multi-Word Units in Multiparallel Corpora”. In: Computerised and Corpusbased Approaches to Phraseology: Monolingual and Multilingual Perspectives – Fraseologia computacional y basada en corpus: perspectivas monolingües y multilingües. Ed. by Gloria Corpas Pastor. Geneva: Tradulex, pp. 447–455.

Dyer, Chris, Victor Chahuneau, and Noah A. Smith (2013). “A Simple, Fast, and Effective
Reparameterization of IBM Model 2”. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 644–649.

Göhring, Anne and Martin Volk (2011). “The Text+Berg Corpus An Alpine French-German Parallel Resource”. In: Traitement Automatique des Langues Naturelles, p. 63.

Graën, Johannes, Dolores Batinic, and Martin Volk (2014). “Cleaning the Europarl Corpus for Linguistic Applications”. In: Proceedings of the Conference on Natural Language Processing. (Hildesheim). Stiftung Universität Hildesheim, pp. 222–227.

Graën, Johannes, Simon Clematide, and Martin Volk (2016). “Efficient Exploration of Translation Variants in Large Multiparallel Corpora Using a Relational Database”. In: 4th Workshop on Challenges in the Management of Large Corpora Workshop Programme. Ed. by Piotr Banski, Marc Kupietz, Harald Lüngen, et al., pp. 20–23.

Koehn, Philipp (2005). “Europarl: A parallel corpus for statistical machine translation”. In: Machine Translation Summit. (Phuket). Vol. 5, pp. 79–86.

Liang, Percy, Ben Taskar, and Dan Klein (2006). “Alignment by Agreement”. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 104–111.

Och, Franz Josef and Hermann Ney (2003). “A Systematic Comparison of Various Statistical
Alignment Models”. In: Computational linguistics 29.1, pp. 19–51.

Petrov, Slav, Dipanjan Das, and Ryan McDonald (2012). “A Universal Part-of-Speech Tagset”. In: Proceedings of the 8th International Conference on Language Resources and Evaluation. Ed. by Nicoletta Calzolari et al. Istanbul: European Language Resources Association (ELRA).

PostgreSQL Global Development Group (2017). PostgreSQL 9.6 Documentation – Chapter 12. Full Text Search. https://www.postgresql.org/docs/9.6/static/textsearch.html. Accessed March 12th, 2017.

Schmid, Helmut (1994). “Probabilistic part-ofspeech tagging using decision trees”. In: Proceedings of International Conference on New Methods in Natural Language Processing.
(Manchester). Vol. 12, pp. 44–49.

Tiedemann, Jörg (2011). Bitext Alignment. Vol. 4. Synthesis Lectures on Human Language Technologies 2. Morgan & Claypool.

Varga, Dániel, László Németh, Péter Halácsy, András Kornai, Viktor Trón, and Viktor Nagy (2005). “Parallel corpora for medium density languages”. In: Proceedings of the Recent Advances in Natural Language Processing. (Borovets), pp. 590–596.

Volk, Martin, Chantal Amrhein, Noëmi Aepli, Mathias Müller, and Phillip Ströbel (2016).
“Building a Parallel Corpus on the World’s Oldest Banking Magazine”. In: Proceedings of the Conference on Natural Language Processing. (Bochum).

Citeringar i Crossref