Conference article

ParaViz: A vizualization tool for crosslinguistic functional comparisons based on a parallel corpus

Ruprecht von Waldenfels
Institute of Polish, Polish Academy of Sciences, Cracow, Poland

Download article

Published in: Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools at NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 111:5, p. 32-36

NEALT Proceedings Series 25:5, p. 32-36

Show more +

Published: 2015-05-07

ISBN: 978-91-7519-035-8

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

ParaViz is a modular query and analysis tool in development for use with ParaSol, a word aligned, linguistically annotated multilingual corpora managed with OpenCorpusWorkbench (CWB). Besides a query interface to the corpus that allows complex queries through an intuitive GUI component as well as directly in the CQP query language, the tool is planned to include a visualization tool that gives access to cross-linguistic functional variation based on user-defined parameter files used to classify word-aligned tokens in the corpus. The user is provided with two perspectives on the data: a qualitative perspective that offers detailed insight into the data, and a bird’s view perspective that offers word lists and visualizations of the aggregated data set.

Keywords

parallel corpora; visualization; synchronic comparison

References

František Čermák and Alexandr Rosen. 2012. The case of InterCorp, a multilingual parallel corpus. International Journal of Corpus Linguistics, 13(3):411–427.

Michael Cysouw and Bernhard Wälchli, editors. 2007. Parallel Texts: Using translational equivalents in linguistic typology. Special Issue of STUF 60/2.

Östen Dahl. 2014. The perfect map: Investigating the cross-linguistic distribution of tame categories in a parallel corpus. In Benedikt Szmrecsanyi and Bernhard Wälchli, editors, Aggregating Dialectology and Typology: Linguistic Variation in Text and Speech, within and across Languages, pages 268–289. De Gruyter Mouton, Berlin, New York.

Stefan Evert and Andrew Hardie. 2011. Twenty-first century corpus workbench: Updating a query architecture for the new millennium. In Proceedings of the Corpus Linguistics 2011 Conference, Birmingham, UK. University of Birmingham.

Daniel H. Huson and David Bryant. 2006. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol., 23:254–267.

Roland Meyer, Ruprecht von Waldenfels, and Andreas Zeman. 2006-2014. Paravoz - a simple web interface for querying parallel corpora. https://bitbucket.org/rvwfels/paravoz.

Pavel Rychlý. 2007. Manatee/bonito - a modular corpus manager. In 1st Workshop on Recent Advances in Slavonic Natural Language Processing, pages 65–70, Brno. Masaryk University.

Jörg Tiedemann. 2003. Recycling Translations – Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing. Ph.D. thesis, Uppsala University, Uppsala, Sweden. Anna Sågvall Hein, Åke Viberg (eds): Studia Linguistica Upsaliensia.

Jörg Tiedemann. 2012. Parallel data, tools and interfaces in opus. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey. European Language Resources Association (ELRA).

Martin Volk, Johannes Graën, and Elena Callegaro. 2014. Innovations in parallel corpus search tools. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland. European Language Resources Association (ELRA).

Ruprecht von Waldenfels. 2014. Explorations into variation across Slavic: taking a bottom-up approach. In Benedikt Szmrecsanyi and Bernhard Wälchli, editors, Aggregating Dialectology and Typology: Linguistic Variation in Text and Speech, within and across Languages, pages 290–323. De Gruyter Mouton, Berlin, New York.

Citations in Crossref