Konferensartikel

ParaViz, an online visualization tool for studying variation in meaning based on parallel texts

Ruprecht von Waldenfels
Dept. of Slavic languages and literatures, University of California, Berkeley, USA

Michal Wozniak
Institute of Polish, Polish Academy of Sciences, Cracow, Poland

Ladda ner artikel

Ingår i: Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, Proceedings of the Workshop, July 11, 2016, Krakow, Poland

Linköping Electronic Conference Proceedings 126:8, s. 42--48

Visa mer +

Publicerad: 2016-07-08

ISBN: 978-91-7685-733-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

ParaViz is a modular corpus query and analysis tool for use with a word aligned, linguistically annotated multilingual corpus of parallel translated texts. Representing an addition to classic query-based corpus tools, ParaViz makes it easy to assess differences in the meanings of cognate or otherwise comparable items in different languages based on their distribution in parallel texts. Translations are thus essentially used as semantic annotations, allowing for a bottom-up analysis of semantics in a network of texts in many languages. The tool takes as input a user-supplied operationalization of the variables under comparison. It then provides the user with two perspectives on the distribution of these variables in the parallel corpus: on the one hand, a close-up perspective of word-aligned corpus examples, color-coded in respect to the user-provided parameters; on the other hand, a bird’s view perspective with visualizations that provide overviews of the aggregated differences in use. Data sets with the categorized data is made available for download so it can be further analyzed. Initially developed as an offline version with a specific research topic in mind, the tool has been adapted as an online tool and will be available for use with the ParaSol corpus (Waldenfels 2011). We feel the publication of such tools in a format that makes it accessible for the research community at large is an important part of addressing the issues of research result replication and sustainability of research efforts in digital humanities in general.

Nyckelord

Inga nyckelord är tillgängliga

Referenser

David Bryant and Vincent Moulton. 2004. Neighbor-net: An agglomerative method for the construction of phylogenetic networks. Molecular Biology and Evolution, 21(2):255–265.

Michael Cysouw and Bernhard W¨alchli, editors. 2007. Parallel Texts: Using translational equivalents in linguistic typology. Special Issue of STUF 60/2.

Östen Dahl. 2014. The perfect map: Investigating the cross-linguistic distribution of tame categories in a parallel corpus. In Benedikt Szmrecsanyi and Bernhard Wälchli, editors, Aggregating Dialectology and Typology: Linguistic Variation in Text and Speech, within and across Languages, pages 268–289. De Gruyter Mouton, Berlin, New York.

Martin Haspelmath. 2003. The geometry of grammatical meaning: semantic maps and cross-linguistic comparison. In M. Tomasello, editor, The new psychology of language: Cognitive and functional approaches to language structure. Vol. 2, pages 211–42. Laurence Erlbaum Associates, Mahwah, NJ.

Magnus Sahlgren. 2008. The distributional hypothesis. Rivista di Linguistica, 20(1):33–53.

Jörg Tiedemann. 2003. Recycling Translations – Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing. Ph.D. thesis, Uppsala University, Uppsala, Sweden.

Anna Sågvall Hein, Åke Viberg (eds): Studia Linguistica Upsaliensia.

Ruprecht von Waldenfels. 2011. Recent developments in parasol: Breadth for depth and xslt based web concordancing with cwb. In Daniela Majchráková and Radovan Garabík, editors, Natural Language Processing, Multilinguality. Proceedings of Slovko 2011, Modra, Slovakia, 20–21 October 2011, pages 156–162, Bratislava. Tribun EU.

Ruprecht von Waldenfels. 2014. Explorations into variation across slavic: taking a bottom-up approach. In Benedikt Szmrecsanyi and Bernhard W¨alchli, editors, Aggregating Dialectology and Typology: Linguistic Variation in Text and Speech, within and across Languages, pages 290–323. De Gruyter Mouton, Berlin, New York.

Ruprecht vonWaldenfels. 2015a. Inner-slavic contact from a corpus driven perspective. In Emmerich Kelih, Stefan Michael Newerkla, and Jürgen Fuchsbauer, editors, Lehnwörter im Slawischen: Empirische und crosslinguistische Perspektiven, pages 237–263. Peter Lang, Frankfurt.

Ruprecht von Waldenfels. 2015b. The paraviz tool: Exploring cross-linguistic differences in functional domains based on a parallel corpus. In Gintare Grigonyte, Simon Clematide, Andrius Utka, and Martin Volk, editors, Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools at NODALIDA 2015, May 11–13, 2015, Vilnius, Lithuania.

Citeringar i Crossref