Conference article

Interactive Visualizations of Corpus Data in Sketch Engine

Lucia Kocincová
NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic

Vít Baisa
Lexical Computing, Brighton, United Kingdom / NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic

Miloš Jakubí ček
Lexical Computing, Brighton, United Kingdom / NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic

Vojtěch Ková r
Lexical Computing, Brighton, United Kingdom / NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic

Download article

Published in: Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools at NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 111:3, p. 17-22

NEALT Proceedings Series 25:3, p. 17-22

Show more +

Published: 2015-05-07

ISBN: 978-91-7519-035-8

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

Automatic analysis of large text corpora produces numerous useful figures which help understanding the underlying corpus data. Usually, the results are presented in the form of tables containing raw data to be interpreted by domain experts. This paper describes an ongoing work on new visualizations and user interface enhancements in Sketch Engine corpus management system which aim at easing the interpretation of the data for both novice users and language professionals.

Keywords

interactive corpus visualization; corpus data; sketch engine; word sketches; thesaurus

References

Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D3 Data-Driven Documents. Visualization and Computer Graphics, IEEE Transactions on, 17(12):2301–2309.

Stefan Evert. 2005. The statistics of Word Cooccurrences: Word Pairs and Collocations. Ph.D. thesis, Universit¨at Stuttgart, Holzgartenstr. 16, 70174
Stuttgart.

Miloš Jakubíček, Adam Kilgarriff, Diana McCarthy, and Pavel Rychl´y. 2010. Fast Syntactic Searching in Very Large Corpora for Many Languages. PACLIC 24 Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, pages 741–747.

Miloš Jakubíček, Adam Kilgarriff, Vojtěch Kovář, Pavel Rychlý, and Vít Suchomel. 2013. The TenTen Corpus Family. International Conference on Corpus Linguistics, Lancaster.

Adam Kilgarriff, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and V&iactue;t Suchomel. 2014. The Sketch Engine: Ten Years On. Lexicography, 1:7–36.

Isabel Meirelles. 2013. Design for Information: An Introduction to the Histories, Theories, and Best Practices Behind Effective Information Visualizations.
Rockport publishers.

Jan Pomikálek, Pavel Rychlý, and Miloš Jakubíček. 2012. Building a 70 Billion Word Corpus of English from ClueWeb. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pages 502–506.

Pavel Rychl&yactue;. 2008. A Lexicographer-Friendly Association Score. Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN, pages 6–9.

George Kingsley Zipf. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley Press.

Citations in Crossref