Conference article

A modernised version of the Glossa corpus search system

Anders Nøklestad
The Text Laboratory, ILN, University of Oslo, Oslo, Norway

Kristin Hagen
The Text Laboratory, ILN, University of Oslo, Oslo, Norway

Janne Bondi Johannessen
The Text Laboratory, ILN, University of Oslo, Oslo, Norway / MultiLing, University of Oslo, Norway

Michal Kosek
The Text Laboratory, ILN, University of Oslo, Oslo, Norway

Joel Priestley
The Text Laboratory, ILN, University of Oslo, Oslo, Norway

Download article

Published in: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 131:32, p. 251-254

NEALT Proceedings Series 29:32, p. 251-254

Show more +

Published: 2017-05-08

ISBN: 978-91-7685-601-7

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper presents and describes a modernised version of Glossa, a corpus search and results visualisation system with a user-friendly interface. The system is open source and can be easily installed on servers or even laptops for use with suitably prepared corpora. It handles parallel corpora as well as monolingual written and spoken corpora. For spoken corpora, the search results can be linked to audio/video, and spectrographic analysis and visualised geographical distributions can be provided. We will demonstrate the range of search options and result visualisations that Glossa provides.

Keywords

No keywords available

References

Eckhard Bick. 2004. Corpuseye: Et Brugervenligt Webinterface for Grammatisk Opmærkede Korpora. Peter Widell and Mette Kunøe (eds). Møde om Udforskningen af Dansk Sprog, Proceedings. Denmark: Århus University. 46-57.

Lars Borin, Markus Forsberg and Johan Roxendal. 2012. Korp – the corpus infrastructure of
Språkbanken. Proceedings of LREC 2012. Istanbul: ELRA, pages 474–478.

Sebastian Hoffmann and Evert, Stefan. 2006. Bncweb (cqp-edition): The Marriage of two Corpus Tools. S. Braun, K. Kohn, and J. Mukherjee (eds). Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods, volume 3 of English Corpus Linguistics. Frankfurt am Main: Peter Lang. 177-195.

Janne Bondi Johannessen, Lars Nygaard, Joel Priestley, Anders Nøklestad. 2008. Glossa: a Multilingual, Multimodal, Configurable User Interface. Proceedings of the Sixth International Language Resources and Evaluation (LREC’08). Paris: European Language Resources Association (ELRA).

Paul Meurer. 2012. Corpuscle – a new corpus management platform for annotated corpora. In: Gisle Andersen (ed.). Exploring Newspaper Language: Using the Web to Create and Investigate a large corpus of modern Norwegian, Studies in Corpus Linguistics 49, John Benjamins, 2012.

Web sites

CANS (Corpus of Norwegian-American Speech): http://tekstlab.uio.no/norskiamerika/english/index.html

CLARIN federated content search: https://www.clarin.eu/content/federated-content-search-clarin-fcs

CLARINO: http://clarin.b.uib.no/

Clojure: https://clojure.org/
ELENOR: http://www.hf.uio.no/ilos/studier/ressurser/elenor/index.html

Glossa on GitHub: https://github.com/textlab/cglossa

IMS Open Corpus Workbench: http://cwb.sourceforge.net/

Leksikografisk bokmålskorpus:https://tekstlab.uio.no/glossa2/?corpus=bokmal

MySql: https://www.mysql.com/

Nordic Dialect Corpus: http://www.tekstlab.uio.no/nota/scandiasyn/index.html

NORINT: http://www.hf.uio.no/iln/english/about/organization/text-laboratory/projects/norint/index.html

NoWaC (Norwegian Web as Corpus): http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html

Citations in Crossref