Conference article

From digital library to n-grams: NB N-gram

Magnus Breder Birkenes
The National Library of Norway, Oslo, Norway

Lars G. Johnsen
The National Library of Norway, Oslo, Norway

Arne Martinus Lindstad
The National Library of Norway, Oslo, Norway

Johanne Ostad
The National Library of Norway, Oslo, Norway

Download article

Published in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:39, p. 293-295

NEALT Proceedings Series 23:39, p. 293-295

Show more +

Published: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

At the National Library of Norway, we are currently developing a service comparable to the Google Ngram Viewer (Michel et al., 2010; Lin et al., 2012; Aiden and Michel, 2013) called NB Ngram. It is based on all books and newspapers digitized up to and including 2013, as part of the large scale digitization project at the National Library of Norway. Uni-, bi- and trigams have been generated on the basis of this text corpus containing some 34 billion words. In this paper, we sketch the background of NB N-gram and illustrate some applications of it.

Keywords

No keywords available

References

Erez Aiden and Jean-Baptiste Michel. 2013.Uncharted: Big Data as a Lens on Human Culture. Penguin, New York.

Yuri Lin, Jean-Baptiste Michel, Erez Lieberman Aiden, Jon Orwant, William Brockman and Slav Petrov. Syntactic Annotations for the Google Books Ngram Corpus. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics Volume 2: Demo Papers (ACL ’12) (2012)

Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, William Brockman, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. Quantitative Analysis of Culture Using Millions of Digitized Books. Science (Published online ahead of print: 12/16/2010).

Google Ngram Viewer Documentation: https://books.google.com/ngrams/info

Citations in Crossref