Magnus Breder Birkenes
The National Library of Norway, Oslo, Norway
Lars G. Johnsen
The National Library of Norway, Oslo, Norway
Arne Martinus Lindstad
The National Library of Norway, Oslo, Norway
Johanne Ostad
The National Library of Norway, Oslo, Norway
Download articlePublished in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania
Linköping Electronic Conference Proceedings 109:39, p. 293-295
NEALT Proceedings Series 23:39, p. 293-295
Published: 2015-05-06
ISBN: 978-91-7519-098-3
ISSN: 1650-3686 (print), 1650-3740 (online)
At the National Library of Norway, we are currently developing a service comparable to the Google Ngram Viewer (Michel et al., 2010; Lin et al., 2012; Aiden and Michel, 2013) called NB Ngram. It is based on all books and newspapers digitized up to and including 2013, as part of the large scale digitization project at the National Library of Norway. Uni-, bi- and trigams have been generated on the basis of this text corpus containing some 34 billion words. In this paper, we sketch the background of NB N-gram and illustrate some applications of it.
Erez Aiden and Jean-Baptiste Michel. 2013.Uncharted: Big Data as a Lens on Human Culture. Penguin, New York.
Yuri Lin, Jean-Baptiste Michel, Erez Lieberman Aiden, Jon Orwant, William Brockman and Slav Petrov. Syntactic Annotations for the Google Books Ngram Corpus. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics Volume 2: Demo Papers (ACL ’12) (2012)
Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, William Brockman, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. Quantitative Analysis of Culture Using Millions of Digitized Books.
Science (Published online ahead of print: 12/16/2010).
Google Ngram Viewer Documentation: https://books.google.com/ngrams/info