Extraction of lethal events from Wikipedia and a semantic repository

Magnus Norrby
Department of Computer Science, Lund University, Sweden

Pierre Nugues
Department of Computer Science, Lund University, Sweden

Ladda ner artikel

Ingår i: Proceedings of the Workshop on Semantic resources and Semantic Annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015, Vilnius, 11th May, 2015

Linköping Electronic Conference Proceedings 112:5, s. 28–35

NEALT Proceedings Series 27:5, s. 28–35

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-049-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This paper describes the extraction of information on lethal events from the Swedish version of Wikipedia. The information searched includes the persons’ cause of death, origin, and profession. We carried out the extraction using a processing pipeline of available tools for Swedish including a part-of-speech tagger, a dependency parser, and manually-written extraction rules. We also extracted structured semantic data from the Wikidata store that we combined with the information retrieved from Wikipedia. Eventually, we gathered a database of facts that covers both sources: Wikipedia and Wikidata.


semantic extraction; knowledge graph; entity repository


Douglas Appelt, Jerry Hobbs, John Bear, David Israel, Megumi Kameyama, and Mabry Tyson. 1993. SRI: Description of the JV-FASTUS system used for MUC-5. In Fifth Message Understanding Conference (MUC-5): Proceedings of a Conference Held in Baltimore, Maryland, pages 221–235, San Francisco, August. Morgan Kaufmann.

Bliki. 2014. Bliki engine. bitbucket. org/axelclk/info.bliki.wiki/wiki/Home. Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, and Xuan Zhou. 2006. Extracting semantic relationships between wikipedia categories. 1st InternationalWorkshop: SemWiki2006 – FromWiki to Semantics.

Hannes Dohrn and Dirk Riehle. 2013. Design and implementation of wiki content transformations and refactorings. In Proceedings of the 9th International Symposium on Open Collaboration, WikiSym ’13, pages 2:1–2:10.

Yidong Fang. 2012. Json.simple. code.google.com/p/json-simple/.

Jerry R. Hobbs, Douglas E. Appelt, John Bear, David Israel, Megumi Kameyama, Mark Stickel, and Mabry Tyson. 1997. FASTUS: a cascaded finitestate transducer for extracting information from natural-language text. In Emmanuel Roche and Yves Schabes, editors, Finite-State Language Processing, chapter 13, pages 383–406. MIT Press, Cambridge, Massachusetts.

Dustin Lange, Christoph B¨ohm, and Felix Naumann. 2010. Extracting structured information from wikipedia articles to populate infoboxes. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pages 1661–1664.

Vivi Nastase and Michael Strube. 2008. Decoding wikipedia categories for knowledge acquisition. In AAAI’08 Proceedings of the 23rd national conference on Artificial intelligence, volume 2, pages 1219–1224.

Joakim Nivre, Johan Hall, and Jens Nilsson. 2006. Maltparser: A data-driven parser-generator for dependency parsing. In Proceedings of LREC.

Robert Östling. 2013. Stagger: an open-source part of speech tagger for Swedish. Northern European Journal of Language Technology, 3.

Wikipedia. 2015. Club 27. http://sv.wikipedia.org/wiki/27_Club#Musiker_som_avlidit_vid_27_.C3.A5rs_.C3.A5lder. Accessed March 11, 2015.

Fei Wu and Daniel S. Weld. 2007. Autonomously semantifying wikipedia. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM ’07, pages 41–50.

Citeringar i Crossref