An Overview of Knowledge Extraction Projects in the NLP group at Lund University

Pierre Nugues
Department of Computer Science, Lund University, Lund, Sweden

Ladda ner artikel

Ingår i: Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, Proceedings of the Workshop, July 11, 2016, Krakow, Poland

Linköping Electronic Conference Proceedings 126:5, s. 25--31

Visa mer +

Publicerad: 2016-07-08

ISBN: 978-91-7685-733-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In this paper, I describe systems and prototypes we created in the natural language processing group at Lund to extract structured knowledge from text. Starting from syntactic and semantic parsing components, we developed applications that can handle large corpora, typically complete Wikipedia versions consisting of millions of documents and process text to identify entities and the relations between them. I describe the overall goals of our projects, the data structure we designed to handle the documents, as well as three applications to extract knowledge from text.


Inga nyckelord är tillgängliga


Anders Björkelund, Bernd Bohnet, Love Hafdell, and Pierre Nugues. 2010. A high-performance syntactic and semantic dependency parser. In Coling 2010: Demonstration Volume, pages 33–36, Beijing, August 23-27. Coling 2010 Organizing Committee.

Firas Dib, Simon Lindberg, and Pierre Nugues. 2015. Extraction of career profiles from Wikipedia. In BD2015, Proceedings of the First Conference on Biographical Data in a Digital World 2015, pages 33–38, Amsterdam, April. CEUR Workshop Proceedings.

Peter Exner and Pierre Nugues. 2012. Constructing large proposition databases. In Proceedings of the eighth international conference on Language Resources and Evaluation (LREC 2012), pages 3836–3840, Istanbul, May 23–25.

Peter Exner and Pierre Nugues. 2014. REFRACTIVE: An open source tool to extract knowledge from syntactic and semantic relations. In Proceedings of LREC 2014, The 9th edition of the Language Resources and Evaluation Conference, pages 2584–2589, Reykjavik, May 27-29.

Peter Exner, Marcus Klang, and Pierre Nugues. 2015. A distant supervision approach to semantic role labeling. In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, pages 239–248, Denver, Colorado, June. Association for Computational Linguistics.

David Ferrucci and Adam Lally. 2004. Uima: An architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng., 10(3-4):327–348, September.

David Angelo Ferrucci. 2012. Introduction to “This is Watson”. IBM Journal of Research and Development, 56(3.4):1:1 –1:15, May-June.

Charles J. Fillmore. 1976. Frame semantics and the nature of language. Annals of the New York Academy of Sciences: Conference on the Origin and Development of Language and Speech, 280:20–32.

Jakob Grundström and Pierre Nugues. 2014. Using syntactic features in answer reranking. In Proceedings of the AAAI 2014 Workshop on Cognitive Computing for Augmented Human Intelligence, pages 13–19, Qu´ebec, July 27.

Richard Johansson and Pierre Nugues. 2008. Dependency-based semantic role labeling of PropBank. In Proceedings
of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pages 69–78, Honolulu, October 25-27.

Marcus Klang and Pierre Nugues. 2016. Wikiparq: A tabulated Wikipedia resource using the Parquet format. In Proceedings of 10th edition of the Language Resources and Evaluation Conference, Portoroz, May.

2013. Kvitt eller dubbelt – tiotusenkronorsfr°agan. http://en.wikipedia.org/wiki/Kvitt_eller_dubbelt.

Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The Proposition Bank: an annotated corpus of semantic roles. Computational Linguistics, 31(1):71–105.

Juri Pyykkö, Rebecka Weegar, and Pierre Nugues. 2014. Passage retrieval in a question answering system. In Proceedings of the The Fifth Swedish Language Technology Conference (SLTC 2014), Uppsala, November 13-14.

Josef Ruppenhofer, Michael Ellsworth, Miriam R. L. Petruck, and Christopher R. Johnson. 2005. Framenet: Theory and practice. http://framenet.icsi.berkeley.edu/book/book.html. Cited 28 October 2005.

Karin Thorsvad and Hasse Thorsvad. 2005. Kvitt eller dubbelt.

Citeringar i Crossref