Joachim Bingel
Institut f ür Deutsche Sprache, Mannheim, Germany
Nils Diewald
Institut f ür Deutsche Sprache, Mannheim, Germany
Download articlePublished in: Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools at NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania
Linköping Electronic Conference Proceedings 111:1, p. 1-5
NEALT Proceedings Series 25:1, p. 1-5
Published: 2015-05-07
ISBN: 978-91-7519-035-8
ISSN: 1650-3686 (print), 1650-3740 (online)
The task-oriented and format-driven development of corpus query systems has led to the creation of numerous corpus query languages (QLs) that vary strongly in expressiveness and syntax. This is a severe impediment for the interoperability of corpus analysis systems, which lack a common protocol. In this paper, we present KoralQuery, a JSON-LD based general corpus query protocol, aiming to be independent of particular QLs, tasks and corpus formats. In addition to describing the system of types and operations that Koral- Query is built on, we exemplify the representation of corpus queries in the serialized format and illustrate use cases in the KorAP project.
concurrent annotation; interoperability; query language; large corpora; query rewrite
Piotr Bánski, Joachim Bingel, Nils Diewald, Elena Frick, Michael Hanl, Marc Kupietz, Piotr Pezik, Carsten Schnober, and Andreas Witt. 2013. KorAP: the new corpus analysis platform at IDS Mannheim. In Zygmunt Vetulani and Hans Uszkoreit, editors, Human Language Technologies as a Challenge for Computer Science and Linguistics. Proceedings of
the 6th Language and Technology Conference, Pozna ´n. Fundacja Uniwersytetu im. A. Mickiewicza.
Piotr Bánski, Nils Diewald, Michael Hanl, Marc Kupietz, and Andreas Witt. 2014. Access Control by Query Rewriting: the Case of KorAP. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland, may. European Language Resources Association (ELRA).
Franck Bodmer. 1996. Aspekte der Abfragekomponente von COSMAS II. LDV-INFO, 8:142–155.
Douglas Crockford. 2006. The application/json Media Type for JavaScript Object Notation (JSON). Technical report, IETF, July. http://www.ietf.org/rfc/rfc4627.txt.
Nils Diewald and Joachim Bingel. 2015. Koral-Query 0.3. Technical report, IDS, Mannheim, Germany. Working draft, in preparation, http://KorAP.github.io/Koral, last accessed 27 April 2015.
Elena Frick, Carsten Schnober, and Piotr Bánski. 2012. Evaluating query languages for a corpus processing system. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pages 2286–2294.
Patricia Huey, 2014. Oracle Database, Security Guide, 11g Release 1 (11.1), chapter 7. Using Oracle Virtual Private Database to Control Data Access, pages 233–272. Oracle. http://docs.oracle.com/cd/B28359_01/network.111/b28531.pdf, last accessed 27 April 2015.
OASIS Standard. 2013. searchRetrieve: Part 5. CQL: The Contextual Query Language Version 1.0. http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.html.
Terence J. Parr and Russell W. Quong. 1995. ANTLR: A predicated-LL (k) parser generator. Software: Practice and Experience, 25(7):789–810.
Adam Przepiórkowski, Zygmunt Krynicki, Lukasz Debowski, Marcin Wolinski, Daniel Janus, and Piotr Bánski. 2004. A search tool for corpora with positional tagsets and ambiguities. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), pages 1235–1238. European Language Resources Association (ELRA).
Viktor Rosenfeld. 2010. An implementation of the Annis 2 query language. Technical report, Humboldt-Universität zu Berlin.
Manu Sporny, Dave Longley, Gregg Kellogg, Markus Lanthaler, and Niklas Lindström. 2014. JSONLD 1.0 – A JSON-based Serialization for Linked Data. Technical report, W3C. W3C Recommendation, http://www.w3.org/TR/json-ld/.