Matthijs Brouwer
Meertens Institute, The Netherlands
Hennie Brugman
Meertens Institute, The Netherlands
Marc Kemps-Snijders
Meertens Institute, The Netherlands
Download articlePublished in: Selected papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26–28 October 2016, CLARIN Common Language Resources and Technology Infrastructure
Linköping Electronic Conference Proceedings 80:2, p. 19-37
Published: 2017-05-23
ISBN: 978-91-7685-499-0
ISSN: 1650-3686 (print), 1650-3740 (online)
In recent years, multiple solutions have become available providing search on huge amounts of plain text and metadata. Scalable searchability on annotated text however still appears to be problematic. With Mtas, an acronym for Multi-Tier Annotation Search, we add annotation layers and structure to the existing Lucene approach of creating and searching indexes, and furthermore present an implementation as Solr plugin providing both searchability and scalability. We present a configurable indexation process, supporting multiple document formats, and providing extended search options on both metadata and annotated text, such as advanced statistics, faceting, grouping and keyword-in-context. Mtas is currently used in production environments, with up to 15 million documents and 9.5 billion words. Mtas is available from GitHub.