MTAS: A Solr/Lucene based Multi Tier Annotation Search solution

Matthijs Brouwer
Meertens Institute, The Netherlands

Hennie Brugman
Meertens Institute, The Netherlands

Marc Kemps-Snijders
Meertens Institute, The Netherlands

Published in: Selected papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26–28 October 2016, CLARIN Common Language Resources and Technology Infrastructure

Linköping Electronic Conference Proceedings 136:2, p. 19-37

Published: 2017-05-23

ISBN: 978-91-7685-499-0

ISSN: 1650-3686 (print), 1650-3740 (online)


In recent years, multiple solutions have become available providing search on huge amounts of plain text and metadata. Scalable searchability on annotated text however still appears to be problematic. With Mtas, an acronym for Multi-Tier Annotation Search, we add annotation layers and structure to the existing Lucene approach of creating and searching indexes, and furthermore present an implementation as Solr plugin providing both searchability and scalability. We present a configurable indexation process, supporting multiple document formats, and providing extended search options on both metadata and annotated text, such as advanced statistics, faceting, grouping and keyword-in-context. Mtas is currently used in production environments, with up to 15 million documents and 9.5 billion words. Mtas is available from GitHub.


Multi tier annotation search, Lucene, SOLR, kwic, statistics


