Towards Large-Scale Language Analysis in the Cloud

Emanuele Lapponi
Language Technology Group, Department of Informatics, University of Oslo, Norway

Erik Velldal
Language Technology Group, Department of Informatics, University of Oslo, Norway

Nikolay A. Vazov
Research Support Services Group, University Center for Information Technology, University of Oslo, Norway

Stephan Oepen
Language Technology Group, Department of Informatics, University of Oslo, Norway

Ingår i: Proceedings of the workshop on Nordic language research infrastructure at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 20

Linköping Electronic Conference Proceedings 89:1, s. 1-10

NEALT Proceedings Series 20:1, s. 1-10

Publicerad: 2013-05-17

ISBN: 978-91-7519-585-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This paper documents ongoing work within the Norwegian CLARINO project on building a Language Analysis Portal (LAP). The portal will provide an intuitive and easily accessible web interface to a centralized repository of a wide range of language technology tools; all installed on a high-performance computing cluster. Users will be able to compose and run workflows using an easy-to-use graphical interface; with multiple tools and resources chained together in potentially complex pipelines. Although the project aims to reach out to a diverse set of user groups; it particularly will facilitate use of language analysis in the social sciences; humanities; and other fields without strong computational traditions. While the development of the portal is still in its early stages; this paper documents ongoing work towards an already operable pilot in addition to providing an overview of long-term goals and visions. At the core of the current pilot implementation we find Galaxy; a web-based workflow management system initially developed for data-intensive research in genomics and bioinformatics; therefore; an important part of the work on the pilot is to adapt and evaluate Galaxy for the context of a language analysis portal.


Research infrastructure; High-Performance Computing; web portal; CLARINO


