Conference article

Closing a Gap in the Language Resources Landscape: Groundwork and Best Practices from Projects on Computer-mediated Communication in four European Countries

Michael Beißwenger
University of Duisburg-Essen, Germany

Thierry Chanier
Université Clermont, Auvergne, France

Tomaž Erjavec
Jožef Stefan Institute, Ljubljana, Slovenia

Darja Fišer
University of Ljubljana, Ljubljana, Slovenia

Axel Herold
Berlin-Brandenburg Academy of Sciences, Berlin, Germany

Nikola Ljubešic
Jožef Stefan Institute, Ljubljana, Slovenia

Harald Lüngen
Institute for the German Language, Mannheim, Germany

Céline Poudat
Université de Nice, Sophia Antipolis, France

Egon Stemle
Eurac Research, Bolzano, Italy

Angelika Storrer
University of Mannheim, Mannheim, Germany

Ciara Wigham
Université Clermont, Auvergne, France

Published in: Selected papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26–28 October 2016, CLARIN Common Language Resources and Technology Infrastructure

Linköping Electronic Conference Proceedings 136:1, p. 1-18

Published: 2017-05-23

ISBN: 978-91-7685-499-0

ISSN: 1650-3686 (print), 1650-3740 (online)


The paper presents best practices and results from projects dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC) from four different countries. Even though there are still many open issues related to building and annotating corpora of this type, there already exists a range of tested solutions which may serve as a starting point for a comprehensive discussion on how future standards for CMC corpora could (and should) be shaped like.


CMC corpora, computer-mediated communication, social media corpora, corpus annotation, language resources, TEI, community building


