Dirk Goldhahn
Natural Language Processing Group, University of Leipzig, Germany. Saxon Academy of Sciences and Humanities, Leipzig, Germany
Thomas Eckart
Natural Language Processing Group, University of Leipzig, Germany. Saxon Academy of Sciences and Humanities, Leipzig, Germany
Sonja Bosch
Department of African Languages, University of South Africa, South Africa
Download articlehttps://doi.org/10.3384/ecp2020172004Published in: Selected Papers from the CLARIN Annual Conference 2019
Linköping Electronic Conference Proceedings 172:4, p. 23-32
Published: 2020-07-03
ISBN: 978-91-7929-807-4
ISSN: 1650-3686 (print), 1650-3740 (online)
This paper presents a use case for enriching lexicographical data for less-resourced languages
employing the CLARIN infrastructure. Newly prepared lexicographical data sets for under-resourced
Bantu languages spoken in southern regions of the African continent form the basis of
the presented work. These datasets have been made digitally available using well-established
standards of the Linguistic Linked Open Data (LLOD) community. To overcome the insufficient
amount of freely available reference material, a crowdsourcing web portal for collecting textual
data for less-resourced languages has been created and incorporated into the CLARIN infrastructure.
Using this portal, the number of available text resources for the respective languages was
significantly increased in a community effort. The collected content is used to enrich lexicographical
data with real-world samples to increase the usability of the entire resource.
minority languages, lesser resourced languages, use case, lexical resources, Bantu languages