Article | Selected Papers from the CLARIN Annual Conference 2019 | Topic modelling applied to a second language: A language adaptation and tool evaluation study Linköping University Electronic Press Conference Proceedings
Göm menyn

Title:
Topic modelling applied to a second language: A language adaptation and tool evaluation study
Author:
Maria Skeppstedt: The Language Council of Sweden, the Institute for Language and Folklore, Sweden Magnus Ahltorp: The Language Council of Sweden, the Institute for Language and Folklore, Sweden Kostiantyn Kucher: Department of Computer Science and Media Technology, Linnaeus University, Vaxjö, Sweden Andreas Kerren: Department of Computer Science and Media Technology, Linnaeus University, Vaxjö, Sweden Rafal Rzepka: Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan. RIKEN Center for Advanced Intelligence Project (AIP), Tokyo, Japan Kenji Araki: Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan
DOI:
https://doi.org/10.3384/ecp2020172017
Download:
Full text (pdf)
Year:
2020
Conference:
Selected Papers from the CLARIN Annual Conference 2019
Issue:
172
Article no.:
017
Pages:
145-156
No. of pages:
12
Publication type:
Abstract and Fulltext
Published:
2020-07-03
ISBN:
978-91-7929-807-4
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

The Topics2Themes tool, which enables text analysis on the output of topic modelling, was originally developed for the English language. In this study, we explored and evaluated adaptations required for applying the tool to Japanese texts. That is, we adapted Topics2Themes to a language that is very different from the one for which the tool was originally developed. To apply Topics2Themes to Japanese texts, in which white space is not used for indicating word boundaries, the texts had to be pre-tokenised and white space inserted to indicate a token segmentation. Topics2Themes was also extended by the addition of word translations and phonetic readings to support users who are second-language speakers of Japanese. To evaluate the adaptation to a second language, as well as the reading support, we applied the tool to a corpus consisting of short Japanese texts. Twelve different topics were automatically identified, and a total of 183 texts representative for the twelve topics were extracted. A learner of Japanese carried out a manual analysis of these representative texts, and identified 35 reoccurring, fine-grained themes.

Keywords: topic modelling, computer-assisted text analysis, language adaptation

Selected Papers from the CLARIN Annual Conference 2019

Author:
Maria Skeppstedt, Magnus Ahltorp, Kostiantyn Kucher, Andreas Kerren, Rafal Rzepka, Kenji Araki
Title:
Topic modelling applied to a second language: A language adaptation and tool evaluation study
DOI:
10.3384/ecp2020172017
References:
No references available

Selected Papers from the CLARIN Annual Conference 2019

Author:
Maria Skeppstedt, Magnus Ahltorp, Kostiantyn Kucher, Andreas Kerren, Rafal Rzepka, Kenji Araki
Title:
Topic modelling applied to a second language: A language adaptation and tool evaluation study
DOI:
https://doi.org10.3384/ecp2020172017
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2019-11-06