Published: 2017-09-13
ISBN: 978-91-7685-467-9
ISSN: 1650-3686 (print), 1650-3740 (online)
Previous work on Korean language processing has proposed different basic segmentation units. This paper explores different possible dependency representations for Korean using different levels of segmentation granularity — that is, different schemes for morphological segmentation of tokens into syntactic words. We provide a new Universal Dependencies
(UD)-like corpus based on different levels of segmentation granularity for Korean.
The corpus contains 67K words in 5,000 sentences which are split into training, development and evaluation data sets.
We report parsing results using the new dependency corpus for Korean and compare
them with the previous Korean UD corpus.