Ramon Ziai
Collaborative Research Center 833, Department of Linguistics, ICALL Research Group, LEAD Graduate School & Research Network, University of Tübingen, Germany
Florian Nuxoll
Collaborative Research Center 833, Department of Linguistics, ICALL Research Group, LEAD Graduate School & Research Network, University of Tübingen, Germany
Kordula De Kuthy
Collaborative Research Center 833, Department of Linguistics, ICALL Research Group, LEAD Graduate School & Research Network, University of Tübingen, Germany
Björn Rudzewitz
Collaborative Research Center 833, Department of Linguistics, ICALL Research Group, LEAD Graduate School & Research Network, University of Tübingen, Germany
Detmar Meurers
Collaborative Research Center 833, Department of Linguistics, ICALL Research Group, LEAD Graduate School & Research Network, University of Tübingen, Germany
Download articlePublished in: Proceedings of the 8th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2019), September 30, Turku Finland
Linköping Electronic Conference Proceedings 164:10, p. 93-99
NEALT Proceedings Series 39:10, p. 93-99
Published: 2019-09-30
ISBN: 978-91-7929-998-9
ISSN: 1650-3686 (print), 1650-3740 (online)
This paper explores Short Answer Assessment (SAA) for the purpose of giving automatic meaning-oriented feedback in the
context of a language tutoring system. In
order to investigate the performance of
standard SAA approaches on student responses arising in real-life foreign language teaching, we experimented with two
different factors: 1) the incorporation of
spelling normalization in the form of a
task-dependent noisy channel model spell
checker (Brill and Moore, 2000) and 2)
training schemes, where we explored task- and item-based splits in addition to standard tenfold cross-validation.
For evaluation purposes, we compiled a
data set of 3,829 student answers across
different comprehension task types collected in a German school setting with
the English tutoring system FeedBook
(Rudzewitz et al., 2017; Ziai et al., 2018)
and had an expert score the answers
with respect to appropriateness (correct
vs. incorrect).
Overall, results place
the normalization-enhanced SAA system
ahead of the standard version and a strong
baseline derived from standard text similarity measures. Additionally, we analyze task-specific SAA performance and
outline where further research could make
progress.