Conference article

Summarization Evaluation meets Short-Answer Grading

Margot Mieskes
Hochschule Darmstadt, Germany

Ulrike Padó
Hochschule für Technik Stuttgart, Germany

Download article

Published in: Proceedings of the 8th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2019), September 30, Turku Finland

Linköping Electronic Conference Proceedings 164:8, p. 79-85

NEALT Proceedings Series 39:8, p. 79-85

Show more +

Published: 2019-09-30

ISBN: 978-91-7929-998-9

ISSN: 1650-3686 (print), 1650-3740 (online)


Summarization Evaluation and Short-Answer Grading share the challenge of automatically evaluating content quality. Therefore, we explore the use of ROUGE, a well-known Summarization Evaluation method, for Short-Answer Grading. We find a reliable ROUGE parametrization that is robust across corpora and languages and produces scores that are significantly correlated with human short-answer grades. ROUGE adds no information to Short-Answer Grading NLP-based machine learning features in a by-corpus evaluation. However, on a question-by-question basis, we find that the ROUGE Recall score may outperform standard NLP features. We therefore suggest to use ROUGE within a framework for per-question feature selection or as a reliable and reproducible baseline for SAG.


short-answer grading, summarization evaluation, ROUGE


No references available

Citations in Crossref