Conference article

Finnish Paraphrase Corpus

Jenna Kanerva

Filip Ginter

Li-Hsin Chang

Iiro Rastas

Valtteri Skantsi

Jemina Kilpeläinen

Hanna-Mari Kupari

Jenna Saarni

Maija Sevón

Otto Tarkka

Download article

Published in: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021.

Linköping Electronic Conference Proceedings 178:29, p. 288-298

NEALT Proceedings Series 45:29, p. 288-298

Show more +

Published: 2021-05-21

ISBN: 978-91-7929-614-8

ISSN: 1650-3686 (print), 1650-3740 (online)


In this paper, we introduce the first fully manually annotated paraphrase corpus for Finnish containing 53,572 paraphrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both cost and quality.


paraphrase, Finnish, annotation, corpus


No references available

Citations in Crossref