A Greek Corpus of Aphasic Discourse: Collection, Transcription, and Annotation Specifications

Spyridoula Varlokosta
National and Kapodistrian University of Athens, Greece

Spyridoula Stamouli
National and Kapodistrian University of Athens, Greece / Institute for Language and Speech Processing / “Athena” Research Center, Greece

Athanassios Karasimos
National and Kapodistrian University of Athens, Greece / Academy of Athens, Greece

Georgios Markopoulos
National and Kapodistrian University of Athens, Greece

Maria Kakavoulia
Panteion University of Social and Political Sciences, Greece

Michaela Nerantzini
National and Kapodistrian University of Athens, Greece / Northwestern University, USA

Aikaterini Pantoula
National and Kapodistrian University of Athens, Greece

Valantis Fyndanis
National and Kapodistrian University of Athens, Greece / University of Oslo, Norway

Alexandra Economou
National and Kapodistrian University of Athens, Greece

Athanassios Protopapas
National and Kapodistrian University of Athens, Greece

Ingår i: Proceedings of LREC 2016 Workshop. Resources and Processing of Linguistic and Extra-Linguistic Data from People with Various Forms of Cognitive/Psychiatric Impairments (RaPID-2016), Monday 23rd of May 2016

Linköping Electronic Conference Proceedings 128:3, s. 14 to 21

Visa mer +

Publicerad: 2016-06-03

ISBN: 978-91-7685-730-4

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In this paper, the process of designing an annotated Greek Corpus of Aphasic Discourse (GREECAD) is presented. Given that resources of this kind are quite limited, a major aim of the GREECAD was to provide a set of specifications which could serve as a methodological basis for the development of other relevant corpora, and, therefore, to contribute to the future research in this area. The GREECAD was developed with the following requirements: a) to include a rather homogeneous sample of Greek as spoken by individuals with aphasia; b) to document speech samples with rich metadata, which include demographic information, as well as detailed information on the patients’ medical record and neuropsychological evaluation; c) to provide annotated speech samples, which encode information at the micro-linguistic (words, POS, grammatical errors, clause types, etc.) and discourse level (narrative structure elements, main events, evaluation devices, etc.). In terms of the design of the GREECAD, the basic requirements regarding data collection, metadata, transcription, and annotation procedures were set. The discourse samples were transcribed and annotated with the ELAN tool. To ensure accurate and consistent annotation, a Transcription and Annotation Guide was compiled, which includes detailed guidelines regarding all aspects of the transcription and annotation procedure.


aphasia, aphasic discourse, annotated corpus


