Iben Nyholm Debess
Grunnurin Føroysk Teldutala, Denmark
Sandra Saxov Lamhauge
Danish Language Council, Denmark
Peter Juel Juel Henrichsen
Danish Language Council, Denmark
Download articlePublished in: Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland
Linköping Electronic Conference Proceedings 167:47, p. 395--399
NEALT Proceedings Series 42:47, p. 395--399
Published: 2019-10-02
ISBN: 978-91-7929-995-8
ISSN: 1650-3686 (print), 1650-3740 (online)
We present a new method for preparing a lexical-phonetic database as a resource for acoustic model training. The research is an offshoot of the ongoing Project Ravnur (Speech Recognition for Faroese), but the method is language-independent. At NODALIDA 2019 we demonstrate the method (called SHARP) online, showing how a traditional lexical-phonetic dictionary (with a very rich phone inventory) is transformed into an ASR-friendly database (with reduced phonetics, preventing data sparseness). The mapping procedure is informed by a corpus of speech transcripts. We conclude with a discussion on the benefits of a well-thoughtout BLARK design (Basic Language Resource Kit), making tools like SHARP possible.