Emmanuel Giguet
Normandie University, UNICAEN, ENSICAEN, CNRS, GREYC, Caen, France
Gaël Lejeune
STIH, EA 4509, Sorbonne University, Paris, France
Ladda ner artikelIngår i: Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019), September 30, Turku Finland
Linköping Electronic Conference Proceedings 165:9, s. 63-68
NEALT Proceedings Series 40:9, p. 63-68
Publicerad: 2019-09-30
ISBN: 978-91-7929-997-2
ISSN: 1650-3686 (tryckt), 1650-3740 (online)
We present different methods for the two tasks of the 2019 Fin-Toc challenge : Title Detection and Table of Contents Extraction. For the Title Detection task we present different approaches using stylometric features like punctuation and character n-grams. Our best approach achieved a F-measure score of 94.88%, ranking 6 on this task. For the TOC extraction task, we presented a method combining visual characteristics of the document layout. With this method we ranked first on this task with 42.72%.
Document layout analysis, Document structure analysis, Physical structure, Logical struture, Table of content extraction, Title Detection
Inga referenser tillgängliga