Konferensartikel

Daniel@FinTOC-2019 Shared Task : TOC Extraction and Title Detection

Emmanuel Giguet
Normandie University, UNICAEN, ENSICAEN, CNRS, GREYC, Caen, France

Gaël Lejeune
STIH, EA 4509, Sorbonne University, Paris, France

Ladda ner artikel

Ingår i: Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019), September 30, Turku Finland

Linköping Electronic Conference Proceedings 165:9, s. 63-68

NEALT Proceedings Series 40:9, s. 63-68

Visa mer +

Publicerad: 2019-09-30

ISBN: 978-91-7929-997-2

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

We present different methods for the two tasks of the 2019 Fin-Toc challenge : Title Detection and Table of Contents Extraction. For the Title Detection task we present different approaches using stylometric features like punctuation and character n-grams. Our best approach achieved a F-measure score of 94.88%, ranking 6 on this task. For the TOC extraction task, we presented a method combining visual characteristics of the document layout. With this method we ranked first on this task with 42.72%.

Nyckelord

Document layout analysis, Document structure analysis, Physical structure, Logical struture, Table of content extraction, Title Detection

Referenser

Inga referenser tillgängliga

Citeringar i Crossref