Carla Abreu
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal / LIACC, Porto, Portugal
Henrique Lopes Cardoso
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal / LIACC, Porto, Portugal
Eugénio Oliveira
Faculdade de Engenharia da Universidade do Porto, Porto. Portugal / LIACC, Porto, Portugal
Ladda ner artikelIngår i: Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019), September 30, Turku Finland
Linköping Electronic Conference Proceedings 165:10, s. 69-73
NEALT Proceedings Series 40:10, p. 69-73
Publicerad: 2019-09-30
ISBN: 978-91-7929-997-2
ISSN: 1650-3686 (tryckt), 1650-3740 (online)
We present the approach developed at the Faculty of Engineering of the University of Porto to participate in FinTOC-2019 Financial Document Structure Extraction -- Detection of titles sub-task. Several financial documents are produced in machine-readable format. Due to the poor structure of these documents, it is an arduous task to retrieve the desired information from them. The aim of this sub-task is to detect titles in this kind of documents. We propose a supervised learning approach making use of linguistic, semantic and morphological features to classify a text block as title or non title. The proposed methodology got a F1 score of 97.01%.
Inga referenser tillgängliga