Conference article

FinDSE@FinTOC-2019 Shared Task

Carla Abreu
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal / LIACC, Porto, Portugal

Henrique Lopes Cardoso
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal / LIACC, Porto, Portugal

Eugénio Oliveira
Faculdade de Engenharia da Universidade do Porto, Porto. Portugal / LIACC, Porto, Portugal

Download article

Published in: Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019), September 30, Turku Finland

Linköping Electronic Conference Proceedings 165:10, p. 69-73

NEALT Proceedings Series 40:10, p. 69-73

Show more +

Published: 2019-09-30

ISBN: 978-91-7929-997-2

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

We present the approach developed at the Faculty of Engineering of the University of Porto to participate in FinTOC-2019 Financial Document Structure Extraction -- Detection of titles sub-task. Several financial documents are produced in machine-readable format. Due to the poor structure of these documents, it is an arduous task to retrieve the desired information from them. The aim of this sub-task is to detect titles in this kind of documents. We propose a supervised learning approach making use of linguistic, semantic and morphological features to classify a text block as title or non title. The proposed methodology got a F1 score of 97.01%.

Keywords

Machine Learning, Natural Language Processing, Document Structure Extraction

References

No references available

Citations in Crossref