Konferensartikel

UWB@FinTOC-2019 Shared Task: Financial Document Title Detection

Tomáš Hercig
NTIS – New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Plzen, Czech Republic

Pavel Král
Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia, Plzen, Czech Republic

Ladda ner artikel

Ingår i: Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019), September 30, Turku Finland

Linköping Electronic Conference Proceedings 165:11, s. 74-78

NEALT Proceedings Series 40:11, p. 74-78

Visa mer +

Publicerad: 2019-09-30

ISBN: 978-91-7929-997-2

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

This paper describes our system created for the Financial Document Structure Extraction Shared Task (FinTOC-2019) Task A: Title Detection. We rely on the XML representation of the financial prospectuses for additional layout information about the text (font type, font size, etc.). Our constrained system uses only the provided training data without any additional external resources. Our system is based on the Maximum Entropy classifier and various features including font type and font size. Our system achieves F1 score 97.2% and is ranked #3 among 10 submitted systems.

Nyckelord

Financial Document, Title Detection, Machine Learning, Maximum Entropy Classifier

Referenser

Inga referenser tillgängliga

Citeringar i Crossref