Finance document Extraction Using Data Augmentation and Attention

Ke Tian
OPT Inc, Tokyo, Japan

Zijun Peng
Harbin Institute of Technology (Weihai), China

Ladda ner artikel

Ingår i: Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019), September 30, Turku Finland

Linköping Electronic Conference Proceedings 165:1, s. 1-4

NEALT Proceedings Series 40:1, p. 1-4

Visa mer +

Publicerad: 2019-09-30

ISBN: 978-91-7929-997-2

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This paper mainly describes the aiai that the team submitted to the FinToc-2019 shared task. There are two tasks. One is the title detection task from non-titles in the finance documents. Another one is the TOC (table of contents) prediction from the finance PDF document. The data augmented and attention-based LSTM and BiLSTM models are applied to tackle the title-detection task. The experiment has shown that our methods perform well in predicting titles in finance documents. The result achieved the 1st ranking score on the title detection leaderboard.


FinToc-2019 shared task, data augmentation, attention-based LSTM, BiLSTM


Inga referenser tillgängliga

Citeringar i Crossref