Conference article

Towards Unlocking the Narrative of the United States Income Tax Forms

Esme Manandise
Intuit Futures, Mountain View, California, USA

Download article

Published in: Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019), September 30, Turku Finland

Linköping Electronic Conference Proceedings 165:5, p. 33-41

NEALT Proceedings Series 40:5, p. 33-41

Show more +

Published: 2019-09-30

ISBN: 978-91-7929-997-2

ISSN: 1650-3686 (print), 1650-3740 (online)


The present study contributes to the literature on the language of the tax-and-regulations domain in the context of highly-formatted tax forms published by a federal agency. Content and form analyses rely on a methodology that looks for meaning and patterns in connection to the main purpose of income tax filing, i.e. figuring out calculations to determine whether taxes were overpaid or owed to the United States Internal Revenue Service. Profiling the income-tax forms by spelling out language regularities across the set has at least two advantages. Firstly, profiling contributes to the understanding of how the 2010 Plain Writing Act mandate of ‘clear and simple’ writing is being achieved—if at all. Secondly, profiling a small, unannotated corpus can help determine the Natural Language Processing approach best fitted to extract, represent, and execute automatically tax calculations expressed as arithmetic word problems.


tax-and-regulation domain, automatic annotation, raw text preprocessing, linguistic-feature-based classification, Plain Writing Act mandate, text compression, tabular content structuring, readability


No references available

Citations in Crossref