Conference article

Augmenting a De-identification System for Swedish Clinical Text Using Open Resources and Deep Learning

Hanna Berg
Department of Computer and Systems Sciences, Stockholm University, Sweden

Hercules Dalianis
Department of Computer and Systems Sciences, Stockholm University, Sweden

Download article

Published in: Proceedings of the Workshop on NLP and Pseudonymisation, September 30, 2019, Turku, Finland

Linköping Electronic Conference Proceedings 166:2, p. 8-15

NEALT Proceedings Series 41:2, p. 8-15

Show more +

Published: 2019-09-30

ISBN: 978-91-7929-996-5

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

Electronic patient records are produced in abundance every day and there is a demand to use them for research or management purposes. The records, however, contain information in the free text that can identify the patient and therefore tools are needed to identify this sensitive information. The aim is to compare two machine learning algorithms, Long Short-Term Memory (LSTM) and Conditional Random Fields (CRF) applied to a Swedish clinical data set annotated for de-identification. The results show that CRF performs better than deep learning with LSTM, with CRF giving the best results with an F1 score of 0.91 when adding more data from within the same domain. Adding general open data did, on the other hand, not improve the results.

Keywords

De-identification, PHI, Machine learning, LSTM, CRF, Swedish

References

No references available

Citations in Crossref