Augmenting a De-identification System for Swedish Clinical Text Using Open Resources and Deep Learning

Hanna Berg
Department of Computer and Systems Sciences, Stockholm University, Sweden

Hercules Dalianis
Department of Computer and Systems Sciences, Stockholm University, Sweden

Ladda ner artikel

Ingår i: Proceedings of the Workshop on NLP and Pseudonymisation, September 30, 2019, Turku, Finland

Linköping Electronic Conference Proceedings 166:2, s. 8-15

NEALT Proceedings Series 41:2, p. 8-15

Visa mer +

Publicerad: 2019-09-30

ISBN: 978-91-7929-996-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Electronic patient records are produced in abundance every day and there is a demand to use them for research or management purposes. The records, however, contain information in the free text that can identify the patient and therefore tools are needed to identify this sensitive information. The aim is to compare two machine learning algorithms, Long Short-Term Memory (LSTM) and Conditional Random Fields (CRF) applied to a Swedish clinical data set annotated for de-identification. The results show that CRF performs better than deep learning with LSTM, with CRF giving the best results with an F1 score of 0.91 when adding more data from within the same domain. Adding general open data did, on the other hand, not improve the results.


De-identification, PHI, Machine learning, LSTM, CRF, Swedish


Inga referenser tillgängliga

Citeringar i Crossref