Hercules Dalianis
Department of Computer and Systems Sciences Stockholm University Kista, Sweden
Hanna Berg
Department of Computer and Systems Sciences Stockholm University Kista, Sweden
Download articlePublished in: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021.
Linköping Electronic Conference Proceedings 178:54, p. 467-471
NEALT Proceedings Series 45:54, p. 467-471
Published: 2021-05-21
ISBN: 978-91-7929-614-8
ISSN: 1650-3686 (print), 1650-3740 (online)
This paper describes a freely available web-based demonstrator called HB Deid. HB Deid identifies so-called protected health information, PHI, in a text written in Swedish and removes, masks, or replaces them with surrogates or pseudonyms. PHIs are named entities such as personal names, locations, ages, phone numbers, dates. HB Deid uses a CRF model trained on non-sensitive annotated text in Swedish, as well as a rule-based post-processing step for finding PHI. The final step in obscuring the PHI is then to either mask it, show only the class name or use a rule-based pseudonymisation system to replace it.
de-identification, pseudonymisation, clinical text, electronic patient records, CRF, Swedish