Hercules Dalianis
Department of Computer and Systems Sciences, Stockholm University, Sweden
Download articlePublished in: Proceedings of the Workshop on NLP and Pseudonymisation, September 30, 2019, Turku, Finland
Linköping Electronic Conference Proceedings 166:3, p. 16-23
NEALT Proceedings Series 41:3, p. 16-23
Published: 2019-09-30
ISBN: 978-91-7929-996-5
ISSN: 1650-3686 (print), 1650-3740 (online)
This study describes a rule-based
pseudonymisation system for Swedish
clinical text and its evaluation. The
pseudonymisation system replaces
already tagged Protected Health Information
(PHI) with realistic surrogates. There
are eight types of manually annotated
PHIs in the electronic patient records; personal
first and last names, phone numbers,
locations, dates, ages and healthcare units.
Two evaluators, both computer scientists,
one junior and one senior, evaluated
whether a set of 98 electronic patients
records where pseudonymised or
not. Only 3.5 percent of the records were
correctly judged as pseudonymised and
1.5 percent of the real ones were wrongly
judged as pseudo, giving that in average
91 percent of the pseudonymised records
were judged as real.