Hanna Berg
Department of Computer and Systems Sciences, Stockholm University, Sweden
Hercules Dalianis
Department of Computer and Systems Sciences, Stockholm University, Sweden
Ladda ner artikelIngår i: Proceedings of the Workshop on NLP and Pseudonymisation, September 30, 2019, Turku, Finland
Linköping Electronic Conference Proceedings 166:2, s. 8-15
NEALT Proceedings Series 41:2, p. 8-15
Publicerad: 2019-09-30
ISBN: 978-91-7929-996-5
ISSN: 1650-3686 (tryckt), 1650-3740 (online)
Electronic patient records are produced in
abundance every day and there is a demand
to use them for research or management
purposes. The records, however,
contain information in the free text that
can identify the patient and therefore tools
are needed to identify this sensitive information.
The aim is to compare two machine learning
algorithms, Long Short-Term Memory
(LSTM) and Conditional Random Fields
(CRF) applied to a Swedish clinical data
set annotated for de-identification. The results
show that CRF performs better than
deep learning with LSTM, with CRF giving
the best results with an F1 score of 0.91
when adding more data from within the
same domain. Adding general open data
did, on the other hand, not improve the results.
Inga referenser tillgängliga