Emnet: a System for Privacy-Preserving Statistical Computing on Distributed Health Data

Meskerem Asfaw Hailemichæl
Department of Computer Science, UiT The Arctic University of Norway, Norway

Kassaye Yitbarek Yigzaw
Department of Computer Science, UiT The Arctic University of Norway, Norway

Johan Gustav Bellika
Department of Clinical Medicine, UiT The Arctic University of Norway, Norway / Norwegian Centre for Integrated Care and Telemedicine, University Hospital of North Norway, Norway

Ladda ner artikel

Ingår i: SHI 2015, Proceedings from The 13th Scandinavien Conference on Health Informatics, June 15-17, 2015, Tromsø, Norway

Linköping Electronic Conference Proceedings 115:6, s. 33-40

Visa mer +

Publicerad: 2015-06-26

ISBN: 978-91-7685-985-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Reuse of health data for epidemiological and health services research have enormous benefits for individuals and society. However, patients’ and health institutions’ have privacy concerns. Yet, the commonly used de-identification and consentbased privacy-preserving methods have limitations. In this paper we described three generic requirements for privacy-preserving statistical computing on distributed health data. Then, we described building blocks for implementation on horizontally partitioned data. For each research project, a set of participant health institutions locally store data extracts for the researchers’ criteria. The data across the institutions collectively make the project data, which we refer to as virtual dataset. We decomposed count, mean, standard deviation, variance, covariance, and Pearson’s r into summation forms and described as an abstract computation graph, where subcomputations are nodes. Generic APIs that can be invoked at runtime to execute a node against a virtual dataset are defined. Then we described a proof of concept implementation called Emnet. Emnet demonstrates that horizontally partitioned data reuse can be possible while preserving patients’ and institutions’ privacy. More statistical analyses can easily be included into Emnet a


Computation Graph; Data Reuse; EHR; Health Information System; Health Services Research; Privacy; Secondary Use; Statistical Computing; Secure Multi-party Computation; Secure Summation; Virtual Dataset


[1] Dobrev A, Haesner M, Husing T, Korte BW, Meyer I. Benchmarking IT Use Among General Practitioners in Europe. Bonn,Germany: European Commission; 2008.

[2] Christensen T, Faxvaag A, L\a erum Hallvard, Grimsmo A. Norwegians GPs’ use of electronic patient record systems. International Journal of Medical Informatics 2009;78:808–14.

[3] Selby JV, Krumholz HM, Kuntz RE, Collins FS. Network News: Powering Clinical Research. Sci Transl Med 2013;5:182fs13–182fs13. doi: 10.1126/scitranslmed.3006298.

[4] Friedman CP, Wong AK, Blumenthal D. Achieving a Nationwide Learning Health System. Sci Transl Med 2010;2:57cm29–57cm29.
doi: 10.1126/scitranslmed.3001456.

[5] Tu JV, Willison DJ, Silver FL, Fang J, Richards JA, Laupacis A, et al. Impracticability of informed consent in the Registry of the Canadian Stroke Network. N Engl J Med 2004;350:1414–21. doi: 10.1056/NEJMsa031697.

[6] Young AF, Dobson AJ, Byles JE. Health services research using linked records: who consents and what is the gain? Australian and New Zealand Journal of Public Health 2001;25:417–20. doi: 10.1111/j.1467-842X.2001.tb00284.x.

[7] Bohensky MA, Jolley D, Sundararajan V, Evans S, Pilcher DV, Scott I, et al. Data Linkage: A powerful research tool with potential problems. BMC Health Services Research 2010;10:346. doi: 10.1186/1472-6963-10-346.

[8] Carter K, Shaw C, Hayward M, Blakely T. Understanding the determinants of consent for linkage of administrative health data with a longitudinal survey. Kotuitui: New Zealand Journal of Social Sciences Online 2010;5:53–60.

[9] Norwegian Ministry of Health. ACT 2008 - 06 - 20 no. 44: Act on medical and health research (the Health Research Act). vol. no. 44. 2009.[10] Kho ME, Duffett M, Willison DJ, Cook DJ, Brouwers MC. Written informed consent and selection bias in observational studies using medical records: systematic review. BMJ 2009;338:b866–b866. doi: 10.1136/bmj.b866.

[11] Wu FT. Defining Privacy and Utility in Data Sets. Rochester, NY: Social Science Research Network; 2012.

[12] El Emam K, Mercer J, Moreau K, Grava-Gubins I, Buckeridge D, Jonker E. Physician privacy concerns when disclosing patient data for public health purposes during a pandemic influenza outbreak. BMC Public Health 2011;11:454.

[13] El Emam K, HU J, Mercer J, Peyton L, Kantarcioglu M, Malin B, et al. A secure protocol for protecting the identity of providers when disclosing data for disease surveillance. Journal of the American Medical Informatics Association 2011;18:212–7. doi: 10.1136/amiajnl-2011-000100.

[14] Lindell Y, Pinkas B. Secure multiparty computation for privacy-preserving data mining. Journal of Privacy and Confidentiality 2009;1:5.

[15] Lisa M. Schilling BMK. Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) Technology Infrastructure for a Distributed Data Network. eGEMS 2013;1:Article 11. doi: 10.13063/2327-9214.1027.

[16] Voets D. EHR4CR. Initial EHR4CR architecture and interoperability framework specifications. EHR4CR, 2012.

[17] McMurry AJ, Murphy SN, MacFadden D, Weber G, Simons WW, Orechia J, et al. SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies. PLoS ONE 2013;8:e55811. doi: 10.1371/journal.pone.0055811.

[18] Vogel J, Brown JS, Land T, Platt R, Klompas M. MDPHnet: Secure, Distributed Sharing of Electronic Health Record Data for Public Health Surveillance, Evaluation, and Planning. Am J Public Health 2014;104:2265–70. doi: 10.2105/AJPH.2014.302103.

[19] Kim KK, Browe DK, Logan HC, Holm R, Hack L, Ohno-Machado L. Data governance requirements for distributed clinical research networks: triangulating perspectives of diverse stakeholders. J Am Med Inform Assoc 2014;21:714–9. doi: 10.1136/amiajnl-2013-002308.

[20] Christen P. Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer-Verlag Berlin Heidelberg; 2012.

[21] Yao AC. Protocols for secure computations. Proceedings of the 23rd Annual Symposium on Foundations of Computer Science, Washington, DC, USA: IEEE Computer Society; 1982, p. 160–4. doi: 10.1109/SFCS.1982.88.

[22] Benaloh JC. Secret Sharing Homomorphisms: Keeping Shares of a Secret Secret (Extended Abstract). In: Odlyzko AM, editor. Advances in Cryptology — CRYPTO’ 86, Springer Berlin Heidelberg; 1987, p. 251–60.

[23] Karr AF, Lin X, Sanil AP, Reiter JP. Secure Statistical Analysis of Distributed Databases. In: Wilson AG, Wilson GD, Olwell DH, editors. Statistical Methods in Counterterrorism, Springer New York; 2006, p. 237–61.

[24] Andersen A, Yigzaw KY, Karlsen R. Privacy preserving health data processing. 2014 IEEE 16th International Conference on e-Health Networking, Applications and Services (Healthcom), 2014, p. 225–30. doi: 10.1109/HealthCom.2014.7001845.

[25] Xu F, Zeng S, Luo S, Wang C, Xin Y, Guo Y. Research on Secure Scalar Product Protocol and Its’ Application. 2010 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM), 2010, p. 1–4. doi: 10.1109/WICOM.2010.5601452.

[26] Yao AC. Protocols for Secure Computations. Proceedings of the 23rd Annual Symposium on Foundations of Computer Science, Washington, DC, USA: IEEE Computer Society; 1982, p. 160–4. doi: 10.1109/SFCS.1982.88.

[27] Gentry C. Fully Homomorphic Encryption Using Ideal Lattices. Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, New York, NY, USA: ACM; 2009, p. 169–78. doi: 10.1145/1536414.1536440.

[28] Paillier P. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In: Stern J, editor. Advances in Cryptology — EUROCRYPT ’99, Springer Berlin Heidelberg; 1999, p. 223–38.

[29] Chaum D, Crépeau C, Damgard I. Multiparty Unconditionally Secure Protocols. Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, New York, NY, USA: ACM; 1988, p. 11–9. doi: 10.1145/62212.62214.

[30] Bogdanov D. Sharemind: programmable secure computations with practical applications. Thesis. Tartu University, 2013.

[31] Yigzaw KY, Bellika JG. Evaluation of secure multi-party computation for reuse of distributed electronic health data. 2014 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2014, p. 219–22. doi: 10.1109/BHI.2014.6864343.

[32] Youwen Z, Liusheng H, Wei Y, Xing Y. Efficient Collusion-Resisting Secure Sum Protocol. Chinese Journal of Electronics 2011;20.

[33] Urabe S, Wong J, Kodama E, Takata T. A High Collusion-resistant Approach to Distributed Privacypreserving Data Mining. Proceedings of the 25th Conference on Proceedings of the 25th IASTED International Multi-Conference: Parallel and Distributed Computing and Networks, Anaheim, CA, USA: ACTA Press; 2007, p. 326–31.

[34] Shepard S, Kresman R, Dunning L. Data Mining and Collusion Resistance. Proceedings of World Congress on Engineering 2009 2009;1.

[35] Drosatos G, Efraimidis PS. Privacy-Preserving Statistical Analysis on Ubiquitous Health Data. In: Furnell S, Lambrinoudakis C, Pernul G, editors. Trust, Privacy and Security in Digital Business, Springer Berlin Heidelberg; 2011, p. 24–36.

[36] Karr AF, Karr AF, Lin X, Lin X, Sanil AP, Sanil AP, et al. Secure Regression on Distributed Databases. J Computational and Graphical Statist 2004;14:263–79.

[37] Shuang Wang XJ. EXpectation Propagation LOgistic REgRession (EXPLORER): Distributed Privacy-Preserving Online Model Learning. Journal of Biomedical Informatics 2013. doi: 10.1016/j.jbi.2013.03.008.

[38] Kearns M. Efficient Noise-Tolerant Learning From Statistical Queries. Journal of the ACM, ACM Press; 1993, p. 392–401.

[39] Chu C, Kim SK, Lin Y-A, Yu Y, Bradski G, Ng AY, et al. Map-reduce for machine learning on multicore. Advances in Neural Information Processing Systems 2007;19:281.

[40] Das S, Sismanis Y, Beyer KS, Gemulla R, Haas PJ, McPherson J. Ricardo: integrating R and Hadoop. Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, New York, NY, USA: ACM; 2010, p. 987–98. doi: 10.1145/1807167.1807275.

[41] Duan Y. P4P: A Practical Framework for Privacy-Preserving Distributed Computation. PhD thesis. University of California, 2007.

[42] Bellika JG, Henriksen TS, Yigzaw KY. The Snow system - a decentralized medical data processing system. In: Llatas CF, García-Gómez JM, editors. Data Mining in Clinical Medicine, Springer; 2014.

[43] Saint-Andre P, Smith K, Tronçon R. XMPP: The Definitive Guide. O’Reilly Media, Inc.; 2009.

[44] Yigzaw KY, Bellika JG, Andersen A, Hartvigsen G, Fernandez-Llatas C. Towards Privacy-preserving Computing on Distributed Electronic Health Record Data. Proceedings of the 2013 Middleware Doctoral Symposium, New York, NY, USA: ACM; 2013, p. 4:1–4:6. doi: 10.1145/2541534.2541593.

[45] Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2012. doi: 10.1136/amiajnl-2011-000681.

[46] Institute of Medicine (US) Committee on Health Research and the Privacy of Health Information: The HIPAA Privacy Rule. Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health Through Research. Washington (DC): National Academies Press (US); 2009.

[47] Hopf YM, Bond C, Francis J, Haughney J, Helms PJ. Views of healthcare professionals to linkage of routinely collected healthcare data: a systematic literature review. J Am Med Inform Assoc 2014;21:e6–10. doi: 10.1136/amiajnl-2012-001575.

[48] Hopf YM, Bond C, Francis J, Haughney J, Helms PJ. “The more you link, the more you risk …” – a focus group study exploring views about data linkage for pharmacovigilance. Br J Clin Pharmacol 2014;78:1143–50. doi: 10.1111/bcp.12445.

Citeringar i Crossref