Combining data mining and text mining for detection of early stage dementia: the SAMS framework

Christopher Bull
School of Computing and Communications, Lancaster University, UK

Dommy Asfiandy
School of Computer Science, University of Manchester, UK

Ann Gledson
School of Computer Science, University of Manchester, UK

Joseph Mellor
School of Computer Science, University of Manchester, UK

Samuel Couth
Institute of Brain, Behaviour and Mental Health, University of Manchester, UK

Gemma Stringer
Institute of Brain, Behaviour and Mental Health, University of Manchester, UK

Paul Rayson
School of Computing and Communications, Lancaster University, UK

Alistair Sutcliffe
School of Computing and Communications, Lancaster University, UK

John Keane
School of Computer Science, University of Manchester, UK

Xiaojun Zeng
School of Computer Science, University of Manchester, UK

Alistair Burns
Institute of Brain, Behaviour and Mental Health, University of Manchester, UK

Iracema Leroi
Institute of Brain, Behaviour and Mental Health, University of Manchester, UK

Clive Ballard
Wolfson Centre for Age-Related Diseases, King’s College London, UK

Pete Sawyer
School of Computing and Communications, Lancaster University, UK

Ladda ner artikel

Ingår i: Proceedings of LREC 2016 Workshop. Resources and Processing of Linguistic and Extra-Linguistic Data from People with Various Forms of Cognitive/Psychiatric Impairments (RaPID-2016), Monday 23rd of May 2016

Linköping Electronic Conference Proceedings 128:6, s. 35 to 40

Visa mer +

Publicerad: 2016-06-03

ISBN: 978-91-7685-730-4

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In this paper, we describe the open-source SAMS framework whose novelty lies in bringing together both data collection (keystrokes, mouse movements, application pathways) and text collection (email, documents, diaries) and analysis methodologies. The aim of SAMS is to provide a non-invasive method for large scale collection, secure storage, retrieval and analysis of an individual’s computer usage for the detection of cognitive decline, and to infer whether this decline is consistent with the early stages of dementia. The framework will allow evaluation and study by medical professionals in which data and textual features can be linked to deficits in cognitive domains that are characteristic of dementia. Having described requirements gathering and ethical concerns in previous papers, here we focus on the implementation of the data and text collection components.


dementia, corpus linguistics, natural language processing, data mining


Baron, A. and Rayson, P. (2008). VARD2: a tool for dealing with spelling variation in historical corpora. Postgraduate Conference in Corpus Linguistics, Aston University, Birmingham, UK.

Brown, C., Snodgrass, T., Kemper, S. J., Herman, R., and Covington, M. A. (2008). Automatic measurement of propositional idea density from part-of-speech tagging. Behavior Research Methods, 40(2):540–545.

Demiriz, A. (2002). webSPADE: a parallel sequence mining algorithm to analyze web log data. In Proceedings of the International Conference on Data Mining (ICDM ’02), pages 755–758. IEEE.

Fraser, K. C., Meltzer, J., and Rudzicz, F. (2015). Linguistic features identify Alzheimer’s Disease in narrative speech. Journal of Alzheimer’s Disease, 49(2):407–422.

Garrard, P., Maloney, L. M., Hodges, J. R., and Patterson, K. (2005). The effects of very early Alzheimer’s disease on the characteristics of writing by a renowned author. Brain, 128(2):250–260.

Hirst, G. and Feng, V. W. (2012). Changes in style in authors with Alzheimer’s disease. English Studies, 93(3):357–370.

Jarrold, W. L., Peintner, B., Yeh, E., Krasnow, R., Javitz, H. S., and Swan, G. E., (2010). Brain Informatics: International Conference, BI 2010, Toronto, ON, Canada, August 28-30, 2010. Proceedings, chapter Language Analytics for Assessing Brain Health: Cognitive Impairment, Depression and Pre-symptomatic Alzheimer’s Disease, pages 299–307. Springer Berlin Heidelberg, Berlin, Heidelberg.

Jarrold, W., Peintner, B., Wilkins, D., Vergryi, D., Richey, C., Gorno-Tempini, M. L., and Ogar, J. (2014). Aided diagnosis of dementia type through computer-based analysis of spontaneous speech. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pages 27–37, Baltimore, Maryland, USA, June. Association for Computational Linguistics.

Jimison, H., Jessey, N., McKanna, J., Zitzelberger, T., and Kaye, J. (2006). Monitoring computer interactions to detect early cognitive impairment in elders. In 1st Transdisciplinary Conference on Distributed Diagnosis and Home Healthcare, 2006. D2H2., pages 75–78. IEEE.

Kemper, S., Greiner, L. H., Marquis, J. G., Prenovost, K., and Mitzner, T. L. (2001). Language decline across the life span: Findings from the Nun Study. Psychology and Aging, 16(2):227–239.

Knapp, M., Prince, M., Albanese, E., Banerjee, S., Dhanasiri, S., Fernandez, J., Ferri, C., Snell, T., and Stewart, R. (2007). Dementia UK: report to the Alzheimer’s Society. King’s College London and London School of Economics and Political Science.

Le, X., Lancashire, I., Hirst, G., and Jokel, R. (2011). Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three British novelists. Literary and Linguistic Computing, 26(4):435–461.

Nasreddine, Z. S., Phillips, N. A., Bedirian, V., Charbonneau, S., Whitehead, V., Collin, I., Cummings, J. L., and Chertkow, H. (2005). The Montreal Cognitive Assessment, MoCA: A brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society, 53(4):695–699.

Ohsaki, M. and Sato, Y. (2002). A rule discovery support system for sequential medical data, in the case study of a chronic hepatitis dataset. In Proceedings of the International Workshop on Active Mining (AM ’02) in International Conference on Data Mining (ICDM ’02), pages 97–102. IEEE.

Orimaye, S. O., Tai, K. Y., Wong, J. S., and Wong, C. P. (2015). Learning linguistic biomarkers for predicting mild cognitive impairment using compound skip-grams. In Proceedings of the 2015 NIPS Workshop on Machine Learning in Healthcare (MLHC), Montreal, Canada.

Pachidi, S., Spruit, M., and Van De Weerd, I. (2014). Understanding users’ behavior with software operation data mining. Computers in Human Behavior, 30:583–594.

Rayson, P. (2008). From key words to key semantic domains. International Journal of Corpus Linguistics, 13(4):519–549.

Sawyer, P., Sutcliffe, A., Rayson, P., and Bull, C. (2015). Dementia and social sustainability: challenges for software engineering. In 37th International Conference on Software Engineering (ICSE ’15), Florence, Italy. IEEE.

Seelye, A., Hagler, S., Mattek, N., Howieson, D. B., Wild, K., Dodge, H. H., and Kaye, J. A. (2015). Computer mouse movement patterns: A potential marker of mild cognitive impairment. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, 1(4):472–480.

Shin, A. M., Lee, I. H., Lee, G. H., Park, H. J., Park, H. S., Yoon, K. I., Lee, J. J., and Kim, Y. N. (2010). Diagnostic Analysis of Patients with Essential Hypertension Using Association Rule Mining. Healthcare Informatics Research, 16(2):77–81.

Snowdon, D. A., Kemper, S. J., Mortimer, J. A., Greiner, L. H., Wekstein, D. R., and Markesbery, W. R. (1996). Linguistic ability in early life and cognitive function and Alzheimer’s disease in late life: Findings from the Nun Study. JAMA, 275(7):528–532.

Stringer, G., Sawyer, P., Sutcliffe, A., and Leroi, I. (2015). From Click to Cognition. In Davide Bruno, editor, The Preservation of Memory, chapter 5, pages 93–103. Psychology Press.

Sutcliffe, A., Rayson, P., Bull, C., and Sawyer, P. (2014). Discovering Affect-Laden Requirements to Achieve System Acceptance. In Proceedings of the 22nd IEEE International Requirements Engineering Conference (RE’14), pages 173–182, Karlskrona, Sweden. IEEE.

Tausczik, Y. R. and Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1):24–54.

Citeringar i Crossref