Psychology and Machine Learning

Machine learning is quite the buzzword of 2017, as a subfield of artificial intelligence (another overused term), it has been around for decades. I remember being introduced to it by an enthusiastic professor in 2008 that told us how it was revolutionizing image analysis and the study of visual perception.

The boom of machine learning, which started when we gained the ability to work with big volumes of data, has been fueled by its versatility. Countless fields of work; financial services, fraud detection, logistics, medical diagnosis, natural language processing, marketing, and sales, are already benefiting from analyzing data through machine learning techniques. We are now starting to see the first applications of Machine Learning to Psychology problems.

Suicide prevention and machine learning

Earlier this year, a group of researchers from Florida State and Vanderbilt universities presented a study wherein a prediction model was developed to accurately identify the risk of attempting suicide in general and psychiatric patient population. [1]

The model can predict the risk of a suicide attempt with an accuracy of 80 percent (two years prior) and 84 percent (one week prior). In the general patient population, the accuracy is slightly higher. The false negatives are also lower than usual (from 1.2 to 3.5 percent). It is important to note that little progress had been made in the study of suicide prediction after decades of research.

The model

Since the study has not yet been published, I will quote an article written by Paul Govern on the Vanderbilt website [2] detailing the development of the model:

“[Researchers] started with de-identified records of adult patients seen at Vanderbilt from 1998 to 2005. They found 5,167 patients with billing codes indicating self-injury. A pair of clinical experts undertook separate reviews of this set, finding 3,250 cases, that is, 3,250 patients with a history of attempted suicide, and 1,917 controls, or patients with a history of nonsuicidal self-injury.

The de-identified records were pared down to demographics, diagnoses (…), socioeconomic status (…), health care utilization and medication information. To find predictors within these data, a machine learning technique called “random decision forests” shuffled this set of records repeatedly, each time building a “decision tree” upon comparing the shuffled set to the expert-ordered set’s strict segregation of cases and controls.

After thousands of shuffles, the algorithm became expert at predicting whether a randomly selected record from the training set was a case or a control. Finally, with a method called bootstrapping, the team used their training set to synthesize new data sets with which to measure the performance of their predictive models.

The second round of testing was set in the general patient population, using an additional 13,000 de-identified records as controls.”

As always, these results need to be taken carefully. The model identifies a combination of factors in the electronic records that could most accurately predict a future suicide attempt. Researchers are now waiting to see how useful the model is, once it is put to the test. The idea is to use it similarly as physicians use a cardiovascular risk score.


[1] Walsh, C.G., Ribeiro, J.D., & Franklin, J.C. (in press). Predicting risk of suicide attempts over time through machine learning. Clinical Psychological Science.

[2] Investigators use machine learning to predict suicide risk by Paul Govern




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s