Part X – Machine Learning
Series Introduction – Appriss Health has long held that chronic pain and addiction (substance use disorder) are separate medical conditions and that each deserve a unique clinical approach. We have spent years developing clinical decision support tools and in doing so have obtained the feedback from countless medical professionals, including many pain providers. Through those providers we have also heard (directly and indirectly) from many chronic pain patients as well.
To continue this conversation, over a long series of blog posts, we will explore our approach to benefit/risk assessment, data presentation, and clinical support…and how they relate to our goal of creating a usable and balanced, clinical viewpoint that protects access to care while highlighting areas of risk that clinicians and patients should be aware of.
How Machine Learning Works.
Machine learning can mean a lot of different things to a lot of different people. Some of you may be a little bit anxious about the concept of computers and machine learning because of the vast potential of it, say, as an example, with the concept of a computer driving (or flying?) your car for you… In the context of this series, machine learning is an analytical tool or method that helps us make better and more accurate predictions about certain outcomes of interest. I’d like to walk through (at a high level) how the process works.
Machine learning typically starts with a question. For instance, “How can I best predict the risk of overdose by using PDMP data?” Once the question has been formed the next step for machine learning is to obtain a large set of data for the machine to learn on. In this case, imagine that we have a set of de-identified PDMP data on 1,000,000 patients for whom we know whether or not each person has overdosed. As an example, we might find that in this group of patients 2,500 had suffered an overdose and 997,500 did not.
As an aside, here, I’d like to quickly explain what de-identified data is because this is important.
De-identification of data involves the removal of all data that could individually identify a patient, or provider, or pharmacy etc. This generally involves replacing a name (and/or address, and/or sex, and/or age, and or/race, etc.) with a unique and large set of characters. As an example of de-identified data, “Jane Doe” might be replaced with “123094q;kjndspoifuaw98yerh;wjdhf982r” for our machine learning purposes. De-identification is necessary, and usually required, to comply with HIPAA, and at the same time be able to use healthcare data for research.
Back to my example… Once we have the PDMP data we typically help the machines a bit by developing hundreds (or thousands) of variables that we think might be helpful. As an example, we might give the machine data points for each patient like the total MME in the last 60 days, or the rate of change of MME in the last 60 days (increasing or decreasing doses). We might give it other types of data that involve distances between the doctor and the pharmacist and the patient’s home. As a final example, we might also give the machine the number of times that opioids and benzodiazepines overlapped.
After we have gathered the data and developed a large set of variables we then typically break the data into what is called a training set and a test set. For example, we may take all of the data on the 1,000,000 de-identified patients and randomly select 750,000 patients for training, and set the other 250,000 patients aside to test against later on. If we’re being really sophisticated (and this word accurately describes Appriss Health’s data science team to a T), we might do this several times to end up with many training and test data sets from the original set of data.
Once we have a training set of data available, we tell the machine which of the patients overdosed and we let it start looking at the data to try to find patterns that are more frequently associated with the patients that suffered an overdose as compared with those who didn’t. We’re all probably used to high speed internet and seeing things happen in the blink of an eye – but these machine learning algorithms can take days of constant computer time to run.
I just added in the term “learning” in the sentence above and that’s where the power of the machines comes in. The machine basically finds variables which are most related to an overdose and then continues that process looking at all the variables until adding another variable does not improve the prediction. For example, the machine may learn that people who have three or more overlapping opioid prescriptions are more likely to overdose than people with only one prescription. It may further learn that if the dosage on those prescriptions is higher, it compounds the risk even further, and that people with three overlapping from the same doctor are far less risky than those from three different doctors, and so on. Eventually, the machine will not gain any accuracy by adding one more variable and the learning will stop.
Now, we have to see if the machine learned a formula that can be used on other data sets. That’s where our test set comes in. We can now apply the formula to the test data set (that the machine never saw). NOT tell the machine who overdosed and who didn’t and see how well the formula with which it came up worked at identifying those who did overdose AND those who didn’t.
And we do this – over and over and over again – until we’re convinced we have the best formula. And at that point we can identify what variables and specific formula the machine created from its extensive learning.
Sometimes we’re intrigued at what we find. For instance, when we did this on real data, we found that high doses of opioids were found to be associated with overdose BUT that factor carried less weight than one may assume. We did find that the machines picked the number of pharmacies used, the use of sedatives and opioids, and changes in dose as being very important. This is one reason why well-managed chronic pain patients tend to have a lower overdose risk score. Those patients are often on a stable dose of opiods with a limited number of overlapping providers and the use of just a few pharmacies, and the machines determined that those types of variables were not strongly associated with the risk of overdose.
Lastly, I want to finish with another important concept.
The algorithm (or formula) that the machine comes up with is best thought of as a probability, or perhaps a percentage. Applied to our Overdose Risk Score, what that means is that our algorithm actually produces the probability (or chance) that a person will have an overdose. We further convert that probability to an easy to read score from 0 to 1000. If we look at 100 patients with an Overdose Risk Score above 900 and compare that group to 100 patients with a score less than 200, we should expect that a greater number of patients in the > 900 group will overdose as compared with the < 200 group — and that is the case. In fact, the rate of overdose is about 300 times greater in the > 900 group. HOWEVER, we cannot predict exactly who in the > 900 group will overdose and who will not, only that they all have a higher risk. Hidden in this statement is the fact that some people in the < 200 group will also have an overdose. The difference is the rate (or percentage) in each group who have an overdose.
So although a machine learning process as described above results in an optimal formula for predicting overdose, one can never look at an Overdose Risk Score and tell someone with certainty whether they will have an overdose or not. I wish it could be that simple… but it is not.
This ends the “data science-y” part of our blog series. I hope that any reader of the full series will appreciate that opioid risk in the setting of chronic pain is an exceedingly complex issue. There are no easy answers… And, there is no certainty. All we can do is the best we can.
To read more on this series:
Part I – Benefit and Risk
Part II – Risk Defined
Part III – Dependence and Addiction
Part IV – PDMPs
Part V – Multiple Provider Episodes
Part VI – PDMP Visualization
Part VII – Interpreting PDMP Visualizations
Part VIII – Scoring PDMP data
Part IX – Understanding How PDMP Data is Scored