This case study was originally published in 2019. 

The Opioid Epidemic

In the last 15 years, deaths due to opioids (both prescription and heroin) have quadrupled in the United States. Opioid misuse amounts to billions of dollars in healthcare costs and lost wages every year. Several state governors have declared a public health emergency due to the devastating impact opioid abuse is having on communities.

Findings from data can be instrumental in creating an intervention policy that seeks to better inform healthcare practitioners of potential risk factors and warning signs of opioid abuse, thereby enabling targeted prevention efforts of individuals deemed at greatest risk. To offer a proactive examination and assess patient risk for opioid abuse and opioid-related death, a State Health Agency tasked WWT with analyzing protected personal information.

The Agency made available to our big data team five years of hospital discharge, birth, death, state trauma registry and emergency medical services records in a secure environment within the State's health services headquarters. Data was encrypted using unique identifiers and transferred into our analytics environment, where it was mounted on a Hadoop Distributed File System (HDFS) and loaded into a Hive SQL database to efficiently join data sources.

Data visualization

Before employing predictive modeling, we created Tableau dashboards to visualize salient aspects of the data. Dashboards allowed us to identify demographic and geographic patterns of opioid-related hospital encounters, as well as potential predictors of opioid abuse.

Multivariate graphical analysis demonstrated that age and to a lesser degree, income, are the most telling demographic considerations. Middle-aged, lower-than-median household income population groups had the highest rates of opioid-related hospital encounters as well as opioid-related deaths.

Additionally, plotting opioid-related incidences on a map of the state, segmented by zip code and county, revealed a pronounced geographic correlation of opioid-related hospitalizations, with several zip codes containing high rates of opioid abuse compared to the state overall.

These visual findings informed how we would analyze specific patient information to predict the risk of opioid abuse.

Opioid risk modeling

To develop a model that predicted the risk of a patient abusing opioids or dying from future use, we created a training dataset of each patient's five-year hospital history condensed into a single record. Variables were engineered from diagnosis and procedure codes within hospital records to enhance performance of the model.

From there, we used a machine learning algorithm called gradient-boosted decision tree classification to identify aspects of patient medical history that are predictive of opioid abuse and opioid-related death.

Opiod abuse model

Opioid abuse model

Model gain refers to the relative predictive value of each variable used in the model. Training the model using the selected variables gives us an indication of which variables are the most important for making accurate predictions. The higher the number, the more important the variable. For example, if we removed "age," our predictions would take a significant accuracy hit. However, if we removed "recent mother," we likely wouldn't see much of a difference in the model's performance.

Information Value (IV) is a statistical measure we use before the model is trained to get an idea of which variables might be important for predicting our target (opioid abuse). An IV of greater-than-or-equal to 0.1 means a variable is a good predictor. We use this test to narrow down a large list of possible variables before training a predictive model.

Opioid abuse predictors

As seen in the visual above, ten patient attributes were identified as predictors in the opioid abuse model. We identified the top three as patient age, history of non-opioid drug abuse and zip code, with the number of hospital discharges and diagnoses of anxiety also showing a large statistical correlation. Findings from the report included the following statistics:

  • Individuals between 44 and 60 are 71 percent more likely to be future opioid abusers.
  • Patients who have at least one non-opioid drug related hospitalization in the past five years are 176 percent more likely to abuse opioids in the future. Among individuals identified as opioid abusers, 35 percent have been hospitalized in the past five years due to non-opioid drug abuse.
  • The risk of opioid abuse and related death increases for patients with multiple hospital discharges, for any reason.
  • Individuals who have been discharged from a hospital more than three times in the past five years are 118 percent more likely to be admitted to a hospital due to opioid-related events.
  • Patients with more than seven prior discharges are at 212 percent greater risk.
  • Patients with a history of unspecified anxiety in the past five years are 189 percent more likely to be hospitalized for opioid use. Among identified opioid abusers, 26 percent have a history of unspecified anxiety.

Recommendations for hospital screening practices

Findings from our big data model can be used by the State Health Agency to identify attributes to look for when patients present at a hospital. Opportunities for opioid prevention and intervention should be noted for patients between 44 and 60 years old, have a history of non-opioid drug related hospitalizations, have had more than three hospital discharges or have been diagnosed with unspecified anxiety in the past five years.

A patient who meets all the above criteria is five times more likely to abuse opioids in the future compared to the overall adult population.

By identifying patients who are at higher risk of abusing opioids in the future, hospitals can provide early assistance and treatment to prevent opioid addictions from impacting more lives.