More than 10 million Americans misused prescription opioids in 2019, and nearly 75 percent of drug overdose deaths in 2020 involved an opioid. According to the United States Centers for Disease Control and Prevention, overdose deaths involving opioids, including prescription opioids, heroin and synthetic opioids such as fentanyl, have increased eightfold since 1999.
As scientists and the health care community search for effective ways to mitigate the opioid epidemic, rapid advances in machine learning are promising. Access to data and machine learning frameworks has led to the development of machine learning models that use health care data to deal with different facets of the opioid crisis. For example, health care databases can assist researchers and clinicians to identify patients at risk by leveraging various data and information.
But are these machine learning models built on health care data reliable at predicting opioid use disorder? That’s what researchers from Florida Atlantic University’s College of Engineering and Computer Science wanted to explore. As such, they examined peer-reviewed journal papers and conducted the first systematic review analyzing not only the technical aspects of machine learning applied to predicting opioid use, but also the published results.
Their goal was to determine if these machine learning methods are useful and, more importantly, reproducible. For the study, they reviewed 16 peer-reviewed journal papers that used machine learning models to predict opioid use disorder and investigated how the papers trained and evaluated these models.
Findings, published in the journal Computer Methods and Programs in Biomedicine , reveal that while results from the reviewed papers show machine learning models applied to opioid use disorder prediction may be useful, there are important ways to improve transparency and reproducibility of these models, which will ultimately enhance their use for research.
For the systematic review, researchers searched Google Scholar, Semantic Scholar, PubMed, IEEE Xplore and Science.gov. They extracted data that included the study's goal, dataset used, cohort selected, types of machine learning models created, model evaluation metrics, and the details of the machine learning tools and techniques used to create the models.
Findings showed that of these 16 papers, three created their dataset, five used a publicly available dataset and the remaining eight used a private dataset. Cohort size ranged from the low hundreds to more than half a million. Six papers used one type of machine learning model, and the remaining 10 used up to five different machine learning models. Most papers did not sufficiently describe the machine learning techniques and tools used to produce their results. Only three papers published their source code.
“The reproducibility of papers using machine learning for health care applications can be improved upon,” said Oge Marques, Ph.D., co-author and a professor in FAU’s Department of Electrical Engineering and Computer Science. “For example, even though health care datasets can be hindered by privacy laws and ethical considerations, researchers should follow machine learning best practices. Ideally, the code should be publicly available.”
The researchers’ recommendations are threefold: use the area under the precision/recall curve (AUPRC), a metric more useful in cases of imbalanced datasets when the negative class is more prevalent and there is low value in true-negative predictions; and avoid non-interpretable models (also known as “black-box models”) in this critical health care area, and favor using interpretable models whenever possible. If that is not possible and a non-interpretable model must be deployed to predict opioid use disorder, they recommend defining the reasons that justify its use. Finally, to ensure transparency and reproducibility of results, the researchers recommend the adoption of checklists and other documentation practices before submitting machine-learning-based studies for review and publication. Better documented and publicly available studies will help the research community advance the field.
The researchers note that the lack of good machine learning reproducibility practices in the papers makes it impossible to verify their claims. For example, the evidence presented may fall short of the accepted standard, or the claim only holds in a narrower set of circumstances than asserted.
“Journal papers would be more valuable to the research community and their suggested application if they follow good practices of machine learning reproducibility in order for their claims to be verified and used as a solid base for future work,” said Marques. “Our study recommends a minimum set of practices to be followed before accepting machine-learning-based studies for publication.”
Study co-authors are Christian Garbin, first author and a Ph.D. candidate, and Nicholas Marques, an M.S. student in data science and analytics and a National Science Foundation Research Traineeship Program scholar, both within the College of Engineering and Computer Science.
“Opioid use disorder is a public health concern of the first magnitude in the United States and elsewhere,” said Stella Batalama, Ph.D., dean, FAU College of Engineering and Computer Science. “Harnessing the power and potential of machine learning to predict and prevent one’s risk of opioid use disorder holds great promise. However, to be effective, machine learning methods must be reliable and reproducible. This systematic review by our researchers provides important recommendations on how to accomplish that.”