A new study from the University of Glasgow used artificial intelligence (AI) to viral genomes. Most emerging infectious diseases affecting humans (like COVID-19) are zoonotic – caused by viruses originating from other animal species. Identifying high-risk viruses earlier can improve research and surveillance priorities.
A study publishing in PLOS Biology by Nardus Mollentze, Simon Babayan, and Daniel Streicker at University of Glasgow, United Kingdom, suggests that artificial intelligence using viral genomes may predict the likelihood that any animal-infecting virus will infect humans, given biologically relevant exposure.
Identifying zoonotic diseases prior to emergence is a major challenge because only a small minority of the estimated 1.67 million animal viruses are able to infect humans. To develop machine learning models using viral genome sequences, the researchers first compiled a dataset of 861 virus species from 36 families.
They then built machine learning models, which assigned a probability of human infection based on patterns in virus genomes. The authors then applied the best-performing model to analyze patterns in the predicted zoonotic potential of additional virus genomes sampled from a range of species.
The researchers found that viral genomes may have generalizable features that are independent of virus taxonomic relationships and may preadapt viruses to infect humans. They were able to develop machine learning models capable of identifying candidate zoonoses using viral genomes.
These models have limitations, as computer models are only a preliminary step to identifying zoonotic viruses with potential to infect humans. Viruses flagged by the models will require confirmatory laboratory testing before pursuing major additional research investments.
Further, while these models predict whether viruses might be able to infect humans, the ability to infect is just one part of broader zoonotic risk, which is also influenced by the virus’ virulence in humans, ability to transmit between humans, and the ecological conditions at the time of human exposure.