Using machine learning, researchers from Stanford Medicine and their collaborators found specific genetic signals in people who develop severe coronavirus infection.
Since the start of the coronavirus pandemic, scientists and clinicians have struggled to understand why some people with the infection become seriously ill or die while others have few, if any, symptoms. Age, body mass index and pre-existing health problems account for some of the disparities, but genetics is known to play a significant role.
Now, researchers from Stanford Medicine and the University of Sheffield in the U.K. have identified more than 1,000 genes linked to the development of severe COVID-19 cases that required breathing support or were fatal. The team was also able to identify specific types of cells in which those genes act up. It’s one of few studies to link coronavirus-associated genes to specific biological functions.
The researchers used a machine learning tool named RefMap, which can find patterns in vast amounts of data, to help identify the genetic basis for complex and poorly understood diseases.
“We mapped the genetic architecture of coronavirus infections and found that these 1,000 genes account for 77% of the drivers of severe COVID-19,” explained Michael Snyder, PhD, professor and chair of genetics.
A paper describing the research published online June 14 in Cell Systems. Snyder, the Stanford W. Ascherman, MD, FACS, Professor in Genetics, and professor of medicine Philip Tsao, PhD, are co-senior authors. Genetics instructor Sai Zhang, PhD, and neuroscientist Jonathan Cooper-Knock, PhD, a Stanford visiting scholar and lecturer at the University of Sheffield, share lead authorship.
The researchers used two large data sets to unpack the genetics behind severe COVID-19. The first data set contained genomic information from healthy human lung tissue. The data helped identify gene expression in 19 different types of lung cells, including epithelial cells that line the respiratory tract and are the first defense against infection. (Gene expression is the process by which certain genes are switched on to make RNA and proteins.)
Other data came from the COVID-19 Host Genetics Initiative, one of the largest genome-wide studies of critically ill coronavirus patients. The researchers looked for genetic clues in the data — DNA mutations, called single nucleotide polymorphisms — that might indicate if someone is at a higher risk for severe COVID-19. They tracked whether some mutations occurred more or less often in COVID-19 patients with severe disease.
Mutations that continued to appear, or were notably absent, in the patients who developed severe COVID-19 suggested those variations might be behind the infection’s severity.
To verify whether the suspicious mutations might in fact increase odds for severe COVID-19 infection, the researchers performed a genome-wide search in lung tissue for the mutations from patients critically ill with COVID-19 and from healthy people.
“We did this for the 19 lung cell types,” Zhang said. Although it was clear which mutations were most likely to convey risk for severe disease, the researchers still didn’t know which genes were affected by the mutations. So the team worked backward, using molecular clues to decipher the region of the genome in which the mutation occurred and, finally, narrow the region down to specific genes. “Then we had our final gene list associated with COVID-19 severity.”
“When you’re studying the genetic basis of disease, you’re trying to pinpoint regions in the genome that are responsible,” Snyder explained. “If you know where to fish — all the hot spots, in this case, the active genomic regions in lungs — you have a much better chance of catching more fish than if you’re searching the whole ocean.”
The researchers also wanted to know which types of cells harbored faulty gene expression. Through their machine learning tool, they determined that severe COVID-19 is largely associated with a weakened response from two well-known immune cells — natural killer (NK) cells and T cells. “NK cells and a subtype called CD56bright are the most important,” Cooper-Knock said. “T cells rank second.”
NK cells, which you’re born with and are the body’s first line of defense against infection, are known for their ability to destroy viruses and cancer cells. NK cells also help produce a range of immune system proteins called cytokines, Cooper-Knock said. One cytokine, interferon gamma, is a key activator of immune cells. Acting in concert with interferon gamma, NK cells mount an immediate and coordinated defense against viral infections.
“CD56bright cells are like the general directing the war. They mobilize other immune cells, telling them where to go and what to do. We found that in people with severe coronavirus infection, critical genes in NK cells are expressed less, so there’s a less robust immune response. The cells aren’t doing what they’re supposed to do,” Cooper-Knock explained.
Snyder likened COVID-19 risk genes to harmful variants of the BRCA genes that predispose some people to breast and ovarian cancer.
“Our findings lay the foundation for a genetic test that can predict who is born with an increased risk for severe COVID-19,” he said. “Imagine there are 1,000 changes in DNA linked to severe COVID-19. If you have 585 of these changes, that might make you pretty susceptible, and you’d want to take all the necessary precautions.”
Cooper-Knock noted that drugs that kickstart sluggish NK cells are already used to treat some types of cancer.
“The drugs bind to receptors on the NK cells and trigger them to have a more robust response,” he said. Trials of NK cell infusions for severe COVID-19 are underway.