New research defines characteristics of long COVID

May 17, 2022

Research from the National Institutes of Health (NIH) has identified characteristics of people with long COVID and those likely to have it, according to a news release.

The scientists used machine learning techniques, analyzed an unprecedented collection of electronic health records (EHRs) available for COVID-19 research to better identify who has long COVID. Exploring de-identified EHR data in the National COVID Cohort Collaborative (N3C), a national, centralized public database led by NIH’s National Center for Advancing Translational Sciences (NCATS), the team used the data to find more than 100,000 likely long COVID cases as of October 2021 (as of May 2022, the count is more than 200,000). The findings appear in The Lancet Digital Health.

Long COVID is marked by wide-ranging symptoms, including shortness of breath, fatigue, fever, headaches, “brain fog,” and other neurological problems. Such symptoms can last for many months or longer after an initial COVID-19 diagnosis. One reason long COVID is difficult to identify is that many of its symptoms are similar to those of other diseases and conditions. A better characterization of long COVID could lead to improved diagnoses and new therapeutic approaches.

“It made sense to take advantage of modern data analysis tools and a unique big data resource like N3C, where many features of long COVID can be represented,” said co-author Emily Pfaff, PhD, a clinical informaticist at the University of North Carolina at Chapel Hill.

The N3C data enclave currently includes information representing more than 13 million people nationwide, including nearly 5 million COVID-19-positive cases. The resource enables rapid research on emerging questions about COVID-19 vaccines, therapies, risk factors and health outcomes.

The new research is part of a related, larger trans-NIH initiative, Researching COVID to Enhance Recovery (RECOVER), which aims to improve the understanding of the long-term effects of COVID-19, called post-acute sequelae of SARS-CoV-2 infection (PASC). RECOVER will accurately identify people with PASC and develop approaches for its prevention and treatment. The program also will answer critical research questions about the long-term effects of COVID through clinical trials, longitudinal observational studies, and more.

In the Lancet study, Pfaff, Melissa Haendel, PhD, at the University of Colorado Anschutz Medical Campus, and their colleagues examined patient demographics, healthcare use, diagnoses and medications in the health records of 97,995 adult COVID-19 patients in the N3C. They used this information, along with data on nearly 600 long COVID patients from three long COVID clinics, to create three machine learning models to identify long COVID patients.

In machine learning, scientists “train” computational methods to rapidly sift through large amounts of data to reveal new insights — in this case, about long COVID. The models looked for patterns in the data that could help researchers both understand patient characteristics and better identify individuals with the condition.

The models focused on identifying potential long COVID patients among three groups in the N3C database: All COVID-19 patients, patients hospitalized with COVID-19, and patients who had COVID-19 but were not hospitalized. The models proved to be accurate, as people identified as at risk for long COVID were similar to patients seen at long COVID clinics. The machine learning systems classified approximately 100,000 patients in the N3C database whose profiles were close matches to those with long COVID.

The models searched for common features, including new medications, doctor visits and new symptoms, in patients with a positive COVID diagnosis who were at least 90 days out from their acute infection. The models identified patients as having long COVID if they went to a long COVID clinic or demonstrated long COVID symptoms and likely had the condition but hadn’t been diagnosed.

NIH release 

More on COVID