Although the integration of big data into health care has been increasing in recent years, the coronavirus disease 2019 (COVID-19) pandemic has brought this practice to the forefront, as many hospitals and care facilities struggle to keep accurate and timely records of COVID-19 cases while adhering to changing procedures implemented at the federal level.
“The use of big data has been woefully absent in our nation’s response to the COVID-19 pandemic, and in public health and health policy planning more generally, despite billions spent by the US government since the passage of the Health Information Technology for Economic and Clinical Health (HITECH) Act in 2009,” Dennis P. Scanlon, PhD, MA, and Mark B. Stephens, MD, MS, wrote in the June issue of The American Journal of Managed Care®.
Comparing health care workers’ fight against COVID-19 to a military operation, the authors argued doctors, local officials, and others are fighting the battle without the necessary central intelligence, which could be gleaned from various types of big data analyses.
“Although not a cure, these data can inform prevention and treatment strategies, patient risk segmentation, and approaches to concepts such as social distancing to better account for portions of the population who are at particularly high risk,” they wrote.
One data point, International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) codes, are uniformly used throughout health care treatment facilities as a way to efficiently reference symptoms and document clinical concepts.
However, some codes lack accuracy for the intended condition, creating roadblocks when it comes to integrating and comparing electronic medical record (EMR) data. In particular, previous research has shown ICD-10 codes do not accurately reflect clinical diagnoses and concepts such as atrial fibrillation, stroke, and acute kidney injury.
To determine whether ICD-10 codes accurately capture presenting symptoms of fever, cough, and dyspnea among patients being tested for COVID-19, the researchers conducted an EMR review of over 2000 patients.
Findings published in JAMA Network Open show the codes performed poorly in capturing COVID-19-related symptoms and highlight the critical need for meticulous data validation to feed multicenter registries built from EMRs.
Health care organizations need rapid access to high-quality, multicenter data to support scientific discovery during the COVID-19 pandemic, the authors wrote. “EMR data could be repurposed to populate COVID-19 registries and surveillance systems.”
In this retrospective cohort study, investigators analyzed ICD-10 codes of 2201 patients who underwent quantitative reverse transcriptase-polymerase chain reaction testing for COVID-19 at the University of Utah Health between March 10 and April 6, 2020.
The researchers compared the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of codes for fever (R50), cough (R05) and dyspnea (R06.0) with manual medical record reviews. Performance was also stratified by COVID-19 test result, sex, age group (<50, 50-64, and >64 years), and inpatient status.
Analyses were built off University of Utah Health’s existing operational dashboard of all patients tested for COVID-19, “linking the medical record numbers to the Enterprise Data Warehouse (EDW) to capture ICD-10 billing codes,” the researchers explained. “The EDW aggregates data across the health system, to create a central resource for operations and research.”
Mean (SD) age of participants was 42 (17) years, 1201 (55%) were female, 1569 (71%) were White, and 282 (13%) were Hispanic or Latino. On the basis of EMR review, 66% of patients presented with fever, 88% with cough, and 64% with dyspnea.
- For fever, the sensitivity of ICD-10 codes was 0.26 (95% CI, 0.24-0.29), specificity was 0.98 (95% CI, 0.96-0.99), PPV was 0.96 (95% CI, 0.93-0.97), and NPV was 0.41 (95% CI, 0.39-0.43)
- For cough, the sensitivity of ICD-10 codes was 0.44 (95% CI, 0.42-0.46), specificity was 0.88 (95% CI, 0.84-0.92), PPV was 0.96 (95% CI, 0.95-0.97), and NPV was 0.18 (95% CI, 0.16-0.20)
- For dyspnea, the sensitivity of ICD-10 codes was 0.24 (95% CI, 0.22-0.26), specificity was 0.97 (95% CI, 0.96-0.98), PPV was 0.93 (95% CI, 0.90-0.96), and NPV was 0.42 (95% CI, 0.40-0.44)
- ICD-10 code performance was better for inpatients than for outpatients for fever (χ2 = 41.30; P < .001) and dyspnea (χ2 = 14.25; P = .003), but not for cough (χ2 = 5.13; P = .16)
“Symptoms are an essential part of data collection for SARS-CoV-2 and COVID-19 surveillance and research,” the researchers wrote, “but symptom-specific ICD-10 codes lack sensitivity and fail to capture many patients with relevant symptoms; the false-negative rate is unacceptably high.”
Because common data models and other aggregation tools rely on these codes to capture clinical concepts, these inaccuracies can have ramifications for downstream scientific discoveries or surveillance. For example, if ICD-10 codes were used for symptom surveillance in subsequent waves of COVID-19, a substantial number of patients would be missed, the authors wrote. “Reliable, accurate data are the foundation of scientific discovery; the right data lead to the right solutions.”
As clinicians may not document all symptoms for all patients, especially when patient volume is high, clinician documentation should be viewed as a reference standard as opposed to a gold standard, the researchers argue. Capturing symptoms directly from the patient or checklist type data entry may better support standardized data collection.
The findings are particularly timely as they come on the heels of several high-profile journal retractions, highlighting the importance of quality control in COVID-19 data aggregation.
“Critical data elements require careful validation to ensure that discoveries translate into effective interventions that reduce morbidity and mortality,” ther authors concluded. “As with many aspects of this pandemic, we must pay careful attention to socioeconomically vulnerable populations, including racial minorities, rural patients, and low-income patients, for whom the gap between ICD-10 coding and clinical reality could be greater.”
Future studies ought to be carried out specifying a plan for data validation in addition to focusing on sampling racial and ethnic minorities to ensure generalizable results.
For More Information: https://www.ajmc.com/view/do-icd-10-codes-accurately-reflect-covid-19-symptoms