Introduction

The post-acute sequelae SARS-CoV-2 infection (PASC) has emerged as a significant concern1,2,3, particularly among young individuals with a previous diagnosis of COVID-19 4,5,6,7. Defined by the World Health Organization (WHO) as the persistence of at least one physical symptom for 12 weeks following initial testing without an alternative diagnosis and expanded by the National Institutes of Health (NIH) to include ongoing, relapsing, or new symptoms four or more weeks post-acute infection, PASC potentially affects a significant proportion of COVID-19 survivors8,9,10. In pediatric populations, recent studies show that PASC symptoms and conditions tend to be systematic and/or syndromic, with higher incidence conditions such as loss of taste/smell, myocarditis, and symptoms associated with cold-like illness occurring in patients after the acute phase of COVID-1911. The estimated prevalence of PASC symptoms and conditions in the pediatric population ranges from 23% to 45% among those previously infected by SARS-CoV-212,13,14, depending on study designs and PASC definitions. These findings highlight the urgent need for further research and comprehensive support to address the prevalence of PASC in pediatric populations.

Prior investigations into potential racial/ethnic differences in PASC among adults have unearthed important findings. For example, the Centers for Disease Control and Prevention (CDC) have shown variations in PASC’s impact based on race/ethnicity15. A study by Khullar et al16. reported that Non-Hispanic Black (NHB) individuals exhibited a higher incidence of new PASC symptoms and conditions compared to Non-Hispanic White (NHW) patients, a difference more pronounced in hospitalized than in non-hospitalized patients. These findings suggest the existence of potential racial/ethnic differences in PASC among adults. Importantly, it is crucial to note that race and ethnicity are social constructs rather than biological ones17,18. Concurrently, research indicates children’s likelihood of testing positive for COVID-19 correlates with their race/ethnicity19,20,21,22. NHB, Hispanic, and multi-racial children exhibited higher rates of COVID-19 positivity compared to their NHW counterparts, indicating differences in infection rates across different racial/ethnic groups23. However, limited research to date has addressed potential racial/ethnic differences in PASC among children and adolescents, making it a pressing area of study. Therefore, our study aimed to quantify such racial/ethnic differences by conducting an association study involving children and adolescents, to determine if the observed patterns are consistent with the findings from studies conducted among adults.

Examining health outcomes through the lens of racial/ethnic differences in the context of COVID-19 runs the risk of pre-existing racial/ethnic differences being either overshadowed or underestimated. To address this, we employed a difference-in-differences (DiD) approach to disentangle the shifts in racial/ethnic differences before and after COVID-19. Our investigation centered on a pediatric cohort from the RECOVER electronic health records (EHR) database across thirteen institutions. This study focused on the pediatric population with six months of follow-up, aims to investigate racial and ethnic differences in PASC symptoms and conditions attributable to SARS-CoV-2 infection. The study also included the Asian American/Pacific Islanders (AAPI) population, which was absent in the prior research.

Our main aim is to ascertain whether COVID-19 status correlates with any racial/ethnic differences across PASC symptoms and conditions. We use the term PASC symptoms and conditions to include a broad spectrum of health issues observed following COVID-19 infection11,24, which are not exclusive to the post-COVID period. This includes systemic manifestations such as fatigue and malaise, respiratory symptoms, cardiovascular complications, and neurological disorders. Our definition, broader than the clinical diagnosis of PASC (ICD-10 code U09.9), enables us to investigate prevalence changes across different racial/ethnic groups before and after COVID-19 infection. Importantly, we quantified the racial/ethnic differences linked with COVID-19 by carefully accounting for pre-infection disparities in PASC symptoms and conditions through the application of DiD analyses.

Specifically, Fig. 1a demonstrates how to use the DiD model to find the racial/ethnic differences attributable to COVID-19. We measured the pre-infection racial/ethnic differences by comparing PASC symptoms and conditions between minority racial/ethnic groups and the NHW group during the pre-infection period. We then measured post-infection differences using the same metrics. The racial/ethnic differences attributable to COVID-19 were determined by calculating the difference between these post-infection and pre-infection differences. When COVID-19 may modify the effect of unmeasured confounding variables, we applied a negative control method (Fig. 1c) to adjust for the shifted impacts of these potential unmeasured confounders. By addressing these questions, our objective is to shed light on racial/ethnic differences in the relationship between COVID-19 and PASC symptoms and conditions.

Fig. 1: Illustration of difference-in-differences (DiD) analysis.
figure 1

a The rationale for DiD without the negative control outcome (NCO) calibration. It disentangles the racial/ethnic differences related to COVID-19 infection in PASC symptoms and conditions from the pre-infection observed racial/ethnic differences. b The directed acyclic graph (DAG) for the DiD without NCO calibration. Each node in the DAG represents each variable and the arrow symbol shows the causal effect. The left panel illustrates the parallel trends assumption for DiD. The right panel demonstrates how DiD without NCO calibration effectively blocks pathways from unmeasured confounders to PASC symptoms and conditions, provided the parallel trends assumption holds. c The DAG for the DiD with NCO calibration. The left panel illustrates a scenario where the parallel trends assumption is violated. The middle panel demonstrates that DiD without NCO calibration fails to block the pathway from unmeasured confounders to PASC symptoms and conditions. The right panel highlights that the DiD method with NCO calibration successfully eliminates the bias from unmeasured confounding.

Results

Study cohort and characteristics

The study involved 225,723 children and adolescents with COVID-19 (COVID-19-positive cohort) from March 2020 to October 2022 (cohort entry month). The index date for the cohort was the earliest date of documented SARS-CoV-2 infection, determined by a positive polymerase chain reaction (PCR), serology, antigen test, or diagnosis of COVID-19 or PASC (U09.9). The COVID-19-positive cohort comprised 109,022 NHW patients (48.3%), 45,823 NHB patients (20.3%), 60,012 Hispanic patients (26.6%), and 10,866 AAPI patients (4.8%) with SARS-CoV-2 infection from March 2020 to October 2022 (Fig. 2). Of these, 50.2% were female.

Fig. 2: Flowchart for COVID-19 positive cohort.
figure 2

Flowchart illustrating the selection process for COVID-19-positive patients from March 2020 to October 2022. Abbreviation: BMI, body mass index; PASC, Post-Acute Sequelae of SARS-CoV-2 infection; PCR, Polymerase Chain Reaction.

We stratified the COVID-19-positive patients at cohort entry into two groups based on the severity during the acute phase (seven days before to fourteen days after the index date) of the COVID-19 infection25: the severe group (moderate and severe cases) and the non-severe group (asymptomatic and mild cases). Table 1 and Table S4 present the baseline characteristics of the cohort, such as age, sex, and Pediatric Medical Complexity Algorithm26 (PMCA), stratified by race/ethnicity and severity. The definitions for each variable are shown in Table S2 of Supplementary Section S1.

Table 1 Baseline characteristics of COVID-19 positive patients, by race/ethnicity and severity status

Statistical method overview

The PASC symptoms and conditions were defined using ICD-10-CM codes (shown in Table S2) and were measured during the pre-infection period (28 to 179 days before the index date) and the post-infection period (28 to 179 days after the index date). We performed pairwise propensity score matching for each minority racial/ethnic group (NHB, Hispanic, AAPI) to the NHW group based on pre-specified covariates. To adjust for pre-infection racial/ethnic differences, we applied a DiD model with negative control outcome (NCO) calibration (Fig. 1c) to the matched cohort.

Figure 1 illustrates the rationale and methodology of our difference-in-differences (DiD) with the NCO calibration approach. Figure 1b shows the rationale of the DiD model. The parallel trends assumption in the DiD model within our study states that the racial/ethnic differences do not change over time in the absence of COVID-19 infection. When the parallel trends assumption holds, unmeasured confounding variables do not affect the racial/ethnic difference in outcomes before and after COVID-19 infection. This is represented by the absence of an arrow between the unmeasured confounding variables and the differences in outcomes in Fig. 1b. However, when the parallel trends assumption is violated—specifically if COVID-19 modifies the effect of unmeasured confounding variables—the outcome estimates from the DiD may become biased without the NCO calibration. To address this limitation, we implemented negative control experiments for each minority racial/ethnic group compared to the NHW group, as illustrated in Fig. 1c. This approach allows us to calibrate our estimates from the unmeasured confounding bias. We provide details for explanations for Fig. 1b and Fig. 1c in the “Methods” Section.

Racial/ethnic differences in PASC symptoms and conditions

To investigate racial/ethnic differences attributable to COVID-19, we examined twenty-four PASC symptoms and conditions. We grouped the PASC symptoms and conditions into two categories: systemic and syndromic features (shown in Table S3). Syndromic features encompass symptoms, signs, and non-specific laboratory abnormalities, while systemic features are diagnosed health conditions. We used thirty-one NCOs in this study, defined as outcomes assumed to have no racial/ethnic differences attributable to COVID-19 (Table S1).

After achieving the balance of standardized mean difference (SMD) by propensity score matching (Supplementary Section S11) and conducting the DiD model with NCO calibration, Fig. 3 shows the racial/ethnic differences attributable to COVID-19 in PASC symptoms and conditions stratified by the severity of acute COVID-19 after the NCO calibration. Overall, we found evidence of an increase in composite outcomes, i.e., at least one condition and any of the syndromic conditions, after SARS-CoV-2 infection among the AAPI group in both severe and non-severe COVID-19 group, but there was no evidence of increased racial differences among Hispanic and NHB groups.

Fig. 3: Racial/Ethnic differences attributable to COVID-19 in PASC symptoms and conditions.
figure 3

Racial/Ethnic differences in risk ratio (RR) that are attributable to COVID-19 for the prevalence of PASC symptoms and conditions among COVID-19-positive cohort (n = 225,723 participants), by race/ethnicity and severity status. Each minority group—Asian American/Pacific Islanders (AAPI), Non-Hispanic Black (NHB), and Hispanic—was compared to the Non-Hispanic White (NHW) group. The dots in the figure showed the estimated RRs for each minority group versus the NHW group from the difference-in-differences (DiD) analysis with NCO calibration stratified by severity status. The error bars showed the 95% confidence interval (CI) of the estimated RR. Source data are provided as a Source Data file.

For patients with severe COVID-19, AAPI patients showed a higher increase in at least one condition (risk ratio [RR] 1.24, 95% confidence interval [CI] 1.04 to 1.49, two-sided Wald test P = 0.019) and any of syndromic conditions (RR 1.22, 95% CI 1.01 to 1.47, P = 0.042) compared to NHW after SARS-CoV-2 infection. Both Hispanic patients (RR 0.99, 95% CI 0.91 to 1.08, P = 0.804) and NHB patients (RR 0.93, 95% CI 0.85 to 1.24, P = 0.147) showed a non-significant decrease in at least one condition as compared to NHW patients. For patients with non-severe COVID-19, AAPI patients showed an increase in at least one condition (RR 1.08, 95% CI 1.01 to 1.14, P = 0.015) and any of syndromic conditions (RR 1.08, 95% CI 1.01 to 1.08, P = 0.017) compared to NHW. Hispanic patients showed a non-significant increase in at least one condition (RR 1.01, 95% CI 0.98 to 1.04, P = 0.498), while NHB patients had a similar non-significant risk (RR 0.99, 95% CI 0.89 to 1.11, P = 0.915) in at least one condition.

However, there exist significant differences among all minority groups across several PASC symptoms and conditions after SARS-CoV-2 infection. For example, for patients with severe COVID-19, the increased prevalence of hair loss among Hispanic patients was greater (RR 2.62, 95% CI 1.06 to 6.49, two-sided Wald test P = 0.038) than the increased prevalence among NHW patients. The corresponding increase in the prevalence of fever and chills among AAPI was greater (RR 1.41, 95% CI 1.01 to 1.97, P = 0.045) than NHW. NHB patients had a smaller change in skin symptoms (RR 0.74, 95% CI 0.58 to 0.96, P = 0.021) than NHW patients. For patients with non-severe COVID-19, AAPI patients had a greater change concerning POTS/dysautonomia (RR 1.57, 95% CI 1.02 to 2.40, P = 0.037) and respiratory signs and symptoms (RR 1.11, P = 0.036) with 95% CI 1.00 to 1.23 (significant; rounded to two decimals) compared to NHW patients. NHB patients had an increase in cognitive function (1.25, 95% CI 1.01 to 1.55, P = 0.037) than NHW patients.

Furthermore, we observed that these racial/ethnic differences varied depending upon the severity of the acute phase of COVID-19 as well as the specific PASC symptoms and conditions being analyzed. For example, among the severe group, the differential increase in abdominal pain was more pronounced for all three minority groups compared to those in the non-severe category.

Sensitivity analysis

Supplementary Section S2 showed the results of the NCO experiments and estimated systematic errors, such as the unmeasured confounder bias. Figure S14 in Supplementary Section S6 showed the racial/ethnic differences using standard Poisson regression models. Among COVID-19 patients within the severe group, NHB patients showed a greater incidence in at least one condition (RR 1.16, 95% CI 1.02 to 1.32, two-sided Wald test P = 0.024) and any syndromic conditions (RR 1.14, P = 0.042) with 95% CI 1.00 to 1.30 (significant; rounded to two decimals) as compared to NHW patients. Hispanic patients also showed a non-significant greater incidence in at least one condition (RR 1.12, 95% CI 0.99 to 1.27, P = 0.075) as compared to NHW patients.

In Fig. S14, among COVID-19 patients with severe illness during the acute infection, Hispanic individuals exhibited a greater incidence of respiratory signs and symptoms (RR 1.16, 95% CI 1.02 to 1.33, two-sided Wald test P = 0.024) and hair loss (RR 1.84, 95% CI 1.02 to 3.31, P = 0.043) as compared with the NHW patient group. NHB had a greater incidence of respiratory signs and symptoms (RR 1.19, 95% CI 1.03 to 1.36, P = 0.015) and heart disease (RR 1.48, 95% CI 1.06 to 2.07, P = 0.022), but a lower incidence of arrhythmias (RR 0.73, 95% CI 0.57 to 0.94, P = 0.013) and headache (RR 0.66, 95% CI 0.48 to 0.93, P = 0.016) compared with the NHW group. Among those with non-severe acute COVID-19, Hispanic patients displayed a higher incidence of myocarditis (RR 4.28, 95% CI 1.53 to 11.98, P = 0.006) and abnormal liver enzyme (RR 2.06, 95% CI 1.08 to 3.94, P = 0.029) compared with NHW patients. Meanwhile, AAPI patients demonstrated a greater incidence of hair loss (RR 3.32, 95% CI 1.43 to 7.72, P = 0.005) compared with the NHW patient group.

These findings revealed that our DiD approach identified fewer racial/ethnic differences compared to standard regression models. It is worth noting that the DiD approach adjusted for the baseline racial/ethnic differences before the SARS-CoV-2 infection, a step that a standard regression analysis failed to take into consideration. Consequently, some of the observed racial/ethnic differences with prior work might not be attributed to COVID-19. Nevertheless, given its adjustment for baseline racial/ethnic differences, the DiD approach holds greater robustness.

To determine whether the defined set of twenty-four PASC symptoms and conditions accurately reflect PASC, we conducted a crude incidence analysis comparing these symptoms and conditions in the COVID-19-positive cohort to that in the COVID-19-negative cohort (Supplementary Section S4). A total of 677,448 COVID-19-negative patients were included in the crude analysis. A random negative test date was chosen as the index date for COVID-19-negative patients and the selection criteria for the COVID-19-negative cohort are shown in Fig. S13. The results indicated that the incidence of all PASC symptoms and conditions was higher in all COVID-19-positive patients compared to all COVID-19-negative patients (Table S7).

To further validate our definition of PASC symptoms and conditions, as well as to investigate potential racial/ethnic differences in PASC symptoms attributable to COVID-19, we applied the DiD approach with NCO calibration to the COVID-19-negative cohort (Fig. S14). This analysis revealed no racial or ethnic differences in PASC symptoms and conditions attributable to COVID-19 within the COVID-19-negative cohort. These findings suggest that the observed racial/ethnic differences in PASC symptoms and conditions among COVID-19-positive individuals are unlikely to be due to inherent differences in symptom and condition incidence patterns but rather may reflect factors specifically associated with COVID-19 infection or its sequelae. This analysis validates that the racial and ethnic differences in PASC observed in the COVID-19-positive cohort are related to COVID-19 infection, rather than underlying population differences or unrelated health trends.

In the analysis including only those patients identified based on positive SARS-CoV-2 PCR or antigen testing (Supplementary Section S7), differences among severe patients were diminished among some PASC symptoms and conditions, while among the non-severe patients, the differences that we identified were consistent in both sets of analyses. To account for the potential bias from limited hospital capacity during the initial COVID-19 period, we performed a secondary analysis excluding COVID-19 patients from the first wave of the pandemic (March to May 2020). This exclusion did not significantly alter the results, as demonstrated in Supplementary Section S8. Supplementary Section S9 shows the results of subgroup analysis by age group. No significant differences were observed for patients aged < 5. For patients aged from 5 to 11, non-severe NHB patients showed an increase in at least one condition (RR 1.08, 95% CI 1.02 to 1.15, P = 0.009) and any of the syndromic conditions (RR 1.08, 95% CI 1.02 to 1.15, P = 0.010) compared to NHW group. For adolescents aged 12 to 20, differences among severe and non-severe patients had a similar pattern. Supplementary Section S10 shows the results of stratified analysis by virus variants. The Hispanic cohort showed less increase among severe and non-severe groups during the pre-Delta period; differences were consistent in both sets during the Delta period, and all minority groups showed increased risk in at least one condition and any syndromic conditions compared to NHW patients during the Omicron period. Supplementary Section S12 provided the results of adding the socioeconomic confounder Area Deprivation Index (ADI) into the analysis, and the differences detected were consistent in both sets of analyses.

Discussion

We examined racial/ethnic variations in long-term consequences of documented SARS-CoV-2 infection across thirteen health institutions in the RECOVER study for 225,723 children and adolescents. After accounting for pre-existing differences and potential confounding biases, we found evidence suggesting differences attributable to COVID-19 for AAPI patients compared to NHW patients. Notably, AAPI patients showed an increase in the risk of developing at least one PASC-related condition and any syndromic conditions, both in severe and non-severe COVID-19 cases. We observed specific symptom differences among minority racial/ethnic groups, despite no evidence indicating disparities in composite outcomes for NHB or Hispanic populations compared to NHW. For example, NHB patients showed a smaller change in skin symptoms compared to NHW patients, consistent with previous findings in adults by Khullar et al16.

The increased cognitive function issues in NHB patients, the greater prevalence of hair loss in Hispanic patients, the higher risk of POTS or dysautonomia, and respiratory symptoms in AAPI patients with severe COVID-19 all warrant particular attention in clinical settings. It is worth noting that AAPI is the smallest racial/ethnic group, and for this group, the results show evidence of differences compared to NHW. Therefore, this strengthens the findings of racial/ethnic differences across PASC symptoms and conditions for the other minority groups compared to NHW.

While the observed racial/ethnic differences in RRs are subtle, they remain significant when applied to large populations and may reflect broader systemic issues. For example, the observed differences in PASC can primarily be attributed to disparities in healthcare access and the availability of specialized pediatric care, which vary significantly across different regions of the country, according to a national report27. These variations are not due to biological reasons but are more closely linked to socioeconomic factors, healthcare infrastructure, and regional healthcare policies.

This study examines PASC racial/ethnic differences among pediatric populations, providing valuable insights into the clinical relevance of these disparities. Understanding these differences is crucial for developing targeted interventions to ensure equitable healthcare access and outcomes for all pediatric populations affected by PASC. The increased risk among minority racial/ethnic pediatric patients may necessitate targeted follow-up care and support for the minority racial/ethnic population. Healthcare providers should be aware of the potential for varied PASC presentations across racial/ethnic groups in pediatric populations. This knowledge can inform more targeted screening and follow-up protocols, potentially improving early detection and management of PASC symptoms in different racial/ethnic patient populations.

The implications of this study on a broader public health level are significant. By utilizing data from thirteen institutions, representing approximately 10% of the US children population and encompassing both urban and suburban health systems, this study offers a comprehensive national sample. Public health strategies are suggested to prioritize the reduction of healthcare access disparities and the enhancement of specialized pediatric care availability across all regions. This will help in better managing and treating PASC in children, ensuring that all affected patients receive the necessary care regardless of their geographic ___location or socioeconomic status. Furthermore, these findings underscore the importance of continuous surveillance, research, and tailored public health policies to address the evolving needs of this patient group.

The method utilized in our study has multiple strengths. First, we used propensity score matching methods instead of standard linear regression models in our adjustment of the confounders, which helped us reduce the non-linear effects of the confounders28. Second, we accounted for the pre-infection racial/ethnic differences in PASC symptoms and conditions. This approach enabled us to quantify racial/ethnic differences attributable to COVID-19 more accurately. The DiD model is particularly powerful in the context of studying racial/ethnic differences, as it helps to isolate the effect of COVID-19 from pre-existing health disparities. By comparing the change in health outcomes before and after COVID-19 across racial/ethnic groups, we can more confidently attribute observed differences to the impact of the virus rather than to the pre-infection racial/ethnic differences.

A key strength of our study is the implementation of NCOs, which are outcomes not expected to show racial/ethnic differences due to COVID-19. This approach enhances the reliability and interpretability of our findings. By analyzing these alongside our primary outcomes, we enable the detection of systematic bias from unmeasured confounding variables. This approach allows us to calibrate results from the DiD model. Importantly, NCOs help mitigate the impact of unmeasured confounding that may change before and after the infection. This is particularly crucial when studying racial/ethnic differences, where many dynamic social and environmental factors may not be captured in our dataset. Furthermore, we carefully addressed potential bias related to differences in follow-up times. To ensure comparability, we standardized the follow-up period for all participants, observing PASC symptoms and conditions over a consistent timeframe of five months (28 to 179 days post-cohort entry). This approach mitigates any bias that might arise from variations in follow-up duration across individuals.

This study has several limitations. First, socioeconomic differences may exacerbate racial/ethnic differences in PASC symptoms and conditions, thereby acting as a mediator effect in the causal pathway between race/ethnicity and clinical outcomes. Such influences have been suggested as risk factors for acute COVID-19 by Chisolm and colleagues in a RECOVER EHR study23. Future research on PASC outcomes is of interest to study such mediation effects.

Secondly, health-seeking behavior or healthcare access is an important consideration, known as ascertainment bias29. Limited access to care and associated medical records among certain minority racial/ethnic groups may contribute to potential bias in the observed racial/ethnic differences. Related issues were recently described by Nasir et al. for the ascertainment of PASC symptoms and conditions in adult populations through EHR30. Nevertheless, we tried to address the ascertainment bias by including healthcare utilization factors in the propensity score model, such as the number of inpatient, outpatient, and ED visits during the baseline period.

Thirdly, confounding poses a significant bias threat in observational studies. To address this, we extensively adjusted for potential confounders using a propensity-score-based matching method and DiD analyses. We employed NCOs to reduce the residual bias, such as unmeasured confounder bias. Additionally, EHR data completeness issues may lead to misclassification and loss-to-follow-up bias. Some attempts have been made to mitigate the impacts of these biases31,32,33,34. The analysis used a combined set of patients, but potential bias in racial/ethnic differences may vary between outpatients and inpatients. Addressing these issues can help improve the reliability of evidence generated from these investigations.

Fourthly, asymptomatic cases are less likely to be captured in EHR as they are less likely to be tested, impacting the findings for non-severe COVID-19. This study focused on diagnosed or treated cases of PASC, reflecting children who sought healthcare services. The direction of bias can be bidirectional, depending on whether asymptomatic cases are disproportionately missed in minority or NHW groups. This could attenuate the DiD analysis for the non-severe COVID-19 group due to potential misclassification bias, leading to conservative or inflated results. Future research should aim to capture a more comprehensive spectrum of COVID-19 severity.

Fifthly, our study design, requiring at least one healthcare visit within 28-179 days after the index date, balanced selection bias against loss to follow-up bias35. While this approach may overrepresent frequent healthcare users, especially during 2020-2021, we mitigated this by incorporating diverse testing data to reduce COVID-19 status misclassification. Furthermore, in the crude incidence analysis comparing COVID-19 positive and negative patients, the resulting selection bias likely included sicker patients in the COVID-19 negative group, potentially biasing our results towards the null. Nevertheless, we still observed a higher incidence of PASC symptoms and conditions for COVID-19-positive patients compared to COVID-19-negative patients. Future studies could explore alternative designs or data sources to enhance the external validity of these findings.

In summary, we quantified the racial/ethnic differences in PASC symptoms and conditions and the impact of SARS-CoV-2 infection on these differences. The impact of COVID-19 varied across racial/ethnic groups, severity of acute COVID-19, and different PASC symptoms and conditions.

Methods

Ethics and inclusion

This study constitutes human subject research. Institute Review Board (IRB) approval was obtained under Biomedical Research Alliance of New York (BRANY) protocol #21-08-508. As part of the BRANY IRB process, the protocol has been reviewed in accordance with the institutional guidelines. The BRANY waived the need for patient-informed consent and HIPAA authorization.

Data source

This retrospective cohort study is part of the NIH Researching COVID-19 to Enhance Recovery (RECOVER) Initiative (https://recovercovid.org/), which aims to learn about the long-term effects of COVID-19. The data were contributed by thirteen sites. Participating institutions in this study included: Children’s Hospital of Philadelphia, Cincinnati Children’s Hospital Medical Center, Children’s Hospital of Colorado, Ann & Robert H. Lurie Children’s Hospital of Chicago, Nationwide Children’s Hospital, Nemours Children’s Health System (in Delaware and Florida), Duke University, University of Iowa Healthcare, University of Michigan, University of Missouri, Oregon Community Health Information Network (OCHIN), University of California, San Francisco, and Vanderbilt University Medical Center. All analyses were conducted using R version 4.1.2, with statistical significance set at a two-tailed p-value threshold of < 0.05.

Cohort construction

We conducted a retrospective study from March 1, 2020, to October 3, 2022, with at least 6 months of follow-up time. We included patients under the age of 21 who had at least one visit from 18 months before to 7 days before the index date (defined as the baseline period) and at least one encounter within 28 days and 179 days after the index date (defined as the follow-up period). For COVID-19-positive patients, we included the patients who had positive PCR, serology, or antigen tests or diagnoses of COVID-19, or diagnoses of PASC (U09.9), which we defined as documented SARS-CoV-2 infection. The index date for these patients was defined as the first time of SARS-CoV-2 infection. We used patient race/ethnicity data included in the EHR and categorized patients into six racial/ethnic groups: NHW, NHB, Hispanic, AAPI, Multiple, and Other/Unknown. Multiple comprising less than 3% of the population in any COVID-19 and severity category (Table S5 of Supplementary Section S3) were considered small samples and excluded from the analysis. The Other/Unknown categories were also excluded due to interpretability issues. The selection of participants for COVID-19-positive patients in real-world data is summarized in Fig. 2.

Defining outcomes

Our definition of PASC symptoms and conditions included 24 symptoms and conditions as shown in Rao et al11., including abdominal pain, abnormal liver enzyme, acute kidney injury, acute respiratory distress syndrome, arrhythmias, cardiovascular signs and symptoms, changes in taste and smell, chest pain, cognitive functions, fatigue and malaise, fever and chills, fluid and electrolyte, generalized pain, hair loss, headache, heart disease, mental health disorders, musculoskeletal pain, myocarditis, myositis, Postural Orthostatic Tachycardia Syndrome (POTS) or dysautonomia, respiratory signs and symptoms, skin symptoms, and thrombophlebitis and thromboembolism. We used validated diagnostic codes (ICD-10-CM) confirmed by board-certified pediatricians, with details of the code sets available in the Supplementary Materials Table S2. Systematic and syndromic conditions related to PASC were grouped by the 24 PASC symptoms and conditions, which are detailed in Table S4.

Patient characteristics

The primary exposure was race/ethnicity, categorized into NHW, NHB, Hispanic, and AAPI. Various patient characteristics were considered as confounders, such as age at the cohort entry date ( < 5, 5–12, 12–21), sex (female, male), cohort entry month (from March 2020 to October 2022), site indicators, obesity (obese, non-obese), a chronic condition indicator as defined by the PMCA (no chronic condition, non-complex chronic condition, and complex chronic condition)26, healthcare visits (inpatient, outpatient, and emergency department visits), medications (0, 1, 2, ≥ 3), negative tests (0, 1, 2, ≥ 3), vaccine doses (0, 1, ≥ 2), and immunization duration during the baseline period (no vaccine, < 4 months, ≥ 4 months). The severity of COVID-19 at the cohort entry date was stratified into the following categories: asymptomatic, mild (symptomatic), moderate (involving moderately severe COVID-19-related conditions like gastroenteritis, dehydration, and pneumonia), and severe (comprising unstable COVID-19-related conditions, ICU admission, or mechanical ventilation)25. We categorized patients exhibiting either asymptomatic or mild symptoms as belonging to the non-severe group, while all other patients were classified as part of the severe group.

Propensity score matching

To quantify the racial/ethnic differences in the PASC symptoms and conditions, we use RR as the comparative measure. The RR is known to be a collapsible measure, where collapsibility36 refers to the measure of association conditional on some factors that remain consistent with the marginal measure collapsed over strata37.

To eliminate the impact of potential measured confounders, we used a propensity score matching technique with the covariates detailed in the patient characteristics section. The propensity score is calculated by the logistic regression model fitted by regressing the racial/ethnic groups on the covariates. We performed this matching separately for minority racial/ethnic groups (NHB, Hispanic, and AAPI), each stratified by severity status, compared with the NHW group. After performing the matching, we assessed the SMD between each covariate value for different racial/ethnic groups, with a difference of 0.1 or less indicating an acceptable balance38. The detailed characteristics of the matched cohort are presented in Supplementary Section S5.

DiD models

Subsequently, we quantified the differential increase in the prevalence of PASC symptoms and conditions across different racial/ethnic groups by the DiD method on the matched cohort. Figure 1a provides a visual representation of the DiD method used to estimate racial/ethnic differences in the increased prevalence of PASC symptoms and conditions related to COVID-19. We introduce the notation to illustrate the DiD method. For each minority racial/ethnic group (AAPI, NHB, or Hispanic) matched with the NHW cohort, we use a binary indicator \(R\) to denote racial/ethnic groups (R= 0 for NHW, R =1 for NHB, Hispanic, or AAPI). We define \({T}\) as the time period, where T = 1 represents the post-infection period (28 to 179 days after COVID-19 infection) and T = 0 represents the pre-infection period (28 to 179 days before COVID-19 infection). \(Y\) is a binary indicator denoting the occurrence of PASC symptoms and conditions, where Y = 1 indicates the patient has been diagnosed with the specific PASC symptoms and conditions, and Y = 0 indicates the absence of such diagnosis. For each participant, we observed the outcome \(Y\) in both pre-infection (T = 0) and post-infection (T = 1) periods.

We fitted a Poisson regression model by regressing each PASC symptom or condition on racial/ethnic groups, time period, and the interaction between racial/ethnic groups and time period:

$$\log (E[{Y|R},T])={\beta }_{0}+{\beta }_{1}R+{\beta }_{2}T+{\beta }_{3}R\times T$$
(1)

where log(\(\cdot\)) is the log link function. From Eq. (1), the pre-infection racial/ethnic difference is found by the coefficient, β1, shown in the green part in Fig. 1a. The post-infection racial/ethnic difference is denoted by the sum of the coefficients, β13. Thus, the coefficient of the interaction term, β3, represents the differential increase in racial/ethnic differences attributable to COVID-19, shown in the blue part in Fig.1a.

To illustrate the DiD model removes the bias from the unmeasured confounder when the parallel trends assumption holds, Fig. 1b shows the directed acyclic graph (DAG) for the causal relationship39 when conducting the DiD model. We use Y0 for the outcome during the pre-infection period, Y1 for the outcome during the post-infection period, and U for the unmeasured confounding variables. For simplicity, we present the DAG for the matched cohort of each minority race/ethnicity-NHW pair after adjusting for measured confounding variables X using the matching method.

Arrows between variables denote causal relationships. For instance, the arrow from \(R\) to \({Y}_{1}\) represents the racial/ethnic difference during the post-infection period, while the arrow from \(R\) to \({Y}_{0}\) represents the difference during the pre-infection period. The DiD model aims to identify the racial/ethnic difference in the outcome change between the post-infection and the pre-infection period, ∆Y (where ∆Y=Y1Y0), as shown by the arrow from R to ∆Y.

The key advantage of the DiD model is its ability to adjust for unmeasured confounding variables \(U\) under the parallel trends assumption. This assumption states that the effect of \(U\) remains constant over time, meaning the effect of \(U\) on Y0 and Y1 is equal. In the DAG, this is represented by the path U→ Y0 being equal to U→ Y1.

Under this assumption, the DiD model, by taking the difference between \({Y}_{1}\) and \({Y}_{0}\), effectively blocks the path from \(U\) to \(\varDelta Y\), allowing us to isolate the effect of \(R\) on \(\varDelta Y\) without interference from \(U\). It is worth noting that a specified model with adjusting for the pre-infection outcomes \({Y}_{0}\) cannot adjust for the bias from the unmeasured confounder \(U\). The model effectively blocks the path R→Y0→Y1. Additionally, it introduces new bias through the collider path RY0UY1. Collider bias occurs when two variables, such as \(R\) and \(U\), influence a common effect, such as \({Y}_{0}\), and controlling for this effect inadvertently creates a spurious association between the variables, leading to biased results. This is why the DiD method is more accurate than the model adjusting for the outcome during the pre-infection period.

Negative control outcome calibration

While we used propensity score matching to account for the measured confounders and DiD analyses to address pre-infection racial/ethnic differences, the results can still be impacted by unmeasured confounder bias when the parallel trends assumption does not hold, that is, when COVID-19 modifies the effect of unmeasured confounding variables. To mitigate such bias, we collected thirty-one NCOs prespecified by pediatric physicians, which are not expected to show racial/ethnic differences due to COVID-19. We fitted the model from Eq. (1) with the outcome replaced by each NCO. The residual bias thus can be estimated from the average of \({\beta }_{3}\) s from the models for the NCOs. By using these NCOs, the study was able to calibrate the residual bias from unmeasured and systematic sources.

Figure 1c illustrates this by the expanded DAG. When the parallel trends assumption is violated, a backdoor path from \(U\) to \(\varDelta Y\) can emerge, potentially biasing our estimates. To address this limitation, we incorporate NCOs into our analysis. Similarly, we denote each NCO during the post-infection period by \({W}_{1}\) and during the pre-infection period by \({W}_{0}\). Due to the influence of \(U\), we observe a path from \({U}\) to ∆W (where ∆W=W1-W0). We then assume that the effect of \(U\) on ΔW is equivalent to its effect on ∆Y. By leveraging this assumption, we can effectively block the path from U to ∆Y-∆W by subtracting ∆W from ∆Y. This allows us to estimate the racial/ethnic difference in PASC symptoms and conditions with greater accuracy and robustness to unmeasured confounding. We conducted a two-sided Wald test to evaluate whether the racial/ethnic differences in PASC symptoms and conditions were equal to zero and reported the corresponding p-values. No adjustments were made for multiple comparisons.

Sensitivity analysis

We conducted a list of sensitivity analyses to examine the robustness of our findings. First, to evaluate how different statistical methods might influence the analytical results, we used an alternative approach, standard Poisson regression analyses, with RR as the comparative measure. Specifically, we considered the incidence of PASC symptoms and conditions as outcomes, while controlling for the same confounders that were used in the matching process within the DiD analysis. The p-values are reported from two-sided Wald tests. Second, we performed crude incidence analyses to compare PASC symptoms and conditions between COVID-19 positive and negative groups, assessing whether the prespecified outcomes accurately reflect PASC. We showed the p-values for the statistical significance from two-sided two-proportion Z-tests. In addition, to validate racial/ethnic differences attributed to COVID-19, we conducted DiD analyses with NCO calibration on COVID-19 negative groups. Third, we conducted analyses for COVID-19 patients identified only by positive SARS-CoV-2 PCR or antigen tests, because the recorded date of COVID-19 diagnosis may not accurately reflect the actual infection date. Fourth, we conducted analyses excluding patients whose index dates fell within the first wave of COVID-19 (March to May 2020) due to limited SARS-CoV-2 testing availability during this period. Additionally, our sensitivity analysis featured stratification by a set of age group strata ( < 5, 5–11, 12–20), differing from the ones previously specified, and by estimated time frames corresponding to dominant virus variants (pre-Delta, Delta, Omicron). Finally, we conducted a sensitivity analysis including ADI as a measured confounder, which excluded patients who did not have this information.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.