Introduction

In real-world settings, observational studies are used to infer causal effects of how well vaccines protect against infection1,2. Accurate estimates of vaccine effectiveness (VE) are important, as they are used to evaluate the impact of vaccines at a population level and can affect vaccine confidence3. Most VE observational studies use population-level data collected through passive surveillance systems that rely on tests conducted to screen for or diagnose infections. However, testing for a given infection depends on many factors: variability in recommendations across populations and risk factors for testing and treatment; access and uptake shaped in part by healthcare engagement; underlying burden of alternate etiologies of symptoms that drive testing. When some of these factors also influence vaccination, they can create unequal rates of testing by vaccination status, introducing bias into VE estimates such as residual confounding (due to unmeasured confounders), or selection bias (when restricting study samples to those tested)4,5,6,7,8. Concerns surrounding bias in VE estimates due to unequal testing by vaccination status were brought to the forefront during the SARS-CoV-2 pandemic, based on empirical data that revealed associations between vaccination status and testing9,10. Quantifying the direction and magnitude of bias in VE estimates due to unequal testing by vaccination, and the conditions under which this bias is most influential, could inform the interpretation of VE estimates and study design to reduce bias.

Two observational study designs commonly used to estimate VE are the retrospective cohort (e.g., refs. 11,12,13,14,15) and retrospective test-negative designs (e.g., refs. 16,17,18,19,20). Cohorts use vaccination status to assign exposure and estimate the relative risks of infection (between the vaccinated and unvaccinated cohorts) that may occur over a given sampling period. Classification of the outcome (infected vs. not infected) depends on testing for infection. Therefore, unmeasured confounders that lead to unequal testing by vaccination status can cause residual confounding in VE estimates from the cohort design21,22,23.

Test-negative designs restrict the study sample to individuals who received a test for the infection in an attempt to eliminate unmeasured confounding24,25. At a given point in the epidemic (sampling period), cases (individuals who tested positive) and controls (individuals who tested negative) are compared to determine if the distribution of the exposure (vaccination) differs between cases and controls, using odds ratios2. However, as described by Westreich and Hudgens6 and Sullivan, Tchetgen Tchetgen, and Cowling7 through a directed acyclic graph (DAG), restricting the study sample to tested individuals may introduce a selection bias via selecting on a collider (also referred to as collider bias) in a situation where the exposure (vaccination) and outcome (infection) both independently affect the likelihood of being tested. Selecting for testing in this scenario distorts the association between vaccination and infection, thereby creating biased VE estimates6,7.

Studies often focus on symptomatic infection as the outcome, estimating the vaccine’s protection against symptomatic infection (hereafter referred to as symptomatic VE) (e.g., refs. 16, 26, 27). When measuring symptomatic VE, these studies restrict their sample to only those individuals who exhibit symptoms and have been tested. Selecting for symptomatic cases may appear to mitigate the collider bias caused by selecting on testing by blocking the association between infection and testing. However, we posit that selecting on both testing and being symptomatic may induce artificial associations (e.g., between healthcare engagement and infection), allowing a bias to persist.

Various studies have qualitatively described the mechanisms by which unequal testing by vaccination could lead to residual confounding and/or selection bias6,7,8. However, fewer studies have measured the direction and magnitude of bias in VE estimates due to unequal testing by vaccination in cohort vs. test-negative designs (but see Suah et al.28); and, importantly, the epidemic and immunological properties under which bias in VE estimates are amplified or dampened in the presence of unequal testing.

Epidemic and immunological properties have the potential to shape biases in VE estimates over time and across epidemics. Epidemic properties such as the force of infection (incidence per susceptible) and the proportion of the population that is susceptible due to infection-acquired immunity can vary over the course of an epidemic and thus can influence the degree of bias in VE estimates over time. Epidemic and immunological properties that determine the overall size and length of an epidemic may also vary with changing contexts and evolving pathogens, potentially shaping biases across epidemics. One epidemic condition that can vary is the underlying epidemic potential (probability of transmission), which could differ depending on contact rates and/or biological properties of the virus (e.g., increased transmissibility of SARS-CoV-2 variants29,30). Immunological properties such as the biological efficacy of vaccines against susceptibility (individual-level protection against acquiring infection31) and/or against infectiousness (individual-level reduction in transmitting virus if infected31) can also vary across epidemics, as seen with the increased immune escape of SARS-CoV-2 variants32. As such, epidemic and immunological properties that determine the overall size and length of an epidemic may moderate the magnitude of bias observed in a given epidemic.

In this study, we developed an agent-based network model, using SARS-CoV-2 as an example, to examine the magnitude of bias in symptomatic VE estimates due to unequal testing by vaccination status across retrospective cohort and test-negative designs. Here, our causal relationship of interest is the effect of vaccination on symptomatic SARS-CoV-2 infection (Fig. 1). First, using our simulated data, we compared the magnitude of this bias between the two study designs at the same sampling period and then examined how it varied across studies with different sampling periods within each study design (Objective 1). Next, we assessed the extent to which the magnitude of bias was moderated by immunological properties (vaccine efficacy against susceptibility and vaccine efficacy against infectiousness) (Objective 2) and by the underlying epidemic potential (probability of transmission) (Objective 3).

Fig. 1: Directed acyclic graphs (DAGs) for estimating vaccine effectiveness against symptomatic SARS-CoV-2 infection across two study designs.
figure 1

DAGs are shown for the retrospective cohort design using relative risk (a) and the retrospective test-negative design using an odds ratio (b). The DAGs depict causal relationships (arrows) and induced associations (dashed lines). Because the vaccine does not decrease the likelihood of developing symptoms once infected, effectiveness against symptomatic infection is equivalent to effectiveness against infection (yellow arrow). Causal relationships between healthcare engagement and vaccination (i), and between healthcare engagement and testing for SARS-CoV-2 (ii) induce an association between vaccination and testing for SARS-CoV-2 (iii) and create a biased relationship between vaccination and testing positive for SARS-CoV-2 in the cohort design (depicted in (a)). In the test-negative design (depicted in (b)), conditioning on testing induces an association between healthcare engagement and being symptomatic (iv). Conditioning additionally on being symptomatic induces an association between healthcare engagement and SARS-CoV-2 infection (v). This induced relationship (v), along with the relationship between healthcare engagement and vaccination (i), introduces confounding by healthcare engagement in the relationship between vaccination and SARS-CoV-2 infection.

Results

Comparison of the magnitude of bias from testing differences across study designs and sampling periods

Using an agent-based network model and our default parameters (Fig. 2; Table 1), we simulated three testing scenarios that varied the magnitude of the relationship between testing and vaccination (“iii” in Fig. 1): an equal testing scenario (equal probability of testing by vaccination status); a moderately unequal testing scenario (vaccinated with 1.76 times higher testing); and a highly unequal testing scenario (vaccinated with 2.36 times higher testing). We assumed testing was limited to individuals with symptoms from SARS-CoV-2 infection or with COVID-like symptoms from alternate etiologies. We then measured the magnitude of bias in VE estimates—specifically in estimates derived from the cohort design (VERR) and the test-negative design (VEOR)—and compared them across study designs and across studies with different sampling periods. We ran the three testing scenarios using both a higher and lower efficacy against susceptibility (0.55 and 0.1), which were equivalent to higher and lower true symptomatic VE (see “Methods” section for details). We also tested the sensitivity of our results to variations in test sensitivity, the time since the start of the study (i.e., calendar time in observational studies), and, for the test-negative design, to differences in the prevalence of COVID-like symptoms from alternate etiologies.

Fig. 2: Example of the agent-based network model used to generate SARS-CoV-2 transmission dynamics.
figure 2

Nodes represent individuals, either vaccinated (blue) or unvaccinated (pink); edges represent contacts through which respiratory transmission can occur. Each node has individual-level attributes, including SARS-CoV-2 infection status (Susceptible [never infected], Exposed, Infectious, Recovered [previously infected]); vaccination status (vaccinated, unvaccinated); level of healthcare engagement (low, high); testing status (untested, tested positive for SARS-CoV-2, tested negative for SARS-CoV-2); symptom status (symptomatic, asymptomatic/no symptoms); and status with respect to alternate etiologies of COVID-like symptoms (exposed or unexposed).

Table 1 Parameter values for agent-based model, including default values, reference(s) for the parameter values and the range of sensitivity values for those parameters of interest

Under equal testing, the cohort design produced unbiased VERR, which remained stable over the successive sampling periods (Fig. 3a, b). Under equal testing, the test-negative design produced unbiased VEOR for studies limited to early sampling periods (e.g., day 30), which captured only the beginning of the epidemic (similar to VERR) (Fig. 3e, f). As the epidemic progressed, cumulative symptomatic infections increased, especially among unvaccinated individuals (Fig. 3c, d). Later in the epidemic, this disproportionate rise among the unvaccinated led to a disproportionate decrease in the cumulative proportion of unvaccinated individuals exhibiting COVID-like symptoms from alternate etiologies who had no prior symptomatic SARS-CoV-2 infection (Supplementary Fig. 1). Consequently, the test-negative design overestimated true symptomatic VE for longer studies that captured later stages of the epidemic.

Fig. 3: Symptomatic vaccine effectiveness (symptomatic VE) estimates across testing scenarios, study designs, and sampling periods, and cumulative proportions of symptomatic infections by vaccination status.
figure 3

Symptomatic VE estimates derived from the retrospective cohort design (VERR, (a) and (b)) and the retrospective test-negative design (VEOR, (e) and (f)) are shown, along with the cumulative proportions of symptomatic infections by vaccination status ((c) and (d)). Panels show results under two levels of vaccine efficacy against susceptibility: higher (0.55) and lower (0.1), which represent the true symptomatic VE (grey dashed line). VERR and VEOR were calculated across three testing scenarios (green): equal testing by vaccination status, moderately unequal testing (vaccinated with 1.76 times higher testing), and highly unequal testing (vaccinated with 2.36 times higher testing). Each time point corresponds to symptomatic VE estimates from studies conducted at that time during the epidemic. The cumulative proportions of symptomatic infection represent the cumulative level of symptomatic infection for unvaccinated (yellow) and vaccinated (dark gold). Central lines depict the median, and shaded regions indicate the 2.5th–97.5th percentile range across n = 100 epidemic realisations.

Under unequal testing, the cohort design produced VERR that underestimated the true symptomatic VE (Fig. 3a, b). Over the course of the epidemic (and thus for studies that captured longer sampling periods), the magnitude of the underestimate decreased and then stabilised (Fig. 3a, b). This decrease coincided with unvaccinated individuals being tested later than vaccinated individuals (Supplementary 1: Timing of Testing; Supplementary Table 1). Stabilisation of the pattern occurred at the end of the epidemic when there were no more individuals with SARS-CoV-2 available for testing (Fig. 3c, d). In sensitivity analyses, when we accounted for the influence of time using a Cox model, the temporal pattern of the bias due to unequal testing was substantially reduced, especially with lower vaccine efficacy (0.1) (Supplementary Fig. 2; Supplementary Tables 23). The magnitude of the underestimates was larger in the highly unequal testing scenario than in the moderately unequal testing scenario (Fig. 3a, b). Varying the test sensitivity had no effect on the magnitude of the bias (Supplementary Fig. 3).

Under unequal testing, the test-negative design also produced VEOR that underestimated the true symptomatic VE, with larger biases occurring in the highly unequal testing scenario (Fig. 3e, f). However, the test-negative design produced smaller biases than the cohort design, with the difference in the magnitude of bias between study designs most pronounced when true symptomatic VE (i.e., efficacy against susceptibility) was low.

In contrast to the cohort design, bias in the test-negative design increased for studies with longer sampling periods that included later epidemic stages, especially when true symptomatic VE was low. A substantial contributor to this pattern was the greater differential outcome misclassification by vaccination status, which accumulated over the course of the epidemic (Supplementary Fig. 4). This misclassification, driven by missed detection, increased with longer sampling periods because, as time progressed, individuals had more opportunities to both (1) become infected with symptomatic SARS-CoV-2 and be undetected (i.e., missed) and (2) develop COVID-like symptoms from alternate etiologies—sometimes more than once—and be erroneously classified as “negative” when they were undetected cases (Supplementary 1: Missed Detection Misclassification Example; Supplementary Fig. 5). In sensitivity analyses, adjusting for time since the start of a study using a logistic model reduced the magnitude of bias in studies with shorter sampling periods, particularly with lower vaccine efficacy (0.1); however, the overall pattern of increasing bias with longer sampling periods remained unchanged (Supplementary Fig. 6; Supplementary Tables 45).

Varying test sensitivity had a minor effect on the magnitude of bias due to unequal testing for the test-negative design, with the size of this effect slightly increasing when the study sampling period extended to the epidemic peak (Supplementary Fig. 7). Overall, the impact of test sensitivity remained consistently small (|Δ| < 0.045). The direction of the test sensitivity’s effect on the magnitude of bias was inconsistent and depended on the level of efficacy against susceptibility. At higher efficacy (0.55), decreasing test sensitivity increased the magnitude of bias, mirroring patterns observed across VE estimates when testing was equal (Supplementary Fig. 7a, c; Supplementary Fig. 8); at lower efficacy (0.1), decreasing test sensitivity reduced the magnitude of the bias (Supplementary Fig. 7b, d). This reduction occurred because, at lower efficacy, the false negatives from decreased test sensitivity among the vaccinated made up a larger proportion of the vaccinated individuals classified as “negative” than was the case for unvaccinated individuals. This pattern reversed when efficacy was higher or when testing was equal (see Supplementary 1: Effects of Testing Sensitivity for details).

In the test-negative design, the magnitude of bias was also amplified in the moderately and highly unequal testing scenarios when there was a higher prevalence of COVID-like symptoms from alternate etiologies (Supplementary Fig. 9). This increase in the magnitude of bias occurred because, as the prevalence of COVID-like symptoms increased, unvaccinated individuals experienced a higher proportional increase in the number of individuals being tested and classified as “negative” than did vaccinated individuals (see Supplementary 1: Proportional Increase Calculations by Vaccination Status with COVID-like Symptoms due to Alternate Etiologies). The reason for this higher proportional increase is that, at higher prevalence of COVID-like symptoms from alternate etiologies, more individuals became symptomatic during the sampling period—with some experiencing repeated episodes—providing unvaccinated individuals who were unlikely to be tested more opportunities to test and be classified as “negative”. Varying the prevalence of COVID-like symptoms from alternate etiologies had no influence on the magnitude of bias in estimates derived from the cohort design.

Influence of immunological properties on the magnitude of bias from testing differences

To determine whether varying immunological properties could shape the magnitude of bias due to testing differences, we repeated the above analyses but now varied the levels of efficacy against susceptibility (0.1 to 0.9) and efficacy against infectiousness (0.1 to 0.9). Here, we opted to compare studies conducted at a single time point (the point of highest positive SARS-CoV-2 epidemic growth [i.e., the first inflection point]) so that VE estimates could be easily compared across levels of vaccine efficacies.

We found that efficacy against susceptibility had a large influence on the magnitude of bias in both study designs (Fig. 4). Both designs were also capable of producing negative estimates of the true symptomatic VE when efficacy was low, even with moderately unequal testing (Fig. 3e, f). The moderating effect of efficacy against susceptibility was more pronounced as testing by vaccination status changed from moderately to highly unequal.

Fig. 4: Extent of bias in symptomatic vaccine effectiveness (symptomatic VE) estimates and the epidemic size by testing scenarios and levels of vaccine efficacy against susceptibility (VEsusc).
figure 4

The extent of bias—how much symptomatic VE estimates diverged from the vaccine’s “true” protection against symptomatic infection—is shown for estimates derived from the retrospective cohort design (VERR; (a)) and the retrospective test-negative design (VEOR; (c)). VERR and VEOR were calculated from studies conducted at the point of highest positive epidemic growth for each epidemic (i.e., the first inflection point), across three testing scenarios (green): equal testing by vaccination status, moderately unequal testing (vaccinated with 1.76 times higher testing), and highly unequal testing (vaccinated with 2.36 times higher testing). The epidemic size—the cumulative proportions of individuals with symptomatic SARS-CoV-2 who were unvaccinated (yellow) and vaccinated (dark gold) (b)—was also recorded at that time point. Simulated epidemics with fewer than 0.1% cumulative infections were removed. Each boxplot shows the median (centre line), the interquartile range (IQR; the box spans the 25th and 75th percentiles), and whiskers that extend to the most extreme data points within 1.5 times the IQR. The violin plot outlines illustrate the kernel probability density. Boxplots and violin plots are based on n = 100 epidemic realisations per testing scenario and level of VEsusc, except when VEsusc = 0.9 (n = 97). The median time to the point of highest positive epidemic growth by the level of VEsusc was t = 50 for 0.1, t = 58 for 0.3, t = 68 for 0.5, t = 92 for 0.7, and t = 185 for 0.9.

Efficacy against infectiousness also had a moderating effect on the magnitude of underestimates caused by unequal testing in both study designs (Fig. 5). The moderating effect was less pronounced than that of efficacy against susceptibility. The moderating effect of efficacy against infectiousness was present across both higher and lower levels of efficacy against susceptibility.

Fig. 5: Extent of bias in symptomatic vaccine effectiveness (symptomatic VE) estimates and the epidemic size by testing scenarios and levels of vaccine efficacy against infectiousness (VEinfect) given higher and lower vaccine efficacy against susceptibility (VEsusc; 0.55 and 0.1).
figure 5

The extent of bias—how much symptomatic VE estimates diverged from the vaccine’s “true” protection against symptomatic infection—is shown for estimates derived from the retrospective cohort design (VERR) ((a) and (b)) and the retrospective test-negative design (VEOR) ((e) and (f)). VERR and VEOR were calculated from studies conducted at the point of highest positive epidemic growth for each epidemic (i.e., the first inflection point), across three testing scenarios (green): equal testing by vaccination status, moderately unequal testing (vaccinated with 1.76 times higher testing), and highly unequal testing (vaccinated with 2.36 times higher testing). The epidemic size—the cumulative proportions of individuals with symptomatic SARS-CoV-2 who were unvaccinated (yellow) and vaccinated (dark gold) ((c) and (d))—was also recorded at that time point. Simulated epidemics with less than 0.1% cumulative infections were removed. Each boxplot shows the median (centre line), the interquartile range (IQR; the box spans the 25th and 75th percentiles), and whiskers that extend to the most extreme data points within 1.5 times the IQR. The violin plot outlines illustrate the kernel probability density. Boxplots and violin plots are based on n = 100 epidemic realisations per testing scenario and level of VEinfect, except when VEinfect = 0.9 (n = 78 when VEsusc is higher [0.55]; n = 91 when VEsusc is lower [0.1]). When VEsusc was higher (0.55), the median time to the point of highest positive epidemic growth across VEinfect was t = 69 for 0.1, t = 79 for 0.3, t = 93 for 0.5, t = 123 for 0.7, and t = 290 for 0.9; when VEsusc was lower (0.1), the median time was t = 48 for 0.1, t = 54 for 0.3, t = 64 for 0.5, t = 85 for 0.7, and t = 167 for 0.9.

Under unequal testing, the moderating effect of each mechanism of efficacy (efficacy against susceptibility, efficacy against infectiousness) also varied by study design. With differences in efficacy against susceptibility, the moderating effect was more pronounced with the cohort design than the test-negative design (noting also that the absolute magnitude of the bias was much smaller with the test-negative design) (Fig. 4a, c). With differences in efficacy against infectiousness, the moderating effect was more pronounced with the test-negative design than the cohort design and was most evident under scenarios of high efficacy against infectiousness (Fig. 5a, b, e, f).

Efficacy against susceptibility had a larger influence than efficacy against infectiousness and a different influence on the magnitude of bias by study design because of the differential effect that these efficacies had on the cumulative numbers of symptomatic infections by vaccination status—the main components of our VE calculations. Although higher efficacy against susceptibility and higher efficacy against infectiousness resulted in longer epidemic periods with smaller epidemic sizes (Supplementary Figs. 1012), changes in efficacy against susceptibility led to differential reductions in the cumulative proportions of symptomatic infections by vaccination status (Fig. 4b; Supplementary Fig. 10), while changes in efficacy against infectiousness led to more proportionate reductions by vaccination status (Fig. 5c, d; Supplementary Figs. 1112). Specifically, increasing efficacy against susceptibility caused symptomatic infections among the vaccinated to shrink more than among the unvaccinated, such that the absolute difference between these two groups (vaccinated versus unvaccinated) tended to monotonically increase until reaching very high levels of efficacy (Fig. 4b; Supplementary Fig. 10). The absolute difference was amplified when testing was more unequal. In contrast, changes in efficacy against infectiousness caused a similar reduction in the cumulative proportions of symptomatic infections by vaccination status as vaccinated and unvaccinated individuals both benefited from the reduction of infectiousness provided by those who were vaccinated (Fig. 5c, d; Supplementary Figs. 1112). The consequence of this on the calculations of relative risk and odds ratios explains the differences in the moderating effects by study design and are detailed in the Supplementary Information with examples (Supplementary 1: Moderating Effect Calculations; Supplementary Figs. 1316).

Influence of underlying epidemic potential (probability of transmission) on the magnitude of bias from testing differences

To determine whether changing epidemic properties could shape the magnitude of bias due to testing differences, we varied the underlying epidemic potential—the probability of transmission (0.033 to 0.1133,34,35,36)—and assessed the effect on the extent of bias in VE estimates. We again compared studies conducted at a single time point (the highest SARS-CoV-2 epidemic growth point). Similar to the effect of efficacy against infectiousness, decreasing the probability of transmission led to a similar, proportionate reduction in the cumulative proportions of symptomatic infections by vaccination status (Fig. 6c, d; Supplementary Figs. 1718). As such, the moderating influence of the probability of transmission demonstrated a similar pattern to that of efficacy against infectiousness. In the context of high transmission, changes in transmission had a negligible influence on the magnitude of bias produced in both study designs (similar to the pattern observed when efficacy against infectiousness was low) (Fig. 6a, b, e, f). However, in the context of low transmission, small changes in transmission led to larger changes in the magnitude of bias, but only in the test-negative design (similar to the pattern observed when efficacy against infectiousness was high) (Fig. 6e, f).

Fig. 6: Extent of bias in symptomatic vaccine effectiveness (symptomatic VE) estimates and the epidemic size by testing scenarios and probability of transmission (Ptrans) for higher and lower vaccine efficacy against susceptibility (VEsusc; 0.55 and 0.1).
figure 6

The extent of bias—how much symptomatic VE estimates diverged from the vaccine’s “true” protection against symptomatic infection—is shown for estimates derived from the retrospective cohort design (VERR; (a) and (b)) and the retrospective test-negative design (VEOR; (e) and (f)). VERR and VEOR were calculated from studies conducted at the point of highest positive epidemic growth for each epidemic (i.e., the first inflection point), across three testing scenarios (green): equal testing by vaccination status, moderately unequal testing (vaccinated with 1.76 times higher testing), and highly unequal testing (vaccinated with 2.36 times higher testing). The epidemic size—the cumulative proportions of individuals with symptomatic SARS-CoV-2 who were unvaccinated (yellow) and vaccinated (dark gold) ((c) and (d))—was also recorded at that time point. Simulated epidemics with less than 0.1% cumulative infections were removed. If the extent of bias could not be calculated, estimates were replaced with an asterisk. Each boxplot shows the median (centre line), the interquartile range (IQR; the box spans the 25th and 75th percentiles), and whiskers that extend to the most extreme data points within 1.5 times the IQR. The violin plot outlines illustrate the kernel probability density. Boxplots and violin plots are based on n = 100 epidemic realisations per testing scenario and level of Ptrans. When VEsusc was higher (0.55), the median time to the point of highest positive epidemic growth across Ptrans was t = 202 for 0.05, t = 108 for 0.07, t = 82 for 0.09, and t = 67 for 0.11; when VEsusc was lower (0.1), the median time was t = 224 for 0.03, t = 90 for 0.05, t = 65 for 0.07, t = 54 for 0.09, and t = 47 for 0.11.

The difference in the direction and magnitude of bias by study design occurred because of how the ratio of the numerators—the cumulative number of symptomatic SARS-CoV-2 cases in vaccinated versus unvaccinated individuals—changed with the probability of transmission in the cohort design (i.e., in the relative risk), and how both the ratio of cumulative symptomatic cases and the ratio of cumulative symptomatic controls in vaccinated versus unvaccinated individuals (i.e., the numerators and denominators, respectively) changed in the test-negative design (i.e., in the odds ratio) (Supplementary Figs. 1920). These patterns again are similar to those observed with efficacy against infectiousness. A detailed explanation and examples illustrating how these calculations vary with the probability of transmission and differ by study design are provided in the Supplementary Information (Supplementary 1: Moderating Effect Calculations; Supplementary Figs. 1920).

Discussion

Using simulation modelling and SARS-CoV-2 as the motivating example, we examined the magnitude of bias in symptomatic VE due to unequal testing by vaccination status and explored how epidemic and immunological properties could shape its magnitude across two commonly adopted observational VE study designs. Similar to Suah et al.28, we found that the presence of unequal testing by vaccination—where unvaccinated individuals tested less than vaccinated individuals—resulted in an underestimation of the true symptomatic VE in both the cohort and test-negative designs, with greater testing differences resulting in larger underestimates. As expected, the cohort design consistently produced larger underestimates (larger biases) than the test-negative design. The largest bias for both study designs occurred with low efficacy against susceptibility, but the moderating effect of efficacy against susceptibility on bias was more pronounced in the cohort design compared with the test-negative design. In contrast, the magnitude of bias due to unequal testing in the test-negative design was more likely than that in the cohort design to be influenced by the study’s selected sampling period, the efficacy against infectiousness, and the probability of transmission.

Our findings confirm and build on previous insights on the mechanisms by which unequal testing and healthcare engagement can create biased estimates of VE in cohort and test-negative designs6,7,8,21,22,23. First, our work expands on previous qualitative studies that showcased the potential for residual confounding and selection bias in VE estimates6,7,8. Specifically, we illustrated how selection bias in test-negative designs can play an important role in affecting estimates of true symptomatic VE—although the magnitude of the bias is likely to be small. Second, our work provides a detailed comparison of the magnitude of bias found across the two most commonly adopted study designs—the retrospective cohort and test-negative design. Our findings add support to the use of the test-negative design over that of the cohort design in estimating VE in the presence of unequal testing by vaccination status24. We further identified important epidemic and immunological properties that should be considered when interpreting VE estimates from both study designs: efficacy against susceptibility, efficacy against infectiousness, and the probability of transmission.

Efficacy against susceptibility had the largest influence on the magnitude of bias in VE estimates, with a more pronounced effect on estimates from the cohort design. Other studies have also found that efficacy against susceptibility can moderate the magnitude of bias produced by other mechanisms of bias, such as heterogeneity in contact levels by vaccination status37 and confounding bias due to correlated case and control vaccination behaviours2. The key implication from these findings is that as a virus undergoes genetic adaptations leading to immune escape, and/or waning against susceptibility occurs, the risk of larger biases in VE estimates increases. This is likely why several studies found estimates of negative VE (suggesting that the vaccine was increasing symptomatic infection) during the emergence of the Omicron variant of SARS-CoV-2 (e.g.,26,27)—a variant known to have higher immune escape38.

There are three important implications stemming from the influence of the study’s selected sampling period, efficacy against infectiousness, and the epidemic potential (probability of transmission)—all of which were more pronounced with the test-negative design than with the cohort design. First, in the context of an emerging and evolving outbreak, VE studies are routinely conducted over successive time periods. Thus, it becomes increasingly important in the context of unequal testing to repeat VE studies and consider the timing of sampling for interpretation of VE estimates. Early in the epidemic, a test-negative design offers a large advantage in terms of minimising the magnitude of bias due to unequal testing, but it calls for greater attention as the epidemic progresses. Even with equal testing, continued caution may still be warranted, as VE estimates can become overestimated when the epidemic is large and the study’s selected sampling period is longer and includes later epidemic stages. This overestimation occurs because, as the epidemic progresses, the pool of individuals who have never been infected with symptomatic SARS-CoV-2 shrinks—especially among the unvaccinated—leading to a control group that no longer accurately reflects the population’s vaccination distribution.

Second, although changes in efficacy against infectiousness play an overall smaller role in moderating the magnitude of bias from unequal testing compared to efficacy against susceptibility, their effects can still be prominent when using the test-negative design. Efficacy against infectiousness is therefore important to consider in the context of the test-negative design as efficacy against infectiousness can change as vaccines are updated and as viruses, such as influenza, develop immune escape that alters their ability to transmit39.

Third, although more pronounced in the context of low epidemic potential, the finding that changes in epidemic potential can moderate the magnitude of bias from unequal testing underscores a need for caution when comparing and interpreting VE estimates across different contexts. Respiratory viruses can have seasonal trends40,41 and, in cases such as SARS-CoV-2 and respiratory syncytial virus, can evolve to have different transmissibility32,42. This means that the epidemic potential of respiratory viruses—and potentially the magnitude of bias due to testing differences—could vary across seasons, regions, and time periods. Similarly, the prevalence of COVID-like symptoms from alternate etiologies, which also influenced the magnitude of bias in symptomatic VE estimates, may also exhibit distinct geographic, seasonal, and yearly patterns that differ from those of the infection of interest. Taken together, these considerations highlight the challenge of ensuring meaningful comparisons of symptomatic VE estimates and the potential importance of accounting for seasonal, geographic and temporal variability in these comparisons.

Taken together, when pooling or synthesising VE estimates, future work is needed to account for sources of heterogeneity that affect the magnitude of potential biases to strengthen the robustness of evidence synthesis. In the meantime, careful inclusion of nuanced DAGs (across study designs) and cautious and transparent interpretation to communicate the potential benefits of vaccination are essential. Good practices also include anticipating and recognising that observed VE may represent biased VE estimates in the presence of unequal testing by vaccination status. These practices are important given that addressing unequal testing remains challenging. Electronic health records in national databases typically have information on exposures and health outcomes, but contain limited data on confounders43. We therefore echo the sentiments of Lewnard et al.44 cautioning that the test-negative design—along with other types of study designs—should not be naively applied to testing datasets collected through administrative or public health surveillance data sources. An area of future methodological work includes using externally collected data on testing by vaccination status from representative surveys (e.g., refs. 9, 10) to conduct quantitative bias analysis, including simulation-based bias analysis45,46.

Our findings also have utility beyond VE studies as they can offer insights into how other forms of unequal testing may impact estimates of protective effects, such as natural immunity. Testing differences by prior infection status have been observed47. Had we replaced vaccination with prior infection in our study and instead estimated the protection of natural immunity, we would expect that unequal testing by prior infection would lead to biased estimates. This expectation aligns with other studies that have noted the potential for such unequal testing to compromise protective effect estimates48,49. Given this, the good practices and future methodological work for interpreting and addressing potential bias from unequal testing in VE estimates can also be applied to estimates of other protective effects, such as natural immunity, that may be affected by testing differences.

Our simulation study has several limitations. Chief among them is our assumption that vaccination coverage was static. Rolling out a vaccine over time could potentially reduce the magnitude of bias due to unequal testing if, for example, individuals who have high healthcare engagement (and hence are likely to be tested) also wanted to be vaccinated but were not able to immediately receive one. However, this effect could also be reversed if interventions initially reached those with the highest healthcare engagement first before being distributed to those with lower healthcare engagement. This pattern could exacerbate unequal testing by vaccination early in the epidemic before stabilising as the vaccine is distributed to those with lower healthcare engagement.

Second is our assumption that individuals can only be infected with SARS-CoV-2 once. The presence of prior infections can contribute to underestimating symptomatic VE50. Therefore, including prior undetected infections before the start of each study’s sampling period may have further exacerbated the underestimation of VE due to unequal testing by vaccination status. Allowing for multiple infections throughout the sampling period and estimating hybrid immunity could also have underestimated this protective effect if testing was unequal and unvaccinated individuals—with or without natural immunity—remained a reference group. The magnitude of this bias would, however, depend on how testing varied by vaccination status, prior infection status, and the interaction between the two.

Third, we had assumed that for those infected with SARS-CoV-2, the severity and duration of symptoms did not vary by vaccination status. Vaccination for SARS-CoV-2 has been shown to reduce both the severity of symptoms and symptom duration51,52,53, which could lead to vaccinated individuals testing less frequently when symptomatic or having fewer days to test. This could, in turn, reduce the magnitude of bias due to unequal testing. However, the magnitude of bias could remain unchanged if vaccinated individuals were also more likely than unvaccinated individuals to get tested with milder symptoms and to test earlier.

Fourth, our scenarios were conditioned on vaccinated individuals testing more than unvaccinated individuals based on several surveys related to SARS-CoV-2 testing and vaccination (e.g., refs. 9, 10). However, unequal testing has also occurred in the opposite direction54, which is an important consideration when comparing VE studies across different study settings, irrespective of pathogen. If unequal testing reflects higher testing among unvaccinated individuals, the direction of the bias would change, leading to an overestimation rather than an underestimation of symptomatic VE.

Finally, for simplicity, we assumed that the vaccine provided “all-or-nothing” protection rather than “leaky” protection against infection. This means the vaccine induced 100% protection for a subset of vaccinated individuals and none for the remainder rather than offering partial protection to every vaccinated individual55. Had we assumed leaky protection and not excluded those with past infection56, the increasing magnitude of bias in our symptomatic VE estimates with unequal testing would likely have increased further. This is because even with equal testing, leaky protection introduces downward bias in VE estimates over the course of an epidemic as more uninfected and unvaccinated individuals who are not at risk of infection are included in the analysis25,56.

Observational studies are key to assessing how well vaccines might be working in the real world2. We illustrated how unequal testing by vaccination status could bias estimates of true symptomatic VE across two commonly used study designs. Overall, using a test-negative design rather than a cohort design can reduce the magnitude of bias due to unequal testing when the mechanisms driving unequal testing are unmeasured. However, interpretation of VE estimates from both study designs still requires caution, especially given heterogeneity in epidemic and immunological properties—with the most important property being vaccine efficacy against susceptibility.

Methods

Mechanisms of interest

The causal relationship we are interested in is the effect of vaccination on symptomatic SARS-CoV-2 infection (Fig. 1). For simplification, we assumed that vaccination does not reduce symptoms in vaccinated individuals who acquire SARS-CoV-2, and thus the effect of vaccination on infection is the same as its effect on symptomatic infection. If infected, an individual may develop symptoms, which leads to an opportunity to engage in SARS-CoV-2 testing. Both infectious and non-infectious etiologies other than SARS-CoV-2 can lead to symptoms similar to those caused by SARS-CoV-2 (referred to as etiologies of COVID-like symptoms, Fig. 1)57.

We assumed that the association between vaccination and testing (pathway marked as “iii” in Fig. 1, which reflects unequal testing by vaccination status) is induced by a relationship between the level of healthcare engagement and vaccination (pathway marked as “i” in Fig. 1) and a relationship between the level of healthcare engagement and testing (pathway marked as “ii” in Fig. 1).

In the cohort design, everyone was included in the study sample. Figure 1a depicts how the assumed relationships between healthcare engagement, vaccination, and testing can lead to residual confounding in VE estimates when using the cohort design. Specifically, the existence of relationships between the level of healthcare engagement and vaccination (“i” in Fig. 1a) and between the level of healthcare engagement and testing (“ii” in Fig. 1a) creates a biasing pathway between vaccination and testing positive for SARS-CoV-2.

In the test-negative design, the study sample was restricted to individuals who were symptomatic and tested for SARS-CoV-2 (depicted by the two “conditioned upon” circles in Fig. 1b). Conditioning on testing aims to block the back-door path between vaccination and testing positive for SARS-CoV-2 that exists due to unmeasured healthcare engagement; however, this conditioning can also introduce a selection bias8. Selecting additionally on being symptomatic aims to block the selection bias, but it can also lead to another bias. Specifically, conditioning on testing introduces a relationship between healthcare engagement and being symptomatic (“iv” in Fig. 1b), while additionally conditioning on being symptomatic induces an association between healthcare engagement and SARS-CoV-2 infection (“v” in Fig. 1b). This induced association (“v” in Fig. 1b), together with the existing causal relationship between healthcare engagement and vaccination (“i” in Fig. 1b), results in confounding that biases the relationship between vaccination and SARS-CoV-2 infection (see Supplementary 2: Selection Bias for details).

To focus our examination on the mechanism of bias from unequal testing by vaccination status, we assumed no other sources of unmeasured confounding.

Model overview

We used an agent-based model to simulate SARS-CoV-2 dynamics and to model the individual-level variations in vaccination and testing (Fig. 2). We chose an agent-based model as it is a flexible approach that can capture emergent phenomena arising from complex interactions in a heterogeneous population58. In our model, each individual (i.e., agent) was assigned attributes to capture health states related to SARS-CoV-2: Susceptible, Exposed, Infectious, and Recovered (where individuals are assumed to be fully protected from reinfection). Other individual-level attributes that vary over time include: symptom status (symptomatic or asymptomatic), SARS-CoV-2 testing (tested or not tested); SARS-CoV-2 diagnosis if tested (tested positive vs. negative). Attributes that do not vary over time include: healthcare engagement (low or high) and vaccination status (vaccinated or unvaccinated). The model updates each individual’s attributes daily. Table 1 summarises the parameters and their corresponding data sources.

Contact patterns

Transmission occurs in daily time-steps through contacts in a homogenous and static network generated using an Erdős-Rényi random graph model59. The network is based on an average of 6 daily contacts60. Given the network is sparse, the distribution of contacts across the population follows a Poisson distribution61. The original placement of edges (representing contacts) is homogeneous by vaccination status, and all nodes (individuals) are connected to the main network (Supplementary 2: Contact Patterns).

SARS-CoV-2 transmission

Based on available data from the literature, we assigned a default value for the biological probability of transmission per contact per day, the duration of the exposure state, the duration of infectiousness, and the proportion asymptomatic if infected with SARS-CoV-2 (Table 1). Each of these processes is stochastic and was implemented using draws from statistical distributions (Supplementary 2: SARS-CoV-2 Transmission Dynamics).

COVID-like symptoms

At any given point in time, a stable proportion (default value of 0.152 in our primary analysis) of the simulated population experienced COVID-like symptoms from alternate etiologies62. We assumed symptoms lasted on average for 10 days, reflecting the average duration of symptoms from the “common cold”63 and influenza64 (Supplementary 2: COVID-like Symptoms). Individuals could experience COVID-like symptoms from alternative etiologies more than once.

Vaccination (coverage and mechanisms of protection)

We applied a fixed 75% vaccination coverage65 such that 75% of the simulated population was fully vaccinated prior to the start of the epidemic, with no further vaccination thereafter. This reflects a scenario wherein a variant may emerge, but a new or additional vaccination has not yet been rolled out.

We simulated vaccine efficacy against susceptibility to acquisition of SARS-CoV-2 and vaccine efficacy against infectiousness55. Efficacy against susceptibility was implemented as “all-or-nothing” protection. That is, the vaccine induced 100% protection for a subset of vaccinated individuals (equal to the efficacy against susceptibility) and zero protection for the remainder (i.e., 1 - efficacy)55. Efficacy against infectiousness was implemented as a reduced probability of SARS-CoV-2 transmission for vaccinated individuals infected with the virus, given contact with a susceptible individual. We assumed that vaccination did not reduce the probability of becoming symptomatic and that vaccine efficacies did not wane over time.

Most of our analyses used a default value for higher and lower efficacy against susceptibility (0.55, informed by an average estimate of VE against Omicron infection66; and 0.1, reflecting VE estimates taken 9 months after vaccination67). For efficacy against infectiousness, we used a default value of 0.268.

Symptomatic VE estimation

To mimic the retrospective cohort design in studies using population-level health administrative data, we assumed that symptomatic individuals with SARS-CoV-2 who were not tested would be misclassified as not infected when estimating true symptomatic VE using the cohort design (Eq. 1):

$${{VE}}_{{RR}}(t)=1-{RR}(t)$$
(1)

with:

$${RR}\left(t\right)=\frac{\frac{{{CI}}_{V}(t)}{{N}_{V}}}{\frac{{{CI}}_{U}(t)}{{N}_{U}}}$$
(2)

where \({RR}(t)\) is the relative risk for a study conducted at time t; \({{CI}}_{V}(t)\) and \({{CI}}_{U}(t)\) are the cumulative numbers of vaccinated and unvaccinated individuals, respectively, who are symptomatic and infected with SARS-CoV-2, and have been tested and diagnosed (tested positive) for a study conducted at time t; and \({N}_{V}\) and \({N}_{U}\) are the total numbers of vaccinated and unvaccinated individuals, respectively.

To assess the potential influence of time in the cohort design, additional sensitivity analyses were conducted, in which true symptomatic VE was estimated using a Cox model with vaccination status as a covariate (see Supplementary 2: Symptomatic VE Estimation for details). If an individual tested more than once during their symptomatic SARS-CoV-2 infection, only the first positive test was selected. Individuals who did not test positive were censored at the end of each study’s sampling period.

To mimic the test-negative design, we sampled from the simulated cohort individuals who were both symptomatic and tested. Using those sampled individuals, we estimated the true symptomatic VE using the test-negative design (Eq. 3):

$${{VE}}_{{OR}}(t)=1-{OR}(t)$$
(3)

with

$${OR}\left(t\right)=\frac{\frac{{{CI}}_{V}(t)}{{{CU}}_{V}(t)}}{\frac{{{CI}}_{U}(t)}{{{CU}}_{U}(t)}}$$
(4)

where \({OR}(t)\) is the odds ratio for a study conducted at time t; and \({{CU}}_{V}(t)\) and \({{CU}}_{U}(t)\) are the cumulative numbers of vaccinated and unvaccinated individuals, respectively, who are symptomatic due to alternate etiologies that cause COVID-like symptoms, who tested negative for SARS-COV-2, and had no prior positive tests for symptomatic SARS-COV-2 for a study conducted at time t. Each individual included in a study receives only one classification.

To assess the potential influence of time in the test-negative design, additional sensitivity analyses were conducted, in which true symptomatic VE was estimated using a logistic model with vaccination status and time since the start of the study (i.e., calendar day) as covariates (see Supplementary 2: Symptomatic VE Estimation for details). Individuals contributed only a single positive or negative test to a study: if an individual had at least one positive test, only the first positive test was selected; if an individual had multiple negative tests and no positive tests, one negative test was randomly selected.

Testing scenarios

We simulated three testing scenarios, each with a varying strength of the relationship between testing and vaccination (“iii” in Fig. 1): equal testing scenario (equal probability of testing by vaccination status); moderately unequal testing scenario (vaccinated with 1.76 times higher testing); and highly unequal testing scenario (vaccinated with 2.36 times higher testing). Here, the direction of the testing differences (i.e., vaccinated individuals testing more than unvaccinated) and the magnitude of the differences in the moderately unequal testing scenario are based on empirical survey results10. Varying “iii” also affected the timing of testing by vaccination status. Thus, in our testing scenarios, vaccinated individuals tested earlier than unvaccinated individuals when testing was unequal, with the difference in the timing growing as unequal testing went from moderately unequal to highly unequal (Supplementary 1: Timing of Testing).

The magnitude of “iii” is influenced by the likelihood of vaccination by the level of healthcare engagement (“i” in Fig. 1); and the likelihood of testing by the level of healthcare engagement (“ii” in Fig. 1). We varied values of “i” to simulate the three testing scenarios above (see Supplementary 2: Testing and Healthcare Engagement and Testing Scenarios for details). The magnitude of “iii” remained constant throughout each epidemic realisation.

Simulations and analyses

Each scenario consisted of 100 epidemic realisations and each epidemic realisation used a population size of 100,000. We selected these values because in scenarios using our default parameters, estimates of symptomatic VE had stabilised by the point of highest positive epidemic growth for SARS-CoV-2 (i.e., the first inflection point). All realisations were initialised with 10 randomly assigned individuals on day 1 of their infectiousness (see Supplementary 2: Event Scheduling). In our analyses, we estimated symptomatic VE using a cohort design and test-negative design (VERR and VEOR) under the three testing scenarios. As previously noted, vaccination coverage, immune escape, and differences in testing were all assumed to be constant over time and no additional time-related confounders were included in the model.

The magnitude of bias was measured as the difference between efficacy against susceptibility (the model input) and the estimates of symptomatic VE produced by the cohort design (VERR) and the test-negative design (VEOR). We assessed the magnitude of bias using efficacy against susceptibility as it represents the true symptomatic VE. Efficacy against susceptibility is equivalent to true symptomatic VE because (1) the vaccine provides all-or-nothing protection55; and (2) the vaccine does not reduce the likelihood of becoming symptomatic once infected. We performed model verification to confirm the accuracy of our model and our symptomatic VE estimates (Supplementary 2: Model Verification; Supplementary Figs. 2122). Results are presented as medians with variability summarised using either 2.5th–97.5th percentile ranges or boxplots.

To explore how estimates of symptomatic VE vary across study designs and sampling periods (Objective 1), we calculated VERR and VEOR (cumulative measurements as shown in formulas 1 and 2) across studies with varying sampling periods under the three testing scenarios. Specifically, we began on day 30 (representing a study with a 30-day sampling period) and then repeated the calculations at each subsequent time point (representing studies with progressively larger sampling periods) until the end of the epidemic. We conducted these analyses using both higher and lower default values of efficacy against susceptibility (0.55 and 0.1, respectively). To assess the potential influence of time on estimates of symptomatic VE, we performed additional sensitivity analyses using models that accounted for time since the start of the study. We also varied the prevalence of etiologies of COVID-like symptoms (0.05–0.2) and test sensitivity (0.5–1) to explore their potential influences on the magnitude of bias under the three testing scenarios. The influence of test sensitivity was assessed by considering only each individual’s first positive test within an infection period. As the effects of test sensitivity are known to change with the ratio of true disease cases to controls69, we evaluated its influence in studies conducted at two distinct stages of the epidemic: (1) the point of highest positive epidemic growth (i.e., the first inflection point); and (2) the epidemic peak.

After estimating symptomatic VE across studies with different sampling periods, we then calculated VERR and VEOR (with default parameters as per Table 1) under a range of epidemic and immunological properties to determine whether varying these properties could shape the magnitude of the bias due to unequal testing (Objectives 2 and 3). First, we varied efficacy against susceptibility (0.1–0.9) and efficacy against infectiousness (0.1–0.9). We then varied the underlying epidemic potential—the probability of transmission (0.033–0.1133,34,35,36).

Results for Objective 1 were reported across a range of study sampling periods (30–150 and 30–100 days for higher and lower efficacy against susceptibility, respectively); results for Objectives 2 and 3 were reported as the difference between VERR and VEOR and true symptomatic VE (i.e., the level of efficacy against susceptibility) for studies conducted at a single time point—the point of highest positive epidemic growth for SARS-CoV-2 (i.e., the first inflection point). A single time point was used for Objectives 2 and 3 so that symptomatic VE estimates could be easily compared across different levels of epidemic potential and immunological protection.

All simulations were conducted using R70 (version 4.3.1) with computations performed on the Niagara supercomputer at the SciNet HPC Consortium71,72 (Supplementary 2: Computing Resources).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.