Abstract
Aberrant immune responses to viral pathogens contribute to pathogenesis, but our understanding of pathological immune responses caused by viruses within the human virome, especially at a population scale, remains limited. We analyzed whole-genome sequencing datasets of 6,321 Japanese individuals, including patients with autoimmune diseases (psoriasis vulgaris, rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), pulmonary alveolar proteinosis (PAP) or multiple sclerosis) and coronavirus disease 2019 (COVID-19), or healthy controls. We systematically quantified two constituents of the blood DNA virome, endogenous HHV-6 (eHHV-6) and anellovirus. Participants with eHHV-6B had higher risks of SLE and PAP; the former was validated in All of Us. eHHV-6B-positivity and high SLE disease activity index scores had strong correlations. Genome-wide association study and long-read sequencing mapped the integration of the HHV-6B genome to a locus on chromosome 22q. Epitope mapping and single-cell RNA sequencing revealed distinctive immune induction by eHHV-6B in patients with SLE. In addition, high anellovirus load correlated strongly with SLE, RA and COVID-19 status. Our analyses unveil relationships between the human virome and autoimmune and infectious diseases.
Similar content being viewed by others
Main
Well-regulated induction of immune responses targeting nonself-viral epitopes is beneficial for virus clearance. On the other hand, excessive or aberrant responses to virus infection can resemble those of autoimmune diseases and occasionally pose risks of self-reactive immune responses as in Guillain–Barré syndrome1. Similarly, epitope indistinguishability between EBNA1, encoded by Epstein–Barr virus (EBV), and host GlialCAM likely contributes to the autoreactive lymphocyte expansion seen in multiple sclerosis (MS)2,3. In the mouse model, persistent infection by Theiler’s virus causes an autoimmune response targeting myelin via epitope spreading, where self-proteins are aberrantly presented to the immune system4. In systemic lupus erythematosus (SLE) plasma, antibody titers against an envelope protein encoded by endogenous retrovirus correlate with interferon (IFN)-stimulated genes (ISGs) expression5. In addition, the immunopathogenic response recognized as a ‘cytokine storm’ is attributable to both virus infection, as seen in severe coronavirus disease 2019 (COVID-19), and autoimmune diseases6. Our understanding of the relationships between virus infection and autoimmune diseases is incomplete. For example, it is still unclear which autoimmune diseases are associated with virus infection and what factors license virus infection to initiate and exacerbate autoimmune disease.
DNA viruses often establish asymptomatic latent infections in healthy individuals. For example, EBV infection is highly prevalent, reaching more than 90% of adult individuals7. Anelloviruses are detectable in the blood of more than 8% of healthy people8. Beyond exogenous viruses, about 1% of humans carry endogenous HHV-6 (eHHV-6), a heritable HHV-6 genome integrated into the subtelomere region that has been proposed to be able to reactivate as a transmissible virus9,10. Because eHHV-6 is not included in reference human genomes, it is an overlooked genetic factor that may trigger antiviral responses. High-throughput sequencing has provided new methods to identify and study such viruses, characterizable as the human ‘virome’8. For example, whole-genome sequencing (WGS) of human specimens and cell lines has been used to study eHHV-6 prevalence and potential reactivation9,11,12. Population-scale sequencing can suggest relationships between the presence of specific human viruses and diseases or immune responses. For instance, the presence of EBV nucleic acid in the spleen and blood is associated with an increased plasma cell number13. The presence of an ssDNA virus family discovered by virome profiling, called Redondoviridae, is associated with periodontitis14. Anellovirus load correlates with immunosuppression and acute respiratory disease15,16. The presence of eHHV-6 has been associated with pre-eclampsia, angina pectoris and increased levels of HHV-6-specific antibody titers17,18,19. These findings highlight the need for systematic virome analysis to identify new associations between viruses and human diseases. The partially revealed associations between viruses and autoimmune diseases, for example, between EBV and MS, suggest that similar relationships may be discoverable with further investigation.
In this study, we analyzed WGS from >6,000 Japanese individuals, including patients with five autoimmune diseases, COVID-19 or healthy controls. We systematically detected eHHV-6 and anellovirus nucleic acids in these datasets and conducted case–control association studies that compared eHHV-6 prevalence or anellovirus abundance in unrelated individuals with or without these diseases (Fig. 1). To confirm the inherited nature of the detected eHHV-6 genome, we performed genome-wide association study (GWAS) of eHHV-6B positivity and PacBio HiFi long-read sequencing of eHHV-6B-positive patients with SLE. In addition, epitope mapping by phage immunoprecipitation sequencing (PhIP-seq) from four eHHV-6B-positive and 12 eHHV-6B-negative patients with SLE and single-cell RNA sequencing (scRNA-seq) data from four eHHV-6B-positive and 11 eHHV-6B-negative patients with SLE were compared.
We collected 6,321 WGS datasets from patients with five autoimmune diseases (PV, RA, SLE, PAP and MS) and COVID-19 and healthy controls. The datasets were analyzed to detect eHHV-6 and anellovirus infection. In addition, we conducted a large-scale replication study of the association between SLE and eHHV-6 using All of Us WGS data. We conducted GWAS of HHV-6B positivity. We performed long-read sequencing, PhIP-seq and scRNA-seq targeting eHHV-6B-positive patients with SLE (Methods).
Results
Overview
We analyzed 6,321 Japanese samples. Details of the sample characteristics, sample collection and the WGS platform are described in Methods. The WGS datasets were composed of 1,357 participants diagnosed with psoriasis vulgaris (PV), 378 with rheumatoid arthritis (RA), 251 with SLE, 216 with pulmonary alveolar proteinosis (PAP), 36 with MS, 1,164 with COVID-19 and 2,919 healthy controls. The majority of the patients with COVID-19 (n = 1,068; 91.8%) had severe disease (requiring oxygen support, artificial respiration and/or intensive care; Supplementary Table 1). Two bioinformatics pipelines were used to detect eHHV-6 and anellovirus infection (Fig. 1). Briefly, we remapped those WGS reads that did not map to the human reference genome to their respective viral reference genomes. For anellovirus detection, 3,333 human anellovirus sequences were used to accurately detect widely divergent anellovirus genomes. Logistic regression and Fisher’s exact were used to calculate the adjusted odds ratio (OR) for the risk of each disease with eHHV-6 or three anellovirus infection states defined from normalized anellovirus viral load (viral copies per cell). In each disease with a high OR, we estimated the relationship between clinical metrics in each disease and eHHV-6 or three anellovirus infection states using linear regression or logistic regression and Fisher’s test.
eHHV-6B associates with autoimmune diseases and SLE activity
We identified eHHV-6A in 16 participants and eHHV-6B in 22 participants (0.25% and 0.35% prevalence, respectively; Table 1, Supplementary Table 2 and Fig. 2a). Of these participants, one healthy control with eHHV-6A and four participants with eHHV-6B (one with PV, two with PAP and one healthy control) had evidence of the ‘solo-DR’ form of eHHV-6 (ref. 9).
a, Scatter plot of the prevalence of eHHV-6A versus eHHV-6B in participants with five autoimmune diseases, COVID-19 and healthy controls. The marker size indicates the sample size for each disease. The red line indicates that the prevalence of eHHV-6 = 1%. b, Box plot of SLEDAI scores for patients with SLE without and with eHHV-6B. SLEDAI scores in participants with eHHV-6B were significantly higher than those without eHHV-6B (mean SLEDAI = 30.5 and 6.0 for eHHV-6B-positive and eHHV-6B-negative patients with SLE, respectively; two-sided Wald test P = 1.3 × 10−8). The histogram shows the distribution of the number of patients with SLE without and with eHHV-6B (top). The horizontal histogram shows the distribution of SLEDAI scores of patients with SLE without and with eHHV-6B, with different colors (right). Boxes denote the IQR; the median is shown as horizontal bars; whiskers extend to 1.5× IQR; outliers are shown as individual points. IQR, interquartile range.
Next, we tested for case–control associations between eHHV-6 and each disease using the logistic regression model and Fisher’s test. No significant differences were observed in eHHV-6A prevalence in patients with any disease compared with healthy controls. In contrast, eHHV-6B was more prevalent in patients with SLE and PAP than in healthy controls (1.6% and 1.4% versus 0.21% prevalence; Fisher’s test mid-P = 0.0060 and 0.022, respectively). After adjustment for age and sex, we confirmed eHHV-6B associations with SLE and PAP with an adjusted OR of 6.9 in SLE (95% confidence interval (CI) = 1.61–27.2; Wald test P = 0.0061) and 7.0 in PAP (95% CI = 1.43–27.7; P = 0.0075). These observations suggest that carrying eHHV-6B contributes to the risk of these diseases with a large effect size relative to previously known genetic and environmental factors of the diseases20,21,22.
To identify a possible association between eHHV-6B and clinical metrics of SLE or PAP, we used the linear or logistic regression model and Fisher’s test (Supplementary Tables 1 and 3). eHHV-6B had no association with low serum complement levels, immunosuppressant use or prednisolone use in SLE (P = 0.46, 0.79 and 0.76, respectively). However, eHHV-6B had a significant positive correlation with high SLE disease activity index (SLEDAI) scores23 (mean SLEDAI = 30.5 and 6.0 for eHHV-6B-positive and eHHV-6B-negative patients with SLE, respectively; P = 1.3 × 10−8). Three of four participants with SLE carrying eHHV-6B had very high SLEDAI scores (≥18), suggesting that eHHV-6B may have a substantial clinical impact on disease activity (Fig. 2b). Of interest, three of four eHHV-6B-positive participants with SLE were positive for anti-RNP antibodies, which are typically seen in about 33% of patients with SLE24, while the fourth participant was positive for anti-Scl-70 antibodies, which is quite rare in SLE. Two eHHV-6B-positive patients with SLE developed complications from other autoimmune diseases. One patient who tested positive for anti-RNP antibodies had scleroderma and Sjögren’s syndrome, and the other patient positive for anti-Scl-70 antibodies had scleroderma and antiphospholipid syndrome. Given these unusual clinical manifestations, inborn eHHV-6B positivity may predispose to atypical SLE pathogenesis. eHHV-6B was not associated with the titer of antigranulocyte-macrophage colony-stimulating factor autoantibodies in PAP (P = 0.90), which are considered central to the pathogenesis of the adult sporadic type of PAP25.
eHHV-6B also associates with high SLE risk in All of Us
We conducted an independent large-scale replication study of the association between SLE and eHHV-6 using WGS data from the National Institutes of Health’s (NIH) All of Us program26. The participants in the All of Us cohort are genetically diverse, and the risk of SLE varies considerably by ancestry, so we performed association testing in the 121,494 individuals of European ancestry and the entire 243,247-person multi-ancestry dataset using a logistic regression model (Table 2). Concordant with the Japanese discovery cohort, no significant associations were observed in eHHV-6A prevalence in either population (P > 0.15). In contrast, eHHV-6B was associated with high SLE risk, with an adjusted OR of 2.7 in European ancestry (95% CI = 1.54–4.76; P = 0.0020) and 2.4 in multi-ancestry (95% CI = 1.47–3.82; P = 0.0012). These results indicate that eHHV-6B is associated with SLE in individuals of diverse ancestry.
Risk eHHV-6B of SLE locates on chromosome (chr) 22q
The sequencing depth of the HHV-6 genome detected in this study was nearly half that of the human genome in each participant, suggesting that the HHV-6 sequences are inherited along with single human haplotypes (Extended Data Fig. 1a,b). To understand the ___location of eHHV-6B in the human genome, we conducted a GWAS using common variants called the T2T-CHM13 reference. Comparison of 22 eHHV-6B-positive participants and 216 eHHV-6B-negative participants revealed a highly significant association with variants on the distal end of chr22q (Fig. 3a). This result was consistent with a previous GWAS using Fisher’s test9, but using T2T-CHM13 as a reference provided a more densely mapped set of associated variants. Focusing on the distal end of chr22q, we performed logistic regression including rare variants (Fig. 3b,c and Supplementary Table 4). The 16 variants with P < 1 × 10−4 spanned a 500-kb region including different gene regions and different linkage disequilibrium (LD) block structures, but all were in high LD with eHHV-6B. By using 30× PacBio HiFi long-read sequencing, we confirmed that these tagged variants were on the same haplotype in each of the four eHHV-6B-positive patients with SLE (Fig. 3d and Extended Data Fig. 1c). These GWAS results reflect the LD relationship with the HHV-6B genome, suggesting that eHHV-6B in the Japanese population is integrated to the distal end of chr22q and has been inherited together as a linked unit. All of the tagged variants are intergenic or intronic, thus not supporting their pathogenicity, whereas all four eHHV-6B-positive patients with SLE carried the full-length HHV-6B genome implicated with viral reactivation27 (Extended Data Fig. 1b). eHHV-6B itself rather than the tagged variants might therefore be involved in the pathogenesis of SLE.
a, Manhattan plot of common variant-based GWAS of eHHV-6B (22 cases versus 216 controls). The straight line indicates the genome-wide significance level (5 × 10−8). b, Regional association plot for the distal end of chr22q. The dotted line indicates the genome-wide significance level. The 16 variants with P < 1 × 10−4 were labeled by reference SNP ID (rs ID). Color shading represents the maximum value of LD (r2) with each of the lead variants in East Asian populations of 1KG. The circle size indicates LD (r2) with the HHV-6B genome in the present WGS data. c, LD map of the lead variants in East Asian populations of 1KG. d, Heatmap representing the dominant genotype of the 16 variants in 22 eHHV-6B-positive participants. We confirmed that the variants were on the same haplotype in four patients with SLE with eHHV-6B by 30× HiFi long-read sequencing (Extended Data Fig. 1c). Uncorrected P values from the GWAS analysis (two-sided Wald test) are shown in a and b. LRS, long-read sequencing; EAS, East Asian.
eHHV-6B results in distinctive immunity in patients with SLE
To investigate whether eHHV-6 can result in the induction of unique antiviral responses, we performed epitope mapping of anti-HHV-6 antibodies in 16 patients with SLE (four eHHV-6B-positive and 12 eHHV-6B-negative; Supplementary Table 5) by PhIP-seq (Fig. 4a)28,29. As a result of epitope mapping to 108 HHV-6B proteins, two HHV-6B virion proteins, antigenic virion protein and glycoprotein O, were frequently recognized as antigens across all participants, in agreement with the established notion that most adults have at one time been infected by exogenous HHV-6 and have acquired immunity against it30,31. To test whether there are epitopes specific to eHHV-6B carriers, we tested for an association between the presence of eHHV-6B and the antigenicity of HHV-6 peptides (Fig. 4b). Responses against one region (amino acids 476–490) in the immediate-early A (IE-A) transactivator were strongly detected in all eHHV-6B carriers, whereas responses against this region were not as strong in all eHHV-6 noncarriers (P = 3.5 × 10−6; Fig. 4c). This demonstrates that eHHV-6B results in distinctive immune responses against HHV-6B in patients with SLE. Because eHHV-6B originated as a virus but is now encoded in the human genome—it has the nature of both virus and self—eHHV-6B may be a heritable viral antigen to which immune responses represent autoimmunity, a phenomenon we term ‘endoimmunity’.
a, Detection of epitopes on HHV-6 proteins. Means of the PhIP-seq scores of the indicated participant groups are shown. Each dot shows a score for a peptide split by a 14 amino acids sliding window applied to HHV-6A and HHV-6B proteins. Specific proteins of interest are labeled. b, Detection of epitopes on the three HHV-6B proteins. Copy numbers of immunoprecipitated phage genomes encoding HHV-6B peptides are shown as a heatmap. Three HHV-6B proteins that harbor abundant epitopes are shown. The region of the IE-A transactivator associated with eHHV-6B positivity is labeled with an orange arrow (amino acids 476–490). c, Statistical association between the presence of eHHV-6 and the copy numbers of immunoprecipitated phage genomes encoding HHV-6 peptides. IE-A transactivator amino acids 476–490 are labeled in orange. Uncorrected P values from the two-sided Wald test are shown. Red horizontal lines show the threshold of P (<0.05) after Bonferroni correction. IS, immunosuppressant use; NC, no antibody control; M, male; F, female.
eHHV-6B induces antiviral response in SLE monocytes
Participants with both eHHV-6 and SLE showed evidence of distinctive humoral immunity to HHV-6 peptides. As SLE is characterized by immune cell dysfunction, we sought to further investigate the immunopathogenesis in eHHV-6B-positive participants with SLE. Peripheral blood mononuclear cells (PBMCs) were collected from the same individuals who were investigated by PhIP-seq (4 eHHV-6B-positive patients with SLE as cases and 11 eHHV-6B-negative patients with SLE as controls; Supplementary Table 5).
After processing using a unified single-cell analysis pipeline (Methods), we obtained 66,915 high-quality cells from PBMCs of 4 cases and 11 controls. There was no detectable expression of HHV-6B genes in any of the cells. Cells were annotated using Azimuth32 (Fig. 5a). To identify more differential expression (DE) patterns than the pseudobulk methods or the mixed models, we applied Milo33 and identified 2,818 neighborhoods, none of which showed evidence of differential abundance (DA; spatial false discovery rate (FDR) < 0.05), and 20 neighborhood groups consisting of neighborhoods that shared at least five cells (Fig. 5b and Extended Data Fig. 2a). After excluding several cell groups to avoid sampling bias, DE and Gene Ontology (GO) enrichment analyses were performed (Fig. 5c and Extended Data Fig. 2b–e). GO analyses showed that antiviral responses were enriched in cells in groups 6, 7 and 18, which were annotated as monocytes and dendritic cells (DCs). To evaluate the IFN response to viruses, we calculated antiviral ISG scores of cells in each neighborhood and compared them between cases and controls. In groups 6, 7 and 18, the antiviral ISG scores of the cases were significantly higher than those of the controls (FDR < 0.05; Fig. 5d). Within-group comparisons showed that the mean scores in the cases were significantly higher than those in the controls in groups 6, 7 and 18, and the differences were larger than those in the other groups (Fig. 5e). Differences in the antiviral ISGs scores described above were observed even between eHHV-6B-positive and eHHV-6B-negative patients with immunosuppressant use, suggesting that these increased antiviral responses in monocytes and DCs were not attributable to immunosuppressants (Extended Data Fig. 3a,b). These results are consistent with the distinctive immunity to HHV-6B between eHHV-6B carriers and noncarriers revealed by the epitope mapping.
a, UMAP embedding of scRNA-seq data for all 66,915 PBMCs from the participants with SLE (n = 4 for eHHV-6B-positive and 11 for eHHV-6B-negative). Thirty cell types were defined by RNA expression of marker genes using Azimuth. b, Graphical representation of neighborhoods identified by Milo. Nodes are neighborhoods, colored by neighborhood groups, and their sizes correspond to the number of cells in each neighborhood. Graph edges depict the number of cells shared between adjacent neighborhoods. c, Enriched biological processes by GO analysis of DE genes of each neighborhood group. The total number of DE genes in each group is indicated in brackets. Dot color indicates the statistical significance of the enrichment (adjusted P values via the Benjamini–Hochberg method), and dot size represents the gene ratio annotated to each term. d, Graphical representation of neighborhoods identified by Milo as in b. Node color indicates the difference between the mean of the antiviral ISG scores of cases and the mean of the scores of controls within the neighborhood (spatial FDR < 0.05), comparing eHHV-6B-positive patients with SLE as cases (n = 4) and eHHV-6B-negative patients with SLE as controls (n = 11). Neighborhoods with nonsignificant differences (spatial FDR ≥ 0.05) are colored white. Neighborhoods that contain only cells of either cases or controls are colored black. e, Comparison of differences within each neighborhood group shown in d using a two-sided paired t test. Diff is the average difference between the case and control means. Boxes denote the IQR; the median is shown as horizontal bars; whiskers extend to 1.5 times the IQR. *FDR < 1.0 × 10−10 and **FDR < 1.0 × 10−50. ASDC, Axl+ Siglec-6+ dendritic cell; pDC, plasmacytoid dendritic cell; HPSC, hematopoietic stem and progenitor cell; Treg, regulatory T cell; NK, natural killer; TEM, effector memory T cell; ILC, innate lymphoid cell; MAIT, mucosal-associated invariant T; GDT, γδ T cells; CTL, cytotoxic T cell; Padj, adjusted P value; Nhood, neighborhood.
Anellovirus associates with autoimmune diseases and COVID-19
Anellovirus was analyzed in 5,597 Japanese WGS samples, excluding participants whose WGS was conducted using DNA extracted from cell lines. The anellovirus load was estimated and stratified (Methods). Of the 5,597 participants, 689 anellovirus-positive participants (12.3%; that is, those with anellovirus load above the detection limit), 57 participants (1.0%) with anellovirus load >1.0 considered anellovirus-infected and 16 participants (0.29%) with anellovirus load >8.0 considered anellovirus-viremic were found (Table 3, Supplementary Table 6 and Extended Data Fig. 4).
Correlations between the three anellovirus detection states and each disease were assessed using the logistic regression model and Fisher’s test. The proportions of anellovirus infection were higher in patients with SLE, RA and COVID-19 than in healthy controls (4.8%, 3.7% and 1.7% versus 0.16%; mid-P = 3.6 × 10−10, 7.4 × 10−10 and 2.9 × 10−7, respectively), and the proportions of anellovirus viremia were higher in patients with SLE, RA and COVID-19 than in healthy controls (2.4%, 0.79% and 0.61% versus 0%; mid-P = 5.6 × 10−7, 0.0023 and 3.3 × 10−4, respectively). Anellovirus infection associated positively with SLE, RA and COVID-19, with an adjusted OR of 58.5 in SLE (95% CI = 15.6–293; P = 2.6 × 10−8), 15.4 in RA (95% CI = 4.89–68.2; P = 2.5 × 10−5) and 6.2 in COVID-19 (95% CI = 2.04–26.7; P = 0.0042). Again, the quantitative magnitudes of the observed OR were larger than those previously reported for genetic and environmental factors of the diseases.
Next, linear or logistic regression models and Fisher’s test were used to identify a possible association between anellovirus detection states and clinical metrics of SLE, RA and COVID-19 (Supplementary Tables 1 and 3). In SLE, anellovirus was positively associated with low serum complement levels, immunosuppressant use and prednisolone use, with an adjusted OR of 4.4 in participants with low serum complement levels (95% CI = 1.46–19.5, P = 0.020), 3.0 in participants treated with immunosuppressants (95% CI = 1.51–6.03, P = 0.0020) and 2.2 in participants treated with prednisolone (95% CI = 1.05–5.21, P = 0.047; Extended Data Fig. 5a–d). Anellovirus detection states were not associated with SLEDAI scores. Although anellovirus detection states were insignificantly associated with rheumatoid factor (RF) or anticitrullinated protein antibody (ACPA), all participants with RA who were viremic for anellovirus (n = 3) were RF- and ACPA-positive. Patients with severe COVID-19 (n = 1,068) were not statistically at higher risk of anellovirus infection, but all anellovirus-infected participants with COVID-19 (n = 20) had severe disease. The proportion of participants with anellovirus infection was statistically higher in participants who died of COVID-19 than in those who survived (mid-P = 0.033).
Discussion
Here 6,321 Japanese WGS datasets were used to test for associations between six immune-mediated diseases and two constituents of the human blood virome, eHHV-6 and anellovirus. We present robust evidence of positive associations between these viruses and several autoimmune diseases and COVID-19. The observed risks of the diseases were relatively larger than those of the previously known genetic and environmental factors of the diseases, although their prevalence remains rare, suggesting that these viruses substantially impact disease biology and are relevant to clinical medicine.
We used WGS data, a robust method for detecting eHHV-6 (ref. 11), to estimate the higher prevalence of eHHV-6B in patients with SLE and PAP. eHHV-6 prevalence among healthy controls was 0.59%, similar to those in previous studies of healthy Japanese participants34,35. Our previous survey of eHHV-6 in a disease-oriented Japanese biobank found a higher prevalence of HHV-6B than HHV-6A, the opposite of that we observed here using healthy controls9. This is consistent with the present finding that eHHV-6B, but not eHHV-6A, is associated with an increased risk of several diseases and should motivate additional biobank-scale studies. It is also consistent with the independent replication data from the All of Us cohort.
Multiple facets of our analysis support a meaningful association between SLE and eHHV-6B. Patients with SLE with eHHV-6B had higher SLEDAI scores, suggesting a role for HHV-6B in SLE pathogenesis and clinical activity, and half of them had other autoimmune complications. We identified the presence of distinctive immune responses against HHV-6B in eHHV-6B carriers with SLE. These findings are consistent with previous studies, reporting that people with eHHV-6 have higher antibody levels against HHV-6-encoded protein IE1 (ref. 19) and show different adaptive immune responses to other herpesviruses and immune gene expression patterns36. It is possible that the antibodies against the IE-A transactivator can act as autoantibodies; however, we cannot exclude the possibility that these are a consequence of broader immune activation against exogenous HHV-6 aided or triggered by eHHV-6B. Herpesviruses often encode immunomodulatory viral chemokine receptors37, and HHV-6 encodes the chemokine receptors U12 and U51 (ref. 38). The expression of these viral proteins may also contribute to pathological immune activation by eHHV-6B. A previous study found an association between SLE and circulating HHV-6 infection39. Single-cell analysis of PBMCs showed increased antiviral responses in monocytes and DCs. scRNA-seq data showed no eHHV-6 expression in PBMCs; however, this is far from mutually exclusive with the activation of humoral immunity against eHHV-6B we observed. eHHV-6 genes are weakly expressed in multiple human tissues other than blood cells, such as the brain and esophagus in the genotype-tissue expression, which recruited individuals without respect to specific diseases19. It may be possible that eHHV-6B carriers of SLE acquired antibodies to the HHV-6B protein after the onset of SLE. Although we cannot identify the timing of B cell activation against IE-A transactivator from the present dataset, it may be possible that eHHV-6B is one factor contributing to severity, rather than the initiating step of SLE. Further investigation into the ‘endoimmunity’ hypothesis as well as T cell activation characteristic of eHHV-6B carriers will shed additional light on the immunopathogenicity of eHHV-6B in SLE.
The variants in high LD with eHHV-6B found by the comprehensive analysis using T2T-CHM13 were unlikely to be pathogenic for SLE. Based on the present associations alone, especially in a single population, we cannot rule out the possibility that a variant that is in high LD with eHHV-6B is the mutation causative of increased SLE risk. However, eHHV-6B found in the Japanese and European populations is inserted in different chromosomes9. This implies that the variants in high LD with eHHV-6B in the Japanese population are different from those in the European population. Considering that eHHV-6B is associated with SLE in both populations, it is unlikely that a variant in high LD with eHHV-6B has a causal role in SLE. Furthermore, eHHV-6B is a polymorphism that has been overlooked in previous GWASs because it is not in the reference genome, complementarily enhancing our knowledge of human disease risk variants.
Anelloviruses are characterized by enormous genetic diversity40,41,42, which must be considered when detecting and estimating their viral load from WGS data. Here we did so by comprehensively using more than 3,000 anellovirus reference sequences. However, as with any mapping-based approach, we are limited by the degree to which these reference genomes represent the viral diversity in this population. Unlike germline integration of HHV-6, anellovirus status in the patient is not expected to match that of a cell line derived from the patient. Thus, 5,597 Japanese blood WGS samples were analyzed, and the proportions of three continuous anellovirus detection states were assessed. Elevated anellovirus load was frequently found in the blood of participants with SLE, RA and COVID-19. Interestingly, patients with COVID-19 were anellovirus-positive in a similar proportion as healthy controls but included significantly higher anellovirus-infected and anellovirus-viremic than healthy controls. A possible scenario is that the atypical immune states characteristic of these diseases (or treatments) allow anelloviruses to reach higher levels in the blood43,44. This would also be supported by the observed association between anellovirus positivity and immunosuppressant and/or prednisolone use in SLE. An association between anellovirus positivity and low complement levels in SLE suggests that, similar to other viral infections, complement deficiency predisposes patients with SLE to anellovirus infection45,46. It is also possible that higher anellovirus levels contribute to the immune responses and inflammation leading to these diagnoses. Previous studies have shown that anellovirus DNA can induce proinflammatory cytokines47,48, suggesting that the higher load of anellovirus is a driver of severe inflammation during acute respiratory virus infection and, potentially, autoimmune response. While high-quality evidence of a causal relationship between anellovirus and human disease is limited, clonally expanded CD4+ T cells recognizing an anellovirus, torque teno virus, have been isolated from the cerebrospinal fluid of a patient with MS49. Notably, our conclusion is similar to that of ref. 50 that the prevalence of detectable (at any level) torque teno virus DNA is not higher in patients with SLE or RA than in healthy controls. We did observe, however, that participants with SLE and RA are more likely to have elevated levels of anellovirus DNA, supporting the hypothesis that anellovirus replication may be involved in autoimmune diseases. Longitudinal rather than cross-sectional studies are needed to adequately assess this. Notably, in our results, the increased number of anellovirus-infected participants among those with fatal COVID-19 is consistent with the hypothesis that anellovirus load may be higher in patients with more severe COVID-19.
Although only a small number of individuals carry eHHV-6 or a high load of anellovirus, positivity appears to have a striking impact on disease risk and clinical phenotype, especially compared with previously identified genetic and environmental factors of these diseases20,21,22. This suggests that the presence of these viruses in an individual’s virome is a potentially clinically useful biomarker for these diseases, enabling a personalized medicine approach to prevent disease. In addition to disease risk, eHHV-6 may be an important factor in clinical practice, such as cell therapies51. With the expanding availability of clinical WGS, incorporating bioinformatic pipelines that assay the virome may be a cost-effective way to screen for these individuals. The link between endogenous viruses or viral infection and immune-related diseases is a longstanding mystery in basic and clinical science. This study provides part of the missing pieces and outlines a path through which the study of the human virome can be integrated into the genotype–phenotype association analysis for which human WGS are often collected. Additional clinical and translational studies are needed to better assess the pathologically relevant mechanisms through which eHHV-6 and anellovirus modulate immunity.
Methods
Ethics
We have complied with all relevant ethical regulations. This study was approved by the ethical committees of Osaka University Graduate School of Medicine and affiliated institutes. All the participants provided written informed consent, as approved by the ethical committees of the affiliated institutes (Osaka University, Kyoto University and Keio University). No compensation was provided to the participants.
Study populations
We enrolled 2,401 Japanese individuals comprising 1,062 PV, 378 RA, 251 SLE, 216 PAP, 36 MS, 94 COVID-19 patients and 364 healthy controls at Osaka University and related institutes. We also used CRAM files from WGS of 2,126 healthy controls generated by the Nagahama Study52 and related studies at Kyoto University and BAM files from WGS of 1,070 patients with COVID-19 generated by the Japan COVID-19 Task Force53. For all patients, blood was drawn when the disease was active. Especially for patients with COVID-19, blood samples were drawn immediately after diagnosis. Also, we used the genomic DNA of 295 patients with PV and 429 healthy controls obtained from EBV-transformed B-lymphoblast cell lines of unrelated Japanese individuals established by the Japan Biological Informatics Consortium (JBIC)54.
We conducted principal component analysis (PCA) against HapMap3 data55 using SNP data of the same individuals to confirm the East Asian genetic background.
WGS data preprocessing
Genomic DNA was extracted from the whole blood of individuals at Osaka University and related institutes, with all samples processed at Osaka University. The DNA samples from Osaka University and those from the JBIC cell lines were sequenced at Macrogen Japan Corporation. Sequencing libraries were prepared using reagents from the TruSeq DNA PCR-free Library Prep kit (Illumina) following the manufacturer’s protocols. Also, 2× 150 bp paired-end WGS was conducted on HiSeqX or NovaSeq (Illumina). The BWA-MEM (v0.7.13 or v0.7.8)56 was used to align the sequenced reads to the human reference build hs37d5 (ref. 57). The reads in the raw alignments were flagged for duplication using Picard MarkDuplicates (v2.10.10 or v2.17.2). The number of average aligned reads was 386 million per sample, corresponding to an average genomic coverage of 18×. WGS CRAM files from the Nagahama Study and related studies contained 370 million 2× 150 bp paired-end reads per sample, aligned to hs37d5 with BWA-MEM (v0.7.13), and followed by Picard MarkDuplicates (v2.10.10), with a mean coverage of 17×. WGS BAM files from the Japan COVID-19 Task Force contained 751 million 2× 150 bp paired-end reads per sample, aligned to hs38DH58 with BWA-MEM (v0.7.15), and followed by GATK4 MarkDuplicates59 (v4.1.0.0), with a mean coverage of 26×. In total, 6,321 BAM or CRAM files were prepared for downstream analyses.
Detection of eHHV-6A/eHHV-6B from WGS
The 6,321 WGS datasets were analyzed to detect eHHV-6 using a previously developed bioinformatics pipeline for eHHV-6 detection and reconstruction11. Briefly, we extracted reads that did not map to the human reference genome and mapped these reads to the reference HHV-6A (National Center for Biotechnology Information (NCBI) accession NC_001664.4) and HHV-6B (NC_000898.1), followed by flagging for duplication to remove PCR duplicates. Then, the mapping depth and coverage of each virus were calculated. When 5% of the HHV-6 genome was covered at more than 2× read depth, we considered the sample to reflect the carriage of eHHV-6. Details are described in a previous paper11. The workflow described here was compiled as a Python pipeline and is available from the following GitHub repository: https://github.com/shohei-kojima/integrated_HHV6_recon (https://doi.org/10.5281/zenodo.12528938)60. The ‘main.py’ in the repository wraps all the workflow described here.
Detection of ‘anellovirus reads’ from WGS
The 5,597 WGS datasets were analyzed to detect reads derived from anelloviruses, excluding 724 participants from JBIC cell lines because cell lines are unlikely to reflect anellovirus infection status in the participants. Due to the extensive sequence divergence among the anellovirus genomes41,42, we modified our previously reported pipeline to accurately detect and quantify anelloviruses. We downloaded human anellovirus FASTA files from NCBI Virus on 21 April 2022. At this time, 27,780 anellovirus sequences were deposited in NCBI. To find human anelloviruses with complete genomes, we referred to the metadata of the deposited sequences and extracted 3,333 complete human anellovirus sequences that were annotated as follows: ‘host’, ‘Homo sapiens’ and ‘Nuc_completeness’, ‘complete’. We found that some anellovirus genomes contain simple repeat sequences and regions with high similarity to human transposons. To avoid potential mismapping of human sequences to those regions, we masked simple repeats and human transposons by RepeatMasker61 (v4.1.1) with ‘-species 9606 -s -no_is’ options. As a result, 1.09% of the anellovirus genomes were masked. The majority of the masked regions (103,757 bp) were simple repeats, but 69 and 1,724 bp were masked as long interspersed nuclear element-1 and long terminal repeat transposons, respectively, potentially reflecting mis-assemblies of viral genomes from metagenome sequencing data. We used the masked 3,333 genomes as anellovirus reference genomes. The human genome-unmapped reads were aligned to these 3,333 human anellovirus genomes using hisat2 (ref. 62) with ‘—mp 2,1 –no-spliced-alignment –all --secondary’ options, followed by flagging for duplication using Picard MarkDuplicates. Then, the read pairs mapped to at least one anellovirus sequence were counted as ‘Anellovirus reads’. Read pairs that were flagged as duplicates were excluded from ‘Anellovirus reads’. We considered a read pair to be an ‘Anellovirus read’ when at least one of the read pairs was mapping to at least one anellovirus genome. The workflow described here was compiled as a Python pipeline and is available from the following GitHub repository: https://github.com/shohei-kojima/human_anellovirus_detection (https://doi.org/10.5281/zenodo.12528938)60. The ‘main.py’ in the repository wraps the workflow described here.
To evaluate the mismapping of human sequences, we mapped ‘Anellovirus reads’ in 2,715 participants (cohorts excluding COVID-19) to CHM13. We found no mismapping of human reads to anellovirus genomes in a majority of the samples (n = 2,627). In 88 samples (3.2% of participants), we observed mismapping of human reads to anelloviruses, but the number of mismappings was at most one read. This indicates that mismapping of human sequences (potentially nonreference sequences) is extremely rare, but a small amount is still present.
Calculation of anellovirus load
We calculated anellovirus load (anellovirus copies per cell) to normalize anellovirus reads by WGS coverage as below:
‘Anellovirus genome length’ is the mean sequence length of 2,907.3 bp, calculated from 3,333 anellovirus FASTA files. ‘Human reads’ is the number of properly paired read pairs obtained using the SAMtools flagstat63. ‘Diploid human genome length’ is 5,618,659,254 bp in hs37d5 and 5,777,320,084 bp in hs38DH, calculated as the sum of 22 autosomal pairs and sex chromosome lengths of the human reference genome excluding Ns and gaps. The sex of the participants was not considered in this calculation.
Thresholds for anellovirus infection states: positive, infection and viremia
Considering the possibilities of human-derived reads mappable on anellovirus sequences and index hopping, we did not simply define anellovirus-positive in the situation when a single (or small number of) ‘anellovirus read’ was detected. Therefore, we robustly defined ‘anellovirus reads’ >5 (anellovirus load > 0.08) as anellovirus-positive (Extended Data Fig. 6). To be sure, we tentatively loosened the threshold a little and evaluated ‘anellovirus reads’ >3 (anellovirus load > 0.05) as anellovirus-positive, but the conclusion remained the same (Supplementary Table 7). When anellovirus load = 1, there is one dsDNA genome per cell on average; therefore, participants with anellovirus load >1 were defined as anellovirus-infected. The maximum value of anellovirus load in the 2,490 healthy controls was 7.4. Thus, we defined participants with anellovirus load >8 as those demonstrating an atypical state of anellovirus viremia.
Statistical analysis
The logistic regression model was used to calculate the adjusted case–control OR with 95% CI for the risk of each disease with eHHV-6 or each anellovirus infection state. We also used a two-sided Fisher’s exact test to evaluate the hypothesis of no association between the prevalence of eHHV-6A/eHHV-6B and the proportions of each anellovirus infection state in each disease and healthy controls. We applied the Bonferroni correction to adjust the P value for multiple test corrections. Then, the multivariable logistic regression model was performed with age and sex as independent variables. Anellovirus viremia was undetected in patients with PV, PAP or MS; therefore, no regression model was used. Wald test P value < 0.05 and Fisher’s test mid-P value < 0.05 were considered significant. All analyses were conducted using R (v4.0.5) with the R package MASS (v7.3.58)64 and exact2x2 (v1.6.6)65. In each disease with a high OR, the linear or logistic regression model was used to estimate the relationship between the clinical metrics in each disease and eHHV-6 or each anellovirus infection state, with age and sex as independent variables. When the clinical metrics were binary, we also used Fisher’s test. Under regression analysis, listwise deletion of cases with missing values was applied.
Replication in All of Us
To find potential eHHV-6 carriers from biobank-scale WGS datasets, we developed a fast-screening method. This screening uses unmapped reads and detects HHV-6 reads by aligning with the reference genomes of HHV-6. It first extracts unmapped reads stored at the end of a BAM or CRAM file using the ‘samtools view’ command and converts reads to the fastq format. The number of unmapped reads varies between individuals and is influenced by mapping algorithms. To minimize the computational burden, our screening approach uses no more than 1 million unmapped reads by default. To avoid mismapping of telomere reads to the telomere-like repeats in the reference genomes of HHV-6, it filters out reads containing at least one unit of the telomere repeat (TAACCC). After filtering potential telomere reads, it aligns reads with the HHV-6A and HHV-6B reference genomes, NCBI accessions NC_001664.4 and NC_000898.1, respectively, using hisat2 with the relaxed mismatch penalties ‘--mp 2,1 --no-spliced-alignment’. To reduce the computational burden, it treats paired-end reads as single-end reads when mapping to HHV-6. Then, it counts reads successfully mapped to HHV-6. To avoid counting non-HHV-6 reads partially aligned with HHV-6, it counts reads without clipping and insertion–deletion (InDels). Finally, it outputs the numbers of reads screened and those mapping to HHV-6. This method is available as a script ‘quick_check.py’ deposited in GitHub—https://github.com/shohei-kojima/integrated_HHV6_recon (https://doi.org/10.5281/zenodo.12528938)60. Because WGS reads were mapped to the human reference genome by DRAGEN, most of the CRAM files contain more than 1 million unmapped reads (usually around 15 million unmapped reads). To reduce the computational burden, we used 1 million reads stored at the end of CRAM files for screening. In total, we screened 245,472 datasets in the All of Us Controlled Tier Dataset v7. For all datasets with six or more HHV-6 reads using the screening method, we applied the HHV-6 reconstruction method, ‘main.py’ in the integrated_HHV6_recon package to detect individuals carrying eHHV-6.
To define participants with SLE, we used diagnosis by the International Classification of Diseases, Ninth Revision (ICD9) (710.0) and ICD10 (M32) codes. To improve phenotype credibility, participants having two or more diagnoses on unique dates that were at least 1 year apart were defined as cases. As for PAP, another autoimmune disease with significant eHHV-6B risk in the Japanese population, we did not conduct a replication study given its ultra-rare disease prevalence (<10 cases per million population worldwide)66. We used all available participants for the association test, except for ones carrying ‘solo-DR’ and presumably two or more copies of full-length eHHV-6. We used the logistic regression model below. To avoid high type I errors, we performed a score test using saddle point approximation implemented in the R package SPAtest (https://cran.r-project.org/web/packages/SPAtest/index.html; v3.1.2).
GWAS of eHHV-6B positivity
From each disease, we randomly selected ten times as many eHHV-6B-negative participants as eHHV-6B-positive cases (22 eHHV-6B-positive versus 220 eHHV-6B-negative). The FASTQ reads were aligned to T2T-CHM13v2.0 (ref. 67) with BWA-MEM (v0.7.27), followed by GATK4 MarkDuplicates and Base Quality Score Recalibration (v4.2.6.1) according to the GATK Best Practice. Then, we performed per-sample SNP and InDel calling using GATK4 HaplotypeCaller and joint genotyping using GATK4 GenomicsDBImport and GenotypeGVCF. We conducted LD-based genotype refinement for low-confidence genotypes and missing sites in WGS data using BEAGLE (v5.4)68 with default settings. Variant quality control (QC) was accomplished by keeping autosomal SNPs and InDels with minor allele frequency (MAF) ≥ 0.05, while removing multi-allelic sites, spanning deletions and variants deviating from the Hardy–Weinberg equilibrium (P < 1 × 10−10 in cases and P < 1 × 10−6 in controls). Individuals were excluded if they showed conflicting sex assignments between genetically inferred sex by variants and WGS coverage, deviating heterozygosity rate (±3 s.d.) or cryptic relatedness (pi-hat > 0.2). Finally, 6,464,509 variants and 238 participants (22 eHHV-6-B-positive versus 216 eHHV-6B-negative) passed QC.
For GWAS, logistic regression analysis was conducted using PLINK (v2.0)69 to test the differences in the dominant genotype between eHHV-6B and controls. We used the dominant genotype because eHHV-6B was designated as binary without distinguishing the number of alleles.
Focusing on the distal end of chr22q, we performed logistic regression including rare variants (MAF < 0.05). Variant QC other than MAF filtering, sample QC and PCs followed the GWAS described above.
PacBio HiFi long-read WGS
DNA samples from four eHHV-6B-positive patients with SLE were sheared targeting the size of 20 kb using Megaruptor 3 (Diagenode). HiFi sequencing libraries were prepared using SMARTbell Expres Template Prep Kit 2.0 according to the manufacturer’s protocols. The libraries were further size-selected electrophoretically using either the SageELF or Blue Pippin Systems (Sage Science). The libraries were sequenced on the Sequel II (Pacific Bioscience) system using the Sequel II Binding Kit 2.0 and Sequel II Sequencing Kit 2.0. Based on the sequenced subreads, PacBio HiFi reads were generated using the circular consensus sequence tool in SMART Link v9.0.0 (Pacific Bioscience). The HiFi reads were aligned to T2T-CHM13v2.0 using pbmm2 (v1.13.1; https://github.com/PacificBiosciences/pbmm2). DeepVariant (v1.6.0)70 was used for SNP and InDel calling, and WhatsHap (v2.2)71 was used for haplotype phasing. Phased genotypes were visually inspected using Integrative Genomics Viewer (v2.8.9)72.
HHV-6 peptide phage library design
Open reading frame (ORF) annotations for HHV-6A strain U1102 (NC_001664.4 and KC465951.1) and HHV-6B strain Z29 (NC_000898.1, AF157706.1 and MW536483.1) were downloaded from NCBI. In total, 88 nonredundant ORFs for HHV-6A and 108 for HHV-6B were annotated. These ORFs were translated into amino acid sequences and then split into 64 amino acid peptide fragments with 50 amino acid overlaps. Fragments shorter than 64 amino acids at the C-terminus of a protein were extended with poly-alanine to reach the 64 amino acid length. This process yielded a total of 5,745 peptide fragments. For a positive control for the PhIP-seq library, we used a total of 248 human microbiota peptide antigens commonly detected in healthy individuals, as obtained from a previous study73. Using ICOR74, the peptide sequences were reverse-translated into DNA sequences optimized for Escherichia coli codon usage. HindIII and XhoI sites within the codon-optimized sequences were replaced with codons frequently used in E. coli (frequency >5% for each amino acid) in such a way that the protein sequence would be preserved.
DNA oligo library synthesis, cloning and phage library synthesis
DNA oligo pools encoding the peptide sequences of HHV-6 were synthesized by Twist Bioscience. Sequences of the DNA oligos are listed in Supplementary Data 1. DNA oligos were PCR-amplified, digested with HindIII and XhoI and subsequently cloned into the HindIII/XhoI site of the T7 Select 10-3b phage vector. A ligation reaction was performed using T4 DNA Ligase. The resultant phage genomes were then packaged into T7 bacteriophage using the T7 Select Packaging Kit and amplified for use in PhIP-seq according to the manufacturer’s protocol.
PhIP-seq
We adapted previously established PhIP-seq protocols, with modifications28,29. IgG concentrations in serum or plasma samples were quantified using the Human IgG ELISA Kit. To minimize nonspecific binding, 1.5 ml tubes and Dynabeads protein A/G were blocked overnight at 4 °C on a rotator with 3% BSA in TBS-T. Sera or plasma containing 5 μg IgG and the HHV-6/human microbiota phage library diluted to 2 × 105-fold representation (2 × 1010 plaque-forming units for a library representing 105 peptide antigen repertoires) were added to preblocked tubes. Three technical replicates were performed for each sample, alongside 12 technical replicates for a negative control lacking sera or plasma. Phage-antibody mixtures were incubated overnight at 4 °C on a rotator, and 20 µl of Dynabeads protein A/G mix was added to each reaction. Following immunoprecipitation at 4 °C for 1 h on a rotator, beads were washed three times with 1,500 μl of PhIP-seq wash buffer (50 mM Tris–HCl (pH 7.5), 150 mM NaCl and 0.1% Tween 20) and resuspended in lysogeny broth with antibiotics. E. coli culture (BLT5403 strain) was then added to resuspended beads and incubated at 37 °C with shaking until lysis. Phage lysates made 0.5 M in NaCl were centrifuged, and the supernatants diluted to 2 × 105-fold representation were used for the second round of immunoprecipitation. Immunoprecipitated beads were subsequently washed three times, resuspended and lysed at 95 °C for 10 min. Finally, the genomes of immunoprecipitated phages were amplified by PCR for Illumina sequencing. A total of 2,000 copies of spike-in DNA (enhanced green fluorescent protein sequence) for normalization were also included in each PCR reaction. Sequences of the spike-in DNA and primer sequences are listed in Supplementary Data 2 and 3. Following the PCR, technical replicates from the same sample were pooled at equimolar concentration and size-selected using SPRIselect. DNA concentrations were determined by quantitative PCR using the KAPA Library Quantification Kit.
PhIP-seq data analysis
Phage genomes were amplified and sequenced on the Illumina NextSeq platform for 2× 150 bp paired-end sequencing, yielding at least 0.24 M spots per sample. Amplicons should contain unique oligo sequences flanking phage genome sequences. First, we counted reads containing the flanking phage sequences. We allowed one base mismatch between next-generation sequencing (NGS) reads and the flanking phage genome sequences. Across all samples, at least 84% of reads contained phage genome sequences flanked by unique oligo sequences, validating that most PCR amplicons are derived from phages. Next, we detected the unique oligo sequences from the reads carrying the flanking phage sequences. We allowed one base mismatch between NGS reads and target oligos. Reads 1 and 2 containing the same oligo sequence were assigned to that oligo. This process enabled the assignment of at least 74% of the total reads to the oligo library across all samples.
To check whether most of the phages displaying HHV-6 peptides are included in the phage library, we amplified and sequenced them by the Illumina NextSeq platform. The 4.4 M spots were obtained, and, on average, 713 reads were detected for each oligo. In total, 313 oligos were not abundantly detected; the read counts were less than 5% of the average (<36 reads). We excluded those oligos from the data analysis.
The immunoprecipitated phage counts were calculated based on the number of reads detecting the spike-in oligo. Across all samples, the number of reads detecting the spike-in oligo was enough (>14,000 reads) compared to the actual spike-in oligo copy number (that is, 2,000), ensuring the accuracy of estimation of immunoprecipitated phage copy numbers. The immunoprecipitated phage counts were then normalized by the abundance of each oligo in the phage library. For this normalization, we used the normalization factor given below:
where n is the number of oligos (that is, 5,432), Ci is the read count of the ith oligo detected in the sequencing of the phage library and factori is the normalization factor for the ith oligo.
The PhIP-seq score was calculated for each 14 amino acid region by averaging over the technical replicates and over the log-transformed normalized immunoprecipitated phage counts encoding the 14 amino acid region. The phage oligo detection from the NGS reads, normalization and the calculation of the PhIP-seq score were done by a custom Python script.
The statistical association between the presence of eHHV-6 and the PhIP-seq score was assessed by linear regression. We fit a model below, and the P for the β estimate of eHHV-6 was calculated by the Wald test (H0: β = 0). The model was fitted by the Python statsmodels module.
scRNA-seq
Four eHHV-6B-positive patients with SLE and 11 eHHV-6B-negative patients with SLE who were close to them in age, sex and treatment history were selected. For both, blood was collected onto heparin tubes, and PBMCs were isolated using Leucosep (Greiner Bio-One) density gradient centrifugation according to the manufacturer’s instructions.
Single-cell suspensions were processed through the 10X Genomics Chromium Controller following the protocol outlined in the Chromium Single Cell V(D)J Reagent Kits (v1.1 Chemistry) User Guide. Chromium Next GEM Single Cell 5′ Library and Gel Bead Kit v1.1 (PN-1000167), Chromium Next GEM Chip G Single Cell Kit (PN-1000127) and Single Index Kit T Set A (PN-1000213) were applied during the process. Oil droplets of encapsulated single cells and barcoded beads (gel bead-in-emulsions (GEMs)) were subsequently reverse-transcribed in a Veriti Thermal Cycler (Thermo Fisher Scientific), resulting in cDNA tagged with a cell barcode and unique molecular index (UMI). Next, cDNA was then amplified to generate single-cell libraries according to the manufacturer’s protocol. Quantification was made with an Agilent Bioanalyzer High-Sensitivity DNA assay (High-Sensitivity DNA Kit; Agilent, 5067-4626). Subsequently amplified cDNA was enzymatically fragmented, end-repaired and polyA-tagged. Cleanup/size selection was performed on amplified cDNA using SPRIselect magnetic beads (Beckman-Coulter; SPRIselect, B23317). Next, Illumina sequencing adapters were ligated to the size-selected fragments and cleaned up using SPRIselect magnetic beads. Finally, sample indices were selected and amplified, followed by a double-sided size selection using SPRIselect magnetic beads. Final library quality was assessed using an Agilent Bioanalyzer High-Sensitivity DNA assay. Samples were then sequenced on an Illumina NovaSeq 6000 in paired-end mode.
scRNA-seq data processing
Raw sequencing data were first filtered and then mapped to a custom reference (GRCh38 + HHV-6A/HHV-6B) using Cell Ranger (v.7.0.1; 10x Genomics). Filtered expression matrices were used. Then, cell doublets were removed by scds (v1.10.0)75 for each sample. QC was performed to remove low-quality cells with <200 genes expressed, >6,000 genes expressed, >60,000 UMIs, < 5% of reads from ribosomal genes or >10% of reads from hemoglobin genes. Thresholds for filtering out low-quality cells with a high proportion of mitochondrial reads were determined using miQC (v1.2.0)76.
The R package Seurat (v4.3.0) was used for data scaling, transformation, clustering, dimensionality reduction and visualization32. Data were scaled and transformed using the SCTransform() function. Then, the first 2,000 highly variable genes were selected for PCA on gene expression using the SelectIntegrationFeatures() function. The batch effect among different patients was removed by the Harmony (v1.0)77. Uniform Manifold Approximation and Projection (UMAP) dimension reduction was generated based on the first 30 harmony-adjusted principal components. The nearest neighbor graph was calculated using the FindNeighbors() function, followed by clustering using the FindClusters() function. Cell types were annotated using the Azimuth human PBMC reference32,78.
DE testing and GO enrichment
We used Milo (v1.2.0)33 to test for the DE genes in cells within defined neighborhoods between the following two conditions: eHHV-6B-positive with SLE (cases) and eHHV-6B-negative with SLE (controls). We constructed a k-nearest neighbor graph and defined the representative neighborhoods on the k-nearest neighbor graph with default parameters. Then, we grouped overlapping neighborhoods that shared at least five cells and ignored their differences in abundance, although there was almost no difference. The neighborhood groups in which the number of cells in one case accounted for more than 60% of the total number of cells in the four cases were excluded from further analysis to avoid sampling bias. We performed DE analysis within these aggregated groups using the testDiffExp() function. We included age and sex in the model as covariates. Statistically significant DE genes were defined with FDR < 0.01 and |fold change | >2. To find the function of DE genes, GO biological process analysis with a P value < 0.01 and an FDR < 0.05 was performed using the enrichGO() and compareCluster() implemented in clusterProfiler (v4.7.1.3)79.
Antiviral ISGs scores
To evaluate IFN response to the virus, antiviral ISG scores were calculated using the AddModuleScore() function implemented in Seurat with default parameters. Thirty-three antiviral ISGs were selected based on previous reports80,81,82 (Supplementary Table 8). The ISGs scores of cells in each neighborhood were used to compare case–control, and the differences between their respective mean of the ISG scores in cells from two conditions were calculated. If there were two or more case samples and two or more control samples with cells in the neighborhood, Welch’s t test was performed. To control for multiple testing, we adapted the spatial FDR implemented in Milo and used 5% spatial FDR as a threshold for significance. The spatial FDR and the differences in the ISG scores between two conditions in each neighborhood were used for visualization. Two-sided paired t test was performed to compare the mean of the ISG scores between two conditions in the neighborhoods within each neighborhood group.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Individual-level eHHV-6 and anellovirus data of the cases are publicly deposited at the Japanese Genotype–phenotype Archive (JGA) with accession codes JGAS000741/JGAD000876 and JGAS000739/JGAD000874 under controlled access. These individual-level eHHV-6 and anellovirus data can also be accessed through an application at the National Bioscience Database Center (NBDC) with the accession codes hum0197 and hum0343. The GWAS summary statistics are publicly deposited at the NBDC with the accession code hum0197. The WGS data of the controls are available from the Human Genetic Variation Database (HGVD) (https://www.hgvd.genome.med.kyoto-u.ac.jp/). The All of Us data are available as a featured workspace to registered researchers of the All of Us Researcher Workbench (see details at https://www.researchallofus.org/). Short/long-read WGS of eHHV-6B-positive participants are publicly deposited at the European Genome–Phenome Archive (EGA) with the accession code EGAS00001007886 under controlled access. PhIP-seq data of eHHV-6B-positive and eHHV-6B-negative participants with SLE are publicly deposited at EGA with the accession code EGAS00001007955 under controlled access. scRNA-seq data of eHHV-6B-positive and eHHV-6B-negative participants with SLE are publicly deposited at the JGA with accession codes JGAS000741/JGAD000876 under controlled access. This scRNA-seq data can also be accessed through the application at the NBDC with the accession code hum0197 (https://humandbs.dbcls.jp/en/hum0197-latest). The 3,333 human anellovirus complete genomes were downloaded as FASTA files from the NCBI nucleotide database (https://www.ncbi.nlm.nih.gov) on 21 April 2022.
Code availability
The codes to detect virome, integrated_HHV6_recon and human_anellovirus_detection, and to perform WGS data preprocessing, GWAS, PhIP-seq analysis and scRNA-seq analysis are deposited in Zenodo (https://doi.org/10.5281/zenodo.12528938 (ref. 60) and https://doi.org/10.5281/zenodo.13953637 (ref. 83)).
References
Hardy, T. A., Blum, S., McCombe, P. A. & Reddel, S. W. Guillain–Barré syndrome: modern theories of etiology. Curr. Allergy Asthma Rep. 11, 197–204 (2011).
Bjornevik, K. et al. Longitudinal analysis reveals high prevalence of Epstein–Barr virus associated with multiple sclerosis. Science 375, 296–301 (2022).
Lanz, T. V. et al. Clonally expanded B cells in multiple sclerosis bind EBV EBNA1 and GlialCAM. Nature 603, 321–327 (2022).
Miller, S. D. et al. Persistent infection with Theiler’s virus leads to CNS autoimmunity via epitope spreading. Nat. Med. 3, 1133–1136 (1997).
Tokuyama, M. et al. Antibodies against human endogenous retrovirus K102 envelope activate neutrophils in systemic lupus erythematosus. J. Exp. Med. 218, e20191766 (2021).
Fajgenbaum, D. C. & June, C. H. Cytokine storm. N. Engl. J. Med. 383, 2255–2273 (2020).
Ikuta, K., Satoh, Y., Hoshikawa, Y. & Sairenji, T. Detection of Epstein–Barr virus in salivas and throat washings in healthy adults and children. Microbes Infect. 2, 115–120 (2000).
Moustafa, A. et al. The blood DNA virome in 8,000 humans. PLoS Pathog. 13, e1006292 (2017).
Liu, X. et al. Endogenization and excision of human herpesvirus 6 in human genomes. PLoS Genet. 16, e1008915 (2020).
Wood, M. L. et al. Variation in human herpesvirus 6B telomeric integration, excision, and transmission between tissues and individuals. eLife 10, e70452 (2021).
Kojima, S., Kamada, A. J. & Parrish, N. F. Virus-derived variation in diverse human genomes. PLoS Genet. 17, e1009324 (2021).
Aswad, A. et al. Evolutionary history of endogenous human Herpesvirus 6 reflects human migration out of Africa. Mol. Biol. Evol. 38, 96–107 (2021).
Kumata, R., Ito, J., Takahashi, K., Suzuki, T. & Sato, K. A tissue level atlas of the healthy human virome. BMC Biol. 18, 55 (2020).
Abbas, A. A. et al. Redondoviridae, a family of small, circular DNA viruses of the human oro-respiratory tract associated with periodontitis and critical illness. Cell Host Microbe 25, 719–729.e4 (2019).
Abbas, A. A. et al. The perioperative lung transplant virome: torque teno viruses are elevated in donor lungs and show divergent dynamics in primary graft dysfunction. Am. J. Transpl. 17, 1313–1324 (2017).
Maggi, F. et al. TT virus loads and lymphocyte subpopulations in children with acute respiratory diseases. J. Virol. 77, 9081–9083 (2003).
Gravel, A. et al. Inherited chromosomally integrated human herpesvirus 6 as a predisposing risk factor for the development of angina pectoris. Proc. Natl Acad. Sci. USA 112, 8058–8063 (2015).
Gaccioli, F. et al. Fetal inheritance of chromosomally integrated human herpesvirus 6 predisposes the mother to pre-eclampsia. Nat. Microbiol. 5, 901–908 (2020).
Peddu, V. et al. Inherited chromosomally integrated human herpesvirus 6 demonstrates tissue-specific RNA expression in vivo that correlates with an increased antibody immune response. J. Virol. 94, e01418-19 (2019).
Sakaue, S. et al. Genetic determinants of risk in autoimmune pulmonary alveolar proteinosis. Nat. Commun. 12, 1032 (2021).
Okada, Y. et al. A genome-wide association study identified AFF1 as a susceptibility locus for systemic lupus eyrthematosus in Japanese. PLoS Genet. 8, e1002455 (2012).
Ogawa, K. et al. Next-generation sequencing identifies contribution of both class I and II HLA genes on susceptibility of multiple sclerosis in Japanese. J. Neuroinflammation 16, 162 (2019).
Bombardier, C., Gladman, D. D., Urowitz, M. B., Caron, D. & Chang, C. H. Derivation of the SLEDAI. A disease activity index for lupus patients. The Committee on Prognosis Studies in SLE. Arthritis Rheum. 35, 630–640 (1992).
Arbuckle, M. R. et al. Development of autoantibodies before the clinical onset of systemic lupus erythematosus. N. Engl. J. Med. 349, 1526–1533 (2003).
Latzin, P. et al. Anti-GM-CSF antibodies in paediatric pulmonary alveolar proteinosis. Thorax 60, 39–44 (2005).
All of Us Research Program Investigators et al. The ‘All of Us’ research program. N. Engl. J. Med. 381,668–676 (2019).
Huang, Y. et al. Human telomeres that carry an integrated copy of human herpesvirus 6 are often short and unstable, facilitating release of the viral genome from the chromosome. Nucleic Acids Res. 42, 315–327 (2014).
Mohan, D. et al. PhIP-seq characterization of serum antibodies using oligonucleotide-encoded peptidomes. Nat. Protoc. 13, 1958–1978 (2018).
Shrock, E. L., Shrock, C. L. & Elledge, S. J. VirScan: high-throughput profiling of antiviral antibody epitopes. Bio Protoc. 12, e4464 (2022).
De Bolle, L., Naesens, L. & De Clercq, E. Update on human herpesvirus 6 biology, clinical features, and therapy. Clin. Microbiol. Rev. 18, 217–245 (2005).
Takemoto, K. et al. Time-series analysis comparing the prevalence of antibodies against nine viral species found in umbilical cord blood in Japan. Jpn. J. Infect. Dis. 69, 314–318 (2016).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).
Dann, E., Henderson, N. C., Teichmann, S. A., Morgan, M. D. & Marioni, J. C. Differential abundance testing on single-cell data using k-nearest neighbor graphs. Nat. Biotechnol. 40, 245–253 (2022).
Tanaka-Taya, K. et al. Human herpesvirus 6 (HHV-6) is transmitted from parent to child in an integrated form and characterization of cases with chromosomally integrated HHV-6 DNA. J. Med. Virol. 73, 465–473 (2004).
Miura, H. et al. Chromosomally integrated human herpesvirus 6 in the Japanese population. J. Med. Virol. 90, 1636–1642 (2018).
Kumata, R., Ito, J. & Sato, K. Inherited chromosomally integrated HHV-6 possibly modulates human gene expression. Virus Genes 56, 386–389 (2020).
De Munnik, S. M., Smit, M. J., Leurs, R. & Vischer, H. F. Modulation of cellular signaling by herpesvirus-encoded G protein-coupled receptors. Front. Pharmacol. 6, 40 (2015).
Isegawa, Y., Ping, Z., Nakano, K., Sugimoto, N. & Yamanishi, K. Human herpesvirus 6 open reading frame U12 encodes a functional β-chemokine receptor. J. Virol. 72, 6104–6112 (1998).
Broccolo, F. et al. Selective reactivation of human herpesvirus 6 in patients with autoimmune connective tissue diseases. J. Med. Virol. 85, 1925–1934 (2013).
Worobey, M. Extensive homologous recombination among widely divergent TT viruses. J. Virol. 74, 7666–7670 (2000).
Cebriá-Mendoza, M. et al. Deep viral blood metagenomics reveals extensive anellovirus diversity in healthy humans. Sci. Rep. 11, 6921 (2021).
Arze, C. A. et al. Global genome analysis reveals a vast and dynamic anellovirus landscape within the human virome. Cell Host Microbe 29, 1305–1315.e6 (2021).
De Vlaminck, I. et al. Temporal response of the human virome to immunosuppression and antiviral therapy. Cell 155, 1178–1187 (2013).
Shibayama, T. et al. Inverse relationship between the titre of TT virus DNA and the CD4 cell count in patients infected with HIV. AIDS 15, 563–570 (2001).
Stoermer, K. A. & Morrison, T. E. Complement and viral pathogenesis. Virology 411, 362–373 (2011).
Jung, J.-Y. & Suh, C.-H. Infection in systemic lupus erythematosus, similarities, and differences with lupus flare. Korean J. Intern. Med. 32, 429–438 (2017).
Zheng, H. et al. Torque teno virus (SANBAN isolate) ORF2 protein suppresses NF-κB pathways via interaction with IκB kinases. J. Virol. 81, 11917–11924 (2007).
Rocchi, J. et al. Torquetenovirus DNA drives proinflammatory cytokines production and secretion by immune cells via toll-like receptor 9. Virology 394, 235–242 (2009).
Sospedra, M. et al. Recognition of conserved amino acid motifs of common viruses and its role in autoimmunity. PLoS Pathog. 1, e41 (2005).
Seemayer, C. A. et al. Prevalence of TTV DNA and GBV-C RNA in patients with systemic sclerosis, rheumatoid arthritis, and osteoarthritis does not differ from that in healthy blood donors. Ann. Rheum. Dis. 60, 806–809 (2001).
Lareau, C. A. et al. Latent human herpesvirus 6 is reactivated in CAR T cells. Nature 623, 608–615 (2023).
Yano, M., Matsuda, F., Sakuntabhai, A., & Hirota, S. (eds). Socio-Life Science and the COVID-19 Outbreak. Economics, Law, and Institutions in Asia Pacific (Springer, 2022).
Tanaka, H. et al. Clinical characteristics of patients with coronavirus disease (COVID-19): preliminary baseline report of Japan COVID-19 task force, a nationwide consortium to investigate host genetics of COVID-19. Int. J. Infect. Dis. 113, 74–81 (2021).
Hirata, J. et al. Variants at HLA-A, HLA-C, and HLA-DQB1 confer risk of psoriasis vulgaris in Japanese. J. Invest. Dermatol. 138, 542–548 (2018).
Altshuler, D. M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013).
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526,68–74 (2015).
Church, D. M. et al. Extending reference assembly models. Genome Biol. 16, 13 (2015).
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Kojima, S. Software of ‘Blood DNA virome associates with autoimmune diseases and COVID-19’. Zenodo https://doi.org/10.5281/zenodo.12528938 (2024).
Smit, A., Hubley, R. & Green, P. RepeatMasker Open-4.0. www.repeatmasker.org (2013).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S. Fourth Edition Vol. 53 (Springer, 2002).
Fay, M. P. Confidence intervals that match Fisher’s exact or Blaker’s exact tests. Biostatistics 11, 373–374 (2010).
Inoue, Y. et al. Characteristics of a large cohort of patients with autoimmune pulmonary alveolar proteinosis in Japan. Am. J. Respir. Crit. Care Med. 177, 752–762 (2008).
Rhie, A. et al. The complete sequence of a human Y chromosome. Nature 621, 344–354 (2023).
Browning, B. L., Zhou, Y. & Browning, S. R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 103, 338–348 (2018).
Hill, A. et al. Stepwise distributed open innovation contests for software development: acceleration of genome-wide association analysis. GigaScience 6, 1–10 (2017).
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
Patterson, M. et al. WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 22, 498–509 (2015).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Vogl, T. et al. Population-wide diversity and stability of serum antibody epitope repertoires against human microbiota. Nat. Med. 27, 1442–1450 (2021).
Jain, R., Jain, A., Mauro, E., LeShane, K. & Densmore, D. ICOR: improving codon optimization with recurrent neural networks. BMC Bioinformatics 24, 132 (2023).
Bais, A. S. & Kostka, D. scds: computational annotation of doublets in single-cell RNA sequencing data. Bioinformatics 36, 1150–1158 (2020).
Hippen, A. A. et al. miQC: an adaptive probabilistic framework for quality control of single-cell RNA-sequencing data. PLoS Comput. Biol. 17, e1009290 (2021).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Schoggins, J. W. & Rice, C. M. Interferon-stimulated genes and their antiviral effector functions. Curr. Opin. Virol. 1, 519–525 (2011).
Schoggins, J. W. Interferon-stimulated genes: what do they all do? Annu. Rev. Virol. 6, 567–584 (2019).
Chemudupati, M. et al. From APOBEC to ZAP: Diverse mechanisms used by cellular restriction factors to inhibit virus infections. Biochim. Biophys. Acta Mol. Cell Res. 1866, 382–394 (2019).
Sasa, N. & Kojima, S. Scripts for ‘Blood DNA virome associates with autoimmune diseases and COVID-19’. Zenodo https://doi.org/10.5281/zenodo.13953637 (2024).
Acknowledgements
We thank all the participants involved in this study. We gratefully acknowledge All of Us participants for their contributions, without whom this research would not have been possible. We also thank the NIH’s All of Us Research Program for making available the participant data examined in this study. This research was supported by the Japan Society for the Promotion of Science KAKENHI (22H00476 and 24K02697 to Y.O., 20H05682 and 21H02972 to N.F.P., 22K15385 to S. Kojima and 24K18476 to R. Koide), Japan Agency for Medical Research and Development (AMED) (JP22ek0410075, JP23km0405211, JP23km0405217, JP23ek0109594, JP23ek0410113, JP223fa627002, JP223fa627010, JP233fa627011, JP23zf0127008, JP24wm0625504 and JP24gm1810011 to Y.O. and JP24gm7010004 to S. Kojima), JST Moonshot R&D (JPMJMS2021 and JPMJMS2024 to Y.O.), Takeda Science Foundation, Bioinformatics Initiative of Osaka University Graduate School of Medicine, Institute for Open and Transdisciplinary Research Initiatives, Center for Infectious Disease Education and Research, Center for Advanced Modality and Drug Delivery System (DDS), Osaka University, Cluster for Pioneering Research under the Hakubi fellowship program, the discretionary budget of the Director of the RIKEN Center for Integrative Medical Sciences and the RIKEN Special Postdoctoral Researcher Program. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper.
Author information
Authors and Affiliations
Consortia
Contributions
N.F.P. and Y.O. designed and supervised the study. N.S., S. Kojima and Y.O. conducted the data analysis. S. Kojima, N.S. and N.F.P. developed methodology. R. Koide performed PhIP-seq. N.S., S. Kojima, R. Koide, A.G.B., N.F.P. and Y.O. wrote the paper. All the other co-authors collected and managed the samples and data construction.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Mike Nalls and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 HHV-6 genome coverage and haplotypes on the distal end of chr22q in four eHHV-6B-positive patients with SLE.
a,b, The mean of HHV-6A/B genome coverages normalized by half of the human genome coverage. The gray shade area represents the 95% confidence interval. Typically, eHHV-6-positive subjects carry the whole HHV-6 genome, but some carry only one of the terminal direct repeats (DRs)—solo-DR—and not the unique (U) region of the HHV-6 genome9. All four eHHV-6B-positive patients with SLE carried the whole HHV-6B genome, and two out of three eHHV-6B-positive patients with PAP carried only solo-DR. The mean normalized coverage of full-length HHV-6 was approximately 1, which supports the HHV-6 genomes we found are heritable endogenous viruses. Since the aligner could not distinguish between DR-L and DR-R sequences, solo-DR-derived reads were mapped to both. The mean normalized coverage of solo-DR was approximately 0.5, which also supports the endogeneity of the HHV-6 genome. c, Phased genotypes of the 16 variants with uncorrected P < 1 × 10−4 in GWAS, derived from long-read sequencing of 4 eHHV-6B-positive patients with SLE. Dark red and black boxes indicate haploid visually confirmed with IGV. Light red and blue indicate genotypes from WGS joint genotyping. IGV, Integrative Genomics Viewer.
Extended Data Fig. 2 DA analysis, DE analysis and GO enrichment analysis of scRNA-seq data of PBMCs from eHHV-6B-positive and eHHV-6B-negative patients with SLE.
a, Beeswarm plot showing the distribution of adjusted log2(fold change) in abundance between cases and controls in neighborhoods, with eHHV-6B-positive patients with SLE as case (n = 4) and eHHV-6B-negative patients with SLE as control (n = 11). b, The DE analysis between cases and controls in neighborhood groups as in a. DE genes (FDR < 0.01 and |fold change| > 2) are colored in red and labeled by gene symbols. c–e, Left: gene-concept network (cneplot) in neighborhood groups 6, 7 and 18. It shows the linkage of the DE genes and the concepts (GO biological process). The color of the genes represents the fold change values, and the size of the GO terms represents the number of associated genes. Right: directed acyclic graph for GO enrichment analysis in neighborhood groups 6, 7 and 18. Circles represent the GO function, colored by their FDR. Arrows indicate the relationship between the upstream and downstream GO functions.
Extended Data Fig. 3 Comparison of the antiviral ISG scores in patients with SLE on immunosuppressants.
a, Graph representation of neighborhoods identified by Milo as in Fig. 5b. Node color indicates the difference between the mean of the antiviral ISGs scores of eHHV-6B-positive patients (case; n = 3) and eHHV-6B-negative patients (control; n = 6) with SLE on immunosuppressants (spatial FDR < 0.05). Neighborhoods with non-significant differences (spatial FDR ≥ 0.05) are colored white. Neighborhoods that contain only cells of either cases or controls are colored black. b, Comparison of differences within each neighborhood group shown in a using a two-sided paired t-test. Diff is the average difference between the case and control means. Boxes denote the IQR; the median is shown as horizontal bars; whiskers extend to 1.5 times the IQR. *FDR < 1.0 × 10−10, **FDR < 1.0 × 10−50.
Extended Data Fig. 4 Reverse cumulative distributions of anellovirus load.
The reverse cumulative distribution plot of anellovirus load in subjects with five autoimmune diseases and COVID-19 and healthy controls. Y-axis indicates the proportions of subjects with anellovirus load > values (x-axis). The proportions of anellovirus infection with anellovirus load >1 were significantly higher in patients with SLE, RA and COVID-19 than in healthy controls. The proportions of anellovirus viremia with anellovirus load >8.0 were significantly higher in patients with SLE, RA and COVID-19 than in healthy controls (dashed lines and asterisks).
Extended Data Fig. 5 Distribution of anellovirus load among various clinical metrics.
Two histograms show the distribution of each axis. a, Strip plot of anellovirus load for patients with SLE without and with low serum complement levels. Anellovirus-positive subjects associated significantly with low serum complement levels (OR 4.4, Wald test uncorrected P = 0.020). NA, not available. b, Strip plot of anellovirus load for patients with SLE treated without and with immunosuppressants. Anellovirus-positive and anellovirus-infected subjects had a significantly positive correlation with immunosuppressant use (OR 3.3 and 7.3, Wald test uncorrected P = 0.0050 and 0.012, respectively). c, Strip plot of anellovirus load for patients with RA stratified by RF and ACPA. All anellovirus-viremic subjects were RF-positive and ACPA-positive. d, Strip plot of anellovirus load for patients with COVID-19 treated without and with oxygen therapy. All subjects with a high anellovirus load were treated with oxygen therapy. Horizontal dashed lines in a–d represent thresholds for anellovirus positivity (0.08), anellovirus infection (1.0) and anellovirus viremia (8.0).
Extended Data Fig. 6 The detection limit of the anellovirus load.
Histogram of anellovirus load below 0.2. The colors of the bars indicate the number of ‘anellovirus reads’ in each sample. We set the detection limit of the anellovirus load to 0.08 (red dashed line). Subjects with ‘anellovirus reads’ >5 (anellovirus load >0.08) were defined as anellovirus-positive. Subjects with anellovirus load between 0 and 0.08 might be false-positive.
Supplementary information
Supplementary Information
Contents of Supplementary Tables.
Supplementary Tables
Supplementary Tables 1–8.
Supplementary Data 1
Sequences of the oligo pool used for HHV-6 epitope mapping by PhIP-seq.
Supplementary Data 2
Spike-in DNA sequence used in PhIP-seq.
Supplementary Data 3
Primer sequences used for PhIP-seq library preparation.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sasa, N., Kojima, S., Koide, R. et al. Blood DNA virome associates with autoimmune diseases and COVID-19. Nat Genet 57, 65–79 (2025). https://doi.org/10.1038/s41588-024-02022-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-024-02022-z
This article is cited by
-
Virome associations in autoimmunity and COVID-19
Nature Reviews Rheumatology (2025)