Introduction

Prospective cohort studies have identified associations of self-reported short and long sleep duration, insomnia (difficulty initiating or maintaining sleep), and chronotype (evening preference) with higher risks of type 2 diabetes (T2D)1,2,3, hyperglycaemia and insulin resistance4. A small number of studies have assessed sleep characteristics using accelerometry devices, assuming these reflect similar sleep characteristics measured with greater precision and less measurement error than self-reported traits. Several observational studies showed that accelerometer-derived shorter sleep duration and lower sleep efficiency (an assumed indicator of insomnia5) were associated with higher glycated haemoglobin (HbA1c) levels in people with diabetes6,7. In a general population, higher sleep fragmentation8 (another indicator of insomnia9), but not shorter accelerometer-derived sleep duration10, was associated with higher HbA1c and glucose levels. However, these were relatively small studies that included ~ 170 to ~ 2107 participants, which are also open to residual confounding and/or reverse causation. A meta-analysis of randomized controlled trials (RCTs) showed that sleep restriction had detrimental effects on insulin sensitivity11, as well as hyperglycaemia supported by experimental data in healthy volunteers12. Several mechanisms have been proposed to link the effects of sleep restriction on glycaemia levels including physiological stress, activation of the sympathetic nervous system and/or circadian disruption, all of which might act on glucose levels via insulin signalling mechanisms11. However, the relevance of experimental sleep restriction protocols to the sleep patterns experienced in the general population is unclear.

Mendelian randomization (MR) is increasingly used to explore lifelong effects because it is less prone to confounding by social, environmental, and behavioural factors13. Previous MR studies showed that self-reported frequent insomnia symptoms causes higher HbA1c14,15,16, whilst no evidence has been provided for effects of self-reported sleep duration or chronotype on T2D and/or glycaemic traits14,17. Recent MR studies suggested causal effects of accelerometer-derived shorter sleep duration and lower efficiency on higher waist-hip ratio but not T2D or other hyperglycaemic outcomes in the UK Biobank (UKB)17,18.

Our aim was to explore potential effects of accelerometer-derived sleep traits (duration, mid-point least active 5-h (L5 timing), mid-point most active 10-h (M10 timing), sleep fragmentation, and sleep efficiency) on HbA1c. We undertook one-sample MR (1SMR) analyses using the UKB sub-sample (n = 73,797) with valid accelerometer measures. Since those with accelerometer data were not a random sub-sample of UKB, we explored possible selection bias by re-running, in this sub-sample, all of our previous MR analyses of self-reported sleep traits (duration, chronotype, insomnia) with HbA1c that had been conducted in the larger UKB sample (n = 336,999)14. Additionally, we conducted two-sample MR (2SMR) analyses using summary outcome data from UKB and the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC)19. Lastly, to help understand any differences we observed between self-reported and accelerometer-derived MR effects for assumed equivalent traits, we calculated the phenotypic correlations, as well as used cross-trait linkage disequilibrium score regression (LDSC)20 to determine genetic correlations across all accelerometer-derived and self-reported sleep traits. To explore the possibility of reverse causality, we applied bidirectional 1SMR in to assess the roles of glycaemic traits on sleep traits in the UKB participants with valid accelerometer and genetic data (n = 73,797). We repeated all analyses with glucose as a secondary outcome.

Results

Baseline characteristics

Figure 1 showed the flow of participants in the UKB sub-sample where the 1SMR analyses were conducted. Participants in the accelerometer-derived sub-sample were more likely to have never smoked, have completed advanced-level education, have a lower prevalence of diabetes and a lower mean BMI than those in either comparison group (i.e., (1) UKB European participants without accelerometer-derived data and (2) all UKB European participants with available genetic data). Other characteristics, including self-reported sleep traits were similar across the three groups (Table 1).

Figure 1
figure 1

Flowchart of the participants included in the main analyses in the UK Biobank. *Quality control procedure undertaken, and the derived files produced by the MRC-IEU (University of Bristol), using the full UK Biobank genome wide SNP data (version 3, March 2018). https://data.bris.ac.uk/data/dataset/1ovaau5sxunp2cv8rcy88688v. The number of 79,460 was obtained after accounting for overlapped samples. Excluding participants with diabetes defined by the Eastwood algorithm (probable/possible type 1 diabetes and type 2 diabetes) and/or additionally those with a baseline HbA1c ≥ 48 mmol/mol.

Table 1 Characteristics of European UK Biobank participants with self-reported sleep trait data and genetic data and in subgroups: (a) with QC-checked accelerometer-derived sleep data; and (b) without accelerometer data.

MR results

In 1SMR analysis (genetic instrument—exposure and genetic instrument—outcome associations were estimated in the UKB sub-sample with accelerometer-derived sleep data (n ~ 73,000)), we generated unweighted allele scores for both accelerometer-derived and self-reported sleep traits as the total number of sleep trait increasing alleles present for each participant, based on SNPs identified in the relevant GWAS. Supplementary Table S1 provides details of each SNP. The variance (R2) explained by the allele scores varied from 0.04% for M10 timing (F-statistic: 30) to 0.74% for sleep fragmentation (F-statistic: 553) among accelerometer-derived traits, and from 0.54% for sleep duration (F-statistic: 401) to 2.12% for chronotype (F-statistic: 1593) among self-reported traits (Supplementary Table S2). The distributions of allele scores of all the sleep traits, except for M10 timing (only 1 SNPs was available and hence we were only able to use the per-allele association (0, 1, 2) for M10 timing), were normal. The mean and standard deviation (SD) of the allele scores were show in Supplementary Table S2.

We conducted two sets of 2SMR analyses with the SNP—exposure associations for both of these analyses obtained from the relevant GWAS16,18,21,22 as used for 1SMR, and the SNP—HbA1c associations were obtained from two different sources: (1) SNP–HbA1c associations were estimated in UKB participants who did not participate in the accelerometer-derived GWAS18 study (referred to as 2SMR-UKB, n =  ~ 292,000); (2) SNP—HbA1c associations were extracted from the MAGIC consortium GWAS19 (referred to as 2SMR-MAGIC, n =  ~ 147,000). In the two sets of 2SMR (i.e., 2SMR-UKB and 2SMR-MAGIC), the R2 explained and the F-statistics for sleep traits were similar, ranging from 0.04% for M10 timing (mean F-statistic: 37) to 0.91% for sleep fragmentation (mean F-statistic: 37) among accelerometer-derived traits, and from 0.68% for sleep duration (mean F-statistic: 40) to 2.78% for chronotype (mean F-statistic: 57) among self-reported traits (Supplementary Table S2). Post hoc calculations indicated that the minimum effects (in SD of outcome per SD exposure units; i.e. the equivalent of a Pearson’s correlation coefficient) we demonstrated that we had power to detect small effects at 80% power at 0.05 significance in our fixed samples sizes using 2SMR.These minimum effects ranged from 0.04 to 0.35, with all but one being equal or less than 0.15; full results are shown in Supplementary Table S2 and further information on calculations in Supplementary Information.

1SMR suggested longer mean accelerometer-derived sleep duration reduced mean HbA1c levels (− 0.11, 95% CI − 0.22 to 0.01 SD per hour longer over 24-h). However, the association was attenuated to the null in sensitivity analyses accounting for any possible horizontal pleiotropy (i.e., collider-correlated estimates23, see “Methods”) in 1SMR; 2SMR main and sensitivity results provided no robust evidence of an effect of accelerometer-derived sleep duration on HbA1c (Fig. 2 and Supplementary Table S3). For all other accelerometer-derived sleep traits, MR estimates did not support any evidence of causal effects on HbA1c (Fig. 2 and Supplementary Table S3). Results (1SMR and 2SMR-UKB) were broadly consistent when participants with diabetes were excluded (Supplementary Tables S3 and S4). There was no evidence suggesting any effect of accelerometer-derived sleep traits on glucose (Fig. 3 and Supplementary Table S4). In 1SMR, the associations of self-reported traits with HbA1c/glucose in the UKB sub-sample with accelerometer-derived data (used here) were consistent, though with wider confidence intervals, with those we previously published using the larger samples14 (Supplementary Fig. S1).

Figure 2
figure 2

Associations of accelerometer-derived sleep traits with HbA1c in one-sample and two-sample Mendelian randomization. 1SMR-2SLS: one-sample MR with two-stage least square method. 1SMR-CC-IVW, MR_Egger, LADreg: one-sample Mendelian randomization with collider-correction in inverse-variance weighted, MR-Egger, and LAD regression respectively. 2SMR-UKB/MAGIC-IVW, WM, MR-Egger, MR-Egger_SiMEX: two-sample MR (in UKB and MAGIC) with inverse-variance weighted, weighted median, MR-Egger, MR-Egger with simulation extrapolation SiMEX respectively. 1SD HbA1c in the UK Biobank with accelerometer-derived data is 0.14 log mmol/mol; 1SD HbA1c in the sub-sample of UK Biobank without accelerometer-derived data is 0.15 log mmol/mol; 1SD HbA1c in the MAGIC is 0.41%. Only 1 SNP predicting M10 timing was identified. As such, the 1SMR-CC was not reliable in the simulation process, and the 2SMR-WM/Egger estimates were not available. AcD: accelerometer-derived.

Figure 3
figure 3

The associations of accelerometer-derived sleep traits with glucose in one-sample Mendelian randomization in the UK Biobank and in two-sample Mendelian randomization in UK Biobank (UKB) and MAGIC. 1SMR-2SLS: one-sample MR with two-stage least square method. 1SMR-CC-IVW, MR_Egger, LADreg: one-sample Mendelian randomization with collider-correction in inverse-variance weighted, MR-Egger, and LAD regression respectively. 2SMR-UKB/MAGIC-IVW, WM, MR-Egger, MR-Egger_SiMEX: two-sample MR (in UKB and MAGIC) with inverse-variance weighted, weighted median, MR-Egger, MR-Egger with simulation extrapolation SiMEX respectively. 1SD glucose in the UK Biobank with accelerometer-derived data is 0.15 log mmol/l; 1SD glucose in the sub-sample of UK Biobank without accelerometer-derived data is 0.18 log mmol/l; 1SD glucose in the MAGIC is 0.84 mmol/l. Only 1 SNP predicting M10 timing was identified. As such, the 1SMR-CC was not reliable in the simulation process, and the 2SMR-WM/Egger estimates were not available. Non-fasting glucose in the 1SMR and 2SMR-UKB estimates; fasting glucose adjusted for BMI in the 2SMR-MAGIC estimates. AcD: accelerometer-derived.

Phenotypic and genetic correlations and MVMR

We used LDSC20 regression to determine genetic correlations across all accelerometer-derived 18 and self-reported16,21,22 sleep traits and HbA1c/glucose19 using GWAS summary statistics. Strong genetic correlations were demonstrated among the three sleep timing traits (accelerometer-derived L5 timing, M10 timing, and self-reported chronotype; all RLDSC > 0.8). There was modest genetic correlation between accelerometer-derived and self-reported sleep duration (RLDSC = 0.43) and relatively strong genetic correlation between accelerometer-derived sleep duration and sleep efficiency (RLDSC = 0.72). Genetic correlations of self-reported insomnia with both accelerometer-derived efficiency and fragmentation were weak (both RLDSC < 0.18), with modest correlation between accelerometer-derived sleep fragmentation and sleep efficiency (RLDSC = − 0.52). There were weak negative genetic correlations of self-reported sleep duration with HbA1c (RLDSC = − 0.07) and glucose (RLDSC = − 0.07), and weak positive genetic correlation of insomnia with HbA1c and glucose (RLDSC ≤ 0.1) (Fig. 4 and Supplementary Table S5). Most of the phenotypic correlations agreed with the LDSC genetic correlations though the strength was weaker (Supplementary Fig. S2 and Supplementary Table S5).

Figure 4
figure 4

The genetic correlations across accelerometer-derived and self-reported sleep traits and glycaemic traits. *p-value < 0.05. **p-value < 0.001. The genetic data of glucose was fasting and was BMI-adjusted. AcD: accelerometer-derived, SR: self-reported.

We repeated MR analyses with mutual adjustment using multivariable Mendelian randomization (MVMR)24 to account for strong correlations between accelerometer-derived sleep traits (i.e., between L5 and M10, and between accelerometer-derived sleep duration and efficiency). These results did not differ from the main results, suggesting no independent causal effect of L5, M10, accelerometer-derived sleep duration or efficiency on HbA1c or glucose (Supplementary Table S6).

Bidirectional MR

We assessed the effects of HbA1c and non-fasting glucose on both the accelerometer-derived and self-reported sleep traits in UKB participants with valid accelerometer and genetic data. The two-stage least square estimates suggested no effect of HbA1c on no sleep traits except for L5 timing (− 0.36, − 0.66 to − 0.07 h per log mmol/mol). There was some evidence of an effect of higher non-fasting glucose levels on reducing the number of sleep episodes (− 2.3, − 4.0 to − 0.5 time per log mmol/l) and on higher sleep efficiency (3.9, 0.4–7.4% per log mmol/l) (Supplementary Table S7).

Discussion

In this, to the best of our knowledge, first MR study to explore causal effects of accelerometer-derived sleep traits on glycaemia. We found no robust evidence that any assessed sleep traits causally affected HbA1c or glucose, including across a suite of sensitivity analyses and in MVMR adjusting for between-trait correlations. The null effects of accelerometer-derived sleep traits were unlikely to be explained by selection bias. We showed strong positive genetic correlations between accelerometer-derived L5 and M10 timing, and self-reported chronotype, suggesting that accelerometer-derived and self-reported measures for sleep timing were capturing the same trait. By contrast, positive correlations between accelerometer-derived and self-reported sleep duration were modest. Those between self-reported insomnia and two accelerometer-derived measures (i.e., low sleep efficiency and high sleep fragmentation) that might be expected to relate to insomnia were weak. Lastly, we found no effect of sleep fragmentation or efficiency on HbA1c, though effects of insomnia were identified previously14. Accelerometer-derived measures of sleep duration and sleep quality might not simply be ‘objective’ measures of self-reported sleep duration and insomnia, but rather they might capture different underlying sleep characteristics.

Our MR findings do not support the observational associations of accelerometer-derived sleep measures (e.g., shorter sleep duration6, lower sleep efficiency7, higher sleep fragmentation8) with higher glycaemia levels. These observational relationships might be explained by residual confounding, as well as reverse causality as most previous observational studies were cross-sectional. For example, undiagnosed hyperglycaemia might cause nocturia25 and/or neuropathic pain26, which could result in reduced sleep duration and poor sleep quality. Our bidirectional 1SMR estimates only indicated potential effects of HbA1c on L5 timing as well as glucose on sleep fragmentation and efficiency. These results will need to be independently replicated before assuming they are causal. Our MR findings also do not support data from randomised controlled trials which have shown that sleep restriction reduces insulin sensitivity, at least in short-term studies11.

Sleep characteristics might be captured differently through assessment of self-reported and accelerometer-derived traits. For instance, the self-reported sleep duration question includes naps but this is not the case for accelerometer-derived sleep duration. The phenotypic and genetic correlations (R = 0.18 and RLDSC = 0.43) also indicated a modest-to-weak correlation, which was consistent with previous findings18,27. The null MR estimates of accelerometer-derived sleep fragmentation and efficiency (assumed measures of insomnia5,9) with HbA1c contrasted with previous MR results suggesting that self-reported frequent insomnia symptoms results in higher HbA1c levels14,15,16. Several factors could explain these differences. Self-reported insomnia is by definition experienced, and that experience, rather than the sleep disturbance, might cause or be a proxy for adverse mental or physical health, such as depression/anxiety28, endocrine disorders29, and/or appetite changes30, that influence HbA1c. Besides, sleep can be disturbed in ways not detectable by actigraphy or even polysomnography. Therefore, accelerometer-derived sleep fragmentation and efficiency might only reflect insomnia status in terms of sleep changes, but not mental or physical changes. The low phenotypic and genetic correlations of accelerometer-derived sleep fragmentation (R = 0.03 and RLDSC = 0.09) and efficiency (R = − 0.04 and RLDSC = − 0.18) with self-reported insomnia supports this idea to some extent. It is also possible that genetic contributions to self-reported and accelerometer-derived measures of insomnia/sleep quality differed, though heritability estimates using UKB data suggested these were similar (17% for self-reported insomnia31 and 22% for accelerometer-derived fragmentation18). Further studies exploring what might contribute to weak/modest correlations between self-reported and accelerometer-derived measures of sleep duration and quality/insomnia are important, though noting that actigraphy data provides limited data about sleep physiology in terms of macro or microstructure32. Lastly, there are potential differences between neurological sleep and sleep defined by accelerometry devices and self-reported questionnaires. Systematic comparisons in large studies with polysomnography as well as self-report and accelerometer data would be needed. Currently, we are not aware of any such studies.

A key strength of this study is its novelty in using MR to explore potential causal effects of accelerometer-derived sleep traits on HbA1c and glucose. We conducted 1SMR, 2SMR, and a range of sensitivity analyses to explore genetic instrument validity. The consistency of findings across these methods, and across samples, increases confidence in our conclusion that accelerometer-derived sleep traits do not have causal effects on HbA1c or glucose.

We acknowledge the following potential limitations. Whilst we have used the largest available cohort with accelerometer-derived sleep data and genomic data, and, to our knowledge, this is the first MR study of these exposures with HbA1c and glucose, we acknowledge that the statistical power may have been limited for some results. Although our post hoc calculations demonstrate that, not for all but one of our 2SMR analyses, we have power to detect small effects of equal or less than 0.15 SD/SD (i.e. the equivalent of a Pearson correlation coefficient ≤ 0.15), the value of such post hoc calculations is contested33,34,35. In studies like ours, the observed point estimates and their confidence intervals are more valuable ways to interpret results and statistical power34. For example, our 2SMR suggested that one hour longer mean accelerometer-derived sleep duration would change mean HbA1c levels by − 0.09 (95% CI − 0.2 to 0.03) (SD unit, 1SD = 0.41%) (Fig. 2). Since the SD for HbA1c was 5.5 mmol/mol (Table 1), our data indicated that lengthening sleep duration by 1 h over 24 h is likely to change HbA1c values by somewhere between − 1.1 (− 0.2*5.5) and + 0.2 (0.03*5.5) mmol/mol. In the setting of diabetes, a 3 mmol/mol decrease in HbA1c is generally considered to be ‘clinically important for reducing the risk of developing diabetes-related complications36. Thus, it can be seen that the 95% CI for our causal effect estimate excludes a ‘clinically significant’ effect. In summary we have power in 2SMR to detect small effects and our 95% confidence intervals suggest there are unlikely to be clinically important effects.

Our results could be influenced by selection bias37, due to the low recruitment into UKB (5.5% participation38), as well as the non-random selection of UKB participants into the accelerometer-derived sub-sample resulting in a healthier accelerometer-derived sub-sample of UKB. Whilst the low participation into UKB could result in selection bias39, similar observational and MR associations with a range of outcomes have been obtained in meta-analyses with/without UKB participants being included, where other cohorts had higher response rates (i.e. ≥ 70%)40,41. Besides, in this study, when we compared 1SMR estimates of self-reported sleep traits on HbA1c/glucose in the accelerometer-derived sub-sample to the same results in a much larger UKB sample that we previously published14, we found similar results, suggesting minimal bias due to selection. Results did not differ in sensitivity analyses that excluded participants with diabetes, suggesting our results are not influenced by having diabetes or treatment with hypoglycaemics. We assumed the genetic instrument reflects lifetime exposure. Although the accelerometer-derived data was obtained sometime after the measure of HbA1c, there was unlikely a concern of reverse causality in an MR design. If not the case (i.e., there was reverse causality), we would expect the results to be biased away from the null, which is contrary to our findings. Our sensitivity estimates (bidirectional 1SMR) did not suggest an effect of HbA1c on any of the sleep traits except for L5 timing and, as noted above, the effects of non-fasting glucose on accelerometer measured sleep efficiency and sleep fragmentation reflecting insomnia. These could be chance findings. Because, no consistent effect of HbA1c on other sleep timing traits was found. Besides, collider bias is a potential concern given the glucose robust SNPs were from a GWAS in which BMI was adjusted (i.e., SNP—glucose associations could be biased following BMI adjustment), these results require replication42. We used genetic variants that passed a p-value threshold of p < 5 × 10−8 in UKB, but with limited evidence of replication in an independent cohort18. Without further replication in larger studies, it was possible that some of the 44 SNPs were false positives and/or had inflated associations with sleep traits, which could result in both our 1SMR and 2SMR results being biased towards the null43. However, a recent study has suggested the use of SNPs from GWAS that have not been independently replicated may not result in notable bias44. Participants were predominantly of European ancestry, meaning our findings may not generalise to other ancestries. Lastly, our study assumed linear associations between accelerometer-derived sleep traits and HbA1c/glucose. If there was a symmetrical U-shaped association, this linear assumption would bias results toward the null.

Conclusions

We found little evidence to support causal effects of any accelerometer-derived sleep trait on HbA1c or glucose levels across a wide range of MR methods. We cannot rule out non-linear (e.g., U-shaped) effects and acknowledge the need for further GWAS and MR studies of accelerometer-derived traits in larger diverse populations.

Methods

This study is reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guideline45, specific for Mendelian randomisation (Supplementary Information).

The UK Biobank

Between 2006 and 2010, the UKB recruited 503,317 adults (aged 40–69 years) out of 9.2 million invited eligible adults (5.5% response)38. Information on socio-demographic characteristics and lifestyle including self-reported sleep traits were obtained using a touchscreen questionnaire at the baseline assessment. Venous blood samples were collected and processed at baseline. Between February 2013 and December 2015, participants, except for those from the North West region (who had been invited to participate in a separate sub-study), were approached by email to participate in the accelerometer study. The valid email addresses were chosen randomly. From June 2013, those who agreed to participate were sent a triaxial accelerometer device (Axivity AX3) device in order of acceptance. It was worn continuously for up to seven days in a sub-sample of participants (n = 103,711) an average of five years after the baseline assessment (range 2.8–8.7 years)18,46. Figure 1 shows the flowchart of participants from all recruited to those included in our study. After applying pre-specified exclusion criteria, we included 73,797 European participants47 with accelerometer-derived sleep data in the analyses. Full details are presented in Supplementary Information.

Accelerometer-derived sleep traits

  1. (1)

    Accelerometer-derived nocturnal sleep duration was defined as the summed duration of all nocturnal sleep episodes within the sleep period time windows (SPT-windows). Sleep episodes were defined as any period of at least 5 min with no change larger than 5° associated with the z-axis of the accelerometer48. The algorithm in GGIR (R package) combined all sleep episodes that were not separated by more than 30 min and then called that the SPT-window (of which there can only be one per day). Any sleep episodes outside of this window were classified as naps and so did not count towards the nocturnal sleep duration total. The total duration of all SPT-windows over the activity-monitor wear time was averaged and divided by the number of days (24 h) to give mean sleep duration per total day. Individuals with an average sleep duration < 3 (n = 147) or > 12 h (n = 3) were set to missing in this study.

  2. (2)

    Midpoint least-active 5-h (L5) timing was a measure of the midpoint of the least-active (i.e., with minimum average acceleration) 5 h of each day. The 5-h periods were defined on a rolling basis (e.g., 1:00–6:00, 2:00–7:00 and so on). For example, if the midpoint of the least-active 5-h was 24:00 (0:00) (i.e., a rolling 5-h was from 21:30 to 2:30) then L5 = 24 (i.e., 24 + 0); if the midpoint of least-active 5-h was 3:30 then L5 = 27.5 (i.e., 24 + 3.5); and if the midpoint of the least-active 5 h was 20:30 then L5 = 20.5 (i.e., 24 − 3.5). Thus, a higher L5 score indicated someone was least active in the morning and more likely to have an evening chronotype.

  3. (3)

    Midpoint most-active 10-h (M10) timing was a measure of the midpoint of the most active (i.e., with maximum average acceleration) 10-h time of day based on a 24-h clock. It was calculated in a similar way to L5 (see above) except with rolling periods of 10 h. A higher M10 score indicated someone who was most active in the evening and hence more likely to have an evening chronotype

  4. (4)

    Nocturnal sleep episode (defined above) was a measure of sleep fragmentation. Individuals with an average number of sleep episodes ≤ 5 (n = 84) or ≥ 30 (n = 52) times were set to missing in this study. We referred to a high number of sleep episodes as ‘sleep fragmentation’ throughout this paper.

  5. (5)

    Mean sleep efficiency was calculated as the nocturnal sleep duration (defined above) divided by the time elapsed between the start of the first inactivity bout and the end of the last inactivity bout (which equals the SPT-window duration) across all valid nights. This was an approximate measure of the proportion of time spent asleep while in bed.

Genetic variants

The genetic variants associated with the five accelerometer-derived sleep traits were obtained from a genome-wide association study (GWAS) conducted in UKB subsample (n = 85,670, White European), where 44 single nucleotide polymorphisms (SNPs) associated at genome-wide significance (p < 5 × 10−8) with at least one of the five accelerometer-derived traits (11 for sleep duration, 6 for L5 timing, 1 for M10 timing, 21 for sleep fragmentation, and 5 for sleep efficiency)18. This GWAS study imputed 11,977,111 genetic variants using the Haplotype Reference Consortium imputation reference panel with a minimum minor allele frequency (MAF) > 0.1% and imputation quality score (INFO) > 0.3. The genetic associations were obtained using a linear mixed model adjusting for the effects of population structure, individual relatedness, age at accelerometer assessment, sex, study centre, season of accelerometer wear, and genotype array (Supplementary Information)18.

Supplementary Table S1 provides the list of SNPs used as instrumental variables for each of the accelerometer-derived sleep traits. The number of SNPs used for each accelerometer-derived sleep trait, the mean F-statistic, and variance (R2) across all SNPs, as well as the unweighted allele score, for each exposure are provided in Supplementary Table S2.

HbA1c and glucose measurement

HbA1c was measured in red blood cells by HPLC on a Bio-Rad VARIANT II Turbo analyzer and glucose was assayed in serum by hexokinase analysis on a Beckman Coulter AU580049. Samples were assumed to be non-fasting, because participants were not advised to fast before attending. The dilution factor and fasting time were considered in corresponding analyses. The HbA1c samples were not affected. We used HbA1c (a stable measure over a period of ~ 4 weeks) as our primary outcome and we explored non-fasting glucose as a secondary outcome (Supplementary Information).

Sex-combined meta-analysis summary statistics of genetic variants related to HbA1c (%, n = 146,806, mean age 59.7 years, 57.9% female)) and BMI adjusted fasting glucose (mmol/l, n = 200,622, mean age 50.9 years, 51.2% female)) were also from the GWAS study lead by Chen et al. downloaded from the MAGIC consortium19. Participants were of European descent without diagnosed diabetes. There was no sample overlap of the HbA1c/fasting glucose GWAS with both the accelerometer-derived and the self-reported sleep GWAS ~ 30.6 million and ~ 31.0 variants were directly genotyped or imputed after exclusions based on minor allele count (MAC < 3) and imputation quality (imputation r2 or INFO score < 0.40) in each cohort. The trait-specific estimates were obtained from fixed-effect meta-analyses within each ancestry using METAL50. Detailed information can be found in the outcome GWAS19.

Statistical analyses

UKB HbA1c/glucose data were right skewed and the units (HbA1c: in mmol/mol and non-fasting glucose: in mmol/l) differed to those obtained from MAGIC19 (HbA1c: in % and fasting glucose with BMI adjusted: in mmol/l). Therefore, we natural log-transformed the HbA1c/glucose levels in UKB and then converted them into standard deviation (SD) units (HbA1c: 1 SD = 0.14 log mmol/mol; non-fasting glucose: 1SD = 0.16 log mmol/l), as well as those from MAGIC19 (HbA1c: 1SD = 0.41%; fasting-glucose: 1SD = 0.83 mmol/l; the mean SD values were calculated for people of European ancestry included cohorts involved in the GWAS study that we used for the 2SMR analysis19, i.e., 2SMR-MAGIC). As such, we estimated the difference in mean HbA1c/glucose in SD units per unit increase in each accelerometer-derived sleep trait in all analyses.

Main analyses assessing the effects of accelerometer-derived sleep traits on HbA1c/glucose

1SMR

We identified SNPs in the UKB data that were aligned with the genome-wide significant (p < 5 × 10−8) SNPs found in the discovery accelerometer-derived sleep traits GWAS18 (i.e., the direction of specific sleep traits’ increasing allele). We then extracted the genetic variants from the UKB Haplotype Reference Consortium reference panel dataset. These data have undergone extensive quality control checks including removal of related participants (third degree or closer). To avoid bias due to population subsamples, non-White British participants were identified by self-report of ethnicity as well as based on principal component analyses. We generated unweighted allele scores51 for the sleep traits by summing the number of effect alleles harboured by each individual. An unweighted allele score potentially reduces biases, such as weak instruments52 biasing towards confounded association, when there is substantial overlap between the sample in which the exposure GWAS was undertaken and genetic instruments selected from and the sample in which the one-sample MR is undertaken (as is the case here). Two-stage least squares instrumental variable analyses were performed to obtain the MR estimate of each trait on HbA1c/glucose. We adjusted for assessment centre and 40 genetic principal components to minimize confounding by population stratification53, as well as baseline age, sex, genotyping chip, fasting time and dilution factor (for glucose only) to reduce random variation. Further details are presented in the Supplementary Information.

2SMR

We used summary associations between the genetic instruments and accelerometer-derived sleep traits identified in the GWAS18 for Sample 1 (the SNP-exposure association). For sample 2 (the SNP-outcome association), we used two independent samples: Sample 2-UKB: estimates of the associations between the genetic instruments and HbA1c/glucose were from the sample of the UKB participants who did not participate in the accelerometer GWAS18 (HbA1c: n =  ~ 292,000 and glucose: n =  ~ 267,000). The SNP—outcome associations were obtained via the multivariable adjusted linear model accounting for assessment centre and 40 genetic principal components, baseline age, sex, and genotyping chip, fasting time and dilution factor (for glucose only); Sample 2-MAGIC: the summary statistics were from the MAGIC consortium19. We conducted inverse-variance weighted (IVW) regression of the Wald ratio for each SNP under a multiplicative random-effects model54 to obtain the causal estimates. Further details are presented in the Supplementary Information.

1SMR and 2SMR analyses taking self-reported sleep traits (sleep duration, chronotype, insomnia symptoms) as the exposures were conducted for comparison. The detailed information is presented in the Supplementary Information.

Sensitivity and additional analyses

Accounting for the impact of diabetes

To account for the potential impact of either diabetes or the diabetic treatment on glycaemic levels, we repeated the analyses with UKB participants (1SMR and 2SMR-UKB) excluding those with diabetes defined by the Eastwood algorithm (probable/possible type 1 diabetes and type 2 diabetes, based on self-reported medical history and medication)55 and/or additionally those with a baseline HbA1c ≥ 48 mmol/mol (≥ 6.5%, the threshold for diagnosing diabetes).

Assessing MR assumptions and evaluating bias

MR analysis requires three key assumptions to be satisfied in order to obtain valid causal estimates56. First, the genetic instrument should be statistically robustly associated with the exposure. We investigated this using first-stage F-statistic and R2. In addition, we undertook a post-hoc calculation of the minimum effects (in SD of exposure per SD outcome units) that we could detect at 80% power and 0.05 significance level in our fixed sample sizes for all of the 2SMR analyses. Further details are presented in the Supplementary Information. An F-statistic < 10 is has been proposed as indicating the potential weak instrument bias57. However, this threshold is arbitrary and in general the higher the R2 and F-statistic the less likelihood of weak instrument bias. Second, there should be no confounding between the genetic instrument and the outcome. This can occur as a result of population stratification. We attempted to minimise this by restricting analyses to European ancestry and adjusted for genetic principal components and assessment centre53. Third, the genetic instrument should influence the outcome exclusively through its effect on the exposure. This would be violated by unbalanced horizontal pleiotropy (i.e., an independent pathway between the instrument genetic variant and outcome other than through the exposure). We have undertaken the following sensitivity analyses to explore potential bias due to horizontal pleiotropy.

In 1SMR, we explored between SNP heterogeneity, potentially due to horizontal pleiotropy, via the Sargan over-identification test58. Additionally, we applied the Collider-Correction23 method to implement three further pleiotropy sensitivity analyses commonly used in 2SMR (i.e., IVW, MR-Egger, and least absolute deviation regression (LADreg) being similar to the weighted median (WM) approach). Collider-Correction was needed in 1SMR to account jointly for pleiotropy and weak instruments bias57 (Supplementary Information). We subsequently referred to this as 1SMR with Collider-Correction as 1SMR-CC (i.e., 1SMR-CC-IVW, 1SMR-CC-MR-Egger, 1SMR-CC-LADreg). In 2SMR, we explored unbalanced horizontal pleiotropy by comparing the results of the IVW regression with standard pleiotropy-robust MR methods: WM and MR-Egger, referred to as 2SMR-UKB/MAGIC WM and 2SMR-UKB/MAGIC MR-Egger. To account for weak instrument bias in the 2SMR MR-Egger estimates, we used simulation extrapolation SiMEX59. We referred it as 2SMR-UKB/MAGIC MR-Egger_SiMEX.

Exploring selection bias

We compared distributions of HbA1c, glucose, diabetes prevalence, BMI, and a range of socioeconomic and behavioral characteristics between those included in the sub-sample of UKB with accelerometer-derived data (n = 73,797) and those not in this sample (n = 306,317), as well as the whole available UKB sample (n = 385,163), because the accelerometer-derived sub-sample were recruited non-randomly. In addition, we compared the 1SMR estimates of self-reported sleep traits (sleep duration, chronotype, insomnia symptoms) on HbA1c/glucose in this study (n = 73,797) with those 1SMR estimates, previously published in nearly all UKB participants14 (n = 336,999, White British ancestry). Similar estimates would suggest limited risk of selection bias.

Phenotypic and genetic correlation between sleep traits

We used adjusted Pearson correlations to assess the correlations across the sleep traits, as well as with HbA1c and glucose for consistency, though some of the sleep traits were categorical (e.g., SR sleep duration, chronotype, insomnia). Pearson correlation can be interpreted as the regression coefficient one would obtain regressing the standardised (SD units) of two variables on each other. We adjusted for baseline age, sex, genotyping chip, assessment centre and 40 genetic principal components.

We used linkage disequilibrium score regression (LDSC)20 (Supplementary Information), as an additional analysis, to aid the interpretation of the MR using accelerometer-derived results and interpret any differences that might be observed between our accelerometer-derived data generated and our previously reported MR effects of self-reported sleep traits on HbA1c/glucose14. We assessed genetic correlations between all accelerometer-derived and self-reported sleep traits. For completeness, we also explored genetic correlations of each accelerometer-derived and self-reported traits with HbA1c and glucose. The full summary statistics of all sleep traits were obtained from the Sleep Disorder Knowledge Portal https://sleep.hugeamp.org/. Those for HbA1c and glucose were from the MAGIC consortium19.

Whenever we observed strong genetic correlation between any two accelerometer-derived sleep traits (i.e., ≥ 0.7) regarding robustness of the univariable MR estimates, we undertook multivariable Mendelian randomization (MVMR)24 to explore whether we could determine individual accelerometer-derived sleep trait direct effect (Supplementary Information).

Bidirectional MR

To explore whether variation in HbA1c and glucose might influence variation in sleep traits we selected genome-wide significant independent SNPs predicting HbA1c (n = 74) and BMI-adjusted fasting glucose (n = 66) from a large multi-ancestry GWAS (European specific data was applied)19 (Supplementary Information). We then generated unweighted allele scores51 for HbA1c and non-fasting glucose as we did in the main analysis in the UKB sub-sample (n = 73,797) with valid accelerometer measures. Two-stage least squares instrumental variable analyses were performed to obtain the MR estimate of HbA1c and non-fasting glucose with all of the sleep traits. We adjusted for assessment centre and 40 genetic principal components to minimize confounding by population stratification53, as well as baseline age, sex, genotyping chip, fasting time and dilution factor (for glucose only) to reduce random variation.

Ethics declarations

The UKB has received ethical approval from the U.K. National Health Service National Research Ethics Service (London, U.K.) (ref 11/NW/0382). The need for informed consent was waived by U.K. National Health Service National Research Ethics Service. This manuscript does not contain any personal or medical information about an identifiable individual. All methods were performed in accordance with the relevant guidelines and regulations.