Introduction

Accelerated kidney function decline can lead to renal failure, which necessitates dialysis or kidney transplantation. Age-related decline in kidney function is expected as part of the natural aging process with an average decline of -1mL/min/1.73m2 per year of age1,2,3. However, the rate of decline is highly variable in the population and the underlying reasons not fully understood4,5. To investigate these reasons, large epidemiological data on kidney function over time in the general population are required. However, such longitudinal datasets on kidney function over time (kidney function trajectories) are sparse. Emerging large biobanks can provide such longitudinal data by integrating information from electronic Medical Records (eMR) on serum creatinine measurements derived from clinical routine.

Data based on eMR are potentially subject to various sources of bias6. In the context of eMR-based creatinine values, this might involve selection bias, when individuals with many eMR-based creatinine measurements have the many measurements due to suspected or diagnosed kidney impairment7,8. This might also involve technical measurement bias when records of creatinine values extend far into the past and were based on assays before manufacturers have started improving standardization. One initial approach to make old assay creatinine measurements usable for epidemiological studies was to correct values from Jaffe assays used before 2009 by a factor of 0.959. Such a correction can eliminate systematic bias on average, allowing the use of bias-corrected values for epidemiological research.

UK Biobank10 provides an opportunity to advance our understanding of kidney function decline in a general adult population. With its large cross-sectional dataset from the study center (SC) assessments, encompassing ~ 500,000 participants, it provides an unprecedented resource for studying cross-sectional kidney function. However, longitudinal UK Biobank data on serum creatinine as kidney function biomarker are currently limited to ~ 15,000 individuals assessed at SC four years after baseline. The recently emerging eMR data from general practitioners (GPs) for UK Biobank participants offer the opportunity to augment existing SC data with longitudinal information. These data were released in September 2019 including multiple serum creatinine measurements over time and recorded kidney-relevant diagnostic codes. The usability of these eMR-based UK Biobank data and its potential for meaningful integration with the SC data is yet to be explored.

Our main objective was thus to augment the existing UK Biobank data on SC-based eGFR with eMR-based data from GPs (“GP-clinical”, application number 20272) to provide a longitudinal data resource for studying kidney function decline. Specifically, we (i) extracted and quality-controlled serum creatinine values derived from UK Biobank GP-eMR data, (ii) compared the eMR-based serum creatinine with SC-based creatinine values; (iii) derived eGFR for the combined eMR- and SC-based creatinine values, (iv) included information on GP-recorded onset of kidney diseases, renal replacement therapy, or other kidney-relevant diagnostic codes, and (v) assessed the utility of these combined data for studying kidney function decline.

Materials and methods

UK Biobank data on serum creatinine from SC assessment and GP-clinical

UK Biobank data and SC assessment at baseline and, for a smaller subset of participants, at a 4-year follow-up were described previously10. Briefly, UK Biobank is a prospective cohort study that included approximately 500,000 individuals aged 40–69 years at baseline recruited at 22 SCs in the United Kingdom. The SC assessment involved collecting participants’ blood samples and storing aliquots frozen at − 80 °C for further analysis11.

Creatinine was measured in serum for all individuals from SC blood drawn at baseline and follow-up in a central laboratory according to standardized protocols (Enzymatic Beckman Coulter AU5800). We obtained these SC-based serum creatinine values (creaSC), the date of the SC visit baseline and follow-up (date-of-exam), age at that date (age-at-exam), and sex of participants (data fields 30700, 31, 34, 52 and 53).

Based on the UK Biobank “GP-clinical” table, we obtained raw eMR-based creatinine values (creaeMR) via read codes12 (Supplementary Table 1). We quality-controlled eMR-based values, excluding technical errors, duplicates, values below the level of detection (LOD) or above the highest recorded value of 6524 µmol/L13. Values below the level of quantification (LoQ) were set to the LoQ. Quality-controlled eMR-based values were in µmol/L with recorded measurement dates assumed to be equivalent or close to the date of blood draw (Supplementary Note 1).

The UK Biobank data utilized here included SC data downloaded in September 2019 and information from “GP-clinical” downloaded in May 2023 (i.e. records until September 2017, depending on country Scotland, Wales or England).

Comparison of creaSC and creaeMR and error model

Next, we merged creaeMR to creaSC for each person by date of blood draw (“date-of-exam”). CreaeMR, being derived from clinical routine, were measured by different assays in different laboratories; “GP-clinical” does not contain information on the specific assays. Since in-vitro diagnostic manufacturers were starting to improve standardization of assays around 2009, we investigated the possibility of a systematic technical bias in creaeMR that decreased the closer the measurement date approached 2009 or some years after (assuming a heterogeneous onset of use of standardized products). We considered creaSC gold standard, since these measurements were conducted by one assay and one laboratory via a standardized protocol in the years 2015–2017. We compared distributions, values, and differences of creaSC versus creaeMR, focusing on individuals with both from the same calendar year (using creaeMR closest in time to SC blood draw).

We evaluated whether creaeMR exhibited a systematic bias: we assumed normally distributed creaSC and creaeMR on log-scale, \(\:X{:=}\text{l}\text{n}\left({crea}_{SC}\:\right)\sim\:N(\mu\:,{\sigma\:}^{2})\) and\(\:\:{X}^{*}{:=}\text{l}\text{n}\left({crea}_{eMR}\right)\sim\:N({\mu\:}^{*},\:{\sigma\:}^{*2})\), respectively. We also assumed an additive error consisting of a random component, \(\:{E}_{r}^{*}\sim N(0,{\sigma}^{{*}^{2}})\), and a systematic bias, s,

$$\:{X}^{*}=X+s+{E}_{r}^{*}.$$

This implies that the expected value of X* is\(\:\:E\left[{X}^{*}\right]=E\left[X+s+{E}_{R}^{*}\right]=E\left[X\right]+E\left[s\right]+E\left[{E}_{R}^{*}\right]\), so that\(\:\:{\widehat{\mu\:}}^{*}=\:\widehat{\mu\:}+s+0\), yielding\(\:\:s={\widehat{\mu\:}}^{*}-\widehat{\mu\:}\). On the original scale, we can derive the geometric means of creaeMR and creaSC,\(\:\:\text{e}\text{x}\text{p}\left({\mu\:}^{*}\right)\) and\(\:\:\text{e}\text{x}\text{p}\left(\mu\:\right)\), respectively. Then the bias is multiplicative and given as the ratio of these geometric means,

$$\:\text{exp}\left(s\right)=\text{e}\text{x}\text{p}\left({\widehat{\mu\:}}^{*}\right)/\text{exp}\left(\mu\:\right).$$

We assumed a differential bias by calendar years (larger in earlier years). Thus, we evaluated differences between creaeMR and creaSC among participants with measurements in the same calendar year. For each calendar year with measurements from both creaeMR and creaSC, we estimated \(\:{\mu\:}^{*year}\)and \(\:{\mu\:}^{year}\)and quantified the multiplicative bias (on the original scale) per calendar year, \(\:\text{exp}\left({s}^{year}\right)=\text{exp}\left({\mu\:}^{*year}\right)/\text{e}\text{x}\text{p}\left({\mu\:}^{year}\right)\)

Bias-corrected creaeMR

The above stated error model and bias quantification can also be used to derive bias-corrected creaeMR. We used the estimated \(\:{s}^{year}\), to correct creaeMR (on ln-scale) measured in the respective calendar year by

$$\:{X}_{corrected}^{*year}={X}^{*}-{s}^{year},$$

where year is the calendar year of the creaeMR measurement. The expected value of the corrected creaeMR is then \(\:{E[X}_{corrected}^{*year}]=E[{X}^{*}-{s}^{year}]=E[{X}^{*}]-{s}^{year}=(\widehat{\mu\:}+{s}^{year})-{s}^{year}=\widehat{\mu\:}\), thus yielding \(\:{X}_{corrected}^{*year}\sim N(\mu\:,{{\sigma}}^{{*}^{2}})\), with a purely random error compared to creaSC, \(\:{X}_{corrected}^{*year}=X+{E}_{r}^{*}\).

For calendar years without individuals that had both creaeMR and creaSC available to estimate the year-specific correction factor, we used\(\:{\:s}^{year}=0.5*{s}^{year+1}+{s}^{year-1}\), or \(\:{s}^{year}\:\)from the first or last year. We utilized bias-corrected creaeMR for further analyses and merged them to creaSC by person and date-of-exam.

Generating the eGFR data based on creaSC and creaeMR

We utilized the combined data of creaSC and bias-corrected creaeMR to derive eGFR via the most recent, race-term free, formula14. For this, we used age-at-exam (i.e. difference between date-at-exam and date-of-birth) and sex (reported at baseline SC visit). We defined Chronic Kidney Disease (CKD) as eGFR < 60 mL/min/1.73m2).

Integrating GP-eMR data on kidney function relevant diagnostic codes

We also extracted recorded information on kidney function relevant diagnostic codes from UK Biobank “GP-clinical” and the corresponding date: (i) severe kidney disease (End-stage Kidney Disease, ESKD; Acute Kidney Injury, AKI), (ii) intervention (nephrectomy, dialysis, kidney transplantation) and (iii) other conditions relevant to kidney function (pregnancy, diabetes). For this, we used the Coding System Lookups and Mappings Dictionary (Version 3, May 2021) to identify the respective Read v2 and CTV3 read codes (Supplementary Table 1, Supplementary Note 2). These data were then merged with serum creatinine data based on date-of-exam.

Statistical analyses on final dataset

For our final dataset including kidney-relevant diagnostic codes and eGFR based on creaSC or bias-corrected creaeMR, we derived descriptive statistics of the included UK Biobank study participants. For this, we used SC-baseline information regarding lifestyle (smoking; body-mass-index, BMI), diabetes (HbA1c > 6.5%, self-report or medication) and CKD status (SC-based eGFR).

We evaluated a potential selection of UK Biobank participants into the “GP-clinical”. We thus compared descriptive statistics of individuals in “GP-clinical” (GP-clinical members) versus individuals not in “GP-clinical” regarding lifestyle, diabetes or CKD status assessed at SC-baseline. We also compared descriptive statistics between individuals with many versus few assessments regarding (≥ 10 versus ≥ 2 eGFR assessments).

We then investigated whether the derived data were suitable for research on eGFR decline. For this, we restricted to individuals with ≥ 2 eGFR assessments at least one year apart, censoring eGFR values after onset of kidney-relevant events (for pregnancy, excluding values ± 6 months before and after). We derived person-specific annual eGFR decline as difference between last and 1st eGFR assessment divided by the number of years-in-between. First, we derived mean annual eGFR decline (and 95%-CI) with and without bias-correction for creaeMR. Second, we used multivariable linear regression to quantify risk factor association with annual eGFR decline. For this, we applied two models: (i) one model including age at first assessment (centered at 40 years), sex, and one genetic variant known for strong association with eGFR decline15 (UMOD, rs77924615) and (ii) another model without the genetic variant but adding smoking (self-report), obesity (body-mass-index, BMI, ≥ versus < 30 kg/m2; measured weight and height) and diabetes (self-report; all from SC-baseline, assuming that this is a reasonable proxy for these risk factors at first assessment). Third, we restricted to individuals with any AKI record and estimated mean annual eGFR decline (and 95%-CI) around an incident AKI event using eGFR assessments ≥ 6 months (as close as possible to the 6 months) before and after AKI.

Results

Serum creatinine measurements from the SC visits and GP-eMR

When extracting creaSC from the baseline SC visit (year of exam 2006–2010), we yielded measurements for 425,147 individuals. For 16,446 individuals, creaSC was available from the follow-up SC visit (year of exam 2012–2013). Together, this resulted in 15,314 individuals with 2 creatinine values over up to 6.1 years (median time = 4.4 years) and 410,965 individuals with exactly one creatinine value.

When extracting creaeMR from “GP-clinical”12, we yielded 1,701,710 raw creatinine values for 199,482 individuals. After quality control, this resulted in 1,660,581 creaeMR values for 199,396 individuals (Supplementary Fig. 1). This included 23,188 individuals with exactly 2 creatinine measurements and 151,728 with ≥ 3 measurements (median number of measurements per person = 7.00, max = 288). Year of exam (i.e. year of measurement) was as early as 1950 up to 2017 resulting in a time between 1st and last measurement of up to 60.2 years (median time = 8.12 years).

When merging creaeMR and creaSC by date-of-exam (i.e. date of measurement for eMR; date of SC-visit for SC), we yielded 2,102,174 creatinine values for 454,907 individuals. Thus, creaeMR substantially extended the longitudinal information on eGFR for UK Biobank participants from 4 years to up to 60.2 years and from a maximum of 2 to up to 289 measurements per person (Fig. 1A, B). The dataset included 206,063 individuals with ≥ 2 measurements (median time between 1st and last assessment = 8.71 years; median number of measurements per person = 6.00, Fig. 1C, D).

Fig. 1
figure 1

Number of creatinine measurements from study center (SC) and electronic Medical Records (eMR). When combining SC- and eMR-based creatinine values after quality-control, we obtained 2,102,174 creatinine values from 454,907 individuals. We show (A) the distribution of individuals by their number of available creatinine measurements over time (n = 248,844 with = 1 measurement indicated as SC- or eMR-derived in black or red, respectively; n = 206,063 with ≥ 2 measurements over time in gray), (B) the distribution of individuals by the time between 1st and last measurement (“assessment period”), (C) the distribution of the number of measurements by calendar year of assessment, and (D) the distribution of the number of measurements available by age groups (age-at-exam, “exam” referring to the GP-record for eMR or to the examination at the SC visit). For C&D, red colour represents eMR and black colour represents SC data.

Bias-corrected creaeMR values

We hypothesized that creaeMR measurements conducted in years substantially earlier than 2009 were subject to a measurement bias. We further hypothesized that this bias became smaller the closer the measurement year approached 2009, where in-vitro diagnostic manufacturers started to improve standardization of creatinine assays. To investigate this, we focused on the 70,231 individuals with both creaSC and creaeMR available from the same calendar year (i.e. year-of-exam 2007–2010 and 2011&2012). We observed higher median creaeMR compared to creaSC in earlier years, which gradually converged towards nearly identical median values in 2013 (Fig. 2A, Supplementary Fig. 2).

Fig. 2
figure 2

Bias-correction of serum creatinine values from electronic Medical Records (eMR) data. We contrasted serum creatinine values from study center (SC) versus eMR by calendar year-of-exam (i.e. year of blood draw & measurement for eMR, year of blood draw for SC with centralized measurement in 2007–2010 and 2012–2013). (A) We show the distributions of creatinine from eMR (red; quality-controlled, not bias-corrected) and SC (black). (B) We show creatinine values from SC- versus eMR (quality-controlled, not bias-corrected) among 70,231 individuals with both measurements from the same calendar year (using the eMR-creatinine value closest in time to the SC-value). Also shown is the estimated bias (‘bias’, i.e. ratio of geometric means of SC-values and eMR-values) and Spearman correlation coefficient (‘corr’). Grey lines indicate the identity. Red lines indicate the line through null with a bias slope. Sex is color-coded (blue: men, pink: women). (C) Shown are the distributions of bias-corrected creatinine from eMR (green; bias-corrected) and SC (black). The year-specific bias estimate was used as correction factor; for other years, we used proxies (2011: 0.93 derived as exp of average between ln(bias) for 2010 and 2012; years before 2007: 0.84; years after 2013: 1.0 - no correction).

We quantified the bias of creaeMR based on an additive error model on the log-scale as the difference between mean ln(creaeMR) and mean ln(creaSC) per calendar year, \(\:{s}^{year}\). This can also be interpreted as multiplicative bias on the original scale, \(\:\text{e}\text{x}\text{p}\left({s}^{year}\right)\), derived as ratio of the geometric means of creaeMR and creaSC (“Methods”). We found a bias of factor 0.84 for the year 2007, which converged to near unity until year 2013 (0.84, 0.84, 0.90, 0.91, 0.95 and 0.97 for the years 2007–2010 and 2012–2013, respectively; Fig. 2B). We obtained nearly the same bias estimates when winsorizing extreme values (beyond mean ± 3 standard deviations on log-scale:\(\:\:\text{e}\text{x}\text{p}\left({s}^{year}\right)\) =0.84, 0.84, 0.90, 0.91, 0.95 and 0.98), documenting the lack of influence of extreme values.

In order to obtain bias-corrected creaeMR for all individuals, we used the derived year-specific correction factors \(\:\text{e}\text{x}\text{p}\left({s}^{year}\right)\) to correct creaeMR measurements from the respective year (i.e. 2007–2010, 2012 and 2013). For years where the correction factor could not be estimated directly, we used proxy correction factors: (i) average of 2010 and 2012 on log-scale for 2011 (0.93); (ii) the 2007 factor for measurements before 2007 (0.84), (iii) and 1.0 (no correction) for measurements after 2013. When comparing the year-specific distributions of corrected creaeMR with creaSC distributions, we now found similar distributions and median values (Fig. 2C).

Description of the resulting UK Biobank dataset on 454,907 individuals with eGFR including eGFR trajectories

Next, we merged creaSC and creaeMR by date-of-exam and derived eGFR using CKD-EPI 2021 14. When comparing creaeMR-based eGFR (eGFReMR) with creaSC-based eGFR (eGFRSC), we observed a bias in eGFReMR without bias-correction of creaeMR and no bias with correction (Supplementary Figs. 3, 4). We used bias-corrected creaeMR and corresponding eGFReMR for the following. We found that the distributions of eGFReMR and eGFRSC were comparable by 5-year age groups (Fig. 3A).

Fig. 3
figure 3

Descriptive of combined data from electronic Medical Records (eMR)- and study center (SC)-based eGFR. The dataset for eGFR from SC and eMR combined comprised 454,907 individuals with ≥ 1 eGFR assessment and overall 2,102,174 assessments of eGFR derived from bias-corrected creatinine. (A) Shown are age-group-specific eGFR distributions14 based on data from SC (black) and eMR (bias-corrected, green). This was limited to the 1st eGFR assessment from SC or eMR, respectively. We also show eGFR distributions at (B) first and (C) last assessment (omitting 48 extreme values ≥ 150mL/min/1.73m2).

The final data comprised 454,907 individuals with ≥ 1 eGFR assessment based on creaSC and/or creaeMR (54.2% women) and with overall 2,102,174 eGFR assessments (Table 1). Mean age was 55.9 years and 59.8 years at 1st and last eGFR assessment respectively; mean eGFR was 95.1 and 91.1 mL/min/1.73m2, respectively (Fig. 3B, C). The data consisted of (i) 248,844 individuals with = 1 eGFR assessment, mostly from SC baseline visit (98.26% from SC baseline, 0.32% from SC follow-up, 2.21% from eMR), (ii) 33,851 individuals with = 2 assessments (60.0% of these with one assessment from SC and one from eMR), and (iii) 172,212 with ≥ 3 assessments (i.e. ≥k-2 assessments from eMR, k being the number of assessments per person). Thus, the dataset now included 33,851 + 172,212 = 206,063 individuals with repeated eGFR values over time (eGFR trajectories).

Table 1 Participant descriptive statistics for the UK Biobank data on eGFR including eGFR trajectories.We show participant characteristics using information from the study center (SC) visit at baseline and descriptive statistics of temporal aspects of the combined SC- and (bias-corrected) eMR-based data on eGFR14. These descriptive statistics are shown for all individuals and restricted to individuals with ≥2 or ≥10 eGFR assessments (further details in Supplementary Table 2). In the table header, ‘n’ denotes the number of individuals and ‘m’ denotes the number of eGFR assessments. Presented are mean and standard deviations, if not stated otherwise.

The characteristics of these individuals with ≥ 2 eGFR assessments (regarding age, %women, %smoking, BMI, eGFR, and %CKD from SC-baseline; n = 206,063) were similar as in the overall data, but individuals with ≥ 10 eGFR assessments (n = 69,506) were older, with higher BMI, lower eGFR, higher %CKD (Table 1).

This difference between individuals with many versus few assessments was also observable when restricting to individuals that were part of “GP-clinical” (GP-clinical members, i.e. individuals with ≥ 1 eGFR assessed from creaeMR; n = 199,396), while there was no difference between GP-clinical members and individuals that were not in GP-clinical (n = 199,396 versus 255,511, respectively; Supplementary Table 2). Thus, we found evidence for a selection towards older and less healthy individuals among those with many creaEMR values over time compared to fewer creaeMR values, in line with literature7,8. However, there was no selection observed for being a GP-clinical member versus all UK Biobank participants.

Description of UK Biobank eGFR trajectories data for 206,063 individuals augmented with kidney function relevant diagnostic codes

The generated UK Biobank eGFR trajectories dataset consisted of 206,063 individuals with ≥ 2eGFR assessments over time (total of 1,853,330 eGFR assessments; Table 1). Median time between 1st and last assessment was 8.5 years (maximum 60.2 years), median number of eGFR assessments over time per person was 6.0 (maximum 289 assessments). Most of these individuals had ≥ 1 eGFR assessed from creaeMR, (n = 195,885; 95.1%), thus were members of GP-clinical.

For GP-clinical members, we extracted records of kidney disease or intervention (AKI, ESKD, Nephrectomy, Dialysis, kidney transplantation) as well as other kidney function related codes (pregnancy, Diabetes) and the corresponding date from UK Biobank “GP-clinical” (Supplementary Table 1). We merged this information to the eGFR trajectories by date-of-exam. For the 195,885 GP-clinical members with ≥ 2 eGFR assessments over time, we yielded 74,501 records of any kidney disease, intervention, pregnancy or diabetes. These included 94, 49, 33, 30, 428 individuals with any record of AKI, ESKD, kidney transplant, dialysis, or nephrectomy before the 1st eGFR assessment and 562, 228, 180, 142, or 536 after (Table 2).

Table 2 Recorded kidney function relevant diagnostic codes among UK Biobank individuals in GP-clinical. We show descriptive statistics on recorded kidney disease, renal replacement therapy, or other diagnostic codes that are kidney function relevant. We show this for all individuals in “GP-clinical” (i.e. have ≥1 eGFR assessment from creaeMR; n=199,396) and restricted to individuals with ≥2 or ≥10 eGFR assessments. In the table header, ‘n’ denotes the number of individuals. We differentiated between diagnostic codes recorded before and after the first available creatinine measurement. If a diagnostic code is reported multiple times for an individual, it is counted once in each category. Results are presented as absolute count and as percentage in parentheses.

The proportion of individuals with kidney disease or intervention recorded after the 1st eGFR assessment was slightly lower among individuals with ≥ 2 eGFR assessments than when restricting to individuals with ≥ 10 assessments (Table 2; e.g. 0.3% versus 0.7% for AKI). This data can be utilized e.g. for focused analyses on patients using trajectories after AKI, or for analyses on individuals without AKI, ESKD, or intervention affecting eGFR.

Some aspects of UK Biobank eGFR trajectories exemplifying its utility

To document the utility of the generated longitudinal eGFR data with bias-corrected creaeMR, we estimated annual eGFR decline without and with bias-correction. For this, we restricted to the 202,794 individuals with ≥ 2 eGFR assessments at least 1 year in-between, censoring individuals after observed eGFR < 15 mL/min/1.73m2 or recorded kidney disease or renal replacement therapy (or nephrectomy of any causes). In the data integrating eMR- and SC-based eGFR, we estimated mean annual decline (difference between last and 1st eGFR assessment divided by years in-between) at 1.04 mL/min/1.73m2 per year (95%-CI = 1.03–1.05) with bias correction and 0.11 mL/min/1.73m2 per year (95%-CI = 0.10–0.12) without. This documented a plausible eGFR decline in line with literature1,2,3 in the data with bias correction, but not without.

We also evaluated the impact of bias-correction on risk factor association with annual decline using multivariable linear regression (“Methods”, considering a risk factor associated when the 95% CI did not overlap zero): (i) in a model with age at first assessment, sex, and the genetic variant, rs77924615, known for its large effect on eGFR16 and eGFR decline15, we observed a slightly deflated genetic effect size without versus with bias-correction (Table 3); (ii) in a model without the genetic variant but adding smoking, obesity, and diabetes, we found smoking and diabetes (but not obesity independent of diabetes) associated with increased annual decline in the data with bias-correction, but smoking was not associated in the data without bias-correction. Thus, the estimation of risk factor associations appeared to be affected by the bias, but was less affected than the estimation of mean eGFR decline.

Table 3 Impact of bias-correction on risk factor association with annual decline. We were interested in the influence of the bias-correction of creatinine from electronic medical records (eMR) on risk factor association with annual decline. For this, we present the results of a multivariable linear regression model examining annual decline computed as annualized difference between two eGFR assessments as the outcome. The model includes the covariables (i) age at first exam, sex, and the genetic variant rs7792461515,16 and (ii) adding smoking, obesity, and diabetes at SC baseline (without the genetic variant). We present beta-estimates [95%-CIs] of (left side) not-bias corrected creatinine and (right side) bias-corrected creatinine.

To provide a further example, we evaluated whether the (bias-corrected) eGFR trajectories reflected impaired kidney function during AKI: for the 354 individuals with incident AKI and ≥ 1 eGFR assessment 6 months before and after AKI, mean annual eGFR decline was four times larger than the overall decline (3.93 mL/min/1.73m2 per year, 95%-CI = 2.82–5.05). An example of an eGFR trajectory with such a rapid eGFR decline around AKI is illustrated in Fig. 4A. Two further examples illustrate eGFR trajectories with recorded ESKD (also eGFR < 15 mL/min/1.73m2) followed by transplantation and of an individual with dialysis, respectively (Fig. 4B, C).

Fig. 4
figure 4

Showcases of eGFR trajectories for individuals with recorded kidney-relevant kidney data. Shown are three examples of eGFR trajectories: (A) a trajectory with an episode of Acute Kidney Injury (AKI) and subsequent partial eGFR recovery, (B) a trajectory with eGFR decline down to 10 mL/min/1.73m2, a recording of ESKD, and eGFR recovery after kidney transplantation and (C) a trajectory with eGFR decline down to 15 mL/min/1.73m2 and a record of dialysis. The x-axis depicts the time since 1st eGFR assessment in years, the y-axis depicts the eGFR value in mL/min/1.73m2. Red and black dots indicated whether the eGFR was based on electronic Medical Records or from study center assessment, respectively. The shaded area indicates eGFR < 60 mL/min/1.73m2 (i.e. defining Chronic Kidney Disease, CKD). Events of ESKD, AKI and transplant are visualized at vertical dotted lines in green, red and blue, respectively.

Overall, we curated UK Biobank data and generated new variables for 454,907 individuals with at least one eGFR assessment, including 206,063 individuals with eGFR trajectories (i.e. at least 2 eGFR assessments) and kidney-relevant diagnostic codes.

Discussion

With this work, we present quality-controlled UK Biobank data of 2,102,174 SC- and eMR-based creatinine measurements and calculated eGFR in 454,907 individuals. For 206,063 of these individuals, eGFR trajectories are available with two or more eGFR assessments over time with a medium time between 1st and last assessment of 8.5 years. Purely SC-based information on creatinine and eGFR at two time points was available for 15,314 individuals with medium follow-up time of 4.4 years. Thus, by including eMR-based information on serum creatinine from GP-clinical, we extended the longitudinal UK Biobank data on eGFR by > 10-fold in terms of number of individuals and > 2-fold in terms of medium length of follow-up. We augmented this dataset with kidney-relevant diagnostic codes from GP-clinical (Fig. 5).

Fig. 5
figure 5

Workflow from raw data to the curated data. This schematic illustrates the analytical flow of the data generation. We show which variables were used from study center (SC) regarding creatinine and other SC-based values and what eMR data were used from GP-clinical. We used eMR-data for creatinine12 and other kidney function relevant diagnostic codes (onset of Acute kidney injury, AKI or Endstage kidney disease, ESKD; interventions nephrectomy, dialysis or transplantation and other kidney relevant codes on Diabetes17 or pregancy). Date of recorded “event” was used from data field “event_dt’. We derived calendar-specific bias correction factor for creaeMR in individuals with creaeMR and creaSC available in the same calendar year (dashed lines) and used proxy correction factors for other calendar years. We combined creaSC and bias-corrected creaeMR with other SC-based data and eMR-based diagnostic codes. ‘n’ denotes the number of individuals and ‘m’ the number creatinine measures. We colour-coded uncorrected eMR-creatinine (red) and bias-corrected eMR-creatinine (green).

One key aspect of integrating eMR-based creatinine measurements to SC-based measurements in UK Biobank was the quantification and accounting for systematic measurement error. The creaeMR values stemmed from routine outpatient care and were prone to bias inherent to using creatinine assays before improved standardization. UK Biobank provided the opportunity to compare these measurements with creaSC, which were measured centrally in 2019 using standardized biobanked serum and a modern enzymatic assay calibrated to National Institute Standardized Technology (NIST, reference material 967). The observed bias of factor 0.84 for calendar years 2007 and 2008, which diminished to factor 0.90 for the year 2009 was in line with correcting values from Jaffe assays used before 20099. The observed bias of 0.95 and 0.97 for the years 2012 and 2013 reflected the increased use of standardized assay. When using these bias estimates to correct creaeMR values, we were able to show that the average creaeMR and derived eGFReMR aligned with the average creaSC and eGFRSC, respectively. Furthermore, we showed that the estimated annual eGFR decline was 0.11 versus 1.04 mL/min/1.73m2 per year without and with the bias-correction, with the latter being in line with literature1,2,3. When exploring the impact of bias-correction on risk factor association with eGFR decline, we found slightly deflated risk factor estimates in the data without compared to with bias-correction, but less impact than for the estimation of mean eGFR decline. This underscored the utility of this bias-correction when integrating eMR-based eGFR from past creatinine measurements in UK Biobank GP-clinical. The notable change before and after correcting for bias also calls for cautious interpretation of eGFR decline estimates in other data when creatinine values from standardized assessments via SC are integrated with values from eMR obtained before laboratory standards were implemented and assays were unknown. Overall, we provide an approach to correct for systematic measurement error in eMR-based serum creatinine, when gold standard measurements of creatinine were available. We considered SC-based creatinine measurements gold standard, since these measurements were all conducted with one assay in one laboratory in a standardized way.

For studies with past eMR-based creatinine measurements but no gold standard measurements, this might not be fully generalizable. When the studies’ eMR-based creatinine measurements are derived in a comparable fashion as in UK Biobank ‘GP-clinical’, a similar calendar year dependent bias for measurements before 2009 might apply. However, the generalizability of the derived correction factors to other countries and health care systems is probably limited. Still, the here presented correction factors might provide a reasonable proxy of this bias, at least they give a notion of the extent of the problem, and accounting for some bias might be better than assuming no bias. Further evaluations of the bias in creatinine measurements before laboratory standards became available in other GP-eMR data in UK or other countries would be warranted. This extends to other biomarkers of kidney function like Cystatin C – or, actually, all laboratory biomarkers.

There are some limitations to be acknowledged. Our bias-correction works on average, which makes the data usable to estimate average kidney function decline or risk factor association. However, the bias-correction does not yield valid person-specific corrected creatinine and eGFR values, since the specific assay for an individual’s measurements was unknown. Furthermore, the bias quantification was only possible in calendar years 2007 to 2010 and from 2012 to 2013 with both creaeMR and creaSC values available and the correction factors had to be extrapolated for any other years. This does not reflect the large timespan of up to 60 years and measurements made as early as 1950, which implies changes in technology and laboratory practice that are not covered by our evaluations. Finally, we have only applied one approach to bias-quantification. For example, we derived an error model and bias per calendar year, while other error models might use the time of array manufacturing continuously. More detailed calibration data from laboratories or manufacturers’ manifests would be required, but might be difficult to obtain for arrays manufactured as far back as 1950.

Another key aspect to consider when using eMR-based data for epidemiological research is the potential enrichment of individuals with disease among individuals with eMR-based data7,8. This might be applicable to UK Biobank GP-clinical in the sense that individuals with suspected kidney impairment obtained more regularly creatinine measurements. While we found UK Biobank individuals with many eMR-records on creatinine to be less healthy than individuals with few records, we found little difference between participants that had eMR-records (i.e. were members of the GP-clinical) versus other participants. This might be due to the fact that serum creatinine is a typical routine biomarker, which is not only assessed upon indication for kidney impairment. The UK Biobank data on eGFR trajectories augmented with diagnostic codes from GP-clinical can be utilized for various aspects of analyses focusing on individuals with or without certain diagnostic codes.

In summary, we provide one approach to quantify bias in eMR-based creatinine measurements in UK Biobank that may be applicable also generally when gold standard measurements are available. We also provide a substantially extended longitudinal UK Biobank data resource on eGFR by integrating SC-based and bias-corrected eMR-based creatinine measurements that is usable for future research on eGFR decline. Our results may increase awareness of potential bias in past eMR-based biomarkers obtained before laboratory standards were available and when utilized assays were unknown. However, our results also underscore the opportunities from eMR-based data to enhance longitudinal information on kidney function, which is important for future studies on mechanisms of kidney function decline in the population.