Table 1 Characteristics of the Danish and US-VA datasets

From: A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories

General cohort information

Danish dataset

US-VA dataset

Dataset timeline

1977–2018

1999–2020

Total n patients

8,110,706

2,962,383

Male (%)

4,030,504 (49.7%)

2,538,762 (85.7%)

Female (%)

4,080,202 (50.3%)

423,621 (14.3%)

Median n disease codes per patient

22

188

Median length of trajectory in years

23.0

12.0

PC cohort information

Total n patients

23,985

3,869

Male (%)

11,880 (49.5%)

3,741 (96.7%)

Female (%)

12,105 (50.5%)

128 (3.3%)

Median n disease codes per patient

18

121

Median length of trajectory in years

17.0

8.0

Median age at PC diagnosis

70.0

68.0

n disease codes 0–3 months pre-PC

95,358

125,305

n disease codes 3–6 months pre-PC

27,131

56,198

n disease codes 6–12 months pre-PC

38,109

97,911

n disease codes >12 months pre-PC

480,830

1,188,199

  1. PC, pancreatic cancer.