Extended Data Fig. 3: Preprocessing and filtering of US-VA disease trajectory datasets. | Nature Medicine

Extended Data Fig. 3: Preprocessing and filtering of US-VA disease trajectory datasets.

From: A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories

Extended Data Fig. 3

Filtering of the US-VA patient registries prior to training. For the US-VA dataset, around 3 million of patients were randomly sampled due to computational limitations and patients with ICD-9/10 for pancreatic cancer, but without entries in US-VA cancer registry were excluded. Similar to the Danish dataset filtering, short trajectories (<5 diagnosis codes) were removed and patients were split into Training (80 %), Validation (10%) and Test set (10%).

Back to article page