Fig. 3: Leveraging GAI to create an eDRS4C and validation on an independent case-control dataset. | Nature Medicine

Fig. 3: Leveraging GAI to create an eDRS4C and validation on an independent case-control dataset.

From: A microRNA-based dynamic risk score for type 1 diabetes

Fig. 3

a, Rationale for using GAI. Left, A theoretical distribution of a single variable across individuals from four contexts is shown. GAI allows the creation of synthetic samples that can maximize all probabilities of variable expression (gray filled plots on the right), while preserving the original data distribution. b, We used the Gaussian copula Synthetic Data Vault (SDV) workflow to create 1,000, 10,000 or 100,000 synthetic control samples. The principal component analysis (PCA) plots present the distribution of real (control and T1D) and synthetic (gray) datasets. These augmented (real + synthetic) datasets containing 1,000, 10,000 or 100,000 synthetic samples were used to develop the eDRS4C model. c, Performance characteristics of the eDRS4C models developed from the augmented datasets containing 1,000, 10,000 and 100,000 synthetic samples on 662 samples (controls n = 364 and T1D n = 298) in an independent validation cohort. a.u., arbitrary unit.

Back to article page