Extended Data Fig. 1: Statistics of pretraining and fine-tuning datasets.
From: A generalist vision–language foundation model for diverse biomedical tasks

(a) Modality distribution of pretraining data used in BiomedGPT. (b) For the training and testing splits of datasets used in downstream fine-tuning, we typically follow the format of number of training samples/number of validation samples/number of test samples to detail each dataset. More details of the data split are described in Supplementary Table 7.