Fig. 2: Comparison of model performance for the i2b2 (n2c2) Clinical Trial Eligibility Challenge.
From: Synthetic data distillation enables the extraction of clinical information at scale

Evaluation includes the Training Set (a, c) because these data were not included during any of the pre-processing, hyperparameter selection or fine-tuning process of the models. All evaluations are zero-shot, but performance on Training (a, c) is separated from Test set (b, d) for clarity. (70B and 8B are Llama-3.1, 3B and 1B are Llama-3.2).