Fig. 2: In-context learning for vision-language models.
From: In-context learning enables multimodal large language models to classify cancer pathology images

Panel A shows that classification accuracy on a simple task detecting tumor (TUM) versus non-tumor (NORM) tiles from the CRC100K dataset can drastically be improved by leveraging ICL through randomly sampled, few-shot image samples. Additionally, we compare random and kNN-based image sampling on two datasets and show that kNN-based image sampling improves model performance in classifying images from both MHIST (left) and PatchCamelyon (right), especially when scaling the number of few-shot samples (Panel B). Note that samples have been slightly shifted on the x-axis for visibility. The y-axis denotes the mean accuracy with lower and upper 2.5% confidence intervals (CIs) from 100,000 bootstrap iterations for both panels, respectively. Source data are provided as a Source Data file.