Main

Phenotypic screening is a powerful approach for identifying clinically relevant treatments1,2,3,4,5. To date, successful efforts have used basic readouts (for example, viability) in simple models of infectious or monogenic diseases where the underlying mechanisms are easy to recapitulate faithfully4. However, most human diseases arise in complex multigenic and multicellular settings that are not well captured by simple assays and model systems. While recent advances in phenotypic screening can now directly link high-content readouts to genetic perturbation libraries (for example, Perturb-seq6 and compressed Perturb-seq7), these methods still have several drawbacks. For example, genetic approaches remain difficult to use in some models and do not mirror the pleiotropic manner in which compounds and intercellular signals act in vivo. Furthermore, most genetic hits require follow-up pharmacologic development, disrupting the chain of translatability4,7. As a result, there remains a wide array of human maladies for which phenotypic drug discovery cannot be applied effectively. To accelerate drug discovery efforts, we require improved phenotypic screens that have three properties. First, they should leverage in vitro or ex vivo cellular models that maintain high-fidelity to in vivo disease contexts such as cell type, epigenetic state or tissue of residency. Second, they should use high-content readouts, such as single-cell genomics or high-content imaging, that can comprehensively capture the complex cellular phenotypes associated with a disease state. Third, they should prioritize biochemical perturbation libraries built with small molecules and/or protein ligands that, unlike genetic perturbations, can be directly brought forward as therapeutic substrates.

At present, it is challenging to conduct such high-content phenotypic biochemical perturbation screens because of two limitations of scale. First, high-content readouts, such as single-cell RNA sequencing (scRNA-seq), are orders of magnitude more expensive than simple functional assays such as cell viability, growth or secretion. While methods exist for increasing efficiency (for example, compressed sensing8 and antibody- or lipid-based sample multiplexing9, they can still be costly to implement on a per-compound basis and/or have fundamental throughput limitations. Second, higher-fidelity models, such as those derived from clinical specimens, are more challenging to generate at scale than less physiologically representative systems, like cell lines. In the most representative cases that use tissue explants or specific model organisms, biomass limitations practically restrict the number of perturbations that can be tested; in expandable models of intermediate complexity, expansion can be constrained by phenotypic changes over time, as evidenced by epigenetic drift in intestinal organoids10, clonal drift in glioblastoma organoids11 and altered RNA and genetic states in pancreatic cancer organoids12. As one approach to enhance scale, past studies have pooled perturbations together.13 While this fundamental approach could improve throughput and reduce assay costs per compound, previous efforts have been limited to simple readouts and libraries of mostly inactive small molecules, leaving unanswered questions about the use of such approaches and how to effectively implement them.

Here, we introduce a generalizable, scalable and efficient method for compressing phenotypic screens with high-content readouts. Our approach pools perturbations to reduce sample input, cost and labor requirements and infers single perturbation effects with a regression-based framework to help prioritize follow-up investigation (Fig. 1). Unlike for cell-intrinsic perturbations, such as pooled clustered regularly interspaced short palindromic repeats (CRISPR) screens, where one can simultaneously probe multiple features in a single sample in parallel with tagging strategies that link each specific perturbation to an individual cell, there is no analogous method for pooling cell-extrinsic factors, such as small-molecule drugs. To establish the feasibility and limits of our approach, we conducted a series of phenotypic screens—both conventional and compressed—using a bioactive small-molecule library paired with a high-content imaging readout. Across a wide range of pool sizes, we consistently identified the compounds with the largest ground-truth (GT) effects as hits in our compressed screens (CSs), thoroughly vetting the robustness of the approach. To demonstrate the power of our method and its generalizability to other models and readouts, we applied it in two biological discovery settings. In the first, we examined the impact of tumor microenvironment (TME)-relevant recombinant protein ligands on early-passage patient-derived pancreatic ductal adenocarcinoma (PDAC) organoids with an scRNA-seq readout, as a subset of these factors had been shown previously to alter PDAC cell states and therapeutic responses12. We found that almost all our top compressed hits drove conserved transcriptional responses when screened individually. Moreover, we observed that transcriptional responses to specific cytokines were distinct from those described in canonical reference databases and uniquely associate with clinical outcomes in a separate PDAC cohort. In the second, we generated a systems-level map of drug effects in diverse cell types by measuring the immunomodulatory impact of a small-molecule mechanism of action (MOA) library on lipopolysaccharide (LPS) and interferon-β (IFNβ) responses in human peripheral blood mononuclear cells (PBMCs). Working in a multi-cell type model with multilayered perturbations, we uncovered several compounds with pleiotropic effects on different gene expression programs (GEPs), both within and across different cell types, and confirmed the heterogeneous effects of one such hit. In summary, our method provides a general means to conduct high-content phenotypic screens in biological models of varying complexity with substantially reduced sample, reagent and labor costs. Our results demonstrate the feasibility of pooled phenotypic screens compressing small molecules and biologics, and may unlock new routes for discovery efforts by connecting experimental perturbations to basic and clinical observations.

Fig. 1: Compressed screening with high-fidelity model systems and high-content assays.
figure 1

a, Conceptual visualization of how assay and biological model complexity may limit the scalability of conventional screens and how this boundary may be shifted by compressed screening. b, In compressed screening, a set of N perturbations is combined into pools of size P with each drug replicated R times across all pools. Pooling so as to ensure specific perturbations are not repeatedly paired enables the use of linear deconvolution to identify the effects of each. Overall, this reduces the number of samples, S, required to conduct a conventional phenotypic screen (\({S}_{{\rm{conventional}}}=N\times R\)) by a factor of P (\({S}_{{\rm{compressed}}}=\frac{N\times R}{P}\)). The example illustrates these ideas for N = 8, R = 4 and P = 4. c, Visualization of the construction of a CS with an acoustic liquid handler. d, Regression framework for inferring the effects of individual perturbations in a CS: we solve for the coefficient matrix (β) that describes the effect of perturbations (whose assignment to pools is denoted in the design matrix X) on the measured features of the screen (matrix Y).

Results

Scaling up biochemical phenotypic screens with compression

To increase the throughput of phenotypic screens and unlock the use of high-throughput assays and model systems of increasing complexity (Fig. 1a), we examined the feasibility of compression through pooling external perturbations, such as biological ligands or chemical compounds (Fig. 1b). In this approach, we combine N perturbations into unique pools of size P, ensuring each perturbation appears in R distinct pools overall (Fig. 1b,c). Relative to a conventional screen where replicates of each perturbation are screened individually, a CS reduces the sample number, cost and labor requirements by a factor of P, which we refer to as P-fold compression. To analyze the output of a CS, we developed an assay-independent computational framework for deconvoluting the effects of individual perturbations on the basis of regularized linear regression and permutation testing (Fig. 1d), inspired by previous work inferring the effects of guide RNAs on genes in pooled CRISPR screens6,14.

Developing and benchmarking CS with high-content imaging

We conducted a series of GT experiments in which we examined the impact of a 316-compound Food and Drug Administration (FDA) drug repurposing library (Supplementary Table 1) on the morphology of U2OS cells to establish a reference dataset for benchmarking the feasibility of compression (Fig. 2a). We chose this library because it would represent a challenging use case for pooling and inferring the impact of individual perturbations (because bioactive compounds with large effects would frequently co-occur in our pools); we selected this model, meanwhile, so that we could readily establish GT (as we could generate enough input material to test each perturbation independently, as well as multiple compressions). Given the size of the GT screen, we opted to measure cellular phenotypes with Cell Painting15, a cost-effective high-content morphological profiling readout that multiplexes six fluorescent dyes (probes: nuclei, Hoechst 33342; endoplasmic reticulum, concanavalin A–AlexaFluor 488; mitochondria, MitoTracker Deep Red; F-actin, phalloidin–AlexaFluor 568; Golgi apparatus and plasma membranes, wheat germ agglutinin–AlexaFluor 594; nucleoli and cytoplasmic RNA, SYTO14), imaged in five channels, to examine multiple cellular components and organelles (Fig. 2b). We constructed and applied a pipeline to analyze our Cell Painting datasets, inclusive of illumination correction, quality control, cell segmentation, morphological feature extraction, plate normalization and highly variable feature selection, yielding 886 informative morphological attributes for further analysis16 (Supplementary Table 2 and Methods).

Fig. 2: Compressed screening identifies compounds with largest effects in a GT setting.
figure 2

a, Overview of CS benchmarking experiments used to assess the morphological impacts of 316 FDA-approved compounds on U2OS cells. b, Overview of conventional GT screen designed to address a (Methods). c, The effects of the 316 compounds were calculated relative to DMSO controls using the MD and the drug × feature matrix was clustered to identify GT drug-associated phenotypes. d, One-sided Fisher’s exact enrichments (−log10(P value)) of the features differentially enriched in each GT phenotype (log2 fold change > 3) from the seven classes of Cell Painting features (five cellular components plus area or shape and neighborhood features). Each of the seven is further broken down on the basis of whether the observation pertains to the entire cell, the cytoplasm only or the nucleus only. The righthand bar visualizes the mean number of cells per well across all samples in each GT phenotype. e, Heat map (−log10(P value), one-sided permutation test with no correction) showing the top five drugs associated with each of the eight GT clusters. f, Overview of our compressed screening benchmarking experiment, which tested a range of compressions (\({S}_{{\rm{compressed}}}=\frac{N\times R}{P}\)) for the same 316 drugs (N) by examining several pool sizes (P) and replications (R) to identify the capabilities and limits of the approach. g, CS experimental overview. h, Analytical approach for inferring the effects of each perturbation using regularized linear regression and solving for the coefficient matrix (β). i, Inferred perturbation effects in a CS (scaled L1 norm, y axis) versus those from the GT screen (MD, x axis) for two replicate runs (r, Pearson correlation; two-sided correlation test CS run 1, P value < 2.2 × 10−16; CS run 2, P value < 2.2 × 10−16). j, Correlation of the effect sizes between GT and CS runs across all pool sizes for the perturbations that were significantly associated with any of the phenotypic clusters in the GT screen (one-sided permutation test without multiple hypothesis correction, P value < 0.01). k, Receiver operating characteristic (ROC) curves (false positive rate versus true positive rate) calculated to show the performance of the CS screens in correctly identifying GT hits for each pool size while varying the deconvolution permutation testing threshold (from 0 to 1 in steps of 0.01). l, Mean scaled L1 norm of the perturbations called as hits (scaled L1 norm > 0) in both replicate CSs at each pool size (y axis). The drugs plotted are those that resulted in a statistically significant effect in any pool size (one-sided permutation test without multiple hypothesis correction, P value < 0.01). m, False negative rate, calculated as perturbations with significant MDs in the GT screen but unrecovered in CS among all significant GT perturbations, as a function of pool size in the CS.

To find optimal conditions for our screen, we conventionally assayed nine concentration and time point combinations, evaluating the effects of different treatments, durations, doses and batches on cellular morphology (Extended Data Fig. 1a,b) and viability (Extended Data Fig. 1c,d and Methods). To quantify an overall morphological effect, we computed a vector of median values for each of the 886 morphologically informative features in each sample well and then calculated the average Mahalanobis Distance (MD) between the control and perturbation vectors for each compound (Fig. 2c). The MD is a multidimensional generalization of the z-score2,3,17,18 that has been used extensively to discern effect size in studies deploying high-content cellular morphological assays19,20. On the basis of these pilots, we selected the 24-h time point and 1 μM concentration combination to generate the full GT dataset because these conditions yielded the largest MD coefficient of variation (Extended Data Fig. 1a,b). We next screened all 316 compounds individually with six-fold replication, resulting in 2,088 samples (that is, wells; inclusive of 1,896 perturbation and 192 control (DMSO) samples) to define our GT (Fig. 2b). To identify shared phenotypic responses among these data, we performed dimensionality reduction over the aforementioned 886 morphologically informative features for all wells with a sufficient number of cells (≥50), yielding eight GT phenotypic clusters enriched for distinct morphological properties related to the intensity and distribution of each stain, as well as overall cell shape and structure (Fig. 2c,d, Extended Data Fig. 1e–g and Methods). One of the clusters (cluster 1) was dominated by DMSO control samples, whereas the others (clusters 2–8) were enriched for specific drug perturbations (Fig. 2e). Reassuringly, drugs driving defined morphological responses, albeit with different effect sizes (as determined by MD), shared common MOAs (for example, antineoplastics in cluster 7; Extended Data Fig. 1h,i and Supplementary Table 1).

We next performed a series of matched CSs. Using the same U2OS model, Cell Painting readout and 316-drug library, we thoroughly tested the feasibility and limits of our approach by spanning different degrees of compression (3–80 drugs per pool) and replication (each drug appearing in a total of three, five or seven pools; Fig. 2f,g). After deconvolving the impact of each drug on the 886 morphological features using regularized linear regression and performing quality controls for toxicity (Fig. 2h and Extended Data Fig. 2a–c), we directly compared the CS and GT results. In the CS, the MD between a perturbation and the control cannot be computed directly because the deconvolution results in a point estimate for the effect of each perturbation (coefficient weights), including the controls, rather than a distribution. Therefore, after comparing several metrics, we chose to summarize the effect of each perturbation in the CS using the max-scaled L1 norm of all significant regression coefficients (permutation test; Extended Data Fig. 2d and Methods). We observed a strong correlation between the perturbation effects computed for the GT and CS screens for pool sizes of up to 40 perturbations, with good agreement for results derived from screens with the same pool size but distinct combinations of drugs in each pool (Fig. 2i,j and Extended Data Fig. 2e). By varying the significance threshold for our permutation tests, we found that we could tune the deconvolution to identify either more hits with a higher false positive rate or fewer hits that were more likely to be true positives (Fig. 2k and Extended Data Fig. 2f). Across all pool sizes, we consistently detected significance in the CS (scaled L1 norm > 0) for those compounds with the largest GT effects, while those with a more modest impact were only captured at lower compressions (Fig. 2l,m). Together, our benchmarking results demonstrate that CS is efficient, accurate and reproducible, and that the degree of compression can be tuned by a user to balance throughput with stringency.

Examining TME ligands impact on PDAC organoids with CS

While TME signals are critical features dictating patient survival across myriad cancers21, ex vivo systems typically overlook these microenvironmental features. This can result in a loss of therapeutically relevant transcriptional features in vitro, leading to decreased concordance with in vivo biology12,22. Understanding how multiple TME factors influence response signatures requires systematic perturbation experiments in patient-derived models that can scale with high-dimensional readouts such as scRNA-seq; however, cost, labor and biological material input requirements have limited these measurements using conventional approaches. To overcome this barrier and evaluate the utility of our approach to enable biological discovery, we used CSs to characterize transcriptional responses to 68 TME ligands in early-passage, patient-derived PDAC organoids, a model system known to drift transcriptionally in culture12,23 (Fig. 3a).

Fig. 3: CS of biological ligands on PDAC organoids reveals major axes of transcriptional response that are recapitulated in single-ligand validation screens with clinical relevance.
figure 3

a, Overview of a CS designed to explore the effects of macrophage-derived ligands on patient-derived PDAC organoid transcriptional states. b, Overview of experimental setup for the 68 biological ligand CS (with select single-ligand landmark perturbations) on PDAC organoids with a scRNA-seq readout. c, Overview of computational analyses on scRNA-seq results from the CS used to identify GEPs and the ligands that induce them. d, Scatter plot of significant ligand effects on cNMF modules (deconvolution regression coefficients) from two CSs with distinct random pooling. e, Heat map of the mean significant ligand effects on cNMF modules over both CS runs. f, Top, overview of a single-ligand validation experiment. The top 11 hits (adjusted P value < 0.05) that emerged in both CSs were validated by performing single-ligand treatments in duplicate. Hits are grouped by the GEPs they induce, and landmark perturbations are noted with an asterisk. DMSO-treated wells served as negative controls. Bottom, Heat map visualizing the Pearson correlation between the effects of each ligand on the CS cNMF GEPs in the CS (rows) and the effects of the same ligands on the CS cNMF GEPs in the single-ligand experiment (columns). g, Top, Venn diagrams depicting the number of shared and unique genes between the cNMF PDAC IL-4 and IL-13 response GEP and corresponding signatures in MsigDB. Bottom, Kaplan–Meier survival plots of TCGA PAAD cohort (n = 182, bulk RNA-seq expression data) based on (left to right) our PDAC IL-4 and IL-13 response module or three gene signatures (Reactome, Biocarta and Lu et al. IL-4) from MsigDB.

More specifically, we conducted two PDAC organoid CSs with the same model, ligand library and compression scheme (replicates = 5, mean pool size = 4.75), but distinct ligand poolings (Fig. 3b). In addition to negative controls (wells containing only organoids and media), we individually screened three ligands (tumor necrosis factor (TNF), transforming growth factor-β1 (TGFβ1) and interferon-γ (IFNγ)) with known activity in PDAC12,24 as positive controls (landmarks). After sample demultiplexing and quality control, we analyzed 5,662 cells and 10,881 cells from the two runs, with a mean ± s.d. of cells per compressed pool of 59 ± 32 and 113 ± 65, respectively (Extended Data Fig. 3a,b).

As PDAC cells in these screens were simultaneously exposed to multiple perturbations, we reasoned that a given cell might concurrently respond to one or more ligands within the pool by expressing one or more GEPs. We discerned such GEPs by applying consensus non-negative matrix factorization (cNMF)25 (Fig. 3c). This approach revealed 13 GEPs with highly variable activity across cells that were consistent between the two runs (Extended Data Fig. 3c, Supplementary Table 3 and Methods). We contextualized these GEPs by correlating GEP expression scores with those for canonical gene signatures previously used to characterize PDAC cell states (Moffitt et al. classical and basal state signatures26), existing MsigDB and PROGENy genesets, signatures associated with our landmark (that is, single cytokine) perturbations (TNF, TGFβ and IFNγ), cell-cycle scores27 (G2M and S phase) and cell quality metrics (‘percentage ribosomal’, ‘percentage mitochondrial’ and ‘number of genes’). This allowed us to annotate 11 of the GEPs as nuclear factor (NF)-κβ activation, TGFβ response, type I IFN response, organoid classical, cell cycle S phase, cell cycle G2M phase, mitochondrial module, ribosomal module and three GEPs primarily expressed in low-complexity cells (Extended Data Fig. 3d,e and Methods)28. One of the two remaining GEPs displayed a negligible association (R < 0.03) with the Hallmark IFNγ response in MsigDB, yet clearly correlated with the module score for genes differentially expressed in the IFNγ-positive control wells (‘landmark perturbation effects’, Extended Data Fig. 3d). The final GEP did not clearly correlate (R > 0.25) with any signature in MsigDB or PROGENY (Extended Data Fig. 3d). We annotated it as the ‘PDAC interleukin (IL)-4 and IL-13’ signature on the basis of the ligands inferred to drive the observed response, supported by literature suggesting that some of its top ranked genes (CD55, PIGR and SERPINB3) can be induced by IL-4 and IL-13 in epithelial cells29,30,31 (Extended Data Fig. 3e) and the presence of detectable IL4I1, a downstream target gene directly induced by IL-4, in SPP1+ macrophages in the primary patient PDAC data from Raghavan et al.12 (Extended Data Fig. 3f). We note that related attempts to use gene-based deconvolution were highly overdetermined (given the ratio of ligands to genes), yielding a small, nondeterministic set of genes associated with each ligand that were difficult to interpret using standard enrichment analyses.

Next, we applied our deconvolution framework to infer which ligands drove the activity of each cNMF GEP in the CS (Fig. 3d,e and Extended Data Fig. 3g,h). Encouragingly, IFNγ was associated with the IFNγ response GEP, IFNɑ2 was associated with the type I IFN signaling GEP, TGFβ and inhibin-βB (INHBB; a member of the TGFβ superfamily) were associated with the TGFβ response GEP, IL-1A, IL-1B and TNF (known activators of NF-κβ32) were associated with the NF-κβ response GEP and the mitogens Wnt7A and R-Spondin 3 (RSPO3) were associated with the ribosomal module GEP33,34. The low-complexity GEP 1, meanwhile, did not consistently associate with any ligand and was ignored as a quality artifact in downstream analyses. As stated above, we annotated the PDAC IL-4 and IL-13 response GEP on the basis of its association with IL-4 and IL-13, as well as with adiponectin (ADIPOQ). Interestingly, we would not have been able to infer which ligands were most active in the organoids by examining cognate receptor expression alone, as our active and inactive ligands did not have discernable patterns of cognate receptor gene expression in untreated organoid cells or in patient-matched primary tumor cells (Extended Data Fig. 3i and Supplementary Table 4). Together, our deconvolution framework paired the major axes of induced phenotypic variation in PDAC organoids with the TME ligands driving those transcriptional variations.

To validate our CS results, we individually tested the 11 top-scoring ligands (IL-1A, IL-1B, TNF, IL-4, IL-13, ADIPOQ, TGFβ1, INHBB, IFNγ, Wnt7A and RSPO3) significantly associated with five of the aforementioned GEPs (NF-κβ response, PDAC IL-4 and IL-13 response, TGFβ response, IFNγ response and ribosomal module; Fig. 3f)23. We conducted two validation experiments on separate days (biological replicates) by screening six replicates of all ligands, including a set of negative control organoids in each run (Extended Data Fig. 4a). We then pooled these data with the negative controls and individual landmark ligand positive controls from the original CSs to form a final single-ligand perturbation dataset. To directly control for batch variation between runs (Extended Data Fig. 4a and Methods), we performed cNMF over the single-ligand dataset and then ran a linear model to predict the effect of each ligand on each single-ligand cNMF GEP while including experimental batch as a covariate. This identified single-ligand GEPs with similar enrichments to, and high correlations with, each of the GEPs from the CS dataset (Extended Data Fig. 4b–d, Supplementary Table 3 and Methods). For example, there was significant overlap between the CS and single-ligand PDAC IL-4 and IL-13 response GEPs (Fisher’s exact test P value = 1.62 × 10−26), and the overlapping genes ranked highly by gene loading in both modules (Extended Data Fig. 4e).

More critically, these data revealed that the majority of the validation ligands significantly induced expression of the expected single-ligands GEPs (that is, those most similar to the CS GEPs discovered in the CS), with all five target GEPs induced by at least one validation ligand (Extended Data Fig. 4f). To cross-check these results for congruence, we also ran a linear fit for the validation data using the CS GEPs in place of the single-ligand ones and obtained similar results (Extended Data Fig. 4g). Lastly, we observed a strong correlation between ligand effects on CS cNMF GEPs in both the CS and the single-ligand validation experiments (Fig. 3f and Methods), demonstrating that compressed screening can robustly identify the transcriptional effects of TME ligands on patient-derived cancer models.

In examining the GEPs in our CS and validation experiments and their associated ligands, we observed limited overlap with matched canonical ligand signatures, suggesting the possibility of disease- and context-specific transcriptional responses to TME signals (Extended Data Figs. 3d and 5a). For example, the PDAC IL-4 and IL-13 response GEP did not correlate well with other signatures within established geneset databases12,35 or those associated with PDAC outcomes26 (Extended Data Figs. 3d and 5a–e), likely explained by the limited overlap between the PDAC IL-4 and IL-13 response GEP and publicly available IL-4 and IL-13 gene signatures (Fig. 3g and Supplementary Table 5). Even for other response GEPs that had more obvious correlations with existing gene signatures (for example, those for TNF; Extended Data Figs. 3d and 4b), we observed only modest overlaps; for example, our PDAC NF-κB GEP shared only 20 genes with its matched PROGENY signature (Jaccard index = 0.058), our PDAC TGFβ GEP shared only 6 genes with the Hallmark TGFβ signature (Jaccard index = 0.047) and our PDAC IFNγ GEP shared only 12 genes with the Hallmark MSigDB signature (Jaccard index = 0.031; Extended Data Fig. 5f). Taken together, this reveals the importance of understanding and considering the contextual background against which perturbations are introduced as specific systems may respond differently to the same perturbation.

To explore the potential clinical relevance of these disease-specific responses, we examined the prognostic value of our PDAC IL-4 and IL-13 response GEP and existing IL-4 and IL-13 response signatures in bulk RNA-seq samples from PDAC tumors in The Cancer Genome Atlas (TCGA)35. Comparing the top and bottom tertile for each IL-4 and IL-13 signature, we found that only our newly identified PDAC IL-4 and IL-13 GEP significantly stratified patients by survival, with higher expression associating with worse outcome (Fig. 3g). The same was true for our PDAC IFNγ and TGFβ GEPs, both of which had minimal overlap with their corresponding public signatures and showed better prognostic ability than their literature counterparts (Extended Data Fig. 5f,g). By further analyzing genes differentially expressed between IL-4- and DMSO-treated cells in our single-ligand perturbations (Extended Data Fig. 5h), we found that the top genes expressed in the IL-4 treatment group (MMP1, CD55 and DUOX2, also observed in our single-ligand PDAC IL-4 and IL-13 response GEP; Extended Data Fig. 4c) are known to be involved in, and enriched in a signature for, epithelial-to-mesenchymal transition (EMT), a process that promotes metastasis of cancer cells36,37,38, potentially explaining the lower overall survival we observed for patients highly expressing this GEP (Fig. 3g and Extended Data Fig. 5i,j). Collectively, these results suggest that the PDAC IL-4 and IL-13, IFNγ and TGFβ response GEPs discovered through our approach represent disease-specific responses with prognostic implications that would have been missed using existing databases. More broadly, our data highlight that disease- and context-specific transcriptional responses to TME signals may result in distinct functional consequences, and emphasize the importance of augmenting and carefully considering existing gene signature databases.

CS dissects systems-level drug responses in primary samples

Many therapies act in a pleiotropic manner across the diverse cell types in a tissue, yet most of our drug screening is carried out in homogenous, immortalized cell lines. Accordingly, we often have a limited understanding of how perturbing a target molecule and its associated signaling network will impact complex cellular systems, ultimately registering therapeutic benefit or toxicity. As one foray into this problem, we leveraged a CS approach to scale small-molecule screening in fresh human PBMCs, a complex multicellular system, with a high-content scRNA-seq readout.

PBMCs are composed of a heterogeneous mixture of T cells, natural killer (NK) cells, B cells and monocytes, among others. A key advantage of using freshly isolated PBMCs is their close representation of peripheral host physiology. Here, we simulated immune activation in this compartment with both LPS (a Toll-like receptor 4 agonist) and IFNβ (a key antiviral cytokine), canonical stimuli that can be sensed by multiple immune subsets through well-described pathways, and that have been used to explore immune responses to Gram-negative bacteria and viruses, respectively39,40. Critically, while immunomodulators are commonly used in the clinic41,42,43,44,45, it is challenging to screen the transcriptional impact of multiple compounds on fresh human PBMCs because of material, cost and labor constraints. Thus, we hypothesized that a CS could serve as a bridge in the ‘chain of translatability’, allowing a better understanding of how potent and specific molecules manifest desirable or deleterious systems-level effects.

To examine the power of a CS in this setting, we tested the effects of 90 chemical compounds with a wide range of known MOAs curated by the Broad Drug Repurposing Hub (https://github.com/jump-cellpainting/JUMP-MOA) on immune responses among PBMCs (Fig. 4a). To execute these CSs, we generated identical pool sets (R = 3 replicates per drug and P = 6 drugs per well) in three 96-well plates with DMSO controls, added fresh PBMCs, incubated overnight and then stimulated with IFNβ or LPS for 4 h the following day while leaving one plate unstimulated (Fig. 4b). Following an evaluation of cell viability, each well was hashed using oligo-tagged antibodies and scRNA-seq was performed. After sample demultiplexing and quality control, we retained a total of 20,062 control cells (mean cell count per well ± s.d., 418 ± 351), 25,617 in the IFNβ plate (524 ± 353) and 65,868 in the LPS plate (1,372 ± 805). These yields highly correlated with our cell viability data, and we observed that both stimulations maintained cell counts relative to the unstimulated condition (Wilcoxon test one-sided P value < 0.01; Fig. 4c and Supplementary Fig. 1a,b). Running deconvolution on cell viability data revealed a handful of compounds with significant effects (Supplementary Fig. 1c). To understand how each compound impacted responses to each stimulus, we merged cell types from the control and stimulus conditions, and then ran cNMF on those cell types individually to identify GEPs (Fig. 4c, Supplementary Fig. 1d–f, Supplementary Table 6 and Methods). We avoided running cNMF on all three conditions at once to prevent effects from one stimulation being masked by the other. This analysis produced two main categories of GEPs: (1) ‘stimulant–response modules’, which changed significantly in response to stimulation, and (2) ‘cell-identity modules’, which did not.

Fig. 4: CS of an MOA drug library identifies modulators of immune responses across human PBMCs.
figure 4

a, Overview of CS for examining the immunomodulatory effects of a MOA drug library on the responses of human PBMCs to stimulation with either LPS or IFNβ. b, Experimental overview. c, Left, overview of data processing and analysis workflow. Right, overview of the computational workflow used to analyze the CS data. d, Summary statistics of the number of affected stimulant–response GEPs and number of affected cell types for each molecule in each stimulation condition (LPS or IFNβ). Permutation test P-value cutoff = 0.01. e, Classification of the impact of a compound on cell-type-specific immune modulation. f, Example of a compound (δ-tocotrienol) with diverse effects within the same cell type and across different cell types. The usage scores for each GEP were normalized against the median of the DMSO + IFNβ condition. A Mann–Whitney U-test (two-sided, no correction) was performed to test score differences (Methods). The bottom, center and top lines of the box plots indicate the 25th percentile, median and 75th percentile, respectively. Whiskers are drawn to the farthest datapoints within 1.5× the interquartile range from the nearest hinge. Sample size by stimulation condition: DMSO, n = 396 (monocytes), n = 1,730 (CD4) and n = 404 (NK cells); IFNβ, n = 197 (monocytes), n = 1,276 (CD4) and n = 265 (NK cells); IFNβ + δ-tocotrienol, n = 41(monocytes), n = 631 (CD4) and n = 53 (NK cells). g, Ruxolitinib acts as an IFNβ interferer for CD4 T cell GEP3. A Mann–Whitney U-test (two-sided, no correction) was performed to test score differences (Methods). The bottom, center and top lines of the box plots indicate the 25th percentile, median and 75th percentile, respectively. Whiskers are drawn to the farthest datapoints within 1.5× the interquartile range from the nearest hinge. Sample sizes: DMSO, n = 1,730; IFNβ, n = 1,276; IFNβ + ruxolitinib, n = 1,356. h, Heat map showing the effects of all 90 compounds on PBMC responses to IFNβ (top) and LPS (bottom). Color bars represent the strength of the effect. Negative values denote a negative potentiator or interferer while positive values denote a positive potentiator or interferer. Permutation test P-value cutoff = 0.01. i, Low-dimensional embedding (UMAP) of single-compound validation experiment of ruxolitinib’s effect on LPS responses. j, Evaluation of CS LPS response module (CD8 T cell GEP3) in ruxolitinib-only perturbation. The bottom, center and top lines of the box plots indicate the 25th percentile, median and 75th percentile, respectively. Whiskers are drawn to the farthest datapoints within 1.5× the interquartile range from the nearest hinge. Sample sizes by condition: DMSO, n = 112 (B cells), n = 648 (CD4), n = 376 (CD8), n = 27 (dendritic cells (DCs)), n = 215 (monocytes), n = 188 (NK cells) and n = 27 (γδ T cells); LPS, n = 130 (B cells), n = 1,004 (CD4), n = 478 (CD8), n = 10 (DCs), n = 48 (monocytes), n = 264 (NK cells) and n = 41 (γδ T cells); LPS + ruxolitinib, n = 62 (B cells), n = 475 (CD4), n = 299 (CD8), n = 5 (DCs), n = 19 (monocytes), n = 152 (NK cells) and n = 28 (γδ T cells).

To determine the impact of each stimulant and drug and their potential synergies, we modified our deconvolution framework to account for possible interactions between the immune stimulant and each pooled drug. Specifically, we included a Boolean interaction term in our design matrix to denote whether the drug and stimulant coexisted in a well (Xinteraction) and modified our coefficient matrix β to include fits for all three features (that is, βstimulants, βdrugs and βinteractions). Overall, we observed that several cell type stimulant–response modules were influenced by the compounds in our MOA library, with some significantly influencing multiple cell types and/or response GEPs (Fig. 4d). Our modified deconvolution method enabled us to compare the signs of βinteractions and βdrugs against those in βstimulants for specific combinations to understand the nature of stimulant–drug interactions in each cell type (Supplementary Figs. 2 and 3). Specifically, we divided our drugs into three categories on the basis of the significance of their impact on stimulant–responses in each cell type: (1) potentiators, which strengthen the effect of stimulant-mediated downstream signaling; (2) interferers, which oppose the effect of the response to a given stimulant; and (3) noninteractors, which do not have an effect on stimulant responses (Fig. 4e, Extended Data Fig. 6 and Methods). For potentiators and interferers, we further annotated them as ‘positive’ or ‘negative’ on the basis of whether the stimulant increases or decreases the GEP (Methods). As in the previous examples, one of the key parameters for this computational workflow is the P-value threshold we used in the deconvolution permutation test; as the P-value cutoff becomes more stringent, fewer drugs are nominated to have effects in each cell type (Extended Data Fig. 7a).

Our IFNβ CS identified compounds with diverse effects on multiple GEP and cell type combinations. In the 90-molecule library, 87 molecules had an impact on IFNβ responses in at least one cell type, and 73 had an impact in multiple cell types (≥2). Notably, 20 molecules had statistically significant effects in four or five cell types, often with high pleiotropy. δ-Tocotrienol (vitamin E, an HMG-CoA reductase inhibitor), for example, has been described as a broadly anti-inflammatory agent; here, we observed variable effects on different immune cell subset GEPs, including opposing effects (potentiating and interfering) on different IFNβ response modules (GEP1 and GEP5) in monocytes, as well as interference with response GEPs in CD4 T cells (GEP3) and NK cells (GEP6) (Fig. 4f). Similarly, ponatinib, known to target multiple receptor tyrosine kinases46, elicited significant effects on IFNβ responses across multiple cell types, acting as a positive interferer of IFNβ GEP3 in B cells (Janus kinase (JAK)– signal transducer and activator of transcription (STAT) and TNF signaling), IFNβ GEP3 in CD4 T cells (JAK–STAT signaling) and IFNβ GEP1 in CD8 T cells (JAK–STAT and TNF signaling), but also as a negative interferer of IFNβ GEP8 in B cells (cell cycle and NF-κB) and IFNβ GEP3 in CD8 T cells (mitochondrial activity and adenosine triphosphate synthesis; Fig. 4f and Extended Data Fig. 7b–e). We note that, although three of the targeted IFNβ response GEPs were annotated as JAK–STAT signaling, ponatinib modulated both shared and unique genes across the cell types, once again highlighting context-specific signatures (Extended Data Fig. 7b, bottom).

Next, we examined drugs with targets known to modulate immune responses. One of the major effects of type I IFN stimulation is the activation of JAK–STAT pathways41,42,43,44,45,47. Ruxolitinib, a known JAK–STAT inhibitor48, acted as a positive interferer of a CD4 T cell GEP (IFNβ GEP3), which positively correlated with JAK–STAT signature expression, giving us confidence that our CS is detecting real biological signals (Fig. 4g and Extended Data Fig. 7d). Relatedly, our screen also nominated rapamycin, a mammalian target of rapamycin (mTOR) inhibitor49, as a negative interferer of a GEP in B cells (IFNβ GEP8), which negatively correlated with JAK–STAT and TNF signaling; it was also a positive interferer of another GEP in B cells (IFNβ GEP3) that positively correlated with JAK–STAT and TNF signaling (Extended Data Fig. 7c and Extended Data Fig. 8a). These results are in line with the known cooperativity between mTOR and JAK–STAT pathways, and the previously reported role of rapamycin in destabilizing TNF expression50,51. Taken together, we found rich response patterns across most of the chemical compounds in the library (Fig. 4h, Extended Data Figs. 7 and 8 and Supplementary Figs. 2 and 3), with monocyte GEPs impacted by the greatest number of molecules (n = 64), followed by B cells (n = 15) and CD4 T cells (n = 14) (permutation test P value < 0.01).

Relative to IFNβ, a similar analysis applied to LPS resulted in fewer stimulant–response modules at the same P-value threshold (Extended Data Fig. 9a); however, the impact of the JUMP MOA library on those modules was equally complex (Fig. 4h, Extended Data Fig. 6 and Supplementary Fig. 2). In total, 86 of 90 molecules had some effect on LPS responses in at least one cell type, 65 molecules had some effect in two or more cell types and 8 moelcules had some effect in four or five cell types. A representative compound with significant effects on LPS responses across multiple cell types is merimepodib, an inosine monophosphate dehydrogenase inhibitor that reduces intracellular guanine-based nucleotides, leading to the inhibition of DNA and RNA synthesis52. This broad-impacting MOA translated in our CS into pleiotropic effects across multiple cell types: Merimepodib acted as a positive interferer of B cell LPS GEP3 (TNF signaling and amine biosynthesis and metabolic processes) and CD8 T cell LPS GEP3 (JAK–STAT signaling), a positive potentiator of monocyte LPS GEP7 (translational and ribosomal activities) and a negative interferer of CD4 T cell LPS GEP1 (mitochondrial activities) (Extended Data Fig. 9b–f). BIX02188, a known mitogen-activated protein kinase (MAPK) kinase inhibitor, meanwhile, was found to be a positive interferer of LPS GEP4 in NK cells, a module highly correlated with TNF signaling and chemokine receptor (NK T cell pathways) expression (Extended Data Fig. 10a,b). Notably, it was reported that LPS-induced TNF production requires activation of the MAPK pathway53,54. In addition, PF 477736, a checkpoint kinase 1 inhibitor55, was one of the top positive potentiators of LPS GEP7 in monocytes, a module highly correlated with ribosomal events and translation elongation (Extended Data Figs. 9c and 10b). This is biologically consistent with the known role of PF 477736 in abrogating cell-cycle arrest, which in turn increases the rate of ribosomal and translational events. Interestingly and unexpectedly, we also found that AZD7545 and BX-912, two drugs with the same MOA annotations (pyruvate dehydrogenase kinase inhibitor), had divergent effects on the same LPS GEP7 in monocytes, a module highly correlated with RNA translation activity (Extended Data Fig. 9c and 10c). In the LPS stimulation screen, monocytes were again the cell type impacted by the largest number of molecules (n = 40), followed by CD8 T cells (n = 30) and NK cells (n = 11) (permutation test P value < 0.01).

Across the entire library, 53 compounds had effects on both LPS and IFNβ responses in multiple cell types. For example, ML298 hydrochloride, a selective phospholipase 2 inhibitor56, was a positive interferer of NK cell LPS GEP7 (KRAS downregulated signal), a positive potentiator of monocyte LPS GEP7 (translational and ribosomal activities) and a negative interferer of CD4 T cell LPS GEP1 (mitochondrial activities). Meanwhile, it was a positive interferer of B cell IFNβ GEP3 (JAK–STAT and TNF signaling) and CD8 T cell IFNβ GEP1 (JAK–STAT and TNF signaling), a negative interferer of B cell IFNβ GEP8 (cell cycle and NF-κB) and a negative potentiator of monocyte IFNβ GEP5 (transmembrane transport and generation of precursor metabolite and energy) (Extended Data Figs. 7c,e, 8b, 9c,f and 10a,d). Furthermore, as with IFNβ GEP3 in CD4 T cells, we found a similar interfering effect by ruxolitinib on LPS GEP3 in CD8 T cells treated with LPS, a module also highly correlated with JAK–STAT signaling (Extended Data Fig. 9e and Extended Data Fig. 10e). Taken together, the consistency of our findings for ruxolitinib in both the LPS and the IFNβ screens and the identification of several new stimulant–drug interactions lend credence to the power of CS in identifying phenotypic and functional effects even in a complex multicellular system.

To independently corroborate pleiotropy, we tested the effects of ruxolitinib on LPS stimulation conventionally. Briefly, cells were seeded in triplicate at the same density as in the CS, treated with ruxolitinib overnight and then acutely stimulated with LPS for 4 h, followed by scRNA-seq (Fig. 4i). These data replicated ruxolitinib’s ability to prevent LPS-induced activation of the JAK–STAT pathway in CD8 T cells (GEP3). We observed that ruxolitinib also inhibited the expression of genes associated with CD8 T cell GEP3 in CD4 T cells, γδ T cells, B cells and NK cells (Fig. 4j). In our LPS CS screen, ruxolitinib was also nominated as a positive interferer of LPS GEP3 in B cells, a module correlated with TNF signaling and amine biosynthesis and metabolic processes (Extended Data Fig. 9d). Our validation data confirmed this effect in B cells, and identified similarly significant interference effects in CD4 and CD8 T cells, NK cells and γδ T cells (Extended Data Fig. 10f). Intriguingly, ruxolitinib upregulated the expression of both these modules in monocytes (Fig. 4j and Extended Data Fig. 10f). We note that these opposite trends in monocytes were unexpected, were not technical in nature, and call for more detailed investigation.

Overall, our results in PBMCs show how different compounds can either potentiate or suppress LPS and IFNβ GEPs in a cell-type-specific manner, even producing divergent responses in different cell types. Here, relative to the PDAC CS, we were able to achieve a second layer of compression by simultaneously profiling the impact of multiple drugs on multiple cell types. More generally, these results help establish the power of the CS framework for revealing systems-level information from phenotypic screens while reducing sample material, reagent and labor costs (here, by a factor of 5.5, roughly corresponding to the size of our pools P, relative to a conventional screen; Supplementary Table 7).

Discussion

Scalability presents a major obstacle for phenotypic screening with cell-extrinsic perturbations such as small molecules or recombinant proteins, often precluding the use of complex model systems and high-content readouts. To address this challenge, we developed CS, an assay- and model-independent approach to increase throughput while reducing cost, labor and material requirements by pooling perturbations and computationally inferring the effects of each using deconvolution. This method empowers chemical and biological ligand perturbation screens, which can often be more easily applied to complex models and more readily advanced as therapeutic substrates. We benchmarked CS in U2OS cells with a high-content imaging readout, demonstrating that the top hits across a wide range of compressions corresponded to those perturbations with the largest effects when screened individually. Next, we applied the technology to identify the phenotypic effects of TME ligands in primary PDAC organoids, discovering multiple classes of ligands that drive conserved, clinically relevant transcriptional response patterns distinct from canonical signatures. Lastly, by performing a CS campaign in PBMCs, a heterogeneous primary sample, we uncovered the ability of small molecules to modulate both unique and diverse immune response programs within and across different cell types. Over a range of systems of varying complexity, our results demonstrate the broad applicability of CS for translating phenotypic screens to larger search spaces, even when paired with high-content readouts, including imaging and single-cell genomics.

When developing the CS approach in U2OS cells, we found that a linear regression-based deconvolution approach could accurately identify compounds with the greatest effects in a GT screen across a variety of compressions (pool sizes varying from 3 to 80 perturbations). This suggests that it is possible to combine perturbations and that nonlinear interactions do not confound interpretation until pool sizes grow large. This is in line with other work that used high-content imaging to investigate all pairwise interactions between bioactive small molecules, finding that only 5% of all possible interactions were nonlinear57. In the highly bioactive library we tested, the effects we inferred began to deviate from GT at compression levels above 40 perturbations per pool. This may be because of the fact that, as pool size increases, the number of possible pairwise interactions between perturbations scales quadratically (N choose 2) = \(\frac{1}{2}N(N-1)\), suggesting that, even if the fraction of interactions that are nonlinear remains small, the absolute number of nonlinear interactions will drive a large effect at a large enough pool size. While our findings suggest a rough upper bound for compression with highly bioactive perturbation libraries, we note that the limits of compression may increase with the richness of the assay readout used in a screen. In addition, much greater compression should be feasible in libraries with fewer bioactive perturbations (for example, design-of-synthesis libraries or discovery compound decks), enabling rapid evaluation of large compound sets in complex biological models. These findings also suggest a compelling regime in which to consider searches for combinatorial effects by conducting a CS to find significant individual perturbations and then either screening all pairwise combinations of these significant perturbations or performing a screen in conjunction with one of the factors being held constant to find synergistic pairs.

We further demonstrated that our compressed screening and deconvolution framework is tunable to the needs of the user. By altering the significance threshold for calling a hit, we show that it is possible to conduct more permissive screens (greater true hit identification along with more false positives) or less permissive ones (fewer true hits but with fewer false positives). This flexibility empowers users to adjust their screening strategy on the basis of the scope of planned follow-up validation experiments. While we conducted CSs in duplicate with distinct randomization to develop the technology, we observed reproducible top hits between replicates. This suggests that doing a single CS may be sufficient for new lead discovery, albeit with a slightly higher false positive rate than with replicate screens. Sequential screens can also be performed at decreasing compression levels to iteratively converge toward top hits in a resource-efficient manner.

Our results in PDAC organoids and PBMCs support the potential of CS as a strategy to uncover novel insights and drive translational discovery through phenotypic screening in complex biological models. While numerous gene signature databases exist, our work demonstrates that phenotypic response signatures can be disease- and context-specific, and that the identification of disease-relevant signatures may uncover important relationships to clinical outcomes. Illustratively, our PDAC GEPs stratify patients by outcome while matched canonical gene signatures do not. While further mechanistic work is needed to dissect the potential sources, roles and prognostic capabilities of signals such as IL-4 and IL-13 in PDAC, our data highlight the use of CS for uncovering model-specific perturbation responses to inform follow-up investigations.

Working with PBMCs, meanwhile, we simultaneously assayed the immunomodulatory properties of a 90-compound MOA library to reveal distinct effect fingerprints that could prove useful in shaping overall immune responses. By performing compression, we reduced the total experimental cost for the three screens (control, IFNβ and LPS) by 5.5-fold (using a pool size, P, of 6) while similarly reducing the manual labor and number of primary cells required. This allowed a single investigator to perform these screens starting from a single blood draw. Had this work been performed using a conventional approach, it would have required over 80 million cells as input and the processing of nine 96-well plates in parallel, with additional associated reagent costs. Critically, our ability to efficiently assay primary cells directly ex vivo portends a route to applying similar strategies to other complex multicellular specimens, such as biopsies. Such an application could both hasten the discovery of critical aspects of tissue biology and power more translatable therapeutic discovery and selection pipelines.

As currently designed, compression is intended to support ‘screening’. While our technology can identify the perturbations with the largest effects in a library, it cannot perfectly characterize the phenotypic shifts driven by all perturbations in a library. Thus, CSs should always be followed by a validation screen that examines the top hits individually. When developing our CSs, we only included one concentration for each perturbation; however, in many discovery-oriented applications, the best concentration to use may be unclear. As such, future iterations of this technology may benefit from exploring the feasibility of conducting CSs that include multiple doses of the same compound and deconvolution methods to infer both the effects of, and the dose responses to, individual compounds. Furthermore, because of the low frequency at which any two perturbations co-occur in our current design, our approach is not optimized to identify nonlinear effects within the CS. However, this limitation may not ultimately matter in clinical settings such as chemotherapy, where analysis suggests that the vast majority of combination therapies do not exploit additive or synergistic interactions between compounds but rather are successful combinations because of the heterogeneity of patient populations where each agent within the combination regimen is effective on distinct subsets of patients or cells within patients31.

In summary, compressed screening is a powerful strategy for enhancing the utility of phenotypic screens, high-content readouts and model systems with varying levels of complexity. We anticipate that compression will prove useful across a wide array of complex model systems and readouts where scalability is a major challenge. This new scalability will enable researchers to map the context-specific phenotypic effects of perturbations, facilitating interpretation of existing and to-be-collected datasets. It will also support the development of new therapeutics in complex systems that are highly relevant to human biology, helping connect in vitro discoveries with clinical implementations.

Methods

CS design principles

To design a CS, we use a set of user-specified design criteria to randomly assign perturbations to pools while satisfying the following: each pool contains a unique set of perturbations, each perturbation in the library (of size N perturbations) occurs in Rcs replicate pools and each pool is of average size P. The user specifies the library N, the compressed replicates Rcs, the conventional replicates Rconv and the desired fold compression C. Fold sample savings of the pools relative to a conventional screen with Sconv samples (\({S}_{{\rm{conv}}}=N\times {R}_{{\rm{conv}}}\) total samples) is C compression (\(P=\frac{{R}_{{\rm{cs}}}C}{{R}_{{\rm{conv}}}}\)). In each CS, we included a set of negative controls (no perturbation added) and a set of landmark positive controls (treated with individual perturbations known to have large effects). To build the plates for the CS, we transferred perturbations from the source library plates to the pool plates using an Echo 650 Acoustic liquid handler.

Cell Painting experiments

For our Cell Painting experiments, we plated U20S cells in 96-well black, clear-bottom, TC-tread plates (Corning, 3712BC) at 2,000 cells per well in 50 µl per well of DMEM (American Type Culture Collection (ATCC)) + 10% FBS + penicillin–streptomycin. Cells were then incubated overnight at 37 °C in 5% CO2 to allow cell adhesion and equilibrium. For conventional screening, compounds (316 FDA-approved small molecules; Supplementary Table 1) were arrayed in 384-well format as stocks that were 1,000× the final concentration in DMSO. We then pin-transferred (V&P Scientific, mounted onto an MCA96 head of Tecan Freedom Evo 150) 50 nl of compound into the 50 µl of medium per well of the assay plate. Cells were then incubated for 6, 24 or 48 h at 37 °C in 5% CO2. For compressed screening with the same compound library, we designed screens with the following compression schemes: C = 6, Rcs = 3; C = 6, Rcs = 5; C = 6, Rcs = 7; C = 12, Rcs = 3; C = 12, Rcs = 5; C = 12, Rcs = 7; C = 24, Rcs = 3; C = 24, Rcs = 5; C = 24, Rcs = 7; C = 48, Rcs = 5; C = 96, Rcs = 5. Respectively, these designs contained 3, 5, 7, 6, 10, 14, 12, 20, 28, 40 and 80 drugs per pool. Furthermore, for each compression scheme, we designed two screens with differing random assignment of drugs to pool. Each plate design was then dispensed using an Echo 650 Acoustic liquid handler into 50 µl of medium in the 384-well format, where the DMSO concentration in negative controls was held constant relative to the total compound dispensed for that assay plate. Assay plates were incubated for 24 h before the Cell Painting assay.

Following compound treatment, cells were stained for the mitochondria (500 nM MitoTracker Deep Red (Thermo Fisher, M22425) in DMSO; 7.5 µl of stock per assay plate) and for the Golgi apparatus and plasma membrane (60 µg ml−1 wheat germ agglutinin–AlexaFluor 594 (Thermo Fisher W11262) in water; 900 µl of stock per assay plate) in 30 µl of medium per well and incubated at 37 °C for 30 min. We then fixed the cells by adding 10 µl of 16% methanol-free PFA (Thermo Fisher, 28908) per well and incubated at room temperature for 20 min. Next, we aspirated the solution, washed cells with 70 µl per well of 1× Hanks’ balanced salt solution (HBSS; Invitrogen, 14065-56) and permeabilized cells with 30 µl per well of 0.1% Triton X-100 in 1× HBSS at room temperature for 20 min. We then again washed cells with 70 µl per well of 1× HBSS and stained for the endoplasmic reticulum, nucleolus, F-actin, nucleus and cellular RNA by adding to each plate 1.5 ml of 100 µg ml−1 concanavalin A–AlexaFluor 488 (Thermo Fisher, C11252) in 0.1 M sodium bicarbonate, 37.5 µl of 200 U per ml of phalloidin–AlexaFluor 594 (Thermo Fisher, A12381) in methanol, 9 µl of SYTO14 green fluorescent nucleic acid stain (Thermo Fisher, S7576), 7.5 µl of 5 µg ml−1 Hoechst 33342 (Thermo Fisher, H3570) in water, 15 ml of 3 µM SYTO14 (Thermo Fisher, S7576) and 15 ml of 1× HBSS in 1% BSA. Cells were then incubated at room temperature for 30 min. We then aspirated the solution and washed cells three times with 70 µl of 1× HBSS, and then filled the wells with 70 µl of HBSS, sealed the plates and stored plates at 4 °C until ready for imaging. Cells were imaged on an ArrayScan VTI with a ×20 objective, using the Hoescht channel to autofocus, and we set the Z and autoexposure for each channel. Nine fields were captured in the middle of each well.

PDAC organoid culture

Pancreatic cancer organoids were derived as previously described from persons with metastatic pancreatic cancer who provided written, informed consent at the Dana-Farber Cancer Institute for tissue acquisition and research studies12. Organoids were seeded in growth factor reduced Matrigel (Corning), fed with human complete organoid medium containing advanced DMEM/F12 (Gibco), 10 mM HEPES (Gibco), 1× GlutaMAX (Gibco), 500 nM A83-01 (Tocris), 50 ng ml−1 mouse epidermal growth factor (Peprotech), 100 ng ml−1 mNoggin (Peprotech), 100 ng ml−1 human fibroblast growth factor 10 (Peprotech), 10 nM human Gastrin I (Sigma), 1.25 mM N-acetylcysteine (Sigma), 10 mM nicotinamide (Sigma), 1× B27 supplement (Gibco), RSPO1-conditioned medium (10% final), Wnt3A-conditioned medium (50% final), 100 U per ml of penicillin–streptomycin (Gibco) and 1× Primocin (Invivogen) and maintained at 37 °C in 5% CO2 (Supplementary Table 8). The medium was changed every 6–7 days. Established organoid models were passaged by dissociation with TrypLE Express (Thermo Fisher) for 30 min and reseeded into Matrigel droplets and fresh culture medium.

Ligand library selection

To generate a library of relevant, effective ligands, we compiled multiple ligand–receptor datasets19,32,33. We first prioritized ligands expressed by myeloid cells targeting structural cell types in a variety of organs or ligands expressed by human monocyte-derived macrophages or human macrophages in vivo. These 200+ ligands were then filtered on the basis of their overall expression and applicability in vitro as a secreted, independent effector molecule. Additional nongenetically encoded molecules were added on the basis of well-established effects, such as LPS-EK, leukotriene B4, oxidized low-density lipoprotein and Pam3CSK4. Manual curation and extensive literature research established the final library and concentrations on the basis of previously reported concentrations and results. All tested ligands are listed in Supplementary Table 1.

Compressed ligand screening

To perform compressed ligand screening on PDAC organoids, cells were expanded as described, resuspended in OWRNA medium with 10% (v/v) Matrigel and seeded at 20,000 single cells in 35 µl per well into a flat-bottom ultralow attachment 96-well assay plate (Corning, 3474) 1 day before dispensing ligands. A CS was designed by randomly assigning 68 ligands (Supplementary Table 1) to 72 pools such that each pool contained four or five ligands (average of 4.75 ligands per pool). Two such plates were designed, with distinct random assignment of ligands to pools in each plate. Ligands were first dispensed into 25 µl per well of OWRNA medium in a 384-well format transfer plate (in a quadrant-wise manner). After ligand dispense, each well was backfilled with an additional 50 µl per well of OWRNA medium. Next, 65 µl per well was transferred with a Tecan Freedom Evo 150 from each transfer plate quadrant into the 96-well assay plate with PDAC organoids, making a final total volume of 100 µl per well in the assay plate. Ligand concentration was modified to account for the multistage transfer such that final assay plate concentrations were as reported (Supplemental Table 1). Assay plates were cultured for 7 days at 37 °C in 5% CO2 and processed for scRNA-seq with cell hashing as described below.

Single-ligand screening

A total of 11 candidate ligands were tested independently for their effects on cell state: 50 ng ml−1 recombinant human (rHu) Wnt7A, 500 ng ml−1 rHu RSPO3, 10 ng ml−1 rHu TNF, 10 ng ml−1 rHu IL-13, 10 ng ml−1 rHu IL-4, 10 ng ml−1 rHu IFNγ, 10 ng ml−1 rHu IL-1A, 10 ng ml−1 rHu IL-1B, 10 ng ml−1 rHu TGFβ1, 25 ng ml−1 rHu ADIPOQ and 10 ng ml−1 rHu activin A (Peprotech). The same batch of organoids used in the original screen (PANFR0562) was used for the validation experiment; the organoids were all cultured and maintained in ‘OWRNA’ medium. Additional details on culture conditions and processing can be found in Supplementary Information.

PDAC scRNA-seq with cell hashing

PDAC organoids for CSs and validation experiments were dissociated to single cells, hashed with antibody-derived oligo tags, pooled and captured for scRNA-seq by Honeycomb Bio HIVEs. Full details can be found in Supplementary Information.

Compressed 90-drug MOA screen

PBMCs were isolated from whole blood by density centrifugation with Ficoll-Paque PLUS (Cytiva, 17144002), followed by red blood cell (RBC) lysis with 1× RBC lysis solution (Invitrogen, 50-112-9751). PBMCs were resuspended and cultured in ATCC-modified RPMI 1640 medium (Thermo Fisher, A1049101) supplemented with 10% FBS. For viability screens and scRNA-seq experiments, fresh PBMCs were seeded at a density of 3 × 105 cells in 200 μl per well in either a canonical v-bottom 96-well plate (Caplugss Evergreen, 222-8031-1V)—to be processed for sequencing—or a white, flat-bottom 96-well plate (Perkin Elmer, 6005680)—to be processed for cell viability—preloaded with drugs to be screened using the Echo 555 Liquid Handler (Beckman Coulter, E555-1115). Briefly, the Echo uses acoustic droplet ejection technology, whereby sound waves eject precisely sized droplets from the compound source plate onto the destination plate suspended above. A CS was designed by randomly assigning 90 drugs (https://github.com/jump-cellpainting/JUMP-MOA) to 45 pools (along with three DMSO-only wells for a total of 48 wells) such that each pool contained six drugs, each at a volume of 0.1 μl with a final concentration of 3 μM for each drug after the addition of 200 μl of PBMCs. Two such plates were designed, with distinct random assignment of ligands to pools in each plate. Cells were then allowed to rest for 16 h at 37 °C in 5% CO2 before acute stimulation with 5 μg ml−1 LPS (Thermo Fisher, 00-4976-93), 10 ng ml−1 IFNβ (Peprotech, 300-02BC-5UG) or no-stimulant control for 4 h. A total of 300,000 cells were sufficient to test viability; however, to ensure that we obtained an adequate number of cells for sequencing, we had duplicate plates of each condition, giving us a total of 600,000 cells per condition.

To evaluate viability, CellTiter-Glo solution was added to cells after stimulation as per the manufacturer’s instructions (Promega, G7573) and luminescence was read using the Envision plate reader (Perkin Elmer, 106406).

Ruxolitnib validation experiment

The effects of ruxolitnib on PBMC responses to LPS were individually validated. Additional details can be found in Supplementary Information.

PBMC scRNA-seq with cell hashing

For the CS and ruxolitinib experiments, cell hashing on samples was performed with antibody tagging using BioLegend’s TotalSeq-B with the 10X feature barcoding technology protocol. Additional details can be found in Supplementary Information.

Quantification and statistical analysis

Deconvolution of CSs with regularized linear regression

To deconvolute the effects of individual perturbations in a CS, we adopted a regularized linear regression approach that was previously used to infer the effects of guide RNAs on gene expression in pooled CRISPR screens6. This general framework for deconvolution requires a design matrix X (pools by perturbations) and an assay readout matrix Y (assay readout by pools), which we fit with a linear model using elastic net regression to infer the coefficient matrix β describing the association between perturbations and assay features. The ElasticNet approach is already a stringent method for variable selection (removing insignificant coefficients) and shrinkage (penalized L1 norm), helping to address correlated covariates and noisy data. We note that, for these reasons, this approach has been used in similar settings by others6. We fit the regression and used cross-validation to determine the optimal L1 ratio using the MultiTaskElasticNetCV function in sklearn (1.4.2) with possible L1 ratios varying from 0.01 to 1. We then permuted the perturbations labels and reran the elastic net regression 1,000 times to obtain a null distribution for the perturbation-feature coefficients. We then filtered out any coefficients in our nonpermuted regression that occurred more than the P-value threshold 1,000 times in the permuted data.

Cell Painting data preprocessing

We used the Distributed Cell Profiler software on AWS to process the Cell Painting images (https://github.com/CellProfiler/Distributed-CellProfiler), followed by a custom pipeline for feature selection and effect quantification. Detailed steps can be found in Supplementary Information.

Identifying optimum doses and time points for Cell Painting screens

We determined the dose–time point combination with the broadest range of effects with an approach based on MD. Additional details can be found in Supplementary Information.

Evaluation of cell viability in Cell Painting GT screen

We used the count of nuclei per well as a proxy for understanding the impact of the compounds on viability. Additional details can be found in Supplementary Information.

Identification of cellular phenotypes in GT Cell Painting data

To identify the biological cellular morphological phenotypes in the GT dataset, we first needed to account for substantial batch effects that existed between the first two replicates in our GT dataset (obtained during the dose–time point analysis) and the final four replicates that we collected later. These batch effects were consistent with large batch effects observed in previous studies using Cell Painting34. To account for this technical variation, we applied Harmony (a batch correction technique that uses soft k-means clustering to identify dataset-specific batch effects) to our GT datasets. After running Harmony, we identified the number of Harmony principal components (PCs) that explained 90% of variance in the data and used these top PCs as the features for identifying the cellular phenotypes in the GT data.

To identify cellular phenotypes, we followed the following workflow for clustering the GT Cell Painting samples. First, we computed the neighborhood graph for all GT samples over the Harmonized features, using the pp.neighbors function in scanpy (1.9.8) and a nearest neighbor value of \(\frac{1}{2}\sqrt{{\rm{number}}\; {\rm{of}}\; {\rm{samples}}}\). We then clustered on this neighborhood graph using the Leiden algorithm implemented in the tl.leiden function in scanpy. To determine clustering resolution, we reran clustering over resolution values from 0.01 to 2.0 and identified the resolution value (0.65) where greater than half (~52%) of DMSO samples resided in a single cluster, each cluster had significant perturbations and few perturbations (<0.5%) were significant in more than one cluster (based on phenotypic fingerprinting described below). This identified eight clusters (GT phenotypes) in the GT Cell Painting data. For each cluster, we then used the tl.rank_gene_groups function in scanpy to run a Wilcoxon test over the processed and feature selection Cell Painting features to identify features enriched in each cluster.

To determine perturbations with a significant effect, we used an approach for identifying the ‘phenotypic fingerprints’ of the perturbations16. For each perturbation, we counted the number of samples in each of the eight GT phenotypic clusters, then randomly permuted the perturbation labels 10,000 times and recalculated the samples in each cluster. Using these randomly permuted labels as a null distribution, we calculated the z scores and P values for the true counts of samples in each cluster. We filtered these results to perturbations with significant enrichment values (maximum z score > 2 and maximum P value < 0.01). These perturbations were then named ‘fingerprinting drugs’ for the enriched GT phenotypic clusters.

Compressed Cell Painting screen design, deconvolution and comparison to GT

To test the limitations of compressed screening, we conducted Cell Painting screens using a wide array of two parameters: replicates R (the number of pools in which each individual drug occurs) and compression C (the fold sample savings in the CS relative to our six replicate GT data). We tested R values of 3, 5 and 7 and C values of 6, 12, 24, 48 and 96. This resulted in 11 different parameter schemes consisting of 3, 5, 6, 7, 10, 12, 14, 20, 28, 40 and 80 drugs per pool. For each parameter scheme, we conducted two screens that were designed with distinct pool randomization, which we refer to as CS run 1 and CS run 2.

To deconvolute the compressed screening data, we passed the preprocessed and feature-selected Cell Painting features in our general regression framework for deconvolution. To assess the total magnitude of the inferred effect of each perturbation, we then calculated the L1 norm of the regression coefficients across all features for each perturbation. We then calculated the Pearson correlation in R (4.3) using function ‘cor.test’ between these values across all compounds and the GT MD value to assess how well the inferred compressed effects agreed with the effects of perturbations in the GT data.

CS scRNA-seq alignment, hash demultiplexing and initial quality control

For both the PDAC screen and the PBCM screen sequencing runs, raw sequencing reads were quantified and quality-controlled using standard pipelines and publicly available software. Full details can be found in Supplementary Information.

PDAC CS scRNA-seq analysis

We ran cNMF on the combined counts matrices from both compressed runs21. The choice of cNMF was because of the fact that it effectively reduces the number of target variables in our system of equations for deconvolution, which helps with finding reliable solutions, and it works well to find coordinated core transcriptional programs on noisy biological data as would be expected for our PDAC CS. We choose an optimal K value (number of cNMF GEPs) by combining a data-driven approach with existing knowledge of PDAC biology. We tested K values ranging from 5 to 50 and then identified candidate optimal K values by plotting the tradeoff between stability and error and focusing on the K values at local stability maxima. We then examined the GEPs at each of these K values and found that, at higher K values, a distinct TGFβ (a known major driver of PDAC biology and one of the landmark positive controls in the screen) GEP was present, whereas, at lower values, the TGFβ signaling GEP was collapsed into other GEPs. Thus, we chose the smallest of the candidate K values at which we observed a clear TGFβ signaling GEP.

We proceeded to analyze the combined scRNA-seq dataset in scanpy (1.9.8) using the highly variable genes output from cNMF and Pearson residual normalization. We visualized the full dataset by running PC analysis with the number of PCs chosen by running KneeLocator from package kneed (0.8.5) in Python to find the knee of the variance explained by the PCs and then running Harmony to integrate the data by batch for visualization purposes only, before visualizing the data in a uniform manifold approximation and projection (UMAP).

The usage of the cNMF GEPs was normalized within each cell. We next correlated the usage of the cNMF GEPs across cells and found that a subset described basal expression states that were broadly expressed across all cells, whereas the remaining were variably expressed across cells. We focused our downstream analysis on these variably expressed cNMF programs (Extended Data Fig. 3c). To annotate these GEPs, we first aimed to match them with known gene signatures. We defined genes significantly upregulated in the TNF, IFNγ and TGFβ1 landmark samples compared to the control samples (using function ‘rank_genes_groups’ by scanpy and then running KneeLocator to find top genes) and calculated module scores for these landmark differential expression (DE) genesets using ‘score_genes’ from scanpy. Then, we used decoupler and the ‘score_genes’ function in scanpy to generate module scores for all PROGENy and MsigDB genesets (using functions ‘run_consensus’ and ‘run_ora’ from decoupler, respectively) for the top 25 genes in the basal and classical signatures from Moffitt et al.26 and for gene signatures for cycling cells and for scRNA-seq quality metrics (number of unique molecular identifiers, number of features, percentage of mitochondrial reads and percentage of ribosomal reads)35. We also used the ‘score_genes’ function in scanpy to generate module scores for the cNMF GEPs for fair comparison. We then correlated all of these module scores with the GEP module scores across all cells and where GEPs highly correlated with a given module; we used this information to manually annotate the GEPs (Extended Data Fig. 3d). With this approach, we were able to annotate all but one highly variable GEP.

Then, we passed the GEP usage scores (output from the cNMF workflow; Extended Data Fig. 3g) for each highly variable cNMF GEP across cells in the compressed wells into our deconvolution code to infer the effect of perturbations on each GEP. We used these results to annotate the one GEP that we could not annotate on the basis of gene signatures (the PDAC IL-4 and IL-13 module).

For ligand–receptor analysis, we took the gene names for the 68 ligands in the CS and searched against the CellphoneDB database using the command line call ‘cellphonedb’ query ‘find_interactions_by_element’. In situations where the identified interacting partner for a ligand was a receptor complex, the complex was expanded to the individual components. The expression of cognate receptors was evaluated in PDAC organoid data generated from the CS.

PDAC single-ligand validation screen scRNA-seq analysis

We adopted a similar approach as in the CS to run cNMF, process scRNA-seq data and identify and annotate highly variable GEPs in the single-ligand data. More details can be found in Supplementary Information.

Evaluation of cNMF GEPs on existing PDAC tumor datasets

To project the validation of cNMF GEPs onto existing PDAC tumor datasets from TCGA and Raghavan et al., we first generated a representative geneset for each GEP by running KneeLocator in Python on the sorted gene spectra scores to identify the top genes contributing to each GEP. We then generated a module score for each of these genesets, the top 25 classical and basal genes from Moffitt et al.26 and IL-4 and IL-13 signaling genesets from MsigDB in both PDAC tumor datasets (for each tumor in TCGA and each cell in Raghavan et al.). We then correlated the GEP module scores and MsigDB module scores with the Moffitt et al. module scores. To look for further evidence of IL-4 and IL-13 signaling associated with classical tumors in Raghavan et al., we performed a targeted analysis of the genes differentially expressed in SPP1+ macrophages, as this cell population was in significantly higher abundance in the TME of classical tumors in Raghavan et al.

Differential analysis in IL-4-treated cells

Density estimations of IL-4-treated versus control cells were calculated using the ‘embedding_density’ function in scanpy on the precomputed UMAP of all single-ligand perturbation data. DE was examined using the ‘rank_genes_groups’ function with control cells as the reference group and IL-4-treated cells as the treatment group. Using the Wilcoxon test, the top 25 DE genes were ranked by the z score underlying the computation of P value for each gene. Next, IL-4 DE genes were filtered for adjusted P value < 0.01 and z score > 0 and then ranked by score. All significant DE genes after filtering were used in overrepresentation analysis on MsigDB Hallmark genesets using the ‘get_ora_df’ function by decoupler.

Survival analysis on IL-4 modules

To evaluate the ability of our PDAC IL-4 and IL-13 response module derived from the single-ligand PDAC screen, along with other existing IL-4 modules from Reactome, BioCarta and Lu et al., we took TCGA bulk RNA-seq data on PDAC from the Genomic Data Commons TCGA pancreatic adenocarcinoma (PAAD) cohort (n = 182) along with paired survival data. In total, 181 of 182 tumors had matched overall survival information. The log2(fragments per kilobase of transcript per million mapped reads + 1)-normalized data were used to score for the PDAC IL-4 and IL-13 response module geneset selected by KneeLocator as mentioned above and the other genesets. All modules scored were then scaled across all samples and we assigned high versus low expression by taking the top and bottom tertile for each module. Kaplan–Meier curves were fitted using KaplanMeierFitter and P values were calculated using ‘logrank_test’ from package lifelines (0.27.7).

Cell viability calculation for PBMC CSs

Cell viability from drug and stimulation treatments was read out through the CellTiter-Glo viability assay. For a 96-well plate, we took the mean of each row i’s empty wells (columns 7–12) as the background measurement (\({V}_{{\rm{empty}}}(i)\)) for each row’s readout and subtracted this value from each nonempty well: \({V}_{{\rm{adj}}}(i,{j})=V(i,j)-{V}_{{\rm{empty}}}(i)\).

In each treatment plate, we then calculated the mean viability of DMSO-only wells after subtracting the background signal (\({V}_{\rm{adj,DMSO}}\)). All nonempty wells were normalized against that value: \({V}_{\rm{norm}}(i,j)={V}_{\rm{adj}}(i,j)/{{V}_{\rm{adj,DMSO}}}\).These values were then compared against each other in mean pool values for each compound and run through linear deconvolution to check significant effects by compounds.

PBMC screen cell type annotation

Cell type annotation was first performed on the control plate using the Azimuth label transfer method from the pretrained PBMC reference model58 with level 1 and 2 predicted cell types (https://github.com/satijalab/azimuth) and the results were inspected by checking top cluster markers through unbiased Louvain clustering. We further integrated the cells from the two stimulation condition plates with the control plate and ran unbiased clustering on the entire dataset. We annotated the IFNβ and LPS plates by inspecting the annotation of control cells in each cluster and independent Azimuth cell type prediction results on these plates.

PBMC drug effect measurement

We split the combined two-plate datasets by annotated cell types, excluding platelet cells. cNMF was run on the raw counts of filtered cells of each cell type from the LPS plate + control plate and IFNβ plate + control plate for a direct comparison of stimulant effects, with similar methods to those used in the PDAC screen. We first identified significant stimulant effects in the cell type by evaluating the normalized GEP usage scores between DMSO cells in the stimulant plate and control plate. We filtered out GEPs where the usage scores were significantly different and had a large effect size between the stimulant and the controls (P value < 0.05, absolute log2 fold change (LFC) > 0.75) to be stimulant effect modules. We further separated these effect modules into stimulant-upregulated (LFC > 0) and stimulant-downregulated (LFC < 0) modules.

Similar to the PDAC analysis, we next passed the GEP usage scores across cells in the compressed wells from the cNMF runs into our deconvolution code to infer the effect of perturbations on each GEP in each cell type. We made a modification to the design matrix with additional interaction terms between the stimulant and the drug (for example, GEP 1 ≈ LPS + drug A + LPS × drug A). This allowed us to evaluate whether there was any interaction between the drugs and the stimulant. For visualizing the deconvolution results across cell types in LPS and IFNβ stimulations (Supplementary Figs. 2 and 3), we showed all regression coefficients on compounds and interaction terms after masking out insignificant coefficients (P value < 0.01), which we replaced with 0.

To determine consensus stimulant–response modules, we used a permutation test P-value threshold of 0.05 to filter out significant regression coefficients after deconvolution. We then evaluated the previously selected stimulant effect modules to see whether they still had a significant effect (that is, coefficient > 0 for upregulated modules and coefficient < 0 for downregulated modules) and retrieved the consensus result for downstream analysis.

To classify and evaluate the effect of each drug on a stimulant, we defined our own metrics from the deconvolution results. More details can be found in Supplementary Information.

Inclusion and ethics

We worked to ensure diversity in experimental samples through the selection of the biological models. We worked to ensure diversity in experimental samples through the selection of the genomic datasets. One or more of the authors of this paper self-identify as underrepresented ethnic minorities in science. The author list of this paper includes contributors from the ___location where the research was conducted who participated in the data collection, design, analysis and/or interpretation of the work.

Ethical approval statement

Patients at the Dana-Farber Cancer Institute gave their consent to Dana-Farber and Harvard Cancer Center Institutional Review Board (IRB)-approved protocols. PBMC samples were derived from MGB under IRB-approved protocols.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.