Scalable, compressed phenotypic screening using pooled perturbations

Liu, Nuo; Kattan, Walaa E.; Mead, Benjamin E.; Kummerlowe, Conner; Cheng, Thomas; Ingabire, Sarah; Cheah, Jaime H.; Soule, Christian K.; Vrcic, Anita; McIninch, Jane K.; Triana, Sergio; Guzman, Manuel; Dao, Tyler T.; Peters, Joshua M.; Lowder, Kristen E.; Crawford, Lorin; Amini, Ava P.; Blainey, Paul C.; Hahn, William C.; Cleary, Brian; Bryson, Bryan; Winter, Peter S.; Raghavan, Srivatsan; Shalek, Alex K.

doi:10.1038/s41587-024-02403-z

Download PDF

Article
Open access
Published: 07 October 2024

Scalable, compressed phenotypic screening using pooled perturbations

Nuo Liu^1,2,3,4,5^na1,
Walaa E. Kattan ORCID: orcid.org/0000-0002-0599-9744^1,2,3,4^na1,
Benjamin E. Mead ORCID: orcid.org/0000-0001-7258-7777^1,2,3,4^na1,
Conner Kummerlowe^1,2,3,4,5^na1,
Thomas Cheng^1,2,3,4,
Sarah Ingabire^1,2,3,4,
Jaime H. Cheah^2,6,
Christian K. Soule^2,6,
Anita Vrcic⁶,
Jane K. McIninch⁶,
Sergio Triana^1,2,3,4,
Manuel Guzman^1,2,3,4,
Tyler T. Dao^1,2,3,4,7,
Joshua M. Peters^4,7,
Kristen E. Lowder^3,8,
Lorin Crawford^9,10,11,
Ava P. Amini ORCID: orcid.org/0000-0002-8601-6040⁹,
Paul C. Blainey ORCID: orcid.org/0000-0002-4889-8783^2,3,7,
William C. Hahn ORCID: orcid.org/0000-0003-2840-9791^3,8,12,
Brian Cleary ORCID: orcid.org/0000-0003-0825-7129^13,14,15,
Bryan Bryson ORCID: orcid.org/0000-0003-1716-6712^4,7,
Peter S. Winter³^na2,
Srivatsan Raghavan ORCID: orcid.org/0000-0002-5374-9918^3,8,12^na2 &
…
Alex K. Shalek ORCID: orcid.org/0000-0001-5670-8778^{1,2,3,4,5,16,17}^na2

Nature Biotechnology (2024)Cite this article

21k Accesses
5 Citations
45 Altmetric
Metrics details

Subjects

Abstract

High-throughput phenotypic screens using biochemical perturbations and high-content readouts are constrained by limitations of scale. To address this, we establish a method of pooling exogenous perturbations followed by computational deconvolution to reduce required sample size, labor and cost. We demonstrate the increased efficiency of compressed experimental designs compared to conventional approaches through benchmarking with a bioactive small-molecule library and a high-content imaging readout. We then apply compressed screening in two biological discovery campaigns. In the first, we use early-passage pancreatic cancer organoids to map transcriptional responses to a library of recombinant tumor microenvironment protein ligands, uncovering reproducible phenotypic shifts induced by specific ligands distinct from canonical reference signatures and correlated with clinical outcome. In the second, we identify the pleotropic modulatory effects of a chemical compound library with known mechanisms of action on primary human peripheral blood mononuclear cell immune responses. In sum, our approach empowers phenotypic screens with information-rich readouts to advance drug discovery efforts and basic biological inquiry.

Machine learning phenomics (MLP) combining deep learning with time-lapse-microscopy for monitoring colorectal adenocarcinoma cells gene expression and drug-response

Article Open access 20 May 2022

Testing the predictive power of reverse screening to infer drug targets, with the help of machine learning

Article Open access 09 May 2024

Drug screening at single-organoid resolution via bioprinting and interferometry

Article Open access 06 June 2023

Main

Phenotypic screening is a powerful approach for identifying clinically relevant treatments^1,2,3,4,5. To date, successful efforts have used basic readouts (for example, viability) in simple models of infectious or monogenic diseases where the underlying mechanisms are easy to recapitulate faithfully⁴. However, most human diseases arise in complex multigenic and multicellular settings that are not well captured by simple assays and model systems. While recent advances in phenotypic screening can now directly link high-content readouts to genetic perturbation libraries (for example, Perturb-seq⁶ and compressed Perturb-seq⁷), these methods still have several drawbacks. For example, genetic approaches remain difficult to use in some models and do not mirror the pleiotropic manner in which compounds and intercellular signals act in vivo. Furthermore, most genetic hits require follow-up pharmacologic development, disrupting the chain of translatability^4,7. As a result, there remains a wide array of human maladies for which phenotypic drug discovery cannot be applied effectively. To accelerate drug discovery efforts, we require improved phenotypic screens that have three properties. First, they should leverage in vitro or ex vivo cellular models that maintain high-fidelity to in vivo disease contexts such as cell type, epigenetic state or tissue of residency. Second, they should use high-content readouts, such as single-cell genomics or high-content imaging, that can comprehensively capture the complex cellular phenotypes associated with a disease state. Third, they should prioritize biochemical perturbation libraries built with small molecules and/or protein ligands that, unlike genetic perturbations, can be directly brought forward as therapeutic substrates.

At present, it is challenging to conduct such high-content phenotypic biochemical perturbation screens because of two limitations of scale. First, high-content readouts, such as single-cell RNA sequencing (scRNA-seq), are orders of magnitude more expensive than simple functional assays such as cell viability, growth or secretion. While methods exist for increasing efficiency (for example, compressed sensing⁸ and antibody- or lipid-based sample multiplexing⁹, they can still be costly to implement on a per-compound basis and/or have fundamental throughput limitations. Second, higher-fidelity models, such as those derived from clinical specimens, are more challenging to generate at scale than less physiologically representative systems, like cell lines. In the most representative cases that use tissue explants or specific model organisms, biomass limitations practically restrict the number of perturbations that can be tested; in expandable models of intermediate complexity, expansion can be constrained by phenotypic changes over time, as evidenced by epigenetic drift in intestinal organoids¹⁰, clonal drift in glioblastoma organoids¹¹ and altered RNA and genetic states in pancreatic cancer organoids¹². As one approach to enhance scale, past studies have pooled perturbations together.¹³ While this fundamental approach could improve throughput and reduce assay costs per compound, previous efforts have been limited to simple readouts and libraries of mostly inactive small molecules, leaving unanswered questions about the use of such approaches and how to effectively implement them.

Here, we introduce a generalizable, scalable and efficient method for compressing phenotypic screens with high-content readouts. Our approach pools perturbations to reduce sample input, cost and labor requirements and infers single perturbation effects with a regression-based framework to help prioritize follow-up investigation (Fig. 1). Unlike for cell-intrinsic perturbations, such as pooled clustered regularly interspaced short palindromic repeats (CRISPR) screens, where one can simultaneously probe multiple features in a single sample in parallel with tagging strategies that link each specific perturbation to an individual cell, there is no analogous method for pooling cell-extrinsic factors, such as small-molecule drugs. To establish the feasibility and limits of our approach, we conducted a series of phenotypic screens—both conventional and compressed—using a bioactive small-molecule library paired with a high-content imaging readout. Across a wide range of pool sizes, we consistently identified the compounds with the largest ground-truth (GT) effects as hits in our compressed screens (CSs), thoroughly vetting the robustness of the approach. To demonstrate the power of our method and its generalizability to other models and readouts, we applied it in two biological discovery settings. In the first, we examined the impact of tumor microenvironment (TME)-relevant recombinant protein ligands on early-passage patient-derived pancreatic ductal adenocarcinoma (PDAC) organoids with an scRNA-seq readout, as a subset of these factors had been shown previously to alter PDAC cell states and therapeutic responses¹². We found that almost all our top compressed hits drove conserved transcriptional responses when screened individually. Moreover, we observed that transcriptional responses to specific cytokines were distinct from those described in canonical reference databases and uniquely associate with clinical outcomes in a separate PDAC cohort. In the second, we generated a systems-level map of drug effects in diverse cell types by measuring the immunomodulatory impact of a small-molecule mechanism of action (MOA) library on lipopolysaccharide (LPS) and interferon-β (IFNβ) responses in human peripheral blood mononuclear cells (PBMCs). Working in a multi-cell type model with multilayered perturbations, we uncovered several compounds with pleiotropic effects on different gene expression programs (GEPs), both within and across different cell types, and confirmed the heterogeneous effects of one such hit. In summary, our method provides a general means to conduct high-content phenotypic screens in biological models of varying complexity with substantially reduced sample, reagent and labor costs. Our results demonstrate the feasibility of pooled phenotypic screens compressing small molecules and biologics, and may unlock new routes for discovery efforts by connecting experimental perturbations to basic and clinical observations.

**Fig. 1: Compressed screening with high-fidelity model systems and high-content assays.**

Results

Scaling up biochemical phenotypic screens with compression

To increase the throughput of phenotypic screens and unlock the use of high-throughput assays and model systems of increasing complexity (Fig. 1a), we examined the feasibility of compression through pooling external perturbations, such as biological ligands or chemical compounds (Fig. 1b). In this approach, we combine N perturbations into unique pools of size P, ensuring each perturbation appears in R distinct pools overall (Fig. 1b,c). Relative to a conventional screen where replicates of each perturbation are screened individually, a CS reduces the sample number, cost and labor requirements by a factor of P, which we refer to as P-fold compression. To analyze the output of a CS, we developed an assay-independent computational framework for deconvoluting the effects of individual perturbations on the basis of regularized linear regression and permutation testing (Fig. 1d), inspired by previous work inferring the effects of guide RNAs on genes in pooled CRISPR screens^6,14.

Developing and benchmarking CS with high-content imaging

We conducted a series of GT experiments in which we examined the impact of a 316-compound Food and Drug Administration (FDA) drug repurposing library (Supplementary Table 1) on the morphology of U2OS cells to establish a reference dataset for benchmarking the feasibility of compression (Fig. 2a). We chose this library because it would represent a challenging use case for pooling and inferring the impact of individual perturbations (because bioactive compounds with large effects would frequently co-occur in our pools); we selected this model, meanwhile, so that we could readily establish GT (as we could generate enough input material to test each perturbation independently, as well as multiple compressions). Given the size of the GT screen, we opted to measure cellular phenotypes with Cell Painting¹⁵, a cost-effective high-content morphological profiling readout that multiplexes six fluorescent dyes (probes: nuclei, Hoechst 33342; endoplasmic reticulum, concanavalin A–AlexaFluor 488; mitochondria, MitoTracker Deep Red; F-actin, phalloidin–AlexaFluor 568; Golgi apparatus and plasma membranes, wheat germ agglutinin–AlexaFluor 594; nucleoli and cytoplasmic RNA, SYTO14), imaged in five channels, to examine multiple cellular components and organelles (Fig. 2b). We constructed and applied a pipeline to analyze our Cell Painting datasets, inclusive of illumination correction, quality control, cell segmentation, morphological feature extraction, plate normalization and highly variable feature selection, yielding 886 informative morphological attributes for further analysis¹⁶ (Supplementary Table 2 and Methods).

**Fig. 2: Compressed screening identifies compounds with largest effects in a GT setting.**

To find optimal conditions for our screen, we conventionally assayed nine concentration and time point combinations, evaluating the effects of different treatments, durations, doses and batches on cellular morphology (Extended Data Fig. 1a,b) and viability (Extended Data Fig. 1c,d and Methods). To quantify an overall morphological effect, we computed a vector of median values for each of the 886 morphologically informative features in each sample well and then calculated the average Mahalanobis Distance (MD) between the control and perturbation vectors for each compound (Fig. 2c). The MD is a multidimensional generalization of the z-score^2,3,17,18 that has been used extensively to discern effect size in studies deploying high-content cellular morphological assays^19,20. On the basis of these pilots, we selected the 24-h time point and 1 μM concentration combination to generate the full GT dataset because these conditions yielded the largest MD coefficient of variation (Extended Data Fig. 1a,b). We next screened all 316 compounds individually with six-fold replication, resulting in 2,088 samples (that is, wells; inclusive of 1,896 perturbation and 192 control (DMSO) samples) to define our GT (Fig. 2b). To identify shared phenotypic responses among these data, we performed dimensionality reduction over the aforementioned 886 morphologically informative features for all wells with a sufficient number of cells (≥50), yielding eight GT phenotypic clusters enriched for distinct morphological properties related to the intensity and distribution of each stain, as well as overall cell shape and structure (Fig. 2c,d, Extended Data Fig. 1e–g and Methods). One of the clusters (cluster 1) was dominated by DMSO control samples, whereas the others (clusters 2–8) were enriched for specific drug perturbations (Fig. 2e). Reassuringly, drugs driving defined morphological responses, albeit with different effect sizes (as determined by MD), shared common MOAs (for example, antineoplastics in cluster 7; Extended Data Fig. 1h,i and Supplementary Table 1).

We next performed a series of matched CSs. Using the same U2OS model, Cell Painting readout and 316-drug library, we thoroughly tested the feasibility and limits of our approach by spanning different degrees of compression (3–80 drugs per pool) and replication (each drug appearing in a total of three, five or seven pools; Fig. 2f,g). After deconvolving the impact of each drug on the 886 morphological features using regularized linear regression and performing quality controls for toxicity (Fig. 2h and Extended Data Fig. 2a–c), we directly compared the CS and GT results. In the CS, the MD between a perturbation and the control cannot be computed directly because the deconvolution results in a point estimate for the effect of each perturbation (coefficient weights), including the controls, rather than a distribution. Therefore, after comparing several metrics, we chose to summarize the effect of each perturbation in the CS using the max-scaled L₁ norm of all significant regression coefficients (permutation test; Extended Data Fig. 2d and Methods). We observed a strong correlation between the perturbation effects computed for the GT and CS screens for pool sizes of up to 40 perturbations, with good agreement for results derived from screens with the same pool size but distinct combinations of drugs in each pool (Fig. 2i,j and Extended Data Fig. 2e). By varying the significance threshold for our permutation tests, we found that we could tune the deconvolution to identify either more hits with a higher false positive rate or fewer hits that were more likely to be true positives (Fig. 2k and Extended Data Fig. 2f). Across all pool sizes, we consistently detected significance in the CS (scaled L₁ norm > 0) for those compounds with the largest GT effects, while those with a more modest impact were only captured at lower compressions (Fig. 2l,m). Together, our benchmarking results demonstrate that CS is efficient, accurate and reproducible, and that the degree of compression can be tuned by a user to balance throughput with stringency.

Examining TME ligands impact on PDAC organoids with CS

While TME signals are critical features dictating patient survival across myriad cancers²¹, ex vivo systems typically overlook these microenvironmental features. This can result in a loss of therapeutically relevant transcriptional features in vitro, leading to decreased concordance with in vivo biology^12,22. Understanding how multiple TME factors influence response signatures requires systematic perturbation experiments in patient-derived models that can scale with high-dimensional readouts such as scRNA-seq; however, cost, labor and biological material input requirements have limited these measurements using conventional approaches. To overcome this barrier and evaluate the utility of our approach to enable biological discovery, we used CSs to characterize transcriptional responses to 68 TME ligands in early-passage, patient-derived PDAC organoids, a model system known to drift transcriptionally in culture^12,23 (Fig. 3a).

**Fig. 3: CS of biological ligands on PDAC organoids reveals major axes of transcriptional response that are recapitulated in single-ligand validation screens with clinical relevance.**

More specifically, we conducted two PDAC organoid CSs with the same model, ligand library and compression scheme (replicates = 5, mean pool size = 4.75), but distinct ligand poolings (Fig. 3b). In addition to negative controls (wells containing only organoids and media), we individually screened three ligands (tumor necrosis factor (TNF), transforming growth factor-β1 (TGFβ1) and interferon-γ (IFNγ)) with known activity in PDAC^12,24 as positive controls (landmarks). After sample demultiplexing and quality control, we analyzed 5,662 cells and 10,881 cells from the two runs, with a mean ± s.d. of cells per compressed pool of 59 ± 32 and 113 ± 65, respectively (Extended Data Fig. 3a,b).

As PDAC cells in these screens were simultaneously exposed to multiple perturbations, we reasoned that a given cell might concurrently respond to one or more ligands within the pool by expressing one or more GEPs. We discerned such GEPs by applying consensus non-negative matrix factorization (cNMF)²⁵ (Fig. 3c). This approach revealed 13 GEPs with highly variable activity across cells that were consistent between the two runs (Extended Data Fig. 3c, Supplementary Table 3 and Methods). We contextualized these GEPs by correlating GEP expression scores with those for canonical gene signatures previously used to characterize PDAC cell states (Moffitt et al. classical and basal state signatures²⁶), existing MsigDB and PROGENy genesets, signatures associated with our landmark (that is, single cytokine) perturbations (TNF, TGFβ and IFNγ), cell-cycle scores²⁷ (G2M and S phase) and cell quality metrics (‘percentage ribosomal’, ‘percentage mitochondrial’ and ‘number of genes’). This allowed us to annotate 11 of the GEPs as nuclear factor (NF)-κβ activation, TGFβ response, type I IFN response, organoid classical, cell cycle S phase, cell cycle G2M phase, mitochondrial module, ribosomal module and three GEPs primarily expressed in low-complexity cells (Extended Data Fig. 3d,e and Methods)²⁸. One of the two remaining GEPs displayed a negligible association (R < 0.03) with the Hallmark IFNγ response in MsigDB, yet clearly correlated with the module score for genes differentially expressed in the IFNγ-positive control wells (‘landmark perturbation effects’, Extended Data Fig. 3d). The final GEP did not clearly correlate (R > 0.25) with any signature in MsigDB or PROGENY (Extended Data Fig. 3d). We annotated it as the ‘PDAC interleukin (IL)-4 and IL-13’ signature on the basis of the ligands inferred to drive the observed response, supported by literature suggesting that some of its top ranked genes (CD55, PIGR and SERPINB3) can be induced by IL-4 and IL-13 in epithelial cells^29,30,31 (Extended Data Fig. 3e) and the presence of detectable IL4I1, a downstream target gene directly induced by IL-4, in SPP1⁺ macrophages in the primary patient PDAC data from Raghavan et al.¹² (Extended Data Fig. 3f). We note that related attempts to use gene-based deconvolution were highly overdetermined (given the ratio of ligands to genes), yielding a small, nondeterministic set of genes associated with each ligand that were difficult to interpret using standard enrichment analyses.

Next, we applied our deconvolution framework to infer which ligands drove the activity of each cNMF GEP in the CS (Fig. 3d,e and Extended Data Fig. 3g,h). Encouragingly, IFNγ was associated with the IFNγ response GEP, IFNɑ2 was associated with the type I IFN signaling GEP, TGFβ and inhibin-βB (INHBB; a member of the TGFβ superfamily) were associated with the TGFβ response GEP, IL-1A, IL-1B and TNF (known activators of NF-κβ³²) were associated with the NF-κβ response GEP and the mitogens Wnt7A and R-Spondin 3 (RSPO3) were associated with the ribosomal module GEP^33,34. The low-complexity GEP 1, meanwhile, did not consistently associate with any ligand and was ignored as a quality artifact in downstream analyses. As stated above, we annotated the PDAC IL-4 and IL-13 response GEP on the basis of its association with IL-4 and IL-13, as well as with adiponectin (ADIPOQ). Interestingly, we would not have been able to infer which ligands were most active in the organoids by examining cognate receptor expression alone, as our active and inactive ligands did not have discernable patterns of cognate receptor gene expression in untreated organoid cells or in patient-matched primary tumor cells (Extended Data Fig. 3i and Supplementary Table 4). Together, our deconvolution framework paired the major axes of induced phenotypic variation in PDAC organoids with the TME ligands driving those transcriptional variations.

To validate our CS results, we individually tested the 11 top-scoring ligands (IL-1A, IL-1B, TNF, IL-4, IL-13, ADIPOQ, TGFβ1, INHBB, IFNγ, Wnt7A and RSPO3) significantly associated with five of the aforementioned GEPs (NF-κβ response, PDAC IL-4 and IL-13 response, TGFβ response, IFNγ response and ribosomal module; Fig. 3f)²³. We conducted two validation experiments on separate days (biological replicates) by screening six replicates of all ligands, including a set of negative control organoids in each run (Extended Data Fig. 4a). We then pooled these data with the negative controls and individual landmark ligand positive controls from the original CSs to form a final single-ligand perturbation dataset. To directly control for batch variation between runs (Extended Data Fig. 4a and Methods), we performed cNMF over the single-ligand dataset and then ran a linear model to predict the effect of each ligand on each single-ligand cNMF GEP while including experimental batch as a covariate. This identified single-ligand GEPs with similar enrichments to, and high correlations with, each of the GEPs from the CS dataset (Extended Data Fig. 4b–d, Supplementary Table 3 and Methods). For example, there was significant overlap between the CS and single-ligand PDAC IL-4 and IL-13 response GEPs (Fisher’s exact test P value = 1.62 × 10⁻²⁶), and the overlapping genes ranked highly by gene loading in both modules (Extended Data Fig. 4e).

More critically, these data revealed that the majority of the validation ligands significantly induced expression of the expected single-ligands GEPs (that is, those most similar to the CS GEPs discovered in the CS), with all five target GEPs induced by at least one validation ligand (Extended Data Fig. 4f). To cross-check these results for congruence, we also ran a linear fit for the validation data using the CS GEPs in place of the single-ligand ones and obtained similar results (Extended Data Fig. 4g). Lastly, we observed a strong correlation between ligand effects on CS cNMF GEPs in both the CS and the single-ligand validation experiments (Fig. 3f and Methods), demonstrating that compressed screening can robustly identify the transcriptional effects of TME ligands on patient-derived cancer models.

In examining the GEPs in our CS and validation experiments and their associated ligands, we observed limited overlap with matched canonical ligand signatures, suggesting the possibility of disease- and context-specific transcriptional responses to TME signals (Extended Data Figs. 3d and 5a). For example, the PDAC IL-4 and IL-13 response GEP did not correlate well with other signatures within established geneset databases^12,35 or those associated with PDAC outcomes²⁶ (Extended Data Figs. 3d and 5a–e), likely explained by the limited overlap between the PDAC IL-4 and IL-13 response GEP and publicly available IL-4 and IL-13 gene signatures (Fig. 3g and Supplementary Table 5). Even for other response GEPs that had more obvious correlations with existing gene signatures (for example, those for TNF; Extended Data Figs. 3d and 4b), we observed only modest overlaps; for example, our PDAC NF-κB GEP shared only 20 genes with its matched PROGENY signature (Jaccard index = 0.058), our PDAC TGFβ GEP shared only 6 genes with the Hallmark TGFβ signature (Jaccard index = 0.047) and our PDAC IFNγ GEP shared only 12 genes with the Hallmark MSigDB signature (Jaccard index = 0.031; Extended Data Fig. 5f). Taken together, this reveals the importance of understanding and considering the contextual background against which perturbations are introduced as specific systems may respond differently to the same perturbation.

To explore the potential clinical relevance of these disease-specific responses, we examined the prognostic value of our PDAC IL-4 and IL-13 response GEP and existing IL-4 and IL-13 response signatures in bulk RNA-seq samples from PDAC tumors in The Cancer Genome Atlas (TCGA)³⁵. Comparing the top and bottom tertile for each IL-4 and IL-13 signature, we found that only our newly identified PDAC IL-4 and IL-13 GEP significantly stratified patients by survival, with higher expression associating with worse outcome (Fig. 3g). The same was true for our PDAC IFNγ and TGFβ GEPs, both of which had minimal overlap with their corresponding public signatures and showed better prognostic ability than their literature counterparts (Extended Data Fig. 5f,g). By further analyzing genes differentially expressed between IL-4- and DMSO-treated cells in our single-ligand perturbations (Extended Data Fig. 5h), we found that the top genes expressed in the IL-4 treatment group (MMP1, CD55 and DUOX2, also observed in our single-ligand PDAC IL-4 and IL-13 response GEP; Extended Data Fig. 4c) are known to be involved in, and enriched in a signature for, epithelial-to-mesenchymal transition (EMT), a process that promotes metastasis of cancer cells^36,37,38, potentially explaining the lower overall survival we observed for patients highly expressing this GEP (Fig. 3g and Extended Data Fig. 5i,j). Collectively, these results suggest that the PDAC IL-4 and IL-13, IFNγ and TGFβ response GEPs discovered through our approach represent disease-specific responses with prognostic implications that would have been missed using existing databases. More broadly, our data highlight that disease- and context-specific transcriptional responses to TME signals may result in distinct functional consequences, and emphasize the importance of augmenting and carefully considering existing gene signature databases.

CS dissects systems-level drug responses in primary samples

Many therapies act in a pleiotropic manner across the diverse cell types in a tissue, yet most of our drug screening is carried out in homogenous, immortalized cell lines. Accordingly, we often have a limited understanding of how perturbing a target molecule and its associated signaling network will impact complex cellular systems, ultimately registering therapeutic benefit or toxicity. As one foray into this problem, we leveraged a CS approach to scale small-molecule screening in fresh human PBMCs, a complex multicellular system, with a high-content scRNA-seq readout.

PBMCs are composed of a heterogeneous mixture of T cells, natural killer (NK) cells, B cells and monocytes, among others. A key advantage of using freshly isolated PBMCs is their close representation of peripheral host physiology. Here, we simulated immune activation in this compartment with both LPS (a Toll-like receptor 4 agonist) and IFNβ (a key antiviral cytokine), canonical stimuli that can be sensed by multiple immune subsets through well-described pathways, and that have been used to explore immune responses to Gram-negative bacteria and viruses, respectively^39,40. Critically, while immunomodulators are commonly used in the clinic^{41,42,43,44,45}, it is challenging to screen the transcriptional impact of multiple compounds on fresh human PBMCs because of material, cost and labor constraints. Thus, we hypothesized that a CS could serve as a bridge in the ‘chain of translatability’, allowing a better understanding of how potent and specific molecules manifest desirable or deleterious systems-level effects.

To examine the power of a CS in this setting, we tested the effects of 90 chemical compounds with a wide range of known MOAs curated by the Broad Drug Repurposing Hub (https://github.com/jump-cellpainting/JUMP-MOA) on immune responses among PBMCs (Fig. 4a). To execute these CSs, we generated identical pool sets (R = 3 replicates per drug and P = 6 drugs per well) in three 96-well plates with DMSO controls, added fresh PBMCs, incubated overnight and then stimulated with IFNβ or LPS for 4 h the following day while leaving one plate unstimulated (Fig. 4b). Following an evaluation of cell viability, each well was hashed using oligo-tagged antibodies and scRNA-seq was performed. After sample demultiplexing and quality control, we retained a total of 20,062 control cells (mean cell count per well ± s.d., 418 ± 351), 25,617 in the IFNβ plate (524 ± 353) and 65,868 in the LPS plate (1,372 ± 805). These yields highly correlated with our cell viability data, and we observed that both stimulations maintained cell counts relative to the unstimulated condition (Wilcoxon test one-sided P value < 0.01; Fig. 4c and Supplementary Fig. 1a,b). Running deconvolution on cell viability data revealed a handful of compounds with significant effects (Supplementary Fig. 1c). To understand how each compound impacted responses to each stimulus, we merged cell types from the control and stimulus conditions, and then ran cNMF on those cell types individually to identify GEPs (Fig. 4c, Supplementary Fig. 1d–f, Supplementary Table 6 and Methods). We avoided running cNMF on all three conditions at once to prevent effects from one stimulation being masked by the other. This analysis produced two main categories of GEPs: (1) ‘stimulant–response modules’, which changed significantly in response to stimulation, and (2) ‘cell-identity modules’, which did not.

**Fig. 4: CS of an MOA drug library identifies modulators of immune responses across human PBMCs.**

To determine the impact of each stimulant and drug and their potential synergies, we modified our deconvolution framework to account for possible interactions between the immune stimulant and each pooled drug. Specifically, we included a Boolean interaction term in our design matrix to denote whether the drug and stimulant coexisted in a well (X_interaction) and modified our coefficient matrix β to include fits for all three features (that is, β_stimulants, β_drugs and β_interactions). Overall, we observed that several cell type stimulant–response modules were influenced by the compounds in our MOA library, with some significantly influencing multiple cell types and/or response GEPs (Fig. 4d). Our modified deconvolution method enabled us to compare the signs of β_interactions and β_drugs against those in β_stimulants for specific combinations to understand the nature of stimulant–drug interactions in each cell type (Supplementary Figs. 2 and 3). Specifically, we divided our drugs into three categories on the basis of the significance of their impact on stimulant–responses in each cell type: (1) potentiators, which strengthen the effect of stimulant-mediated downstream signaling; (2) interferers, which oppose the effect of the response to a given stimulant; and (3) noninteractors, which do not have an effect on stimulant responses (Fig. 4e, Extended Data Fig. 6 and Methods). For potentiators and interferers, we further annotated them as ‘positive’ or ‘negative’ on the basis of whether the stimulant increases or decreases the GEP (Methods). As in the previous examples, one of the key parameters for this computational workflow is the P-value threshold we used in the deconvolution permutation test; as the P-value cutoff becomes more stringent, fewer drugs are nominated to have effects in each cell type (Extended Data Fig. 7a).

Our IFNβ CS identified compounds with diverse effects on multiple GEP and cell type combinations. In the 90-molecule library, 87 molecules had an impact on IFNβ responses in at least one cell type, and 73 had an impact in multiple cell types (≥2). Notably, 20 molecules had statistically significant effects in four or five cell types, often with high pleiotropy. δ-Tocotrienol (vitamin E, an HMG-CoA reductase inhibitor), for example, has been described as a broadly anti-inflammatory agent; here, we observed variable effects on different immune cell subset GEPs, including opposing effects (potentiating and interfering) on different IFNβ response modules (GEP1 and GEP5) in monocytes, as well as interference with response GEPs in CD4 T cells (GEP3) and NK cells (GEP6) (Fig. 4f). Similarly, ponatinib, known to target multiple receptor tyrosine kinases⁴⁶, elicited significant effects on IFNβ responses across multiple cell types, acting as a positive interferer of IFNβ GEP3 in B cells (Janus kinase (JAK)– signal transducer and activator of transcription (STAT) and TNF signaling), IFNβ GEP3 in CD4 T cells (JAK–STAT signaling) and IFNβ GEP1 in CD8 T cells (JAK–STAT and TNF signaling), but also as a negative interferer of IFNβ GEP8 in B cells (cell cycle and NF-κB) and IFNβ GEP3 in CD8 T cells (mitochondrial activity and adenosine triphosphate synthesis; Fig. 4f and Extended Data Fig. 7b–e). We note that, although three of the targeted IFNβ response GEPs were annotated as JAK–STAT signaling, ponatinib modulated both shared and unique genes across the cell types, once again highlighting context-specific signatures (Extended Data Fig. 7b, bottom).

Next, we examined drugs with targets known to modulate immune responses. One of the major effects of type I IFN stimulation is the activation of JAK–STAT pathways^{41,42,43,44,45,47}. Ruxolitinib, a known JAK–STAT inhibitor⁴⁸, acted as a positive interferer of a CD4 T cell GEP (IFNβ GEP3), which positively correlated with JAK–STAT signature expression, giving us confidence that our CS is detecting real biological signals (Fig. 4g and Extended Data Fig. 7d). Relatedly, our screen also nominated rapamycin, a mammalian target of rapamycin (mTOR) inhibitor⁴⁹, as a negative interferer of a GEP in B cells (IFNβ GEP8), which negatively correlated with JAK–STAT and TNF signaling; it was also a positive interferer of another GEP in B cells (IFNβ GEP3) that positively correlated with JAK–STAT and TNF signaling (Extended Data Fig. 7c and Extended Data Fig. 8a). These results are in line with the known cooperativity between mTOR and JAK–STAT pathways, and the previously reported role of rapamycin in destabilizing TNF expression^50,51. Taken together, we found rich response patterns across most of the chemical compounds in the library (Fig. 4h, Extended Data Figs. 7 and 8 and Supplementary Figs. 2 and 3), with monocyte GEPs impacted by the greatest number of molecules (n = 64), followed by B cells (n = 15) and CD4 T cells (n = 14) (permutation test P value < 0.01).

Relative to IFNβ, a similar analysis applied to LPS resulted in fewer stimulant–response modules at the same P-value threshold (Extended Data Fig. 9a); however, the impact of the JUMP MOA library on those modules was equally complex (Fig. 4h, Extended Data Fig. 6 and Supplementary Fig. 2). In total, 86 of 90 molecules had some effect on LPS responses in at least one cell type, 65 molecules had some effect in two or more cell types and 8 moelcules had some effect in four or five cell types. A representative compound with significant effects on LPS responses across multiple cell types is merimepodib, an inosine monophosphate dehydrogenase inhibitor that reduces intracellular guanine-based nucleotides, leading to the inhibition of DNA and RNA synthesis⁵². This broad-impacting MOA translated in our CS into pleiotropic effects across multiple cell types: Merimepodib acted as a positive interferer of B cell LPS GEP3 (TNF signaling and amine biosynthesis and metabolic processes) and CD8 T cell LPS GEP3 (JAK–STAT signaling), a positive potentiator of monocyte LPS GEP7 (translational and ribosomal activities) and a negative interferer of CD4 T cell LPS GEP1 (mitochondrial activities) (Extended Data Fig. 9b–f). BIX02188, a known mitogen-activated protein kinase (MAPK) kinase inhibitor, meanwhile, was found to be a positive interferer of LPS GEP4 in NK cells, a module highly correlated with TNF signaling and chemokine receptor (NK T cell pathways) expression (Extended Data Fig. 10a,b). Notably, it was reported that LPS-induced TNF production requires activation of the MAPK pathway^53,54. In addition, PF 477736, a checkpoint kinase 1 inhibitor⁵⁵, was one of the top positive potentiators of LPS GEP7 in monocytes, a module highly correlated with ribosomal events and translation elongation (Extended Data Figs. 9c and 10b). This is biologically consistent with the known role of PF 477736 in abrogating cell-cycle arrest, which in turn increases the rate of ribosomal and translational events. Interestingly and unexpectedly, we also found that AZD7545 and BX-912, two drugs with the same MOA annotations (pyruvate dehydrogenase kinase inhibitor), had divergent effects on the same LPS GEP7 in monocytes, a module highly correlated with RNA translation activity (Extended Data Fig. 9c and 10c). In the LPS stimulation screen, monocytes were again the cell type impacted by the largest number of molecules (n = 40), followed by CD8 T cells (n = 30) and NK cells (n = 11) (permutation test P value < 0.01).

Across the entire library, 53 compounds had effects on both LPS and IFNβ responses in multiple cell types. For example, ML298 hydrochloride, a selective phospholipase 2 inhibitor⁵⁶, was a positive interferer of NK cell LPS GEP7 (KRAS downregulated signal), a positive potentiator of monocyte LPS GEP7 (translational and ribosomal activities) and a negative interferer of CD4 T cell LPS GEP1 (mitochondrial activities). Meanwhile, it was a positive interferer of B cell IFNβ GEP3 (JAK–STAT and TNF signaling) and CD8 T cell IFNβ GEP1 (JAK–STAT and TNF signaling), a negative interferer of B cell IFNβ GEP8 (cell cycle and NF-κB) and a negative potentiator of monocyte IFNβ GEP5 (transmembrane transport and generation of precursor metabolite and energy) (Extended Data Figs. 7c,e, 8b, 9c,f and 10a,d). Furthermore, as with IFNβ GEP3 in CD4 T cells, we found a similar interfering effect by ruxolitinib on LPS GEP3 in CD8 T cells treated with LPS, a module also highly correlated with JAK–STAT signaling (Extended Data Fig. 9e and Extended Data Fig. 10e). Taken together, the consistency of our findings for ruxolitinib in both the LPS and the IFNβ screens and the identification of several new stimulant–drug interactions lend credence to the power of CS in identifying phenotypic and functional effects even in a complex multicellular system.

To independently corroborate pleiotropy, we tested the effects of ruxolitinib on LPS stimulation conventionally. Briefly, cells were seeded in triplicate at the same density as in the CS, treated with ruxolitinib overnight and then acutely stimulated with LPS for 4 h, followed by scRNA-seq (Fig. 4i). These data replicated ruxolitinib’s ability to prevent LPS-induced activation of the JAK–STAT pathway in CD8 T cells (GEP3). We observed that ruxolitinib also inhibited the expression of genes associated with CD8 T cell GEP3 in CD4 T cells, γδ T cells, B cells and NK cells (Fig. 4j). In our LPS CS screen, ruxolitinib was also nominated as a positive interferer of LPS GEP3 in B cells, a module correlated with TNF signaling and amine biosynthesis and metabolic processes (Extended Data Fig. 9d). Our validation data confirmed this effect in B cells, and identified similarly significant interference effects in CD4 and CD8 T cells, NK cells and γδ T cells (Extended Data Fig. 10f). Intriguingly, ruxolitinib upregulated the expression of both these modules in monocytes (Fig. 4j and Extended Data Fig. 10f). We note that these opposite trends in monocytes were unexpected, were not technical in nature, and call for more detailed investigation.

Overall, our results in PBMCs show how different compounds can either potentiate or suppress LPS and IFNβ GEPs in a cell-type-specific manner, even producing divergent responses in different cell types. Here, relative to the PDAC CS, we were able to achieve a second layer of compression by simultaneously profiling the impact of multiple drugs on multiple cell types. More generally, these results help establish the power of the CS framework for revealing systems-level information from phenotypic screens while reducing sample material, reagent and labor costs (here, by a factor of 5.5, roughly corresponding to the size of our pools P, relative to a conventional screen; Supplementary Table 7).

Discussion

Scalability presents a major obstacle for phenotypic screening with cell-extrinsic perturbations such as small molecules or recombinant proteins, often precluding the use of complex model systems and high-content readouts. To address this challenge, we developed CS, an assay- and model-independent approach to increase throughput while reducing cost, labor and material requirements by pooling perturbations and computationally inferring the effects of each using deconvolution. This method empowers chemical and biological ligand perturbation screens, which can often be more easily applied to complex models and more readily advanced as therapeutic substrates. We benchmarked CS in U2OS cells with a high-content imaging readout, demonstrating that the top hits across a wide range of compressions corresponded to those perturbations with the largest effects when screened individually. Next, we applied the technology to identify the phenotypic effects of TME ligands in primary PDAC organoids, discovering multiple classes of ligands that drive conserved, clinically relevant transcriptional response patterns distinct from canonical signatures. Lastly, by performing a CS campaign in PBMCs, a heterogeneous primary sample, we uncovered the ability of small molecules to modulate both unique and diverse immune response programs within and across different cell types. Over a range of systems of varying complexity, our results demonstrate the broad applicability of CS for translating phenotypic screens to larger search spaces, even when paired with high-content readouts, including imaging and single-cell genomics.

When developing the CS approach in U2OS cells, we found that a linear regression-based deconvolution approach could accurately identify compounds with the greatest effects in a GT screen across a variety of compressions (pool sizes varying from 3 to 80 perturbations). This suggests that it is possible to combine perturbations and that nonlinear interactions do not confound interpretation until pool sizes grow large. This is in line with other work that used high-content imaging to investigate all pairwise interactions between bioactive small molecules, finding that only 5% of all possible interactions were nonlinear⁵⁷. In the highly bioactive library we tested, the effects we inferred began to deviate from GT at compression levels above 40 perturbations per pool. This may be because of the fact that, as pool size increases, the number of possible pairwise interactions between perturbations scales quadratically (N choose 2) = \(\frac{1}{2}N(N-1)\), suggesting that, even if the fraction of interactions that are nonlinear remains small, the absolute number of nonlinear interactions will drive a large effect at a large enough pool size. While our findings suggest a rough upper bound for compression with highly bioactive perturbation libraries, we note that the limits of compression may increase with the richness of the assay readout used in a screen. In addition, much greater compression should be feasible in libraries with fewer bioactive perturbations (for example, design-of-synthesis libraries or discovery compound decks), enabling rapid evaluation of large compound sets in complex biological models. These findings also suggest a compelling regime in which to consider searches for combinatorial effects by conducting a CS to find significant individual perturbations and then either screening all pairwise combinations of these significant perturbations or performing a screen in conjunction with one of the factors being held constant to find synergistic pairs.

We further demonstrated that our compressed screening and deconvolution framework is tunable to the needs of the user. By altering the significance threshold for calling a hit, we show that it is possible to conduct more permissive screens (greater true hit identification along with more false positives) or less permissive ones (fewer true hits but with fewer false positives). This flexibility empowers users to adjust their screening strategy on the basis of the scope of planned follow-up validation experiments. While we conducted CSs in duplicate with distinct randomization to develop the technology, we observed reproducible top hits between replicates. This suggests that doing a single CS may be sufficient for new lead discovery, albeit with a slightly higher false positive rate than with replicate screens. Sequential screens can also be performed at decreasing compression levels to iteratively converge toward top hits in a resource-efficient manner.

Our results in PDAC organoids and PBMCs support the potential of CS as a strategy to uncover novel insights and drive translational discovery through phenotypic screening in complex biological models. While numerous gene signature databases exist, our work demonstrates that phenotypic response signatures can be disease- and context-specific, and that the identification of disease-relevant signatures may uncover important relationships to clinical outcomes. Illustratively, our PDAC GEPs stratify patients by outcome while matched canonical gene signatures do not. While further mechanistic work is needed to dissect the potential sources, roles and prognostic capabilities of signals such as IL-4 and IL-13 in PDAC, our data highlight the use of CS for uncovering model-specific perturbation responses to inform follow-up investigations.

Working with PBMCs, meanwhile, we simultaneously assayed the immunomodulatory properties of a 90-compound MOA library to reveal distinct effect fingerprints that could prove useful in shaping overall immune responses. By performing compression, we reduced the total experimental cost for the three screens (control, IFNβ and LPS) by 5.5-fold (using a pool size, P, of 6) while similarly reducing the manual labor and number of primary cells required. This allowed a single investigator to perform these screens starting from a single blood draw. Had this work been performed using a conventional approach, it would have required over 80 million cells as input and the processing of nine 96-well plates in parallel, with additional associated reagent costs. Critically, our ability to efficiently assay primary cells directly ex vivo portends a route to applying similar strategies to other complex multicellular specimens, such as biopsies. Such an application could both hasten the discovery of critical aspects of tissue biology and power more translatable therapeutic discovery and selection pipelines.

As currently designed, compression is intended to support ‘screening’. While our technology can identify the perturbations with the largest effects in a library, it cannot perfectly characterize the phenotypic shifts driven by all perturbations in a library. Thus, CSs should always be followed by a validation screen that examines the top hits individually. When developing our CSs, we only included one concentration for each perturbation; however, in many discovery-oriented applications, the best concentration to use may be unclear. As such, future iterations of this technology may benefit from exploring the feasibility of conducting CSs that include multiple doses of the same compound and deconvolution methods to infer both the effects of, and the dose responses to, individual compounds. Furthermore, because of the low frequency at which any two perturbations co-occur in our current design, our approach is not optimized to identify nonlinear effects within the CS. However, this limitation may not ultimately matter in clinical settings such as chemotherapy, where analysis suggests that the vast majority of combination therapies do not exploit additive or synergistic interactions between compounds but rather are successful combinations because of the heterogeneity of patient populations where each agent within the combination regimen is effective on distinct subsets of patients or cells within patients³¹.

In summary, compressed screening is a powerful strategy for enhancing the utility of phenotypic screens, high-content readouts and model systems with varying levels of complexity. We anticipate that compression will prove useful across a wide array of complex model systems and readouts where scalability is a major challenge. This new scalability will enable researchers to map the context-specific phenotypic effects of perturbations, facilitating interpretation of existing and to-be-collected datasets. It will also support the development of new therapeutics in complex systems that are highly relevant to human biology, helping connect in vitro discoveries with clinical implementations.

Methods

CS design principles

To design a CS, we use a set of user-specified design criteria to randomly assign perturbations to pools while satisfying the following: each pool contains a unique set of perturbations, each perturbation in the library (of size N perturbations) occurs in R_cs replicate pools and each pool is of average size P. The user specifies the library N, the compressed replicates R_cs, the conventional replicates R_conv and the desired fold compression C. Fold sample savings of the pools relative to a conventional screen with S_conv samples (\({S}_{{\rm{conv}}}=N\times {R}_{{\rm{conv}}}\) total samples) is C compression (\(P=\frac{{R}_{{\rm{cs}}}C}{{R}_{{\rm{conv}}}}\)). In each CS, we included a set of negative controls (no perturbation added) and a set of landmark positive controls (treated with individual perturbations known to have large effects). To build the plates for the CS, we transferred perturbations from the source library plates to the pool plates using an Echo 650 Acoustic liquid handler.

Cell Painting experiments

For our Cell Painting experiments, we plated U20S cells in 96-well black, clear-bottom, TC-tread plates (Corning, 3712BC) at 2,000 cells per well in 50 µl per well of DMEM (American Type Culture Collection (ATCC)) + 10% FBS + penicillin–streptomycin. Cells were then incubated overnight at 37 °C in 5% CO₂ to allow cell adhesion and equilibrium. For conventional screening, compounds (316 FDA-approved small molecules; Supplementary Table 1) were arrayed in 384-well format as stocks that were 1,000× the final concentration in DMSO. We then pin-transferred (V&P Scientific, mounted onto an MCA96 head of Tecan Freedom Evo 150) 50 nl of compound into the 50 µl of medium per well of the assay plate. Cells were then incubated for 6, 24 or 48 h at 37 °C in 5% CO₂. For compressed screening with the same compound library, we designed screens with the following compression schemes: C = 6, R_cs = 3; C = 6, R_cs = 5; C = 6, R_cs = 7; C = 12, R_cs = 3; C = 12, R_cs = 5; C = 12, R_cs = 7; C = 24, R_cs = 3; C = 24, R_cs = 5; C = 24, R_cs = 7; C = 48, R_cs = 5; C = 96, R_cs = 5. Respectively, these designs contained 3, 5, 7, 6, 10, 14, 12, 20, 28, 40 and 80 drugs per pool. Furthermore, for each compression scheme, we designed two screens with differing random assignment of drugs to pool. Each plate design was then dispensed using an Echo 650 Acoustic liquid handler into 50 µl of medium in the 384-well format, where the DMSO concentration in negative controls was held constant relative to the total compound dispensed for that assay plate. Assay plates were incubated for 24 h before the Cell Painting assay.

Following compound treatment, cells were stained for the mitochondria (500 nM MitoTracker Deep Red (Thermo Fisher, M22425) in DMSO; 7.5 µl of stock per assay plate) and for the Golgi apparatus and plasma membrane (60 µg ml⁻¹ wheat germ agglutinin–AlexaFluor 594 (Thermo Fisher W11262) in water; 900 µl of stock per assay plate) in 30 µl of medium per well and incubated at 37 °C for 30 min. We then fixed the cells by adding 10 µl of 16% methanol-free PFA (Thermo Fisher, 28908) per well and incubated at room temperature for 20 min. Next, we aspirated the solution, washed cells with 70 µl per well of 1× Hanks’ balanced salt solution (HBSS; Invitrogen, 14065-56) and permeabilized cells with 30 µl per well of 0.1% Triton X-100 in 1× HBSS at room temperature for 20 min. We then again washed cells with 70 µl per well of 1× HBSS and stained for the endoplasmic reticulum, nucleolus, F-actin, nucleus and cellular RNA by adding to each plate 1.5 ml of 100 µg ml⁻¹ concanavalin A–AlexaFluor 488 (Thermo Fisher, C11252) in 0.1 M sodium bicarbonate, 37.5 µl of 200 U per ml of phalloidin–AlexaFluor 594 (Thermo Fisher, A12381) in methanol, 9 µl of SYTO14 green fluorescent nucleic acid stain (Thermo Fisher, S7576), 7.5 µl of 5 µg ml⁻¹ Hoechst 33342 (Thermo Fisher, H3570) in water, 15 ml of 3 µM SYTO14 (Thermo Fisher, S7576) and 15 ml of 1× HBSS in 1% BSA. Cells were then incubated at room temperature for 30 min. We then aspirated the solution and washed cells three times with 70 µl of 1× HBSS, and then filled the wells with 70 µl of HBSS, sealed the plates and stored plates at 4 °C until ready for imaging. Cells were imaged on an ArrayScan VTI with a ×20 objective, using the Hoescht channel to autofocus, and we set the Z and autoexposure for each channel. Nine fields were captured in the middle of each well.

PDAC organoid culture

Pancreatic cancer organoids were derived as previously described from persons with metastatic pancreatic cancer who provided written, informed consent at the Dana-Farber Cancer Institute for tissue acquisition and research studies¹². Organoids were seeded in growth factor reduced Matrigel (Corning), fed with human complete organoid medium containing advanced DMEM/F12 (Gibco), 10 mM HEPES (Gibco), 1× GlutaMAX (Gibco), 500 nM A83-01 (Tocris), 50 ng ml⁻¹ mouse epidermal growth factor (Peprotech), 100 ng ml⁻¹ mNoggin (Peprotech), 100 ng ml⁻¹ human fibroblast growth factor 10 (Peprotech), 10 nM human Gastrin I (Sigma), 1.25 mM N-acetylcysteine (Sigma), 10 mM nicotinamide (Sigma), 1× B27 supplement (Gibco), RSPO1-conditioned medium (10% final), Wnt3A-conditioned medium (50% final), 100 U per ml of penicillin–streptomycin (Gibco) and 1× Primocin (Invivogen) and maintained at 37 °C in 5% CO₂ (Supplementary Table 8). The medium was changed every 6–7 days. Established organoid models were passaged by dissociation with TrypLE Express (Thermo Fisher) for 30 min and reseeded into Matrigel droplets and fresh culture medium.

Ligand library selection

To generate a library of relevant, effective ligands, we compiled multiple ligand–receptor datasets^19,32,33. We first prioritized ligands expressed by myeloid cells targeting structural cell types in a variety of organs or ligands expressed by human monocyte-derived macrophages or human macrophages in vivo. These 200+ ligands were then filtered on the basis of their overall expression and applicability in vitro as a secreted, independent effector molecule. Additional nongenetically encoded molecules were added on the basis of well-established effects, such as LPS-EK, leukotriene B4, oxidized low-density lipoprotein and Pam3CSK4. Manual curation and extensive literature research established the final library and concentrations on the basis of previously reported concentrations and results. All tested ligands are listed in Supplementary Table 1.

Compressed ligand screening

To perform compressed ligand screening on PDAC organoids, cells were expanded as described, resuspended in OWRNA medium with 10% (v/v) Matrigel and seeded at 20,000 single cells in 35 µl per well into a flat-bottom ultralow attachment 96-well assay plate (Corning, 3474) 1 day before dispensing ligands. A CS was designed by randomly assigning 68 ligands (Supplementary Table 1) to 72 pools such that each pool contained four or five ligands (average of 4.75 ligands per pool). Two such plates were designed, with distinct random assignment of ligands to pools in each plate. Ligands were first dispensed into 25 µl per well of OWRNA medium in a 384-well format transfer plate (in a quadrant-wise manner). After ligand dispense, each well was backfilled with an additional 50 µl per well of OWRNA medium. Next, 65 µl per well was transferred with a Tecan Freedom Evo 150 from each transfer plate quadrant into the 96-well assay plate with PDAC organoids, making a final total volume of 100 µl per well in the assay plate. Ligand concentration was modified to account for the multistage transfer such that final assay plate concentrations were as reported (Supplemental Table 1). Assay plates were cultured for 7 days at 37 °C in 5% CO₂ and processed for scRNA-seq with cell hashing as described below.

Single-ligand screening

A total of 11 candidate ligands were tested independently for their effects on cell state: 50 ng ml⁻¹ recombinant human (rHu) Wnt7A, 500 ng ml⁻¹ rHu RSPO3, 10 ng ml⁻¹ rHu TNF, 10 ng ml⁻¹ rHu IL-13, 10 ng ml⁻¹ rHu IL-4, 10 ng ml⁻¹ rHu IFNγ, 10 ng ml⁻¹ rHu IL-1A, 10 ng ml⁻¹ rHu IL-1B, 10 ng ml⁻¹ rHu TGFβ1, 25 ng ml⁻¹ rHu ADIPOQ and 10 ng ml⁻¹ rHu activin A (Peprotech). The same batch of organoids used in the original screen (PANFR0562) was used for the validation experiment; the organoids were all cultured and maintained in ‘OWRNA’ medium. Additional details on culture conditions and processing can be found in Supplementary Information.

PDAC scRNA-seq with cell hashing

PDAC organoids for CSs and validation experiments were dissociated to single cells, hashed with antibody-derived oligo tags, pooled and captured for scRNA-seq by Honeycomb Bio HIVEs. Full details can be found in Supplementary Information.

Compressed 90-drug MOA screen

PBMCs were isolated from whole blood by density centrifugation with Ficoll-Paque PLUS (Cytiva, 17144002), followed by red blood cell (RBC) lysis with 1× RBC lysis solution (Invitrogen, 50-112-9751). PBMCs were resuspended and cultured in ATCC-modified RPMI 1640 medium (Thermo Fisher, A1049101) supplemented with 10% FBS. For viability screens and scRNA-seq experiments, fresh PBMCs were seeded at a density of 3 × 10⁵ cells in 200 μl per well in either a canonical v-bottom 96-well plate (Caplugss Evergreen, 222-8031-1V)—to be processed for sequencing—or a white, flat-bottom 96-well plate (Perkin Elmer, 6005680)—to be processed for cell viability—preloaded with drugs to be screened using the Echo 555 Liquid Handler (Beckman Coulter, E555-1115). Briefly, the Echo uses acoustic droplet ejection technology, whereby sound waves eject precisely sized droplets from the compound source plate onto the destination plate suspended above. A CS was designed by randomly assigning 90 drugs (https://github.com/jump-cellpainting/JUMP-MOA) to 45 pools (along with three DMSO-only wells for a total of 48 wells) such that each pool contained six drugs, each at a volume of 0.1 μl with a final concentration of 3 μM for each drug after the addition of 200 μl of PBMCs. Two such plates were designed, with distinct random assignment of ligands to pools in each plate. Cells were then allowed to rest for 16 h at 37 °C in 5% CO₂ before acute stimulation with 5 μg ml⁻¹ LPS (Thermo Fisher, 00-4976-93), 10 ng ml⁻¹ IFNβ (Peprotech, 300-02BC-5UG) or no-stimulant control for 4 h. A total of 300,000 cells were sufficient to test viability; however, to ensure that we obtained an adequate number of cells for sequencing, we had duplicate plates of each condition, giving us a total of 600,000 cells per condition.

To evaluate viability, CellTiter-Glo solution was added to cells after stimulation as per the manufacturer’s instructions (Promega, G7573) and luminescence was read using the Envision plate reader (Perkin Elmer, 106406).

Ruxolitnib validation experiment

The effects of ruxolitnib on PBMC responses to LPS were individually validated. Additional details can be found in Supplementary Information.

PBMC scRNA-seq with cell hashing

For the CS and ruxolitinib experiments, cell hashing on samples was performed with antibody tagging using BioLegend’s TotalSeq-B with the 10X feature barcoding technology protocol. Additional details can be found in Supplementary Information.

Quantification and statistical analysis

Deconvolution of CSs with regularized linear regression

To deconvolute the effects of individual perturbations in a CS, we adopted a regularized linear regression approach that was previously used to infer the effects of guide RNAs on gene expression in pooled CRISPR screens⁶. This general framework for deconvolution requires a design matrix X (pools by perturbations) and an assay readout matrix Y (assay readout by pools), which we fit with a linear model using elastic net regression to infer the coefficient matrix β describing the association between perturbations and assay features. The ElasticNet approach is already a stringent method for variable selection (removing insignificant coefficients) and shrinkage (penalized L₁ norm), helping to address correlated covariates and noisy data. We note that, for these reasons, this approach has been used in similar settings by others⁶. We fit the regression and used cross-validation to determine the optimal L₁ ratio using the MultiTaskElasticNetCV function in sklearn (1.4.2) with possible L₁ ratios varying from 0.01 to 1. We then permuted the perturbations labels and reran the elastic net regression 1,000 times to obtain a null distribution for the perturbation-feature coefficients. We then filtered out any coefficients in our nonpermuted regression that occurred more than the P-value threshold 1,000 times in the permuted data.

Cell Painting data preprocessing

We used the Distributed Cell Profiler software on AWS to process the Cell Painting images (https://github.com/CellProfiler/Distributed-CellProfiler), followed by a custom pipeline for feature selection and effect quantification. Detailed steps can be found in Supplementary Information.

Identifying optimum doses and time points for Cell Painting screens

We determined the dose–time point combination with the broadest range of effects with an approach based on MD. Additional details can be found in Supplementary Information.

Evaluation of cell viability in Cell Painting GT screen

We used the count of nuclei per well as a proxy for understanding the impact of the compounds on viability. Additional details can be found in Supplementary Information.

Identification of cellular phenotypes in GT Cell Painting data

To identify the biological cellular morphological phenotypes in the GT dataset, we first needed to account for substantial batch effects that existed between the first two replicates in our GT dataset (obtained during the dose–time point analysis) and the final four replicates that we collected later. These batch effects were consistent with large batch effects observed in previous studies using Cell Painting³⁴. To account for this technical variation, we applied Harmony (a batch correction technique that uses soft k-means clustering to identify dataset-specific batch effects) to our GT datasets. After running Harmony, we identified the number of Harmony principal components (PCs) that explained 90% of variance in the data and used these top PCs as the features for identifying the cellular phenotypes in the GT data.

To identify cellular phenotypes, we followed the following workflow for clustering the GT Cell Painting samples. First, we computed the neighborhood graph for all GT samples over the Harmonized features, using the pp.neighbors function in scanpy (1.9.8) and a nearest neighbor value of \(\frac{1}{2}\sqrt{{\rm{number}}\; {\rm{of}}\; {\rm{samples}}}\). We then clustered on this neighborhood graph using the Leiden algorithm implemented in the tl.leiden function in scanpy. To determine clustering resolution, we reran clustering over resolution values from 0.01 to 2.0 and identified the resolution value (0.65) where greater than half (~52%) of DMSO samples resided in a single cluster, each cluster had significant perturbations and few perturbations (<0.5%) were significant in more than one cluster (based on phenotypic fingerprinting described below). This identified eight clusters (GT phenotypes) in the GT Cell Painting data. For each cluster, we then used the tl.rank_gene_groups function in scanpy to run a Wilcoxon test over the processed and feature selection Cell Painting features to identify features enriched in each cluster.

To determine perturbations with a significant effect, we used an approach for identifying the ‘phenotypic fingerprints’ of the perturbations¹⁶. For each perturbation, we counted the number of samples in each of the eight GT phenotypic clusters, then randomly permuted the perturbation labels 10,000 times and recalculated the samples in each cluster. Using these randomly permuted labels as a null distribution, we calculated the z scores and P values for the true counts of samples in each cluster. We filtered these results to perturbations with significant enrichment values (maximum z score > 2 and maximum P value < 0.01). These perturbations were then named ‘fingerprinting drugs’ for the enriched GT phenotypic clusters.

Compressed Cell Painting screen design, deconvolution and comparison to GT

To test the limitations of compressed screening, we conducted Cell Painting screens using a wide array of two parameters: replicates R (the number of pools in which each individual drug occurs) and compression C (the fold sample savings in the CS relative to our six replicate GT data). We tested R values of 3, 5 and 7 and C values of 6, 12, 24, 48 and 96. This resulted in 11 different parameter schemes consisting of 3, 5, 6, 7, 10, 12, 14, 20, 28, 40 and 80 drugs per pool. For each parameter scheme, we conducted two screens that were designed with distinct pool randomization, which we refer to as CS run 1 and CS run 2.

To deconvolute the compressed screening data, we passed the preprocessed and feature-selected Cell Painting features in our general regression framework for deconvolution. To assess the total magnitude of the inferred effect of each perturbation, we then calculated the L₁ norm of the regression coefficients across all features for each perturbation. We then calculated the Pearson correlation in R (4.3) using function ‘cor.test’ between these values across all compounds and the GT MD value to assess how well the inferred compressed effects agreed with the effects of perturbations in the GT data.

CS scRNA-seq alignment, hash demultiplexing and initial quality control

For both the PDAC screen and the PBCM screen sequencing runs, raw sequencing reads were quantified and quality-controlled using standard pipelines and publicly available software. Full details can be found in Supplementary Information.

PDAC CS scRNA-seq analysis

We ran cNMF on the combined counts matrices from both compressed runs²¹. The choice of cNMF was because of the fact that it effectively reduces the number of target variables in our system of equations for deconvolution, which helps with finding reliable solutions, and it works well to find coordinated core transcriptional programs on noisy biological data as would be expected for our PDAC CS. We choose an optimal K value (number of cNMF GEPs) by combining a data-driven approach with existing knowledge of PDAC biology. We tested K values ranging from 5 to 50 and then identified candidate optimal K values by plotting the tradeoff between stability and error and focusing on the K values at local stability maxima. We then examined the GEPs at each of these K values and found that, at higher K values, a distinct TGFβ (a known major driver of PDAC biology and one of the landmark positive controls in the screen) GEP was present, whereas, at lower values, the TGFβ signaling GEP was collapsed into other GEPs. Thus, we chose the smallest of the candidate K values at which we observed a clear TGFβ signaling GEP.

We proceeded to analyze the combined scRNA-seq dataset in scanpy (1.9.8) using the highly variable genes output from cNMF and Pearson residual normalization. We visualized the full dataset by running PC analysis with the number of PCs chosen by running KneeLocator from package kneed (0.8.5) in Python to find the knee of the variance explained by the PCs and then running Harmony to integrate the data by batch for visualization purposes only, before visualizing the data in a uniform manifold approximation and projection (UMAP).

The usage of the cNMF GEPs was normalized within each cell. We next correlated the usage of the cNMF GEPs across cells and found that a subset described basal expression states that were broadly expressed across all cells, whereas the remaining were variably expressed across cells. We focused our downstream analysis on these variably expressed cNMF programs (Extended Data Fig. 3c). To annotate these GEPs, we first aimed to match them with known gene signatures. We defined genes significantly upregulated in the TNF, IFNγ and TGFβ1 landmark samples compared to the control samples (using function ‘rank_genes_groups’ by scanpy and then running KneeLocator to find top genes) and calculated module scores for these landmark differential expression (DE) genesets using ‘score_genes’ from scanpy. Then, we used decoupler and the ‘score_genes’ function in scanpy to generate module scores for all PROGENy and MsigDB genesets (using functions ‘run_consensus’ and ‘run_ora’ from decoupler, respectively) for the top 25 genes in the basal and classical signatures from Moffitt et al.²⁶ and for gene signatures for cycling cells and for scRNA-seq quality metrics (number of unique molecular identifiers, number of features, percentage of mitochondrial reads and percentage of ribosomal reads)³⁵. We also used the ‘score_genes’ function in scanpy to generate module scores for the cNMF GEPs for fair comparison. We then correlated all of these module scores with the GEP module scores across all cells and where GEPs highly correlated with a given module; we used this information to manually annotate the GEPs (Extended Data Fig. 3d). With this approach, we were able to annotate all but one highly variable GEP.

Then, we passed the GEP usage scores (output from the cNMF workflow; Extended Data Fig. 3g) for each highly variable cNMF GEP across cells in the compressed wells into our deconvolution code to infer the effect of perturbations on each GEP. We used these results to annotate the one GEP that we could not annotate on the basis of gene signatures (the PDAC IL-4 and IL-13 module).

For ligand–receptor analysis, we took the gene names for the 68 ligands in the CS and searched against the CellphoneDB database using the command line call ‘cellphonedb’ query ‘find_interactions_by_element’. In situations where the identified interacting partner for a ligand was a receptor complex, the complex was expanded to the individual components. The expression of cognate receptors was evaluated in PDAC organoid data generated from the CS.

PDAC single-ligand validation screen scRNA-seq analysis

We adopted a similar approach as in the CS to run cNMF, process scRNA-seq data and identify and annotate highly variable GEPs in the single-ligand data. More details can be found in Supplementary Information.

Evaluation of cNMF GEPs on existing PDAC tumor datasets

To project the validation of cNMF GEPs onto existing PDAC tumor datasets from TCGA and Raghavan et al., we first generated a representative geneset for each GEP by running KneeLocator in Python on the sorted gene spectra scores to identify the top genes contributing to each GEP. We then generated a module score for each of these genesets, the top 25 classical and basal genes from Moffitt et al.²⁶ and IL-4 and IL-13 signaling genesets from MsigDB in both PDAC tumor datasets (for each tumor in TCGA and each cell in Raghavan et al.). We then correlated the GEP module scores and MsigDB module scores with the Moffitt et al. module scores. To look for further evidence of IL-4 and IL-13 signaling associated with classical tumors in Raghavan et al., we performed a targeted analysis of the genes differentially expressed in SPP1⁺ macrophages, as this cell population was in significantly higher abundance in the TME of classical tumors in Raghavan et al.

Differential analysis in IL-4-treated cells

Density estimations of IL-4-treated versus control cells were calculated using the ‘embedding_density’ function in scanpy on the precomputed UMAP of all single-ligand perturbation data. DE was examined using the ‘rank_genes_groups’ function with control cells as the reference group and IL-4-treated cells as the treatment group. Using the Wilcoxon test, the top 25 DE genes were ranked by the z score underlying the computation of P value for each gene. Next, IL-4 DE genes were filtered for adjusted P value < 0.01 and z score > 0 and then ranked by score. All significant DE genes after filtering were used in overrepresentation analysis on MsigDB Hallmark genesets using the ‘get_ora_df’ function by decoupler.

Survival analysis on IL-4 modules

To evaluate the ability of our PDAC IL-4 and IL-13 response module derived from the single-ligand PDAC screen, along with other existing IL-4 modules from Reactome, BioCarta and Lu et al., we took TCGA bulk RNA-seq data on PDAC from the Genomic Data Commons TCGA pancreatic adenocarcinoma (PAAD) cohort (n = 182) along with paired survival data. In total, 181 of 182 tumors had matched overall survival information. The log₂(fragments per kilobase of transcript per million mapped reads + 1)-normalized data were used to score for the PDAC IL-4 and IL-13 response module geneset selected by KneeLocator as mentioned above and the other genesets. All modules scored were then scaled across all samples and we assigned high versus low expression by taking the top and bottom tertile for each module. Kaplan–Meier curves were fitted using KaplanMeierFitter and P values were calculated using ‘logrank_test’ from package lifelines (0.27.7).

Cell viability calculation for PBMC CSs

Cell viability from drug and stimulation treatments was read out through the CellTiter-Glo viability assay. For a 96-well plate, we took the mean of each row i’s empty wells (columns 7–12) as the background measurement (\({V}_{{\rm{empty}}}(i)\)) for each row’s readout and subtracted this value from each nonempty well: \({V}_{{\rm{adj}}}(i,{j})=V(i,j)-{V}_{{\rm{empty}}}(i)\).

In each treatment plate, we then calculated the mean viability of DMSO-only wells after subtracting the background signal (\({V}_{\rm{adj,DMSO}}\)). All nonempty wells were normalized against that value: \({V}_{\rm{norm}}(i,j)={V}_{\rm{adj}}(i,j)/{{V}_{\rm{adj,DMSO}}}\).These values were then compared against each other in mean pool values for each compound and run through linear deconvolution to check significant effects by compounds.

PBMC screen cell type annotation

Cell type annotation was first performed on the control plate using the Azimuth label transfer method from the pretrained PBMC reference model⁵⁸ with level 1 and 2 predicted cell types (https://github.com/satijalab/azimuth) and the results were inspected by checking top cluster markers through unbiased Louvain clustering. We further integrated the cells from the two stimulation condition plates with the control plate and ran unbiased clustering on the entire dataset. We annotated the IFNβ and LPS plates by inspecting the annotation of control cells in each cluster and independent Azimuth cell type prediction results on these plates.

PBMC drug effect measurement

We split the combined two-plate datasets by annotated cell types, excluding platelet cells. cNMF was run on the raw counts of filtered cells of each cell type from the LPS plate + control plate and IFNβ plate + control plate for a direct comparison of stimulant effects, with similar methods to those used in the PDAC screen. We first identified significant stimulant effects in the cell type by evaluating the normalized GEP usage scores between DMSO cells in the stimulant plate and control plate. We filtered out GEPs where the usage scores were significantly different and had a large effect size between the stimulant and the controls (P value < 0.05, absolute log₂ fold change (LFC) > 0.75) to be stimulant effect modules. We further separated these effect modules into stimulant-upregulated (LFC > 0) and stimulant-downregulated (LFC < 0) modules.

Similar to the PDAC analysis, we next passed the GEP usage scores across cells in the compressed wells from the cNMF runs into our deconvolution code to infer the effect of perturbations on each GEP in each cell type. We made a modification to the design matrix with additional interaction terms between the stimulant and the drug (for example, GEP 1 ≈ LPS + drug A + LPS × drug A). This allowed us to evaluate whether there was any interaction between the drugs and the stimulant. For visualizing the deconvolution results across cell types in LPS and IFNβ stimulations (Supplementary Figs. 2 and 3), we showed all regression coefficients on compounds and interaction terms after masking out insignificant coefficients (P value < 0.01), which we replaced with 0.

To determine consensus stimulant–response modules, we used a permutation test P-value threshold of 0.05 to filter out significant regression coefficients after deconvolution. We then evaluated the previously selected stimulant effect modules to see whether they still had a significant effect (that is, coefficient > 0 for upregulated modules and coefficient < 0 for downregulated modules) and retrieved the consensus result for downstream analysis.

To classify and evaluate the effect of each drug on a stimulant, we defined our own metrics from the deconvolution results. More details can be found in Supplementary Information.

Inclusion and ethics

We worked to ensure diversity in experimental samples through the selection of the biological models. We worked to ensure diversity in experimental samples through the selection of the genomic datasets. One or more of the authors of this paper self-identify as underrepresented ethnic minorities in science. The author list of this paper includes contributors from the ___location where the research was conducted who participated in the data collection, design, analysis and/or interpretation of the work.

Ethical approval statement

Patients at the Dana-Farber Cancer Institute gave their consent to Dana-Farber and Harvard Cancer Center Institutional Review Board (IRB)-approved protocols. PBMC samples were derived from MGB under IRB-approved protocols.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

This study did not generate new unique reagents. The cell lines, small molecules and recombinant protein ligands used in this study are available commercially. The organoid model used in this study is available upon request with a materials transfer agreement. More details can be found in the key resources table (Supplementary Information). Summarized and deidentified scRNA-seq PDAC data are available through the Broad Institute’s Single Cell Portal (https://singlecell.broadinstitute.org/single_cell/study/SCP2621/compressed-phenotypic-screening-empowers-scalable-biological-discovery-pdac-compressed-screen-and-validation-screen). Summarized and deidentified scRNA-seq PBMC data are available through the Broad Institute’s Single Cell Portal (https://singlecell.broadinstitute.org/single_cell/study/SCP2622/compressed-phenotypic-screening-empowers-scalable-biological-discovery-pbmc-screens). Raw sequencing data for the PDAC screen can be accessed from the Gene Expression Omnibus under accession number GSE267243 (ref. ²³). Links and accession numbers for the publicly available data analyzed in this study are listed in the key resources table (Supplementary Information). Cell Painting images can be accessed through BioImage Archive (BIA) with accession number S-BIAD1319.

Code availability

Code is available on GitHub (https://github.com/ShalekLab/compressed_screening)¹⁶. Any additional information required to reanalyze the data reported in this paper is available from the corresponding author upon request.

References

Joyce, A. R. & Palsson, B. Ø. The model organism as a system: integrating ‘omics’ data sets. Nat. Rev. Mol. Cell Biol. 7, 198–210 (2006).
PubMed CAS Google Scholar
Eder, J., Sedrani, R. & Wiesmann, C. The discovery of first-in-class drugs: origins and evolution. Nat. Rev. Drug Discov. 13, 577–587 (2014).
PubMed CAS Google Scholar
Swinney, D. C. & Anthony, J. How were new medicines discovered? Nat. Rev. Drug Discov. 10, 507–519 (2011).
PubMed CAS Google Scholar
Moffat, J. G., Vincent, F., Lee, J. A., Eder, J. & Prunotto, M. Opportunities and challenges in phenotypic drug discovery: an industry perspective. Nat. Rev. Drug Discov. 16, 531–543 (2017).
PubMed CAS Google Scholar
Swinney, D. C. Phenotypic vs. target-based drug discovery for first-in-class medicines. Clin. Pharmacol. Ther. 93, 299–301 (2013).
PubMed CAS Google Scholar
Dixit, A. et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).
PubMed PubMed Central CAS Google Scholar
Yao, D. et al. Scalable genetic screening for regulatory circuits using compressed Perturb-Seq. Nat. Biotechnol. 42, 1282–1295 (2024).
PubMed CAS Google Scholar
Cleary, B., Cong, L., Cheung, A., Lander, E. S. & Regev, A. Efficient generation of transcriptomic profiles by random composite measurements. Cell 171, 1424–1436(2017).
PubMed PubMed Central CAS Google Scholar
Mylka, V. et al. Comparative analysis of antibody- and lipid-based multiplexing methods for single-cell RNA-seq. Genome Biol. 23, 55 (2022).
PubMed PubMed Central CAS Google Scholar
Thalheim, T. et al. Epigenetic drifts during long-term intestinal organoid culture. Cells 10, 1718 (2021).
PubMed PubMed Central CAS Google Scholar
Klein, E., Hau, A.-C., Oudin, A., Golebiewska, A. & Niclou, S. P. Glioblastoma organoids: pre-clinical applications and challenges in the context of immunotherapy. Front. Oncol. 10, 604121 (2020).
PubMed PubMed Central Google Scholar
Raghavan, S. et al. Microenvironment drives cell state, plasticity, and drug response in pancreatic cancer. Cell 184, 6119–6137(2021).
PubMed PubMed Central CAS Google Scholar
Kainkaryam, R. M. & Woolf, P. J. Pooling in high-throughput drug screening. Curr. Opin. Drug Discov. Devel. 12, 339–350 (2009).
PubMed PubMed Central CAS Google Scholar
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 301–320 (2005).
Google Scholar
Bray, M.-A. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat. Protoc. 11, 1757–1774 (2016).
PubMed PubMed Central CAS Google Scholar
Liu, N. et al. compressed_screening: code for the manuscript ‘Compressed phenotypic screening empowers scalable biological discovery’. GitHub https://github.com/ShalekLab/compressed_screening/tree/master (2024).
Ziegler, S., Sievers, S. & Waldmann, H. Morphological profiling of small molecules. Cell Chem. Biol. 28, 300–319 (2021).
PubMed CAS Google Scholar
Hutz, J. E. et al. The multidimensional perturbation value: a single metric to measure similarity and activity of treatments in high-throughput multidimensional screens. J. Biomol. Screen. 18, 367–377 (2013).
PubMed Google Scholar
Mead, B. E. et al. Screening for modulators of the cellular composition of gut epithelia via organoid models of intestinal stem cell differentiation. Nat. Biomed. Eng. 6, 476–494 (2022).
PubMed PubMed Central CAS Google Scholar
Dahlin, J. L. et al. Reference compounds for characterizing cellular injury in high-content cellular morphology assays. Nat. Commun. 14, 1364 (2023).
PubMed PubMed Central CAS Google Scholar
de Visser, K. E. & Joyce, J. A. The evolving tumor microenvironment: from cancer initiation to metastatic outgrowth. Cancer Cell 41, 374–403 (2023).
PubMed Google Scholar
Warren, A. et al. Global computational alignment of tumor and cell line transcriptional profiles. Nat. Commun. 12, 22 (2021).
PubMed PubMed Central CAS Google Scholar
Liu, N. et al. Scalable, compressed phenotypic screening using pooled perturbations (PDAC screens). Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE267243 (2024).
Tu, M. et al. TNF-α-producing macrophages determine subtype identity and prognosis via AP1 enhancer reprogramming in pancreatic cancer. Nat. Cancer 2, 1185–1203 (2021).
PubMed CAS Google Scholar
Kotliar, D. et al. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. eLife 8, e43803 (2019).
PubMed PubMed Central Google Scholar
Moffitt, R. A. et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma. Nat. Genet. 47, 1168–1178 (2015).
PubMed PubMed Central CAS Google Scholar
Tirosh, I. et al. Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma. Nature 539, 309–313 (2016).
PubMed PubMed Central Google Scholar
Schubert, M. et al. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nat. Commun. 9, 20 (2018).
PubMed PubMed Central Google Scholar
Bunney, P. E., Zink, A. N., Holm, A. A., Billington, C. J. & Kotz, C. M. Orexin activation counteracts decreases in nonexercise activity thermogenesis (NEAT) caused by high-fat diet. Physiol. Behav. 176, 139–148 (2017).
PubMed PubMed Central CAS Google Scholar
Schjerven, H., Brandtzaeg, P. & Johansen, F.-E. Mechanism of IL-4-mediated up-regulation of the polymeric Ig receptor: role of STAT6 in cell type-specific delayed transcriptional response. J. Immunol. 165, 3898–3906 (2000).
PubMed CAS Google Scholar
Nasu, J. et al. Cytokine-stimulated release of decay-accelerating factor (DAF; CD55) from HT-29 human intestinal epithelial cells. Clin. Exp. Immunol. 113, 379–385 (1998).
PubMed PubMed Central CAS Google Scholar
Chandel, N. S., Trzyna, W. C., McClintock, D. S. & Schumacker, P. T. Role of oxidants in NF-κB activation and TNF-α gene transcription induced by hypoxia and endotoxin. J. Immunol. 165, 1013–1021 (2000).
PubMed CAS Google Scholar
Jia, B. et al. Wnt7A predicts poor prognosis and contributes to growth and metastasis in tongue squamous cell carcinoma. Oncol. Rep. 41, 1749–1758 (2019).
PubMed CAS Google Scholar
Harnack, C. et al. R-Spondin 3 promotes stem cell recovery and epithelial regeneration in the colon. Nat. Commun. 10, 4368 (2019).
PubMed PubMed Central Google Scholar
Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
PubMed PubMed Central Google Scholar
Wu, Y. et al. IL-4 and IL-17A cooperatively promote hydrogen peroxide production, oxidative DNA damage, and upregulation of dual oxidase 2 in human colon and pancreatic cancer cells. J. Immunol. 203, 2532–2544 (2019).
PubMed PubMed Central CAS Google Scholar
Luan, H., Jian, L., Huang, Y., Guo, Y. & Zhou, L. Identification of novel therapeutic target and prognostic biomarker in matrix metalloproteinase gene family in pancreatic cancer. Sci. Rep. 13, 17211 (2023).
PubMed PubMed Central CAS Google Scholar
He, Z., Wu, H., Jiao, Y. & Zheng, J. Expression and prognostic value of CD97 and its ligand CD55 in pancreatic cancer. Oncol. Lett. 9, 793–797 (2015).
PubMed Google Scholar
Abdolvahab, M. H., Mofrad, M. R. K. & Schellekens, H. Interferon β: from molecular level to therapeutic effects. Int. Rev. Cell Mol. Biol. 326, 343–372 (2016).
CAS Google Scholar
Ciesielska, A., Matyjek, M. & Kwiatkowska, K. TLR4 and CD14 trafficking and its influence on LPS-induced pro-inflammatory signaling. Cell. Mol. Life Sci. 78, 1233–1261 (2021).
PubMed CAS Google Scholar
Valesini, G. et al. Biological and clinical effects of anti-TNFα treatment. Autoimmun. Rev. 7, 35–41 (2007).
PubMed CAS Google Scholar
Hu, X., li, J., Fu, M., Zhao, X. & Wang, W. The JAK/STAT signaling pathway: from bench to clinic. Signal Transduct. Target. Ther. 6, 402 (2021).
PubMed PubMed Central Google Scholar
Gong, J., Chehrazi-Raffle, A., Reddi, S. & Salgia, R. Development of PD-1 and PD-L1 inhibitors as a form of cancer immunotherapy: a comprehensive review of registration trials and future considerations. J. Immunother. Cancer 6, 8 (2018).
PubMed PubMed Central Google Scholar
Topalian, S. L. et al. Safety, activity, and immune correlates of anti-PD-1 antibody in cancer. N. Engl. J. Med. 366, 2443–2454 (2012).
PubMed PubMed Central CAS Google Scholar
Jin, J., Chang, Y. & Wei, W. Clinical application and evaluation of anti-TNF-α agents for the treatment of rheumatoid arthritis. Acta Pharmacol. Sin. 31, 1133–1140 (2010).
PubMed PubMed Central CAS Google Scholar
Tan, F. H., Putoczki, T. L., Stylli, S. S. & Luwor, R. B. Ponatinib: a novel multi-tyrosine kinase inhibitor against human malignancies. Onco Targets Ther. 12, 635–645 (2019).
PubMed PubMed Central CAS Google Scholar
Platanias, L. C. Mechanisms of type-I- and type-II-interferon-mediated signalling. Nat. Rev. Immunol. 5, 375–386 (2005).
PubMed CAS Google Scholar
Ajayi, S. et al. Ruxolitinib. In Small Molecules in Hematology (ed. Martens, U. M.) (Springer International Publishing, 2018).
Lamming, D. W. Inhibition of the mechanistic target of rapamycin (mTOR)—rapamycin and beyond. Cold Spring Harb. Perspect. Med. 6, a025924 (2016).
PubMed PubMed Central Google Scholar
Park, J.-W. et al. Destabilization of TNF-α mRNA by rapamycin. Biomol. Ther. 20, 43–49 (2012).
CAS Google Scholar
Saleiro, D. & Platanias, L. C. Intersection of mTOR and STAT signaling in immunity. Trends Immunol. 36, 21–29 (2015).
PubMed CAS Google Scholar
Tong, X. et al. Merimepodib, an IMPDH inhibitor, suppresses replication of Zika virus and other emerging viral pathogens. Antiviral Res. 149, 34–40 (2018).
PubMed CAS Google Scholar
Dumitru, C. D. et al. TNF-α induction by LPS is regulated posttranscriptionally via a Tpl2/ERK-dependent pathway. Cell 103, 1071–83 (2000).
PubMed CAS Google Scholar
Bruggen, T. V. D., Nijenhuis, S., Raaij, E. V., Verhoef, J. & Asbeck, B. S. V. Lipopolysaccharide-induced tumor necrosis factor α production by human monocytes involves the Raf-1/MEK1–MEK2/ERK1–ERK2 pathway. Infect. Immun. 67, 3824–3829 (1999).
PubMed PubMed Central Google Scholar
Blasina, A. et al. Breaching the DNA damage checkpoint via PF-00477736, a novel small-molecule inhibitor of checkpoint kinase 1. Mol. Cancer Ther. 7, 2394–2404 (2008).
PubMed CAS Google Scholar
O’Reilly, M. C. et al. Development of dual PLD1/2 and PLD2 selective inhibitors from a common 1,3,8-triazaspiro[4.5]decane core: discovery of Ml298 and Ml299 that decrease invasive migration in U87-MG glioblastoma cells. J. Med. Chem. 56, 2695–2699 (2013).
PubMed PubMed Central Google Scholar
Caldera, M. et al. Mapping the perturbome network of cellular perturbations. Nat. Commun. 10, 5140 (2019).
PubMed PubMed Central Google Scholar
Butler, A., Darby, C., Hao, Y., Hoffman, P. & Satija, R. Azimuth: a Shiny app demonstrating a query-reference mapping algorithm for single-cell data. GitHub https://github.com/satijalab/azimuth (2022).

Download references

Acknowledgements

This work was supported in part by the Koch Institute (support grant 5P30-CA014051 from the National Cancer Institute). We thank the Koch Institute’s Robert A. Swanson (1969) Biotechnology Center for technical support, specifically the Genomics Core Facility and High-Throughput Sciences Facility. A.K.S. was supported, in part, by the National Institutes of Health (NIH) National Institute on Drug Abuse DP1 Avant-Garde Pioneer Award (1DP1DA053731), the Searle Scholars Program, the Sloan Fellowship in Chemistry, the Bill and Melinda Gates Foundation, the John W. Jarve Seed Fund for Science and the Ragon Institute. B.B. was supported by the NIH (R01A1022553 and R01AR073252). S.R. was supported by the Claudia Adams Barr Program for Innovative Basic Cancer Research, the Dana-Farber Cancer Institute Hale Family Center for Pancreatic Cancer Research and the NIH (K08CA260442). W.C.H. and S.R. were supported by the NIH (U01 CA250549). W.E.K. was supported by the King Abdulaziz City for Science and Technology–MIT Ibn Khaldun Fellowship for Saudi Arabian Women and the King Abdullah University of Science and Technology Ibn Rushd Postdoctoral Fellowship. T.C. was supported by the Croucher Scholarship for Doctoral Study. S.T. was supported in part by a Pew Latin American Fellowship Program in Biomedical Sciences.

Author information

These authors contributed equally: Nuo Liu, Walaa E. Kattan, Benjamin E. Mead, Conner Kummerlowe.
These authors jointly supervised this work: Peter S. Winter, Srivatsan Raghavan, Alex K. Shalek.

Authors and Affiliations

Institute for Medical Engineering and Science (IMES) and Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
Nuo Liu, Walaa E. Kattan, Benjamin E. Mead, Conner Kummerlowe, Thomas Cheng, Sarah Ingabire, Sergio Triana, Manuel Guzman, Tyler T. Dao & Alex K. Shalek
Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
Nuo Liu, Walaa E. Kattan, Benjamin E. Mead, Conner Kummerlowe, Thomas Cheng, Sarah Ingabire, Jaime H. Cheah, Christian K. Soule, Sergio Triana, Manuel Guzman, Tyler T. Dao, Paul C. Blainey & Alex K. Shalek
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Nuo Liu, Walaa E. Kattan, Benjamin E. Mead, Conner Kummerlowe, Thomas Cheng, Sarah Ingabire, Sergio Triana, Manuel Guzman, Tyler T. Dao, Kristen E. Lowder, Paul C. Blainey, William C. Hahn, Peter S. Winter, Srivatsan Raghavan & Alex K. Shalek
Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA
Nuo Liu, Walaa E. Kattan, Benjamin E. Mead, Conner Kummerlowe, Thomas Cheng, Sarah Ingabire, Sergio Triana, Manuel Guzman, Tyler T. Dao, Joshua M. Peters, Bryan Bryson & Alex K. Shalek
Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
Nuo Liu, Conner Kummerlowe & Alex K. Shalek
Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Jaime H. Cheah, Christian K. Soule, Anita Vrcic & Jane K. McIninch
Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Tyler T. Dao, Joshua M. Peters, Paul C. Blainey & Bryan Bryson
Dana Farber Cancer Institute, Boston, MA, USA
Kristen E. Lowder, William C. Hahn & Srivatsan Raghavan
Microsoft Research, Cambridge, MA, USA
Lorin Crawford & Ava P. Amini
Center for Computational Biology, Brown University, Providence, RI, USA
Lorin Crawford
Department of Biostatistics, Brown University, Providence, RI, USA
Lorin Crawford
Harvard Medical School, Boston, MA, USA
William C. Hahn & Srivatsan Raghavan
Faculty of Computing and Data Sciences, Boston University, Boston, MA, USA
Brian Cleary
Department of Biomedical Engineering, Boston University, Boston, MA, USA
Brian Cleary
Department of Biology, Boston University, Boston, MA, USA
Brian Cleary
Program in Immunology, Harvard Medical School, Boston, MA, USA
Alex K. Shalek
Harvard Stem Cell Institute, Cambridge, MA, USA
Alex K. Shalek

Authors

Nuo Liu
View author publications
Search author on:PubMed Google Scholar
Walaa E. Kattan
View author publications
Search author on:PubMed Google Scholar
Benjamin E. Mead
View author publications
Search author on:PubMed Google Scholar
Conner Kummerlowe
View author publications
Search author on:PubMed Google Scholar
Thomas Cheng
View author publications
Search author on:PubMed Google Scholar
Sarah Ingabire
View author publications
Search author on:PubMed Google Scholar
Jaime H. Cheah
View author publications
Search author on:PubMed Google Scholar
Christian K. Soule
View author publications
Search author on:PubMed Google Scholar
Anita Vrcic
View author publications
Search author on:PubMed Google Scholar
Jane K. McIninch
View author publications
Search author on:PubMed Google Scholar
Sergio Triana
View author publications
Search author on:PubMed Google Scholar
Manuel Guzman
View author publications
Search author on:PubMed Google Scholar
Tyler T. Dao
View author publications
Search author on:PubMed Google Scholar
Joshua M. Peters
View author publications
Search author on:PubMed Google Scholar
Kristen E. Lowder
View author publications
Search author on:PubMed Google Scholar
Lorin Crawford
View author publications
Search author on:PubMed Google Scholar
Ava P. Amini
View author publications
Search author on:PubMed Google Scholar
Paul C. Blainey
View author publications
Search author on:PubMed Google Scholar
William C. Hahn
View author publications
Search author on:PubMed Google Scholar
Brian Cleary
View author publications
Search author on:PubMed Google Scholar
Bryan Bryson
View author publications
Search author on:PubMed Google Scholar
Peter S. Winter
View author publications
Search author on:PubMed Google Scholar
Srivatsan Raghavan
View author publications
Search author on:PubMed Google Scholar
Alex K. Shalek
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, B.E.M., C.K. and A.K.S. Methodology, B.E.M., C.K., N.L., W.E.K., T.C., J.H.C., C.K.S., J.P., B.B., P.S.W., S.R. and A.K.S. Software, B.E.M., C.K., N.L. and T.C. Validation, B.E.M., C.K., N.L., W.E.K., P.S.W. and S.R. Formal analysis, B.E.M., C.K., N.L. and T.C. Investigation, B.E.M., C.K., W.E.K., N.L., J.H.C., C.K.S., S.I., A.V., J.M., S.T., M.G. and T.T.D. Resources, J.P., P.C.B., B.B., P.S.W., S.R. and A.K.S. Data curation, B.E.M., C.K., N.L. and T.C. Writing—original draft, B.E.M., C.K., W.E.K. and N.L. Writing—review and editing, N.L., W.E.K., B.E.M., C.K., B.C., A.P.A., L.C., P.S.W., S.R. and A.K.S. Visualization, B.E.M., C.K., N.L. and W.E.K. Supervision, P.S.W., S.R. and A.K.S. Project administration, B.E.M., C.K., N.L., W.E.K., P.S.W., S.R. and A.K.S. Funding acquisition, W.C.H., P.C.B., B.B., S.R. and A.K.S.

Corresponding author

Correspondence to Alex K. Shalek.

Ethics declarations

Competing interests

A.K.S. reports compensation for consulting and/or scientific advisory board (SAB) membership from Honeycomb Biotechnologies, Cellarity, Ochre Bio, Bio-Rad Laboratories, Relation Therapeutics, IntrECate biotherapeutics, Fog Pharma, Passkey Therapeutics and Dahlia Biosciences unrelated to this work. B.E.M. reports compensation for consulting from Empress Therapeutics unrelated to this work. S.R. holds equity in Amgen and receives research funding from Microsoft. P.S.W. receives research funding from Microsoft and reports compensation for consulting from The Engine Venture unrelated to this work. P.C.B. is a consultant to or holds equity in 10X Genomics, General Automation Lab Technologies/Isolation Bio, Celsius Therapeutics, Next Gen Diagnostics, Cache DNA, Concerto Biosciences, Stately, Ramona Optics and Bifrost. W.C.H. is a consultant for Thermo Fisher, Solasta Ventures, KSQ Therapeutics, Jubilant Therapeutics, RAPPTA Therapeutics, Function Oncology, Riva Therapeutics, Serinus Biosciences, Crane Biosciences, Frontier Medicines, Kestrel Therapeutics and Calyx. W.H. reports SAB membership from Kestrel Therapeutics. L.C. and A.P.A. are both employees of Microsoft Research and hold equities in Microsoft. The other authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Xin Jin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Developing compressed screening by testing 316 small molecules in the U2OS cell line with a Cell Painting readout.

a. Histogram of the log₁₀ Mahalanobis Distance (MD) values calculated for each small molecule perturbation relative to the mean of the negative controls (DMSO) over morphological features quantified by the Cell Paiting assay at 6 hours, 24 hours, and 28 hours. For each time point, the coefficient of variation (std./mean) of the log₁₀MD is reported. b. Histogram of log₁₀MD values calculated for each small molecule perturbation relative to the mean of the negative controls (DMSO) over morphological features quantified by the Cell Paiting assay at the 24-hour timepoint at three drug doses: 0.1, 1, and 10 µM. For each dose, the coefficient of variation (std./mean) of the log₁₀MD is reported. c. Histogram of cell number distribution across 192 DMSO control wells for all imaging plates. d. Histogram of cell number distribution in control (right) and perturbation (left) wells, stratified by drug dosage in the GT. e. Violin plot of cell number for perturbations associated each ground truth (GT) phenotypic cluster, plotted by GT phenotypic cluster. f. Scatterplot of non-zero enrichment scores from one-sided permutation test for each perturbation in each GT phenotype. g. Representative Cell Painting images from the morphological phenotype clusters identified in the GT screen. Cells were stained with dyes marking nuclei (Hoechst 33342), endoplasmic reticula (Concanavalin A-AlexaFluor 488), mitochondria (MitoTracker Deep Red), F-actin (Phalloidin-AlexaFluor 568), Golgi apparati and plasma membranes (Wheat Germ Agglutinin-AlexaFluor 594), and nucleoli/cytoplasmic RNA (SYTO14). Diverse phenotypes were observed in compounded-treated U2OS cells, such as: altered nuclei shape and size (Cluster 7); redistribution of ER to one side of nucleus (Cluster 6); mitochondrial puncta (Clyster 8); and, bright, abundant Golgi staining (Cluster 2). Scale bars: 200 μm. h. Bar plot of fingerprint drug categories and their representation in each cluster. i. Representative images of drugs (anti-neoplastic agents) associated with phenotypic cluster 7 with a range of effect sizes (MDs) in the GT screen. The extent of variation in staining of DNA (Hoechst 33342) and nucleolus and cellular RNA (SYTO14) correlate with MD. Scale bars: 200 μm.

Extended Data Fig. 2 Additional analytical results from the CS Cell Painting benchmark.

a. Bar graph of top and bottom 10 perturbations in GT screen by mean cell number across n = 6 replicate wells with that perturbation. Error bar indicates 95% confidence interval. b. Histogram of cell number distribution in perturbation wells, stratified by extent of compression. Coefficient of variance (CV; std./mean) = 0.088, R² = 0.037 amongst compressed run mean cell counts; CV = 0.22, R² = 0.048 amongst uncompressed and compressed run mean cell count. c. Histogram of cell number distribution in GT perturbation wells stratified by whether the perturbation was identified as a hit in the compressed screen. d. Comparison of different statistical methods to evaluate the correlation of effect sizes between GT and CS. Error bar indicates 95% confidence interval. e. Scatterplot of correlation between effect sizes (L1 norms of regression coefficients) in two CS runs (permutation test, p-value < 0.01. f. Using GT screen significant perturbations as ground truth, ROC curves (true positive rate vs. false positive rate) show the performance of the CS screens in run 2 for correctly identifying hits at each pool size while varying the permutation testing threshold (from 0 to 1 in steps of 0.01) in the deconvolution algorithm.

Extended Data Fig. 3 PDAC compressed screen scRNA-seq quality metrics and cNMF modules.

a. Scatter plot of the number of cells per perturbation across all pools in each replicate plate. b. Violin plots of the number of UMIs, the number of unique genes, and the percent of genes that are mitochondrial in the compressed screen scRNA-seq datasets. c. Heatmaps visualizing the Pearson correlation across cells for their usage of cNMF gene expression programs. Bolded GEPs are selected as highly variable and annotated as response GEPs. d. Heatmaps visualizing the Pearson correlation across cells for their module scores of cNMF gene expression programs and their module scores for existing gene signatures. e. Top three genes by gene spectra score for the highly variable cNMF modules. f. Violin plot of IL4I1 expression in macrophage subtypes in the Raghavan et al. dataset. Two-sided Mann-Whitney U test without correction was performed to test score differences (Methods). All the box plots indicate 25^th percentile at the bottom, median in the middle and 75^th percentile at the top. Whiskers are drawn to the farthest datapoints within 1.5* interquartile range from the nearest hinge. Sample sizes: C1Q1⁺ TAM n = 2063, FCN1⁺ TAM n = 1963, SPP1⁺ TAM n = 567. g. UMAP visualizations of all cells from both compressed screens, colored by cNMF module usage. h. UMAP visualizations of all cells from both compressed screens, colored by density of cells from pools containing specific ligands. i. Top: Scatter plot of mean cognate receptor(s) expression (log-normalized) for each screened ligand over control PDAC organoid cells in the compressed scRNA-seq dataset. Bottom: Scatter plot of mean cognate receptor(s) expression (log-normalized) for each screened ligand over primary tumor cells from the matched patient in the Raghavan et al. study. Ligands are ordered by mean expression of receptor(s) in each dataset and ligands with significant (p value< 0.01, one-sided permutation test without correction) effects on identified cNMF GEPs are colored in orange.

Extended Data Fig. 4 Single-ligand perturbation experiment scRNA-seq quality metrics and cNMF modules.

a. Violin plots of the number of UMIs, the number of unique genes, and the percent of mitochondrial genes in the single-ligand scRNA-seq dataset. b. Heatmaps visualizing the Pearson correlation across cells for their module scores of single-ligand cNMF gene expression programs and their scores for existing gene signatures. c. Heatmap of the top three genes by gene spectra score for the single-ligand cNMF modules that corresponded (correlated highly) with the highly variable compressed cNMF modules. d. Heatmap visualizing the Pearson correlation between compressed screen cNMF GEPs gene spectra scores (rows) and single-ligand cNMF GEPs gene scores (columns). e. Ranking of top 50 genes in compressed screen (CS) PDAC IL-4/IL-13 response GEP (top) and single-ligand PDAC IL-4/IL-13 response GEP (bottom) by gene loading values from respective cNMF runs. Genes common to both are highlighted in orange. f. Heatmap depicting regression coefficients for the single-ligand validation data using the single-ligand screen cNMF GEPs as response variables. g. Heatmap depicting regression coefficients for the single-ligand validation data using the CS cNMF GEPs as response variables instead of the single-ligand experiment cNMF GEPs.

Extended Data Fig. 5 PDAC organoid specific IL-4/IL-13 response module is context specific and unique from other IL-4/IL-13 signatures.

a. Left: Heatmap showing the Pearson correlation of the expression of the Moffitt classical and basal transcriptional signatures with the expression of each CS cNMF module over bulk PDAC RNA-seq data from TCGA. Right: Heatmap of the Pearson correlation of the Moffitt classical and basal transcriptional signatures with the expression of each CS cNMF module over malignant PDAC single cells from Raghavan et al. b. Scatterplots showing the correlation between the Moffitt classical score and our PDAC IL-4/IL-13 IL-4/IL-13 response GEP (top) or three IL-4/IL-13 transcriptional signatures from MsigDB as computed over bulk RNA-seq data from PDAC tumors available in TCGA. c. Scatterplots showing the correlation between the Moffitt classical score and our PDAC IL-4/IL-13 response GEP (top) or three IL-4/IL-13 transcriptional signatures from MsigDB as computed across malignant PDAC cells from Raghavan et al. d. Scatterplots showing the correlation between the Moffitt basal score and our PDAC IL-4/IL-13 response GEP (top) or three IL-4/IL-13 transcriptional signatures from MsigDB as computed over bulk RNA-seq data from PDAC tumors available in TCGA. e. Scatterplots showing the correlation between the Moffitt basal score and our PDAC IL-4/IL-13 response GEP (top) or three IL-4/IL-13 transcriptional signatures from MsigDB as computed across malignant PDAC cells from Raghavan et al. f. Venn diagrams for the overlap between single-ligand validation experiment cNMF GEP module genes and genes from existing signatures in the literature. g. Kaplan–Meier survival plots of a pancreatic cancer patient cohort (TCGA PAAD, n = 182, bulk RNA-seq expression data) based on single-ligand IFNγ GEP and TGF-β GEP (left column) compared with their literature counterparts (right column). h. Integrated low-dimensional embedding (UMAP) of the scRNA-seq data from the IL-4 single-ligand perturbation experiment. Color scale indicates the density of the ligand treatment. i. Top 25 differentially expressed genes between IL-4 and DMSO-only treated groups (two-sided Wilcoxon test, BH adjusted p-value < 0.01). j. Overrepresentation Analysis (ORA) of all significant (two-sided Wilcoxon test, BH adjusted p-value < 0.05) differentially expressed genes in IL-4 vs control cells using MsigDB Hallmark dataset. For the ORA, one-sided Fisher’s exact test is computed and FDR correction is applied to the p-value.

Extended Data Fig. 6 Modulation effect of 90 MOA compounds on LPS or IFNβ response across PBMC cell types.

a. Potentiation or interference strength of 90 compounds on cell type specific GEPs, including non-stimulant response GEPs, derived from cNMF runs using data from LPS+control screens (top) or IFNβ+control screens (bottom).

Extended Data Fig. 7 Evaluation of the effects of MOA compounds on IFNβ responses in PBMCs.

a. Line plots of the number of detected positive/negative IFNβ response potentiators and interferers per cell type as a function of the p-value cut-off used in one-sided permutation tests for the significance of regression coefficients. b. Top: Ponatinib induces multiple different effects on IFNβ response modules across cell types. Bottom: Evaluation of genes comprising IFNβ JAK-STAT related response GEPs. ‘IFNβ JAK-STAT common genes’ are the intersection of selected top genes from B cell GEP3, CD4 T cell GEP3, and CD8 T cell GEP1. ‘B cell GEP3 unique genes’ are the genes unique to the selected top genes from B cell GEP3. All scores are calculated using scanpy ‘score_genes’ function and normalized to median of DMSO + IFNβ group. Two-sided Mann-Whitney U test was performed to test score differences (Methods). All the box plots indicate 25^th percentile at the bottom, median in the middle and 75^th percentile at the top. Whiskers are drawn to the farthest datapoints within 1.5* interquartile range from the nearest hinge. Sample sizes in each stimulation condition: DMSO n = 198 (B cell), n = 1,730 (CD4) n = 201 (CD8); DMSO + IFNβ n = 174 (B cell), n = 1,276 (CD4), n = 87 (CD8); Ponatinib+IFNβ n = 119 (B cell), n = 1,145 (CD4), n = 66 (CD8). c. Annotation of B cell cNMF GEP modules from IFNβ + control condition. d. Annotation of CD4 T cell cNMF GEP modules from IFNβ + control condition seen in Fig. 4f. GEP3 highly correlates with JAK-STAT pathway signature. e. Annotation of CD8 T cell cNMF GEP modules from IFNβ + control condition.

Extended Data Fig. 8 Evaluation of the effects of MOA compounds on IFNβ responses in PBMCs (continued).

a. Bar plot of compounds with the strongest effects on IFNβ response in each cell type. b. Annotation of monocyte cNMF GEP modules from IFNβ + control condition.

Extended Data Fig. 9 Evaluation of the effects of MOA compounds on LPS responses in PBMCs.

a. Line plots of the number of detected positive/negative LPS response potentiators and interferers per cell type as a function of the p-value cut-off used in permutation tests for the significance of regression coefficients. b. Merimepodib exerts multiple different effects on LPS response modules across cell types. Two-sided Mann-Whitney U test without multiple hypothesis correction was performed to test score differences (Methods). All the box plots indicate 25^th percentile at the bottom, median in the middle and 75^th percentile at the top. Whiskers are drawn to the farthest datapoints within 1.5* interquartile range from the nearest hinge. Sample sizes in each stimulation condition: DMSO n = 198 (B), n = 1730 (CD4), n = 201 (CD8), n = 396 (monocyte); DMSO + LPS n = 507 (B), n = 3,259 (CD4), n = 61(CD8), n = 1,395 (monocyte); Merimepodib+LPS n = 233 (B), n = 2,865 (CD4), n = 375 (CD8), n = 165 (monocyte). c. Annotation of monocyte cNMF GEP modules from LPS+control condition. d. Annotation of B cell GEPs from LPS+control condition. GEP3 highly correlates with amine biosynthesis/metabolic processes and TNF pathway signatures. e. Annotation of CD8 T cell GEPs from LPS+control condition. GEP3 highly correlates with JAK-STAT pathway signature. f. Annotation of CD4 T cell cNMF GEP modules from LPS+control condition.

Extended Data Fig. 10 Evaluation of the effects of MOA compounds on LPS responses in PBMCs (continued).

a. Annotation of NK cell cNMF GEP modules from LPS+control condition. b. Bar plot of compounds with the strongest effects on LPS response in each cell type. c. Divergent effects on LPS GEP7 in monocytes by two pyruvate dehydrogenase kinase inhibitors where BX-912 is a positive potentiator and AZD7545 is a positive interferer. Two-sided Mann-Whitney U test without correction was performed to test score differences (Methods). All the box plots indicate 25^th percentile at the bottom, median in the middle and 75^th percentile at the top. Whiskers are drawn to the farthest datapoints within 1.5* interquartile range from the nearest hinge. d. ML298 Hydrochloride as a representative compound with pleiotropic effects across cell types in both IFNβ (right) and LPS (left) stimulation conditions. Module scores were normalized against median in DMSO+stimulation condition in each comparison. Two-sided Mann-Whitney U test without multiple hypothesis correction was performed to test score differences (Methods). All the box plots indicate 25th percentile at the bottom, median in the middle and 75th percentile at the top. Whiskers are drawn to the farthest datapoints within 1.5* interquartile range from the nearest hinge. e. Ruxolitinib as a positive interferer of LPS-GEP3 in CD8 T cells. Two-sided Mann-Whitney U test without correction was performed to test score differences (Methods). All the box plots indicate 25^th percentile at the bottom, median in the middle and 75^th percentile at the top. Whiskers are drawn to the farthest datapoints within 1.5* interquartile range from the nearest hinge. f. Evaluation of CS LPS response module (B cell GEP3) in Ruxolitinib-only perturbation. Two-sided Mann-Whitney U test without correction was performed to test score differences (Methods). All the box plots indicate 25^th percentile at the bottom, median in the middle and 75^th percentile at the top. Whiskers are drawn to the farthest datapoints within 1.5* interquartile range from the nearest hinge. Despite a statistically insignificant difference in monocytes between DMSO-only and DMSO + LPS treated groups, the large shift between the module scores warrants the inclusion of Ruxolitinib’s effect as an example of a positive potentiator in this cell type.

Supplementary information

Supplementary Information

Key resources table, supplementary table legends, Supplementary Figs. 1–3, supplementary figure legends and Supplementary Methods.

Reporting Summary

Supplementary Table 1

Supplementary Tables 1–8 in a single worksheet.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, N., Kattan, W.E., Mead, B.E. et al. Scalable, compressed phenotypic screening using pooled perturbations. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-024-02403-z

Download citation

Received: 06 January 2023
Accepted: 26 August 2024
Published: 07 October 2024
DOI: https://doi.org/10.1038/s41587-024-02403-z

This article is cited by

Adapting systems biology to address the complexity of human disease in the single-cell era
- David S. Fischer
- Martin A. Villanueva
- Alex K. Shalek
Nature Reviews Genetics (2025)
New approaches to secondary metabolite discovery from anaerobic gut microbes
- Lazarina V. Butkovich
- Oliver B. Vining
- Michelle A. O’Malley
Applied Microbiology and Biotechnology (2025)