NiCo identifies extrinsic drivers of cell state modulation by niche covariation analysis

Agrawal, Ankit; Thomann, Stefan; Basu, Sukanya; Grün, Dominic

doi:10.1038/s41467-024-54973-w

Download PDF

Article
Open access
Published: 05 December 2024

NiCo identifies extrinsic drivers of cell state modulation by niche covariation analysis

Nature Communications volume 15, Article number: 10628 (2024) Cite this article

11k Accesses
3 Citations
51 Altmetric
Metrics details

Subjects

Abstract

Cell states are modulated by intrinsic driving forces such as gene expression noise and extrinsic signals from the tissue microenvironment. The distinction between intrinsic and extrinsic cell state determinants is essential for understanding the regulation of cell fate in tissues during development, homeostasis and disease. The rapidly growing availability of single-cell resolution spatial transcriptomics makes it possible to meet this challenge. However, available computational methods to infer topological tissue domains, spatially variable genes, or ligand-receptor interactions are limited in their capacity to capture cell state changes driven by crosstalk between individual cell types within the same niche. We present NiCo, a computational framework for integrating single-cell resolution spatial transcriptomics with matched single-cell RNA-sequencing reference data to infer the influence of the spatial niche on the cell state. By applying NiCo to mouse embryogenesis, adult small intestine and liver data, we demonstrate the ability to predict novel niche interactions that govern cell state variation underlying tissue development and homeostasis. In particular, NiCo predicts a feedback mechanism between Kupffer cells and neighboring stellate cells dampening stellate cell activation in the normal liver. NiCo provides a powerful tool to elucidate tissue architecture and to identify drivers of cellular states in local niches.

Identification and characterization of cell niches in tissue from spatial omics data at single-cell resolution

Article Open access 16 February 2025

Quantitative characterization of cell niches in spatially resolved omics data

Article Open access 18 March 2025

Unraveling cell–cell communication with NicheNet by inferring active ligands from transcriptomics data

Article 04 March 2025

Introduction

Cellular states are marked by distinct molecular configurations which determine fate and function^1,2, and are intricately regulated by a complex interplay of myriad molecules, interacting stochastically, only biased by their affinities. Despite this complexity, cell states exhibit remarkable robustness ensuring reproducible cell fate decision and differentiation. Within tissues, cell state variability can be dissected into intrinsic and extrinsic determinants.

For instance, stochastic binding of transcription factors to regulatory sites may lead to strong fluctuations of mRNA molecules across cells within the same state. This transcriptional bursting represents a cell-intrinsic source of variability and could even be actively modulated and exploited for the regulation of developmental processes³. Extrinsic molecular determinants, on the other hand, arise from cell-cell communication in tissues, encompassing molecular signaling, physical interactions, and competition for metabolites. Cell-cell communication is pivotal for physiological processes such as development, homeostasis, disease progression, and regeneration. Deciphering the gene programs orchestrating spatiotemporal communication between cells within tissues remains an open challenge for single-cell biology⁴.

The ability to dissect intrinsic and extrinsic determinants of a cell state would allow us to acquire a deeper understanding of cell state regulation within tissues. Although single-cell RNA-sequencing (scRNA-seq) permits the characterization of cell state heterogeneity and gene expression noise^5,6,7, and the prediction of ligand-receptor interactions^8,9, spatial context is lacking, and a distinction between intrinsic and extrinsic drivers of cell state changes is not possible.

Over the last decade, spatially resolved high-resolution transcriptomics technologies have rapidly advanced¹⁰. Following the development of sequencing-based spatial transcriptomics with a resolution of ~100 µm¹¹, several improved methods have been introduced increasing the resolution to the sub-micrometer range (e.g., Stereo-seq¹², Seq-Scope¹³). However, owing to the low sensitivity of transcript detection, pixels typically need to be aggregated into larger bins, thereby sacrificing single-cell resolution¹⁰. As a complementary approach, highly-multiplexed single-molecule FISH (smFISH), pioneered by MERFISH¹⁴ and seqFISH¹⁵, or in situ sequencing^16,17,18 rely on imaging-based quantification of individual mRNA molecules in tissue sections, typically for a few hundred genes. Crucially, the assignment of single mRNA molecules to specific cells is a prerequisite for measuring covariation of gene expression in neighboring cells to understand extrinsic divers of cell state changes, and can currently only be achieved by imaging-based methods allowing cell segmentation. Availability of commercial solutions for multiplexed smFISH and in situ sequencing is rapidly growing enabling the broader community to explore fundamental aspects of cell state regulation in tissues.

Extrinsic determinants of the transcriptional state of a cell originating from the microenvironment can be revealed by gene expression covariance in neighboring cells. We developed NiCo with the goal to infer Niche Covariation of gene expression programs at cell type resolution on a transcriptome-wide scale by integrating imaging-based spatial transcriptomics with matched scRNA-seq reference data. Available computational approaches for the analysis of spatial transcriptomics are focused on the inference of spatially variable genes (e.g., SpatialDE¹⁹), on spatial modeling of ligand-receptor interactions (e.g., SpaOTsc²⁰ and COMMOT²¹) or local gene-gene dependencies (e.g., MISTy²² or GCNG²³), or on explaining intra-cell-type variance by niche composition (NCEM)²⁴. These methods are typically restricted to the set of genes directly measured in the spatial modality, and none of these methods was designed towards inferring cell state dependencies of co-localized cell types. To overcome these limitations NiCo integrates sensitive genome-wide quantification of gene expression by scRNA-seq data with single-cell resolution spatial transcriptomics. NiCo does not rely on mapping individual cells from the scRNA-seq data into space or vice versa, which is computationally expensive and inherently noisy. Instead, NiCo derives interpretable latent factors reflecting cell state variability of each cell type captured within both scRNA-seq reference and spatial data. These latent factors are then used to infer covariation of gene programs in neighboring cell types from the spatial modality and to associate these factors with transcriptome-wide expression patterns by utilizing the scRNA-seq data. We demonstrate that this approach has the capacity to recover well-known functional niche interactions in the developing mouse embryo, as well as in mouse intestine and liver. NiCo provides novel insights into the cell states involved in these interactions and their spatial configuration. NiCo complements existing methods specialized for spatial ligand-receptor interaction analysis to infer cell state covariation in local niches and illuminates the consequences of local niche interactions for cell state control.

Results

Inferring covariation of gene programs in local tissue niches at single-cell resolution with NiCo

NiCo integrates imaging-based spatial transcriptomics data, obtained by multiplexed smFISH or in situ sequencing technologies such as MERFISH¹⁴, seqFISH¹⁵, or STARmap¹⁸, and commercial implementations of these approaches (e.g., Vizgen MERSCOPE, 10x Xenium, Nanostring CosMx) which typically measure targeted libraries of up to 500 genes, with scRNA-seq reference data that provide genome-wide coverage but lack spatial information. NiCo requires a cell-by-gene count matrix and two-dimensional cell-center coordinates from imaging-based spatial transcriptomics data that have undergone cell segmentation, as well as a cell-by-gene count matrix of a scRNA-seq reference dataset along with cell type labels comprising all cell types expected to occur in the spatial data.

NiCo was designed as a multi-step pipeline, facilitating (1) cell type annotation of the spatial modality by label transfer from the scRNA-seq reference data, (2) recovery of niche architecture for each cell type, (3) inference of covarying latent factors in co-localized cell types, and association of these latent factors with functional pathways and molecular signaling interactions (Fig. 1).

**Fig. 1: Exploring cell state covariation in local niches with NiCo.**

NiCo employs an iterative approach to perform the cell type annotations (Fig. 1, “Annotations”). First, mutual nearest neighbors are identified in the spatial and scRNA-seq data modalities after normalization to eliminate technical variability. After pruning scattered anchors utilizing Leiden clusters of the spatial modality, non-anchors are iteratively annotated based on anchors among their k-nearest neighbors (Methods).

After annotating cell types in the spatial ___domain, NiCo interrogates niche architecture for each “central” cell type (CC) in order to identify neighboring niche cell types with a high predictive capacity for the central cell type identity. NiCo trains a regularized logistic regression classifier to predict the identity of the central cell type from the normalized cell type frequencies in local niche neighborhoods (Methods). The regression coefficients of all niche cell types permit prioritization of potential interaction partners within local neighborhoods (Fig. 1, “Interactions”).

As a first step towards inferring covariation of gene programs within locally interacting cell types, NiCo has implemented different versions of non-negative matrix factorization (NMF). For each cell type, NiCo infers latent factors by integrative NMF^25,26 to capture common variability in the spatial and the scRNA-seq data modality on the shared subset of genes. Alternatively, if gene expression variability in the spatial modality is dominated by technical artefacts, e.g., due to imperfect segmentation giving rise the “spill-over” between cell types, NiCo offers conventional NMF on the shared set of genes only for the scRNA-seq modality and infers the cell loadings of these factors in the spatial modality. The output of this step is a set of latent factors capturing the cell state variability of each cell type based on the shared set of genes between both modalities (Fig. 1, “Covariations”).

Finally, NiCo performs ridge regression of each latent factor of the central cell on the latent factors of all predictive niche cell types obtained by step 2 (“Interactions”). Significant regression coefficients indicate either positive or negative covariation between the respective factors (Fig. 1, “Covariations”). Functional gene modules can be associated with covarying factors by performing pathway enrichment analysis on the top (anti-)correlating genes for each factor obtained from the transcriptome-wide scRNA-seq data. In addition, these genes can be interrogated for covarying ligands and receptors as potential mediators of this crosstalk.

We note that NiCo can also be applied to genome-wide sequencing- or imaging-based spatial transcriptomics data without requiring scRNA-seq reference data. In this case, the “Interactions” and “Covariations” modules can directly be run on annotated spatial data. This multifaceted analysis strategy overcomes limitations of available methods and permits identification of covarying gene programs in local niches at single cell type resolution.

NiCo enables accurate and scalable cell type annotation of imaging-based spatial transcriptomics data

Cell type annotation of imaging-based spatial transcriptomics data on limited subsets of genes is a challenging problem. Available computational methods addressing these tasks were primarily optimized for spot-deconvolution of sequencing-based spatial transcriptomics. Nonetheless, state-of-the-art methods such as cell2location²⁷, Tangram²⁸, TACCO²⁹, and uniPort³⁰, can be applied to imaging-based data that have undergone cell segmentation. In a rigorous benchmarking analysis, we conducted a comprehensive comparison of NiCo annotations against annotations generated by Tangram, uniPort, TACCO and cell2location on published datasets, comprising mouse small intestinal MERFISH data³¹, primary motor cortex MERFISH data³², and mouse embryo seqFISH data³³ (Supplementary Fig. 1a). Considering the authors’ annotation as ground truths, we evaluated the performance of these methods with well-established scoring metrics, i.e., adjusted Rand index and Jaccard similarity (Methods). Overall, NiCo outperformed Tangram and uniPort on all three datasets and exhibited higher ground truth consistency on Primary Motor cortex and the embryo data than TACCO (Fig. 2a). On the intestinal data, NiCo yielded a higher Jaccard similarity but a lower adjusted Rand index than TACCO. Cell2location exhibited higher overlap with the ground truths on the intestinal dataset, and showed comparable performance on the Primary Motor Cortex data, while NiCo was more consistent with the ground truths for the embryo data (Fig. 2a). However, on large datasets such as the MERSCOPE liver dataset with ~400k cells (Data Availability), cell2location took more than two days to run, while NiCo finished in less than three hours on an AMD Ryzen 9, 5950X, 16-core processor with 128 GiB memory (Fig. 2b). Therefore, NiCo enables scalable reference-based cell type annotation of imaging-based spatial transcriptomics data on a standard laptop.

**Fig. 2: Benchmarking NiCo annotation and interaction inference.**

NiCo identifies predictive niche interactions

To test NiCo’s ability to predict niche interactions, we simulated spatial architectures of tissues with six cell types. To simulate preferential niche interactions between these cell types, we modeled mutual affinities by specifying Lennard-Jones potentials of varying attractive force ε between specific pairs of cell types (Methods). The simulated particle configurations thus reflect preferential niche interactions, which can be ranked by the strengths of their relative attractive forces. We assessed NiCo’s regularized logistic regression classifier on these simulated data. Inspection of the confusion matrix, comparing predicted and ground truths cell types for two scenarios, demonstrates NiCo’s capacity to predict cell type identities based on niche composition as a function of the simulated interaction strength (Fig. 2c–e). Magnitude and signs of the regression coefficients were consistent with the simulated interactions, confirming NiCo’s capability for niche prediction. In the first scenario (Fig. 2c), NiCo predicted the strongest interaction for the central cell type T3 with T5 (ε = 5), and for central cell type T0, NiCo predicted the strongest interaction with T2 (ε = 3). Testing a more complex configuration (Fig. 2d), NiCo predicted the strongest interaction for the central cell type T3 with T2 (ε = 10), the second-strongest interaction with T1 (ε = 8), and the third-strongest with T5 (ε = 5). Similar conclusions hold true for the other cell types. For a larger neighborhood radius (R = 5, Fig. 2e), the relative rankings are not always mirrored by the inferred predictors due to stronger second-order effects. In general, we recommend running NiCo with radius R = 0 to focus on juxtacrine interactions before exploring larger radii. MISTy²² is an available machine learning framework for predicting structural and functional interactions from spatial omics data. To benchmark with NiCo’s interaction module, we applied MISTy to the second simulated scenario for juxtacrine interactions (Fig. 2d, Methods). MISTy recovered the relative ranking by interaction strength with some exceptions (Fig. 2f). For instance, the best predictor of T5 were T0 and T4 instead of T3. However, since MISTy interaction coefficients reflect predictor importance derived from random forests regression, a high value could indicate positive or negative interactions, which makes a direct interpretation difficult. Similarly, MISTy only picks up T3 as predictor of T2, but fails to detect T0. Moreover, T5 was not inferred as predictor of T3. The predictive contribution of higher-order interactions becomes apparent when inspecting MISTy’s paraview and comparing to NiCo with radius R = 5 (Fig. 2e), e.g., for predicting T5: both methods now detect T4 and T2 as best predictors. However, NiCo reveals that T4 is depleted while T2 is enriched in the paraview. Overall, NiCo’s predictions were more consistent with the ranking by simulated interaction strengths.

Predicted niche interactions are consistent with tissue ___domain architecture

A key objective of spatial transcriptomics analysis is the identification of topological tissue domains with distinct cell type architectures. Although NiCo was not designed specifically for this task, its predicted cell type interactions are reflective of tissue ___domain architecture. To benchmark NiCo with tissue ___domain detection methods, including Stagate³⁴, Seurat BuildNicheAssay³⁵, CellCharter³⁶, SpaGCN³⁷, SpatialPCA³⁸, and Banksy³⁹ we ran each method on spatial MERFISH whole mouse brain data with available ground truths annotation of six regional tissue domains comprising 17 cell types in total (Methods)⁴⁰. For each ___domain, we selected the NiCo cell type with the highest correlation of the regression coefficients to the ___domain-specific cell type frequencies, and compared to the cell type enrichment computed for the domains predicted by each method (Supplementary Fig. 1b, c and 2). Overall, the interactions predicted by NiCo were on average more consistent with ground truths ___domain annotations than the ___domain predictions obtained from available methods (Fig. 2g). On another dataset of mouse visual cortex STARmap data with ground truths annotation of neocortical layers¹⁸, NiCo’s performance was comparable to the ___domain detection methods (Fig. 2h and Supplementary Fig. 3). We note that NiCo predicts global interaction coefficients as a basis for inferring cell state dependencies between interacting cell types, while ___domain detection methods predict local tissue regions with unique cell type compositions. The observation that NiCo interaction coefficients of specific cell types frequently reflect ___domain architecture indicates that these cell types dominate particular domains. Benchmarking of memory usage and computation time for niche ___domain detection on an AMD Ryzen 9, 5950X, 16-core processor (Fig. 2b) suggests that NiCo is both fast and memory-efficient compared to available niche detection methods and does not require any pre-specified number of clusters or cluster resolution parameters.

NiCo deciphers cellular crosstalk during mouse organogenesis

In the following, we demonstrate the predictive capacity of NiCo on three different tissue environments with varying complexities and distinct architectural features. As a first scenario, we interrogate niche architecture and covariation of gene programs between local neighbors in the context of mouse embryonic development. We applied NiCo to seqFISH data of E8.5 embryos³³, where a diversity of organ tissues has already developed, and utilized an scRNA-seq reference data of the same developmental stage⁴¹. To facilitate data integration of the two modalities, we annotated cell types within the seqFISH dataset using the NiCo annotation module with default parameters to transfer the cell type labels from the scRNA-seq clusters (Fig. 3a, b).

**Fig. 3: NiCo identifies niche architecture and covariation during mouse embryogenesis.**

With the emergence of organ structures and the establishment of three-dimensional spatial tissue domains, embryonic tissues pose a unique challenge when deciphering niche interactions^42,43. We applied the interaction module of NiCo to interrogate the dynamic interplay within and across these tissue domains. The level of confusion of NiCo’s cell type classifier reflects the predictive capacity of the niche composition for each cell type. The majority of cell types in the E8.5 embryo were predicted well from their niche composition (Fig. 3c) with a weighted average precision of 0.71. This observation is not unexpected given the topological ordering of many cell types into tissue domains (Fig. 3a), e.g., gut, forebrain/midbrain/hindbrain, and heart tissue (cardiomyocytes). In these cases, the relative magnitude of regression coefficients suggests that the highest predictive capacity is contributed by neighbors of the same cell type identity (Fig. 3d). Conversely, elevated values in the off-diagonal entries of the confusion matrix reflect similarities in the niche composition of distinct cell types, e.g., for endothelium and haematoendothelial progenitors (Fig. 3c). Hence, inspection of the confusion matrix enables delineation of cell types with highly specific niche composition, and pairs or even larger groups of cell types with similar niche composition.

The regression coefficients of each cell type can be utilized to construct a spatial cell type neighborhood graph highlighting dominant niche interactions. The sparsity of this graph can be controlled by a cutoff on the regression coefficients. After applying a cutoff of 0.01, the neighborhood graph (Fig. 3e) unveils well known tissue co-localization patterns, such as extraembryonic (ExE) endoderm—definitive endoderm – gut, reflecting the endodermal origin of gut epithelium, or endothelium – haematoendothelium – blood progenitors (Fig. 3d), consistent with the emergence of primitive blood cells from yolk sac endothelium at this stage. The high connectivity of blood progenitors and erythroid cells is likely due to presence of these cells within the vasculature.

NiCo’s covariation module identifies covarying molecular programs in co-localized cell types encoded by cell type-specific latent variables, or factors. To explore covariation between co-localized cell types in common niches, we inferred three latent factors for each cell type, capturing major intra-cell-type variability (Methods). Using NiCo’s ridge regression step, we detected covariation between these factors across pairs of co-localized cells.

Focusing on cardiac tissue, this analysis revealed strong positive covariation of each cardiomyocyte factor with the same factor in neighboring cardiomyocytes, suggesting cell state synchronization in neighboring cardiomyocytes (Fig. 3f). Another significant covariation (log₁₀P = −3.92) was detected between cardiomyocyte factor (Fa) 2 and pharyngeal mesoderm Fa1, supported by 223 co-localized pairs out of 853 pharyngeal mesoderm cells and 773 cardiomyocytes (Fig. 3f). During mouse embryonic development, the vital role of the pharyngeal mesoderm is well documented, contributing significantly to extensive regions of both head and heart muscle⁴⁴. Heart formation involves two distinct cardiac progenitor cell populations: the first heart field (FHF) and the second heart field (SHF)⁴⁵. The T-box transcription factor Tbx5 marks the FHF, which is responsible for the left ventricle and atria development⁴⁶. Conversely, around E8.0, the SHF, originating from the pharyngeal mesoderm and characterized by Islet1 (Isl1) expression, populates the cardiac outflow tract and right ventricle^47,48. Studies suggest that canonical and noncanonical WNT signaling are essential for progenitor cell proliferation in pharyngeal mesoderm and the development of SHF heart differentiation⁴⁹. Anomalies in the signaling communication between the pharyngeal mesoderm and cardiomyocytes are associated with congenital heart diseases⁵⁰.

To interpret this covariation pattern, we inspected the top positively and negatively correlating genes with cardiomyocyte Fa2 and pharyngeal mesoderm Fa1, computed from the scRNA-seq reference data to leverage genome-wide information (Fig. 3g). While key markers for cardiac progenitors such as Nkx2-5, Isl1, Mef2c, and Fgf8 were negatively correlated with pharyngeal mesoderm Fa1, mature cardiac genes involved in muscle contraction, e.g., cardiac troponins (Tnnt2, Tnni3, Tnnc1, Tnnt1, Tnni1), and other mature cardiomyocyte markers such as Myl2, Myl3 and Acta anticorrelated with cardiomyocyte Fa2. These associations were also reflected by enriched biological processes among the top positively and negatively correlating genes which further highlights an association of pharyngeal mesoderm Fa1 with translational activity (Supplementary Fig. 4a, b). To explore signaling interactions that may mediate this interaction, we inspected ligands and receptors correlated to the covarying factors, and identified a potential interaction between Wnt2 on cardiomyocytes and Fzd2 on pharyngeal mesodermal cells (Fig. 3h), consistent with the known requirement of WNT signaling for SHF progenitor proliferation and differentiation from pharyngeal mesoderm⁴⁹. Moreover, the secreted signaling mediator Hmgb1 is among the top three genes most highly correlated to Pharyngeal mesoderm Fa1. Early knockout of this gene in the developing hearts leads to cardiac growth retardation⁵¹. Therefore, Hmgb1 signaling from the pharyngeal mesoderm could be an inducer of cardiomyocyte maturation (Fig. 3g). These observations indicate the co-localization of immature cardiomyocytes with translationally active putative cardiac progenitors in the pharyngeal mesoderm, consistent with the SHF developmental origin, highlighting the capability of NiCo to detect cell state covariation in local niches underlying embryonic tissue development.

NiCo unveils covariation of intestinal stem cell and Paneth cell states

The intestinal epithelium undergoes permanent homeostatic turnover fueled by an active stem cell compartment. This renewal process maintains the relative proportions of various cell types along the crypt-villus axis. Lgr5+ stem cells, localized at the bottom of the intestinal crypts, give rise to proliferating transient-amplifying (TA) progenitors and eventually differentiate into mature epithelial cell types⁵². This differentiation process is orchestrated with an upward migration from the crypt bottom to the tip of the villus. Recent observations indicate that Paneth cells, a known stem cell niche, could differentiate directly from stem cells⁵³.

To conduct a spatially resolved analysis of intercellular crosstalk within the mouse small intestine, we applied NiCo to published MERFISH data³¹. Integration with a comprehensive scRNA-seq dataset⁵⁴ facilitated cell type annotation in the spatial ___domain reflecting known localization of cell types along the crypt-villus axis (Fig. 4a, b). While stem and TA cells (stem/TA) as well as Paneth cells were localized to the crypt bottom, enterocytes segregated into bottom, medium, and top zone enterocytes. Goblet, tuft, and enteroendocrine cells were distributed more randomly along the crypt villus axis. NiCo’s interaction analysis revealed a predictive capacity of the niche composition for the majority of cell types (Fig. 4c). Off-diagonal elements in the confusion matrix, indicative of a similar niche composition for the respective cell types, suggest that the goblet cells share a similar niche with the stem/TA cells and bottom zone enterocytes, in line with their preferential localization to the lower crypt region. Inspection of the regression coefficients indicated that stem/TA cells were predicted by the presence of goblet and Paneth cells within their niche, and, vice versa, Paneth cells were predicted by co-localized Paneth and stem/TA cells (Fig. 4d). The cell type neighborhood graph derived from the regression coefficients segregated into a module of epithelial cells, and a module connecting different types of immune cells, lymphatic endothelial cells, as well as vascular cells (Fig. 4e).

**Fig. 4: NiCo infers cell state covariation between mouse intestinal stem cells and Paneth cell progenitors.**

Focusing on the stem/TA cell niche, NiCo’s covariation module inferred a significant covariation of stem/TA cell Fa1 with goblet cell Fa2 (log₁₀P = −4.00) and Paneth cell Fa1 (log₁₀P = −1.43) supported by 217 co-localized pairs out of 1107 stem/TA cells and 184 Paneth cells (Fig. 4f). Inspection of the top-correlating genes with stem/TA Fa1 in the scRNA-seq data revealed several known stem cell markers such as Olfm4, Lgr5, and Hopx. Paneth cell Fa1, on the other hand, strongly correlated with markers of Paneth cell progenitors (Fig. 4g and SupplementaryFig. 5a, b). This pattern suggests that the stemness signature covaries with a progenitor signature in neighboring Paneth cells. We note that Paneth cells can emerge directly from stem cells and the nascent Paneth cell progenitors may thus signal to the sister stem cells in order to maintain their stemness⁵³ (Supplementary Fig. 5c, d). Interrogation of ligand-receptor pairs correlated to the covarying factors recovered the Wnt3 ligand correlating to Paneth cell Fa1 and Fzd2 as well as Fzd7 receptors correlating to stem/TA cell Fa1 (Fig. 4h). Canonical Wnt signaling through the Wnt3-Fzd7 axis is essential for the survival of Lgr5+ intestinal stem cells⁵⁵. Reassuringly, regulation of canonical Wnt signaling as well as negative regulation of development were identified among the enriched gene ontology (GO) biological processes enriched in top correlating genes of stem/TA Fa1 (Supplementary Fig. 5e). Our spatial covariation analysis extends previous findings by identifying Paneth cell progenitors that likely directly emerged from the stem cell pool as the source of the critical Wnt ligand required for maintenance of the stem cell⁵³. Consistently, Wnt3 is downregulated during Paneth cell maturation following the trend of Fa1 (Fig. 4i). We note that existing methods for the inference of spatial cell-cell crosstalk, such as Niche-DE⁵⁶, COMMOT²¹, stLearn ⁵⁷and CellNeighborEX ⁵⁸did not recover this interaction due to their limitation to directly measured gene sets (Methods and Supplementary Fig. 6). In summary, NiCo recovers the critical Paneth cell niche of intestinal stem cell maintenance without any prior information and predicts nascent Paneth cell progenitors as a critical Wnt ligand source.

NiCo predicts covariation of stellate cell and Kupffer cell states in the mouse liver

The liver is the major metabolic organ in the body and possesses a remarkable regenerative capacity⁵⁹. Liver tissue is functionally organized into hexagonally shaped units termed lobules⁶⁰. Within each lobule, hepatocytes are organized into distinct zones along the axis connecting the central vein in the center with the portal vessels at the boundary. Metabolic and signaling pathways exhibit zonal gradients, a phenomenon also reflected by transcriptome zonation in hepatocytes and sinusoids connecting the central vein with the portal vein and the hepatic artery^61,62. Achieving spatial single-cell resolution is imperative to reveal the intricate cellular interactions within zonated liver niches. We applied NiCo to publicly available highly-multiplexed smFISH data on 347 genes in the mouse liver generated with the MERSCOPE platform (Data Availability) and integrated these data with a comprehensive single-cell/single-nucleus RNA-seq mouse liver atlas⁶³.

NiCo could annotate 375,161 out of 391,679 cells (95.8%) in the spatial modality (Fig. 5a). Since no author annotation was available as ground truths, we inspected spatial cell type localization within the lobule. According to NiCo’s annotation, portal vein endothelial cells localized to the margin of tissue areas that morphologically resemble large blood vessels, distant from smaller areas demarcated by the presence of central vein endothelial cells. Reassuringly, portal hepatocytes were annotated in vicinity to portal vein endothelial cells, while central hepatocytes co-localized with central vein endothelial cells, and the intermediate zone was populated by mid-zonal hepatocytes (Fig. 5b, c). This observation demonstrates NiCo’s capacity to correctly annotate cell states with more subtle transcriptional differences such as hepatocytes within different zones. A comparison with alternative state-of-the-art annotation tools such as cell2location, Tangram, TACCO and uniPort revealed that these methods struggle with the recovery of zonated hepatocyte states (SupplementaryFig. 7). Apart from NiCo, TACCO and cell2location managed to annotate hepatocytes of different zones correctly, although the separation of mid-zonal and portal hepatocytes was less clear. Moreover, on this dataset it took more than two days of computation time on a standard machine to run cell2location, while NiCo finished in few hours (Fig. 2b).

**Fig. 5: NiCo recovers niche architecture of the mouse liver.**

The logistic regression classifier of NiCo’s interaction module identified cell types with a highly predictive niche, such as portal and central vein endothelial cells, or cholangiocytes (Fig. 5d), and the regression coefficients confirmed known associations such as co-localization of central hepatocytes and central vein endothelial cells (Fig. 5e). Various immune cell populations, on the other hand, such as dendritic cells, monocytes, or neutrophils, were not predicted by the cell type composition of their niche suggesting a variable neighborhood for these migratory cells. The cell type neighborhood graph derived from the regression coefficients segregated into a portal cell type module, comprising portal vein and lymphatic endothelial cells, as well as cholangiocytes and portal hepatocytes, and a mid-central module of central and mid-central hepatocytes, central vein and sinusoidal endothelial cells (Fig. 5f). Both modules were connected to fibroblasts, which are known to reside around the central and portal blood vessels.

To understand whether the niche interactions identified from the regression coefficients were meaningful even in situations where the predictive capacity was reduced, we focused on the stellate cell niche, which exhibited a classification accuracy of 0.06. Nonetheless, the logistic regression classifier suggested specific positive associations with sinusoidal endothelial cells, Kupffer cells, and central/mid-zonal hepatocytes (Fig. 5e). This niche composition aligns perfectly with a recent characterization of the stellate cell niche⁶⁴. We demonstrate robustness of this niche prediction across technical replicates. Running NiCo on a second liver slice analyzed with MERSCOPE recovers liver cell type interactions in general, and the stellate cell-Kupffer cell niche composition in particular (Supplementary Fig. 8). Moreover, to test the robustness of NiCo to sample size, we repeated the analysis on a smaller sub-region containing only 88,772 (22.7%) cells. Again, NiCo recovered liver cell type interactions and the stellate-Kupffer cell niche (Supplementary Fig. 9). To explore the consequence of these niche interactions for the state of stellate cells, we applied NiCo’s covariation module to the full slice data. The most significant covariation (log₁₀P = −3.08) was inferred between stellate cell Fa2 and Kupffer cell Fa1 (Fig. 6a), supported by 7781 co-localized pairs out of 11,881 stellate cells and 38,932 Kupffer cells. Stellate cells are quiescent in the normal liver, but undergo activation and transdifferentiate into myofibroblasts producing alpha-smooth muscle actin (SMA) and collagen after chronic liver damage^65,66. Kupffer cells function as resident macrophages and shield the liver from bacterial infections⁶⁷. The top correlating genes with stellate cells Fa2 (Fig. 6b) were enriched in viral defense and antigen presentation pathways (Supplementary Fig. 10a, b), indicative of a proposed function as innate immune cell⁶⁸. On the other hand, proteoglycans such as Dcn and Bgn were among the top correlating genes. Antigen presentation and complement activation pathways were highly enriched among the top correlating genes with Kupffer cell Fa1 reflecting an activated Kupffer cell state (Fig. 6b and Supplementary Fig. 10a–e). Focusing on ligand-receptor pairs correlated with the covarying factors, we detected the ligand Tgfb1 on Kupffer cells and the receptor Tgfbr3 on stellate cells (Fig. 6c). Tgf-β acts as a potent inducer of fibrosis in the liver^69,70. Proteoglycans such as Dcn and Bgn are produced by stellate cells and exert known anti-fibrogenic functions; they inhibit Tgf-β signaling by sequestering Tgf-β ligands at the extracellular matrix or cell surface⁷¹. The predicted correlation of the Tgfb1-Tgfbr3 interaction with antigen presentation and immune activation in Kupffer cells and similar functions as well as proteoglycan production in co-localized stellate cells (Fig. 6d, e and Supplementary Fig. 10c–e), may indicate a functional dependence of these processes in the healthy liver. Sensing of low levels of viruses and other pathogens may lead to activation of Kupffer cells and induction of Tgf-β signaling, which is received by interacting stellate cells. In stellate cells, this signal could induce proteoglycans which sequester Tgf-β ligands to dampen pro-fibrogenic signaling in order to avoid stellate cell activation at low pathogen levels in the healthy liver. Upon chronic stimulation, this mechanism may be insufficient to inhibit Tgf-β signaling surpassing a critical threshold, and thus leading to full stellate cells activation and collagen production.

**Fig. 6: NiCo predicts crosstalk between stellate cells and Kupffer cells in the mouse liver.**

Since Tgfb1 and Dcn were not directly measured in the original MERSCOPE experiment, we independently validated the inferred covariation between Tgfb1 in Kupffer cells marked by Clec4f and Dcn in stellate cells by single-molecule hybridization chain reaction (smHCR)⁷² (Methods). Indeed, this highly sensitive assay confirmed a strong correlation (Pearson’s correlation coefficient r = 0.59) between Tgfb1 and Dcn in co-localized pairs of stellate and Kupffer cells within healthy liver tissue (Fig. 6f, g and Methods). In comparison, correlation of Dcn in stellate cells with the general marker Clec4f in co-localized Kupffer cells was markedly reduced (Pearson’s correlation coefficient r = 0.24) indicating the specificity of the covariation between Dcn and Tgfb1 in pairs of stellate and Kupffer cells (Fig. 6g). We finally validated dampening of hepatic stellate cell (HSC) activation by Dcn in vitro (Methods). qPCR quantification after in vitro culture of HSCs stimulated with Tgf-b in the presence (HSC+Tgfb+Dcn) or absence (HSC+Tgfb) of Dcn suggested dampened activation when Dcn was administered to the culture. Specifically, HSC stimulation by Tgf-b in the presence of Dcn led to significantly lower induction of activation markers Col1a1 (P < 0.01) and Pdgfrb (P < 0.04), while the quiescent HSC marker gene Reln remained unaffected. (Methods, Fig. 6h and Supplementary Fig. 10f). Induction of activation markers Lox and Acta2 was also reduced although the fold-reduction did not reach significance. These observations support the hypothesis that secreted Dcn may indeed dampen HSC activation by sequestering Tgf-b. The key insight infered by NiCo is the induction of Dcn expression in HSCs as a result of Tgf-b upregulation in co-localized Kupffer cells, supported by the validated covariation pattern (Fig. 6g). This example showcases the ability of NiCo to generate insightful hypotheses about the functional implications of cell type-specific niche interactions in the liver.

NiCo detects covariation patterns in sequencing-based mouse cerebellum spatial transcriptomics data

Although NiCo was primarily designed for the analysis of single-cell resolution spatial transcriptome data, we evaluated its capabilities on a mouse cerebellum dataset generated with Slide-seqV2, a sequencing-based spatial transcriptomics method with 10 μm resolution⁷³. Since pixels typically consist of 1-2 cells, interaction and covariation inference may be confounded to a certain extent. Utilizing cell type annotations for each pixel generated by the authors, we first ran NiCo’s interaction module, which recovered well known interactions, e.g., between Purkinje neurons and a specialized form of astrocytes called Bergmann glial cells⁷⁴ (Supplementary Fig. 11). Covariation analysis between these co-localized cell types recovered covarying secretory programs, e.g., calcium signaling from Bergmann cells to Purkinje neurons⁷⁵ (Supplementary Fig. 11e). This vignette showcases the general applicability of NiCo to sequencing-based spatial transcriptomics.

Discussion

With the availability of single-cell sequencing methods, our understanding of the molecular definition of cell types has substantially advanced. As a key insight, it was recognized that a cell type cannot be defined by a unique transcriptome state, but rather populates a continuous region within the high-dimensional transcriptome space, reflecting systematic cell state variability. Such variability could in theory be accounted for by cell-intrinsic factors driven by the architecture of the gene regulatory network governing a cell type, which could give rise to multi-stability due to stochasticity of molecular interactions⁷⁶. However, cells within tissues are exposed to extrinsic inputs from the microenvironment, which could be another mechanism to induce cell state modulations. The ability to deconvolve intrinsic and extrinsic cell state determinants represents a critical step forward towards understanding robustness and plasticity of cell types in multicellular tissue environments.

Application of single-cell resolution spatial transcriptomics permits addressing this challenge. NiCo explains cell state variability by gene expression covariates within cell types populating the same niche. To avoid strong assumptions on the signaling mediators, we did not incorporate ligand-receptor interactions as a constraint. Hence, predicted covariations may result from direct signaling interactions but also from indirect sources such as cellular adaptation towards biomechanical cues, or competition for metabolites⁷⁷. Nonetheless, NiCo proposes candidate signaling pathways mediating predicted covariations by identifying ligands and receptors correlating to the covarying factors within the interacting cell types. However, we believe that alternative drivers of cell state covariation in local niches are important to consider, and therefore provide a tool to detect the downstream effects on the transcriptional state of a cell unconstrained by assumptions on the mediators of this interaction.

Since imaging-based technologies typically only measure few hundred genes, integration with scRNA-seq data is critical for the functional interpretation of predicted covariations between co-localized cells. NiCo solves this challenge by inferring a small set of latent factors capturing dominant cell state variability for each cell type, and utilizes these factors to infer cell-cell covariation and to associate these covariates with full transcriptome variability. It is important to stress that cell state covariation inference requires mapping of individual mRNA molecules to the cell of origin. This is currently not possible with sequencing-based spatial transcriptomics, which require local pixel aggregation and subsequent deconvolution into cell types due to limited resolution and sensitivity. This process destroys local covariance structure between cell types in aggregated spots. With technological advancements, sequencing-based methods may be able to achieve single-cell resolution in the future, and NiCo will be directly applicable to this data type. Similarly, NiCo can be applied to future generations of full transcriptome imaging-based spatial technologies without requiring reference scRNA-seq data.

By applying NiCo to seqFISH data of E8.5 mouse embryos undergoing hierarchical dynamic organization into tissue domains, we showcase that NiCo identifies cell-cell interactions underlying the emergence of cardiac progenitors from the sub-pharyngeal mesoderm and recovers Wnt signaling as a critical pathway regulating this process. For mouse intestinal MERFISH data, NiCo provides deeper insights into the well-known Paneth cell niche of Lgr5+ intestinal stem cells, and associates Wnt activity with the Paneth cell progenitor state. In the mouse liver, NiCo captures cell state covariation between stellate cells and Kupffer cells, which could be essential for limiting stellate cell activation in the healthy liver, supported by Dcn-mediated dampening of HSC activation in vitro. These examples demonstrate that NiCo can illuminate the effect of cell-cell interactions in local tissue environments.

NiCo relies on regularized regression analysis to identify predictive niche interactions and to infer factor covariation. Both steps are limited in their capacity to infer higher-order interactions and the covariation analysis is insensitive to strongly non-linear covariation patterns. While NiCo’s regression framework in principle could accommodate for higher-order interaction terms, this would massively increase model complexity, which would be detrimental for robustness and interpretability. Future extensions should explore non-linear methods such as graph attention networks⁷⁸ for the exploration of higher-order interactions and non-linear niche covariation.

In summary, NiCo fills a critical gap for identifying spatial dependencies between cell states in common tissue niches and represents a significant step forward towards learning and interpreting extrinsic driving forces of cellular states. With the rapidly increasing availability of commercial platforms for high-resolution spatial transcriptomics we anticipate that NiCo will be a key component to gain biological insights from such data.

Methods

Dataset description

We analyzed spatial data and reference scRNA-seq of four different mouse tissues across seven studies.

Organogenesis

We analysed spatial transcriptomics data at single-cell resolution from mouse organogenesis generated by seqFISH³³. The spatial data was captured during the embryonic time point E(8.5–8.75) and comprised 19,451 individual cells and 351 genes. For this dataset, 24 distinct cell types were annotated by the authors, which were used as ground truths for benchmarking cell type annotation. As reference dataset, a published scRNA-seq resource covering different stages of mouse organogenesis was used⁴¹, which was subset to the E8.5 timepoint corresponding to the spatial dataset. Cells designated as NaN cluster type were removed, resulting in 16,909 cells and 19,371 genes across 29 clusters. After merging the two blood progenitor clusters and three erythroid subtypes, 26 distinct cell types were identified by the authors, which were utilized for NiCo’s annotation module.

Small intestine

We obtained single-cell resolution spatial transcriptomics data of the mouse small intestine generated with the MERFISH technology³¹. Corresponding scRNA-seq reference data of the same tissue were obtained from an independent study⁵⁴. The published MERFISH annotation comprises 19 unique cell types, profiling 241 genes across 7416 individual cells. The scRNA-seq data encompass 32,287 genes, and 2239 cells that are stratified into two clustering partitions, i.e., high-resolution cell states and coarse cell types. Within the cell state category, 26 distinct cell states were identified, while the coarse category encompasses 17 unique cell types. To apply the NiCo annotation module, we merged all populations of stem cell-related states, goblet-related cells, and stroma-related populations, respectively, resulting in 19 distinct cell type clusters. The intersection of the scRNAseq and MERFISH data comprises 240 common genes.

Liver

We utilized single-cell resolution spatial transcriptomics data of the mouse adult liver obtained with the Vizgen MERSCOPE platform (available from https://vizgen.com/data-release-program/). Corresponding scRNA-seq reference data from the Liver Cell Spatial Proteogenomics atlas were utilized⁶³. The MERSCOPE dataset includes 347 genes and 391,679 individual cells, with cell filtration based on a minimum count of 5. The scRNA-seq atlas comprises 185,894 cells and 24,857 genes. We utilized the ‘mouseStStAll’ cell type annotation provided by the authors. This annotation was modified by merging specific immune cell types (CD8 Effector Memory T cells, CTLs, NKT cells, Naïve CD4 + T cells, Naïve CD8 + T cells, Th1s, TRegs, and Th17 cells into single T cell cluster; Patrolling Monocytes, Trans. Monocytes and Monocytes-derived cells into a single Monocyte cluster; and MoMac1, MoMac2, and Peritoneal Macrophages into a single Macrophage cluster) and by annotating subclusters of hepatocytes. The hepatocyte population in ‘mouseStStAll’ was divided into eight subclusters, which we categorized into four subpopulations of zonal hepatocytes based on their expression of zonation markers⁶¹.

Primary motor cortex

We accessed MERFISH data of the mouse primary motor cortex³² and corresponding scRNA-seq data⁷⁹. The MERFISH dataset comprises 280,186 cells, encompassing 254 genes and 24 cell types as defined by the authors. These cells were sourced from 64 tissue slides derived from two mice litter. From this dataset, we selected a specific slide (mouse1, slice153) due to its continuous structure and the highest cell count, totaling 7626 cells. The scRNA-seq data, termed ‘BICCN MOp scRNA 10X v3 Analysis AIBS’ were obtained from the NeMo archive; we removed ‘Low Quality’ and ‘doublet’ subclass labels. This dataset consisted of 71,183 cells and 30,982 genes, and we retained 20 subclass labels as defined by the authors. Nineteen of these cell types mirrored those identified in the MERFISH data.

Cerebellum Slide-seqV2

The Slide-seqV2 mouse cerebellum dataset⁷³ comprises 44,091 cells and 5160 genes, with cell types annotated using the RCTD⁸⁰ method by the authors. We retained the singlet and doublet_uncertain categories, resulting in 30,575 cells and 18 cell types. The NiCo interaction module was applied to this dataset with default parameters and the covariation module was applied with the unimodal version of ordinary NMF.

STARmap mouse visual cortex dataset

STARmap mouse visual cortex data¹⁸ comprise 1207 cells and 1020 genes, 16 cell types, annotated into 7 neocortical layers L1, L2/3, L4, L5, L6, HPC (hippocampus) and cc (corpus callosum).

MERFISH mouse brain atlas

The MERFISH atlas of the whole mouse brain⁴⁰ comprises 59 brain sections with ~4 million cells. We analyzed z-slice 11.2, containing 44,699 cells and 500 non-blank genes, falling into 17 cell type classes and 6 neighborhood classes with regional specificity. The six neighborhood classes are denoted as D1 (HY-EA-Glut-GABA), D2 (NN-IMN-GC), D3 (Pallium-Glut), D4 (Subpallium-GABA), D5 (Subpallium-GABA; HY-EA-Glut-GABA), D6 (Subpallium-GABA; NN-IMN-GC).

Niche Covariation (NiCo) algorithm

The NiCo cell type annotation module

We adopted a multistep approach to integrate single-cell resolution spatial transcriptomics with scRNA-seq reference data, assuming a given cell type annotation of the reference data available as clustering partition. We first give a brief overview of the pipeline, followed by a detailed explanation of each step: (1) Determining anchors: Following preprocessing, NiCo first identifies anchor cells between the spatial query data and the scRNA-seq reference data by the mutual nearest neighbor (MNN) method⁸¹. (2) Anchor filtering: We made use of gene expression-based clustering of the spatial data using the Leiden community detection algorithm to filter out dispersed anchors mapping to reference clusters at low frequencies. (3) Iterative annotation of non-anchors: Non-anchor cells in the spatial query data receive a cell type label based on the cell type frequency of anchors among their K-nearest neighbors (KNN) and join the anchor set. This process is iterated.

Determining anchors

NiCo’s annotation module takes cell-by-gene count matrices of query and reference data, denoted as M_q and M_r, as input data. After sub-setting to the shared set of genes measured in both modalities, normalization using the scTransform method⁸² yields Pearson’s residuals for downstream analysis. Next, we extracted the top 50 principal components (PCs) from each normalized matrix, transforming them into latent dimension matrices, d_q and d_r. These 50 PCs retained the majority of the variance in the data. We standardized d_q and d_r and employed a KD-tree algorithm to find mutual nearest neighbor (MNN) pairs applying a soft criterion. Specifically, we identify for each query cell q the set of KNN in the reference data, termed s_q. Similarly, we identify for each reference cell r the set of KNN in the query data, termed s_r. By default, we use K = 50. Query cell q and reference cell r are considered MNN, if q∈s_r and r∈s_q. This neighborhood-based definition of MNNs permits for the existence of multiple mutual nearest neighbor cells in the scRNA-seq reference dataset for a given query cell. In this case, the query cell is considered a “confused anchor”. This is resolved by drawing information from the non-confused anchor cells among the KNNs (K = 50 by default) of the query cell. After pruning neighboring anchors that do not belong to the same spatial guiding cluster (see “Anchor filtering”) as the query cell, a unique cell type is assigned based on the largest proportion of pruned neighboring cells by majority vote. If this results in a tie or if no filtered neighboring cells belong to the same spatial guiding cluster as the query cell, it is designated ‘NM’ (Not mapped). If this procedure results in too many cells that could not be mapped, we implemented a weighted score instead of a majority vote for cell type assignment based on neighboring anchor information. The weighted score is computed based on the inverse distance between the confused query cell and its neighboring anchors after pruning. Scores are aggregated across neighbors with the same assigned cell types, and the cell type with the largest score is assigned to the query cell.

Anchor filtering

For each pair of MNN the cell type label is transferred from the scRNA-seq reference annotation to the spatial query data. We next perform low resolution Leiden clustering of the spatial query data d_q to infer clusters for anchor pruning (spatial guide clustering). These guiding clusters are utilized to filter out noisy anchors. Anchors of a given cell type label are considered noisy, or dispersed, if they map to a spatial cluster with a frequency lower than the dispersion parameter r (r = 0.15 by default). This step implements information sharing between anchors of the same cell type label by keeping anchors falling into the same spatial guiding cluster at high frequency and removing anchors falling into spatial guiding clusters at low frequency. The resolution of the spatial guide clustering can be adjusted; lower values retain more anchors but may decrease specificity of the annotation.

Iterative annotation of non-anchors

Anchor cells serve as a seed for annotating the remaining non-anchor cells in the spatial query data. Each non-anchor cell is annotated by a majority vote based on cell type proportions of anchor cells among its KNN. If the cell type with the highest proportion by majority vote and the queried non-anchor cell end up in same guiding cluster, the queried non-anchor cell is assigned to this cell type. Moreover, if a query cell’s majority vote for cell type assignment results in a tie, NiCo marks it as ‘NM’ (Not Mapped). This process is iterated, and after each iteration anchors are updated with the annotated non-anchors. This process is only iterated three times to avoid low confidence annotations. NiCo successfully mapped ~96% of query cells in embryo and liver data, ~99% of query cells in intestine data, and ~90% of query cells in brain data. To further reduce the number of non-mapped cells, cell type annotations of neighbors can optionally be weighted by their relative distances as explained in the step 1 to the query cell, which avoids ties but may reduce specificity and is therefore not used as default setting.

Benchmarking cell type annotations

In our comparative analysis, we benchmarked NiCo against four state-of-the-art methods: cell2location²⁷, Tangram²⁸, TACCO²⁹ and uniPort³⁰. The evaluation was conducted based on two primary criteria: the adjusted rand score (ARI) and Jaccard similarity (JAC). The author’s annotation of each dataset was considered as ground truth. To compute these metrics, the cell type names for the reference annotation had to match exactly the author-defined annotation. We identified 14, 20, and 17 matched cell types for intestine, embryo, and cortex data, respectively, out of 19, 26, and 20 total scRNA-seq reference clusters. Initially, all datasets were preprocessed with a filtering step, setting the minimum total transcript count to 5 in order to eliminate low quality cells. For NiCo, the resolution parameter for spatial guide Leiden clustering was set to 0.4 for intestine, cortex, and embryo datasets, and 0.5 for the liver data.

Tangram was executed with default parameters, mapping cells to space using the ‘clusters mode’ and reference cluster labels from scRNA-seq data. uniPort was run with default parameters, employing label transfer from reference data to query data for spatial cell type annotations. TACCO was run with default parameters. For cell2location, parameters “N cells per Location=1” and “detection alpha=20” were applied, with all other parameters set to their default values. To avoid memory overflow, cell abundance was estimated from the posterior distribution summary in the large liver dataset by directly using quantile computation. For other datasets, the quantile computation uses 1000 “num_samples” to generate a posterior distribution and uses the 0.05 quantiles to obtain spatial cell type annotations, following the developer’s recommendation.

Calculation of computation time and memory usage for cell type annotation and niche prediction tasks

The computation time and memory requirement for different tasks were measured using Python’s time and memory_profiler modules. Specifically, these module were called at the beginning and end of the script to calculate the total execution time and memory usage.

The NiCo interaction module

To infer significant cell type interactions based on spatial co-localization, NiCo employs a logistic regression classifier to predict the cell-type identity of a “central” cell from the cell type composition of its neighborhood. In short, the method proceeds in three steps. (1) Inference of niche composition: We first determine the local-niche neighbors for each cell. Subsequently, we infer global expected cell type frequencies and determine fold enrichments for every cell type in each neighborhood. (2) Logistic regression classifier: We then train a logistic regression classifier to predict the central cell type identity from the cell type enrichment ratios in the local niche. (3) Cell type interaction graph: The regression coefficients are then utilized to construct a cell type interaction graph.

Inference of niche composition

Local neighborhood information is deduced based on cell centroids. Two distinct schemes are employed for defining neighborhood composition. The first scheme is based on juxtacrine signaling and includes only direct neighbors. This approach, also referred to as ‘radius=0’, is executed through Delaunay triangulation. Since Delaunay triangulation can include distant neighbors due to wide intercellular space in the tissue, we define a cutoff of 100μm and discard neighbor with a centroid-to-centroid distance beyond this limit. The second scheme accommodates paracrine signaling by including all neighbors whose centroid is localized within a given radius R around the central cell’s centroid.

Once the neighbors are identified, we create a cell type count matrix Λ_ij and a vector y_i with i = 1,…,m and j = 1,…,n. n corresponds to the number of cell types, and m is equal to the total number of cells in the dataset. Λ_ij denotes the count of cell type j in the neighborhood of the central cell i. The cell type identity of cell i is stored in y_i ∈ (1,…, K). The number of classes K corresponds to the number of cell types and is therefore identical to the number of features n. The counts Λ_ij are normalized by expected global cell type frequencies to obtain the neighborhood enrichment matrix

$${X}_{{ij}}={\Lambda }_{{ij}}/{\sum }_{k}{f}_{j}\cdot {\Lambda }_{{ik}}$$

(1)

where f_j corresponds to the global frequency of cell type j and ${\sum }_{j}{\,f}_{j}=1$.

Logistic regression classifier

NiCo employs a multi-class logistic regression classifier to learn a set of coefficients reflecting the relevance of each niche cell type for predicting a given central cell type. This classifier computes the probability $P\left({y}_{i}=k,|,{{{\bf{x}}}}_{i}{;W}\right)$ of central cell i to be of cell type k given the enrichment ratios ${{{\bf{x}}}}_{{i}}$ of all cell types in the neighborhood of cell i:

$$P({y}_{i}={k|}{{{\bf{x}}}}_{i};W)=\frac{\exp ({{{\bf{w}}}}_{k}{{{\bf{x}}}}_{i}+{w}_{0,k})}{{\sum }_{j=1}^{K}\exp ({{{\bf{w}}}}_{j}{{{\bf{x}}}}_{i}+{w}_{0,j})}$$

(2)

${{{\bf{x}}}}_{i}$ corresponds to the i-th row of the neighborhood enrichment matrix $X$ and ${{{\bf{w}}}}_{k}$ corresponds to the k-th column of $W$.

The algorithm incorporates an L2 penalty for regularization, with its hyperparameter, denoted as C, determined using the GridSearchCV function of the scikit-learn python library by minimizing the negative log-likelihood

$${\min }_{w}\left(-C{\sum }_{i=1}^{n}{\sum }_{k=1}^{K}[{y}_{i}=k]\log \left(P\left({y}_{i}=k|{{{\bf{x}}}}_{i};W\right)\right)+\frac{1}{2}{\sum }_{i=1}^{n}{\sum }_{j=1}^{K}{w}_{j,i}^{2}\right)$$

(3)

Stratified 5-fold cross-validation is applied. The input feature matrix $X$ is normalized using the StandardScaler function. Furthermore, class weights are adjusted using the balanced mode to account for class imbalance. The ‘class_weight’ function of the scikit-learn python library assigns higher weights to minority classes, allowing the model to pay more attention to its patterns and reducing bias towards majority classes. In this mode, values of y adjust weights inversely proportional to class frequencies in the input data.

The L-BFGS method implemented in the ‘lbfgs’ solver function of the scikit-learn python library is employed during the training step, alongside the ‘multinomial’ logistic loss function encompassing the entire probability distribution.

The vector ${{{\bf{w}}}}_{i}^{f}=\left({w}_{i1}^{f},\ldots,{w}_{{in}}^{f}\right)$ corresponds to the i-th column of the coefficient matrix ${W}^{f}$ for cross fold f (f = 5), i.e., contains the cell type coefficients for predicting each central cell type i from its niche composition ${x}_{i}$. It is averaged across all data cross folds $f=1,\ldots,F$:

$${{{\mathbf{\beta }}}}_{i}=\frac{1}{F}{\sum }_{f=1}^{F}{{{\bf{\,w}}}}_{i}^{f}$$

(4)

The standard deviation of ${{{\bf{w}}}}_{i}^{\,f}$ reflects the uncertainty of the coefficients and serves to estimate error bars. For follow-up analysis we recommend disregarding coefficients with error bars of similar magnitude to the coefficient value itself.

To assess the predictive capacity of the niche composition for each cell type, we inspect the confusion matrix, which compares the predicted cell type label to the ground truths label corresponding to the NiCo annotation of the central cell. We report the average confusion scores across data folds and normalize each row of the confusion matrix (ground truths cell type labels) to one.

To facilitate the visualization of the coefficients for each central cell type i, we plot the mean ${{{\mathbf{\beta }}}}_{i}$ in descending order of their absolute value along the x-axis and indicate uncertainty by the standard deviation across data folds. Positive coefficients ${\beta }_{{ij}} > 0$ indicate that the neighboring cell type j preferentially co-localizes with central cell type i, whereas negative coefficients ${\beta }_{{ij}} < 0$ signify preferential absence of cell type j from the niche of cell type i.

Cell type interaction graph

To create a cell type interaction map highlighting cell type co-localization patterns, we normalize ${{{\mathbf{\beta }}}}_{i}$ by dividing it by the maximum absolute value of its components ${\beta }_{{ij}}$. After applying a cutoff on ${\beta }_{{ij}}$, we draw edges connecting the central cell type i and niche cell type j with directionality pointing from j to i, resulting in a graph structure comprising all cell types and their interaction partners. A global cutoff can be applied to ${\beta }_{{ij}}$ to control the sparsity of the graph. The edge thickness scales with ${\beta }_{{ij}}$. We employed a force-directed layout within the framework of the Graphviz layout.

Testing the NiCo interaction module on simulated ground truths data

To test the capacity of NiCo’s logistic regression classifier to predict the central cell type’s class from the cell type neighborhood, we conducted a simulation, where we created six cell types, each containing 200 cells, in a 2D space using the LAMMPS molecular dynamics package⁸³. This simulation was carried out with a varied Lennard-Jones (LJ) pairwise potential for modeling interactions between cell types. LJ potentials describe interactions that are repulsive on short range (particle size σ), to simulate physical cell boundaries, and attractive at longer distances with a given range (interaction cutoff range parameter rc) and strength (potential well ε), to simulate preferential co-localization of defined pairs of cell types.

Fig. 2c: We first initialized all (6 + 6 choose 2) = 21 pairwise interactions, including self-interactions, with the following parameters: potential well ε = 1, particle size σ = 2, and interaction cutoff range parameter rc = 5. Subsequently, we introduced stronger interaction potentials between T0-T2 (ε = 3) and T3-T5 (ε = 5). This allowed us to create a scenario where T0 was expected to be in the neighborhood of T2, and vice versa, and T3 should be in proximity to T5, and vice versa. The simulation box had dimensions of −50 to +50 in both X and Y, and periodic boundary conditions were assumed. All masses were set to 1, initial velocities were assigned to achieve a temperature of T = 1 at the normalized kB unit, and equilibration was attained using NVE and Langevin’s thermostat. This ensured that each cell maintained the desired temperature of T = 1, and the enforce2d command was employed to constrain cell movement along the Z direction to mimic the simulated locations in the 2D plane. The final configuration of the simulation served as the input for NiCo to predict neighborhood niche interactions. The confusion matrix and coefficients for predicting each cell type were computed for the default parameter of radius=0, considering only immediate neighbors.

Fig. 2d: To test a more challenging situation, we simulated a more complex scenario. All six pairwise interactions, including self-interactions, were initialized with parameters ε = 1, σ = 2, and rc = 5. We then altered the ε parameters for T0-T2, T3-T5, T2-T3, and T1-T3 interactions to 3, 5, 10, and 8, respectively, to establish a distinct order among the cell types. The confusion matrix and coefficients for predicting each cell type were computed for the default parameter of radius=0, considering only immediate neighbors.

Benchmarking the NiCo interaction module with MISTy

In our comparative study between NiCo and MISTy²² for niche prediction, we transformed cell types for the more complex, second simulated scenario into a one-hot encoding scheme. Subsequently, we input these data into MISTy, generating a juxtaview with default parameters and paraview with radius five. Inferred importance, i.e., z-transformed total variance reduction scaled by (1 - p-value), was interpreted as indicating cell type interactions between a central cell type (target) and niche-cell types (predictors).

Detecting covariation of gene expression in co-localized cells with NiCo

To detect covariation of co-localized cell states in local niches, NiCo’s covariation module implements the following steps: (1) Latent variable inference: Identification of a set of latent variables for each cell type that capture intra-cell type variability. (2) Latent variable annotation: Inference of genes and pathways correlated to the latent factors, including ligand and receptor pairs to predict signaling mediators. In this step, we focus on the scRNA-seq data to obtain full transcriptome coverage. (3) Identification of niche covariation: NiCo applies regularized linear regression to identify latent factors in co-localized cell types exhibiting covariation.

Latent variable inference

NiCo employs an integrated Non-Negative Matrix Factorization (iNMF) approach alongside ordinary NMF (oNMF) to identify latent factors from the shared set of genes in the spatial transcriptomics and the scRNA-seq data. Both datasets are scaled but not mean-centered. These NMF methods learn a lower-dimensional embedding space, encompassing spatial cell factors ${H}_{{sp}}\in {{\mathbb{R}}}_{+}^{{c}_{{sp}}{\mbox{x}}{LF}}$, scRNA-seq cell factors ${H}_{{sc}}\in {{\mathbb{R}}}_{+}^{{c}_{{sc}}{\mbox{x}}{LF}}$, and shared gene embedding factors $W\in {{\mathbb{R}}}_{+}^{{LF}{\mbox{x}}G}$, where ${LF}$ corresponds to the user-defined number of factors, $G$ denotes the number of shared genes across modalities, and ${c}_{{sp}}$ and ${c}_{{sc}}$ denote the number of cells for a particular cell type in the spatial and the scRNA-seq data, respectively. The iNMF methodology^25,26 draws inspiration from the LIGER package^25,26, and within NiCo, it simultaneously learns common gene factors $W$, ${H}_{{sc}}$ factors and ${H}_{{sp}}$ factors from cell type-specific scRNA-seq reference cluster ${E}_{{sc}}$ and query spatial cluster ${E}_{{sp}}$ after sub-setting to the shared set of genes. The objective function of iNMF is designed to learn a shared factor $W$ across all datasets while capturing data heterogeneity through factors ${H}_{x}$ and ${V}_{x}$ with x denoting the modality:

$${{\rm{argmin}}}\left(\right.{{||}{E}_{{sc}}-{H}_{{sc}}\left(W+{V}_{{sc}}\right){||}}_{F}^{2}+\lambda {\parallel {H}_{{sc}}{V}_{{sc}}\parallel }_{F}^{2}$$

$$+{{{\rm{||}}}{E}_{{sp}}-{H}_{{sp}}(W+{V}_{{sp}}){{\rm{||}}}}_{F}^{2}+\lambda {{{\rm{||}}}{H}_{{sp}}{V}_{{sp}}{{\rm{||}}}}_{F}^{2}\left.\right)$$

(5)

Optimization incorporates penalizing the Frobenius norm (denoted by F) after accounting for heterogeneous effects across modalities, ${{H}_{{sc}}V}_{{sc}}$ and ${H}_{{sp}}{V}_{{sp}}$, and regularization of these components with parameter $\lambda$. The matrix product ${H}_{x}{V}_{x}$ encodes data-specific constraints for modality $x$ tailored for heterogenous multi-modal data, following the principles outlined in ref. ²⁶. Importantly, all learned matrices, ${H}_{x}$, ${V}_{x}$, and $W,$ are constrained to be non-negative. The selection of $\lambda$, a tuning parameter, is critical and preferably small for a mixture of homogenous datasets and larger for a mixture of heterogeneous datasets. To determine the optimal $\lambda$ from ${E}_{{sc}}$ and ${E}_{{sp}}$, we execute iNMF for increasing values of $\lambda$ set to even integers ($\lambda={\mathrm{0,2,4,6}},\ldots$), and selecting the $\lambda$ that stabilizes the alignment score (Eq. (6)), following the methodology outlined in ref. ⁸⁴. In brief, we first randomly downsampled the larger dataset (${H}_{{sc}}$ or ${H}_{{sp}}$) to match the size of the smaller dataset and merged them into a unified dataset. We determine $K$ as 1% of the total number of cells in the smaller dataset, denoted as $n$, and identified the $K$-nearest neighbors for each cell in the unified dataset. For every cell, we calculated how many of its KNN belonged to the same dataset and average this across all cells to obtain $\bar{x}$. In an ideal scenario where ${H}_{{sc}}$ and ${H}_{{sp}}$ are learned perfectly, each cell’s K-nearest neighbors would be evenly distributed across the two datasets. We then normalized this value by the expected number of cells from the same dataset and scaled it to range from 0 to 1. Ultimately, the alignment score between two successive $\lambda$ iterations was considered stable when the difference between them fell below the threshold of 0.001, determining our final $\lambda$.

$${{\rm{score}}}=1-\frac{\bar{x}-\frac{K}{n}}{K-\frac{K}{n}}$$

(6)

For the MERSCOPE liver dataset, we noticed intermingling of all cell types in the dimensional reduction representation of the spatial data (Fig. 5a). Potential explanations for this observation are imperfect cell segmentation or high background. This leads to a challenge during the factorization process of iNMF, since factors could become strongly associated with spill-over and/or background genes, which would not be indicative of their actual prevalence within certain cell types. To address this issue, we implemented an alternative approach known as ordinary NMF (oNMF). In oNMF, the matrices ${H}_{{sc}}$ and $W$ are only learned from reference data ${E}_{{sc}}$. We use the iterative element-wise multiplicative update solver⁸⁵. The objective function in oNMF aims to minimize the ‘Kullback-Leibler’ divergence, denoted as ${d}_{{KL}}$, which measures the dissimilarity between matrices $X$ and $Y$: ${d}_{{KL}}\left(X,Y\right)={\sum}_{i,j}({X}_{i,j}\log \left(\frac{{X}_{i,j}}{{Y}_{i,j}}\right)-{X}_{i,j}+{Y}_{i,j})$. We initialize the matrices $W$ and ${H}_{{sc}}$ using the nndsvda method of the scikit-learn python library (Non-negative Double Singular Value Decomposition with zeros filled with the average of E_sc). Our optimization process runs for a maximum of 1000 iterations, and no regularization parameters are applied to matrices ${H}_{{sc}}$ and $W$. Following the derivation of $W$ from the scRNA-seq data, we fix these factors for the spatial data while optimizing ${H}_{{sp}}$. This is achieved through an analogous element-wise iterative multiplicative update rule keeping $W$ constant:

$${H}_{{sp}}\left[i,j\right]=\frac{{H}_{{sp}}\left[i,j\right]*({W}^{T}{E}_{{sp}}^{T})[i,j]}{\left({W}^{T}W{H}_{{sp}}\right)[i,j]}$$

(7)

Importantly, we do not update the matrix $W$ during this process. This strategy enables us to infer spatial cell factors reflecting the gene expression variability information contained in the scRNA-seq data.

The number of factors ${LF}$ per cell type is a user-defined parameter. Since cell type populations are already homogenous and intra-cell type variability is typically limited, we execute NiCo with a small number of factors per cell types (${LF}=3$ by default). By performing consensus-NMF analysis⁸⁶ on the intestinal datasets, we found that factor stability frequently drops significantly for larger numbers of factors, and tuning the number of factors individually for each cell type would require substantial user intervention. We recommend running NiCo with three to five factors per cell type.

The factors obtained from NMF lack an inherent order. To address this, we introduce a quantification of disorder within each factor $i$ of cell type $k$, referred to as entropy:

$${E}^{k}[i]=\frac{-{\sum}_{i}({H}_{{sc}}^{k}\left[:,i\right]*{\log }_{2}({H}_{{sc}}^{k}[:,i]))}{{\log }_{2}\left({{\rm{length}}}\big({H}_{{sc}}^{k}[:,i]\big)\right)}$$

(8)

This normalized entropy scales between 0 and 1. We calculate the entropy of scRNA-seq cell factors (${H}_{{sc}}$) and reorder the factors by increasing entropy. Subsequently, the same ordering is applied to the spatial cell factors (${H}_{{sp}}$). We found that the normalized entropy of iNMF-derived factors is typically higher than for oNMF-derived factors. This sorting process provides a more structured and informative representation of the factors.

Latent variable annotation

Finally, both NMF approaches yield a factors-by-genes matrix $W$; each factor thus corresponds to a linear combination of genes, representing specific gene modules or meta-genes for each cell type. Since factors are derived from the shared set of genes, they do not provide transcriptome-wide information. To conduct comprehensive gene covariation analysis, pathway enrichment, and ligand-receptor analysis, we require information associating the whole transcriptome with each of these factors. To achieve this, we utilize two distinct metrics, the Spearman correlation and cosine similarity. Specifically, the Spearman correlation measures the association of each gene with every factor by calculating correlations between each column of ${E}_{{sc}}$ (representing cell type-specific scRNA-seq whole gene expression profiles) to each column of ${H}_{{sc}}$. Accordingly, the cosine similarity is obtained by taking the dot product between a column of ${E}_{{sc}}$ and the column of ${H}_{{sc}}$. Top positively or negatively correlating genes for each factor could then be inspected and functionally annotated, e.g., by pathway enrichment analysis. Ligands and receptors among these genes can be nominated for inferring intercellular signaling underlying observed covariations of factors between cells.

Identification of niche covariation

We applied regularized linear regression analysis to elucidate covariation between latent factors of co-localized cell types residing in the same tissue niche. NiCo explores these relationships between factors corresponding to a central cell type and those pertaining to co-localized neighboring cell types as identified by the interaction module. By design, NiCo is sensitive to approximately linear relationships and returns regression coefficients between the dependent variable (representing factors of central cell types) and the independent variables (comprising weighted averages of factors from neighboring cell types).

Given a central cell type $k$, we include all neighboring cell types $j$ occurring across instances of cell type $k$ with regression coefficients ${\beta }_{{kj}} \, > \, c$ for a given cutoff $c$. By increasing $c$, neighboring cell types of limited predictive capacity as inferred by NiCo’s interaction module can be discarded.

For this analysis, let ${H}_{{sp}}^{k}[j,i]$ denote the i-th factor of the j-th instance of the central cell type, and consider cell factors ${H}_{{sp}}^{m}$ of neighboring cell types $m\in \left\{1,\ldots,M\right\}$ with ${\beta }_{{km}} > c$. We infer covariation between ${H}_{{sp}}^{k}[,i]$ and all factors ${H}_{{sp}}^{m}$ across all instances $j$ by ridge regression:

$${\min }_{\alpha }\left(\left\|{\sum }_{m}{\sum }_{l=1}^{{LF}}{\alpha }_{l,m}^{k,i}\frac{1}{{N}_{j}^{k,m}}{\sum }_{h\in {S}_{j}^{k,m}}{H}_{{sp}}^{m}\left[h,l\right]-{H}_{{sp}}^{k}\left[j,i\right]\right\|_{2}^{2}+\eta \left\|{\alpha }_{l,m}^{k,i}\right\|_{2}^{2}\right)$$

(9)

Here, ${\alpha }_{l,m}^{k,i}$ denotes the regression coefficients between factor $i$ of the central cell type $k$ and the latent factor $l$ of neighboring cell type $m$. ${S}_{j}^{k,m}$ denotes the set of all cells of cell type $m$ in the neighborhood of the j-th instance of central cell type $k$ and ${N}_{j}^{k,m}$ denotes the number of these cells. Hence, NiCo averages a latent factor across all cells of a given neighboring cell type.

The ridge regression regularization parameter η is optimized through the following procedure: The initial η parameter range spans powers of 2, ranging from [2⁻¹⁰, 2⁻⁹,…, 2]¹⁰. GridSearchCV calculates the negative mean squared error using the leave-one-out cross-validation strategy to determine the optimal η parameter.

Ligand-receptor interaction analysis

To identify ligand-receptor signaling interactions as potential mediators of the inferred covariation of factors in co-localized cells, we first compiled a ligand-receptor database from available resources. We created NiCoLRdb by merging ligand-receptor pairs from Omnipath⁸⁷, NATMI⁸⁸, and CellPhoneDB^89,90. Multimeric complexes from CellPhoneDB were not included. From this database we extract ligand-receptor (LR) cognate pairs within the cell type of interest and interacting neighboring cell types included in the covariation analysis. Since only limited numbers of ligand and receptor genes are typically directly measured in the spatial data, we leverage information from the genome-wide scRNA-seq data. For a pair of interacting cell types with covarying factors inferred by NiCo, we only retain ligand-receptor pairs where the ligand and receptor genes are expressed in a minimum fraction ${f}_{{LR}}$ of cells for the interacting cell types. Furthermore, we require a minimum absolute Spearman correlation ${c}_{{LR}}$ of the ligand and receptor genes with the respective factor in the covarying cell type. To increase sensitivity, ${f}_{{LR}}$ and ${c}_{{LR}}$ can be lowered, and higher values of these thresholds select candidates with stronger associations with the covarying factors. However, it is not necessarily expected that both ligand and receptor correlate with the covarying factors. For example, the receptor could be constitutively expressed in the receiving cell type, while the ligand is induced together with a gene program associated with a factor in the sending cell type. Therefore, it is recommended to apply low cutoff values for ${f}_{{LR}}$ and ${c}_{{LR}}$ and visually inspect the metrics for each pair. For visualization, we devised a quadrilateral (rectangle) plot uniting the fraction of expressing cells and factor correlation values within a dot-plot-like visualization to highlight LR crosstalk. Four isosceles triangles form this unit, each oriented towards one of the rectangle’s edges and joined in the center. The western and eastern faces represent the fraction of cells expressing the ligand and receptor, respectively, while the northern and southern faces highlight ligand and receptor factor correlation values. Ligands are consistently mapped on the y-axis, while receptors are denoted on the x-axis.

Pathway enrichment analysis for factor-associated genes

We conducted pathway enrichment analysis using the Enrichr module within the GSEApy package⁹¹. The objective was to assess whether a predefined set of genes displayed statistical significance or relevance with respect to specific gene sets or pathways from the Enrichr library. We supplied the top positively or negatively correlated genes for a given factor as our input ‘gene list’. This gene list was then queried against various Enrichr libraries, such as ‘BioPlanet 2019’, ‘Reactome 2016’, or ‘GO Biological Process 2021’ to assess the enrichment within diverse cellular processes or pathways. Enrichr visually presents significant cellular processes as a dot plot, where the Y-axis represents these processes as a function of a combined score. The size of each dot indicates the percentage of genes shared within specific cellular processes, while the color code reflects the associated p-value. The combined score, denoted as ‘c’ is defined as the product of the logarithm of the p-value and the z-score. The p-value is calculated using the Enrichr Fisher’s exact test, while the z-score is determined through the Enrichr correction applied to the Fisher’s exact test⁹².

smFISH experiment

Sample preparation and image acquisition

Single molecule FISH by smHCR was performed on 5 μm cut, snap-frozen, methanol fixed mouse liver tissue sections from C57B/6 J males with an age of 12−16 weeks (n = 3). Coding-sequences of Clec4f (NM 016751.3), Tgfb1 (NM 011577.2), Dcn (NM 001190451.2) were used to design smFISH probes of 20 nt length. Probes were designed by Molecular Instruments (Los Angeles, United States) and the hybridization protocol was performed according to the manufacturer. Samples were imaged using an Olympus IX83 CSUW1-TS2 spinning disc confocal microscope equipped with 405, 488, 561 and 640 nm LED lasers using a 60x oil immersion objective. Image acquisition was performed with a resolution of (149.76 × 149.76) microns equivalent to 2304 × 2304 px (binning 1 × 1) and images opened with the Bio-Formats Importer and further processed using FIJI⁹³.

Detection of RNA spots and cell segmentation

We employed the STARFISH package⁹⁴ for finding the precise localization of RNA spots and harnessed the QuPath software⁹⁵ to perform the cell segmentation based on extended DAPI regions with default parameters. Our smFISH data acquisition leveraged four lasers, each capturing distinct molecular signals across different areas of tissue into four channels: Tgfb1(Red 561 nm laser), Dcn (Far-red 639 nm laser), Clec4f (Green 488 nm laser), and DAPI (Blue 405 nm laser). We initiated our analysis with a white top hat filter to remove the background noise with a masking radius of 3. Subsequently, we normalized and enhanced each channel’s intensity using the STARFISH ‘ClipPercentileToZero’ function within the percentile intensities parameter range of p min = 80 to p max = 100. To ensure the robustness of detected spots, we wrote a custom script to convert detected spot coordinates into Tif images and then systematically overlaid the Tif images of raw signals. This way we verified detected spots for a range of parameters in FIJI⁹³ for each channel across different fields of view (FOV). Once we achieved a satisfactory level of spot detection, we proceeded to select optimal parameter values. These included the minimum and maximum sigma parameters (1.5, 3), numerical sigma (10), and the intensity threshold parameter (95th percentile), facilitating spot detection across all FOVs. We saved all the detected spot centroid parameters (Xc, Yc) and the radius parameter r of the spots for further analysis.

Analysis of expression correlation in co-localized pairs of stellate cells and Kupffer cells

The cell segmentation data, derived from QuPath, were processed using the regionprops command from the scikit-image python library. This command stores the pixel coordinates of the true cell morphological region as a labeled mask. We generated discrete region masks associated with individual spots bounded in x dimension within the interval [ceil(Xc − r), floor(Xc + r)] and bounded in y dimension within [ceil(Yc − r), floor(Yc + r)]. By assessing the containment of a spot’s mask within a cell’s mask, we effectively determined the complete overlap of the spots region with the cell region. Spots failing to satisfy this criterion were excluded from further analysis. This process was iteratively applied, resulting in a tally of associated spots for each cell across the Clec4f, Dcn, and Tgfb1 gene transcripts.

To ascertain whether the cell’s identity belongs to the Kupffer cell (KC) or stellate cell population, we first selected all the cells exhibiting over 20 transcripts for either Dcn or Clec4f, along with satisfactory segmentation accuracy. The maximum number of expressed transcripts in a single cell could be over 150 in such a setting. We labeled cells as stellate cells if Dcn transcript counts exceed Clec4f transcript counts, and as KC if Clec4f transcript counts are greater than Dcn transcript counts. This soft criterion accounts for imperfect segmentation due to the difficult-to-segment shape of stellate cells with long protrusions. Cells not meeting these criteria remained unclassified. To explore spatial relationships, we constructed cell neighborhoods based on Delaunay Triangulation using the centroid positions. Complex units comprising adjacent KC and stellate cells were identified from the colocalized neighborhood. In total, 93 such complex units were discerned for three FOVs. From all complex units, we generated a scatter plot, with Dcn transcript levels from stellate cells represented on the y-axis and Tgfb1 transcripts from KC on the x-axis. Pearson correlation coefficients were computed, and linear regression was applied for quantification.

Hepatic stellate cell culture experiment

Animal work

Male C57BL/6 mice were used and all the animal experiments that were performed, were in accordance with the local and institutional regulations for the Protection of Animal Welfare (Regierung Unterfranken, Bavaria, Germany).

Isolation and cultivation of mouse hepatic stellate cells (HSCs) and real-time PCR

HSCs were isolated from 25 to 35 weeks old, male C57BL/6 mice as previously described⁹⁶. In brief, in situ perfusion with EGTA buffer and Collagenase solution was followed by a successive Collagenase and Pronase in vitro digest. HSCs were washed and enriched by non-ionic gradient centrifugation and seeded in 24 wells (6 × 10⁴ cells per well; Thermo Scientific, Waltham, United States). Prior to cell stimulation, HSCs were cultured for 72 h in DMEM (Thermo Scientific) supplemented with 10% FBS (Corning, New York, United States) and 1% penicillin–streptomycin (Thermo Scientific). To activate Tgf-β signaling, cells were washed with PBS (Thermo Scientific) and treated with recombinant mouse TGF-β1 (10 ng/ml; Biolegend, San Diego, United States), under FBS free conditions and compared with pretreated HSCs, which received recombinant mouse Decorin (10 μg/ml; R&D Systems, Minneapolis, United States) for 1 h before TGF-β1 was added. After 72 h of treatment, cells were harvested for RNA extraction, cDNA synthesis and qPCR analysis, as previously described⁹⁷. Total RNA was extracted using Arcturus PicoPure RNA Isolation Kit and cDNA was synthesized using RevertAid First Strand cDNA Synthesis kit (both Thermo Scientific) according to the manufacturer’s instructions. For qPCR data analysis, the relative gene expression values for the target mRNA were normalized to Gapdh, which was selected from a set of 3 housekeeping genes (Gapdh, Hprt, Ppia), based on the normqPCR algorithm⁹⁸. Used primers are listed in Supplementary Table 1.

Analysis of qPCR data

We performed three technical replicates and five biological replicates for the experiment, measuring the C_q (Quantification Cycle) values of Acta2, Col1a1, Lox, Pdgfrb, and Reln genes. For each biological replicate, we first averaged the technical replicates and quantified the relative expression of each gene i of interest using: $\Delta {{c}_{q}}_{i}^{j}={2}^{({{C}_{q}}_{{{\rm{Gapdh}}}}^{j}-{{C}_{q}}_{i}^{j})}$, where j represents the condition from the following groups: control (HSC), HSC+Tgfb, HSC+Dcn, and HSC+Tgfb+Dcn. We then quantified the relative gene expression over the control group: ${\Delta \Delta {c}_{q}}_{i}^{j}=\frac{{\Delta {c}_{q}}_{i}^{j}}{{\Delta {c}_{q}}_{i}^{{{\rm{control}}}}}$. Finally, we performed a two sample paired t-test between ${\Delta \Delta {c}_{q}}_{i}^{{{\rm{HSC}}}+{{\rm{Tgfb}}}}$ and ${\Delta \Delta {c}_{q}}_{i}^{{{\rm{HSC}}}+{{\rm{Tgfb}}}+{{\rm{DCN}}}}$ across the data points of biological replicates and reported the p-value in the figure.

Benchmarking NiCo cell type niche interactions with tissue niche detection methods

To compare NiCo predictions for cell-cell interactions derived from logistic regression coefficients with topological tissue niche domains predicted by alternative methods, we define two scoring schemes:

Niche composition Z-score

For a given tissue ___domain $d,$ we compute the frequency ${f}_{{dj}}$ of cell type $j$, and mean ${\mu }_{j}$ and standard deviation ${\sigma }_{j}$ of cell type $j$ across all domains. From these numbers, we compute a z-score for the enrichment of cell type $j$ in ___domain $d$:

$${Z}_{{dj}}=\frac{{f}_{{dj}}-{\mu }_{j}}{{\sigma }_{j}}$$

(10)

Cellcharter neighborhood enrichment score (CC enrichment)

As alternative scoring metric we compute the function cc.gr.enrichment from the CellCharter method. cc.gr.enrichment takes an anndata object along with ‘group_key’ and ‘label_key’ as input and computes the enrichment of label_key in group_key. The function returns the ratio of observed to expected values and indicates whether a cell type is over- or under-represented in a given tissue ___domain.

We compare the cell type composition between the ground truth (GT) domains and method-detected domains using the Z-score and the CC enrichment. In both scenarios, we compute the Pearson correlation of each of these quantities between a GT ___domain and all method-detected domains and consider the method-detected domains with the highest correlation as best match. We then compare the distribution of maximum correlations for all GT domains across methods.

Niche-detection benchmarking on Allen Brain spatial MERFISH atlas data and STARmap mouse visual cortex data

We used the default parameters for CellCharter, Banksy, SpaGCN, SpatialPCA, SeuratV5, and Stagate, except for the clustering resolution parameter or the required number of clusters. We adjusted this parameter to retrieve a number of predicted domains close to the number of GT domains (6). For the Allen Brain MERFISH data, CellCharter identified 6 clusters that maximized the stability parameter. Banksy, with the default Leiden clustering resolution of 0.9 and lambda of 0.8, identified 15 clusters, which exceeds the ground truth (GT) of 6 domains. We also tested another clustering resolution of 0.2 with a lambda of 0.8, which resulted in 9 clusters. SpaGCN, with the default Louvain parameter, identified 23 clusters, while resolutions of 0.2, 0.1, and 0.05 identified 13, 9, and 3 clusters, respectively. For SpatialPCA, we specified the clusternum parameter as 6 at the louvain_clustering step. Similarly, in SeuratV5, we set the niches.k parameter to 6 in the BuildNicheAssay step. Stagate, with the default resolution of 1, identified 22 clusters; resolution of 0.5, 0.2, and 0.1 identified 15, 9, and 4 clusters, respectively.

After adjustment of clustering resolution or cluster number, all methods identified 7 clusters for STARmap mouse visual cortex data, equal to the number of GT domains, except for CellCharter, which identified only five clusters. All methods were used with default parameters except for the clustering resolution parameter or the required number of clusters. We set the niches.k parameter to 7 for Seurat and clusternum parameter to 7 for SpatialPCA. We used Leiden clustering with a resolution of 0.5 in Stagate, a resolution of 0.9 in Banksy, and a resolution of 1 in SpaGCN. CellCharter identified 5 clusters from stability analysis that were used to define the tissue domains.

Cell-cell interaction benchmarking on Intestinal data

We benchmarked Niche-DE⁵⁶, COMMOT²¹, cellNeighborEX⁵⁸, and stLearn⁵⁷ on the intestinal data to predict ligand-receptor pairs enriched in pairs of interacting cell types. Niche-DE⁵⁶ identified 6 genes (Mptx2, Slc12a2, Maoa, Clca3b, Apob, Stmn1) between Paneth niche cells and the stem/TA index cell type, and 4 genes (Stmn1, H2-Eb1, Maoa, Slc12a2) between stem/TA niche cells and the Paneth index cell. CellNeighborEX⁵⁸ identified Mptx2, Lgr5, and Kit as cell-contact dependent differentially expressed genes between stem/TA and Paneth cells.

COMMOT²¹ using default parameters that filter LR pairs where both ligand and receptor are expressed in at least 5% of the spots, identified 0 pairs with the CellChat database and 5 pairs (Apob-Sdc1, Cadm1-Cadm1, Cd14-Itgam, Vcan-Cd44, Vim-Cd44) with the NiCoLRdb database. stLearn⁵⁷ detected five pairs of LR interactions (Vim-Cd44, Vcan-Cd44, Cadm1-Cadm1, Gzmb-Chrm3, Cd34-Sell) using the connectomeDB2020 database.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All datasets used in this paper are publicly available. Detailed preprocessing steps and cell type annotations are described in the “Dataset Description” section of the Methods. The scRNA-seq tissue data are available from the following sources: the mouse organogenesis data are available at ArrayExpress under accession number E-MTAB-6967; the mouse intestinal data are available at GEO under accession number GSE190037; the mouse liver cell atlas data (StSt) are available at https://www.livercellatlas.org/download.php; the mouse primary motor cortex data are available at https://assets.nemoarchive.org/dat-ch1nqb7.

The spatial datasets are available from the following sources: the mouse organogenesis seqFISH data are available at https://content.cruk.cam.ac.uk/jmlab/SpatialMouseAtlas2020/; the mouse small intestine MERFISH data are available at https://datadryad.org/stash/dataset/doi:10.5061/dryad.jm63xsjb2; the mouse liver MERFISH data are available at the Vizgen MERSCOPE website https://info.vizgen.com/mouse-liver-access/; the mouse primary motor cortex (coronal slice 153, mouse1) MERFISH data aren available at https://download.brainimagelibrary.org/29/3c/293cc39ceea87f6d/processed_data/. The cerebellum Slide-seqV2 data “myRCTD_cer_reps.rds” file was obtained using the scripts at https://raw.githack.com/dmcable/spacexr/master/AnalysisCSIDE/Figures/figure3.html and converted into a scanpy object that is available under the NiCo tutorial link https://nico-sc-sp.readthedocs.io/en/latest/tutorial2.html. The visual cortex STARmap data are available at https://github.com/prabhakarlab/Banksy_py/tree/main/data/starmap. The MERFISH Brain Atlas data C57BL6J-638850-raw.h5ad were downloaded from https://alleninstitute.github.io/abc_atlas_access/descriptions/MERFISH-C57BL6J-638850.html. The intestinal stem cell lineage scRNA-seq data are available at GEO under accession number [GSE94092]. The Supplementary Fig. source data are available at https://github.com/ankitbioinfo/nico_tutorial/tree/main/Supplementary_data. The processed source data, simulation datasets, benchmark data, and experimental data presented in the main figures are available at https://zenodo.org/records/13895622.

Code availability

The open-source code for NiCo is available at a PyPI repository (https://pypi.org/project/nico-sc-sp/) and on Zenodo (https://zenodo.org/records/13902981)⁹⁹. We uploaded all codes and scripts used for this study’s analyses and figure plotting functions to the NiCo package. The documentation, usage instructions and other relevant information related to NiCo can be accessed through the tutorial (https://github.com/ankitbioinfo/nico_tutorial) and documentation (https://nico-sc-sp.readthedocs.io/en/latest/).

References

Stein-O’Brien, G. L., Ainslie, M. C. & Fertig, E. J. Forecasting cellular states: from descriptive to predictive biology via single-cell multiomics. Curr. Opin. Syst. Biol. 26, 24–32 (2021).
Article PubMed PubMed Central Google Scholar
Sagar & Grün, D. Deciphering cell fate decision by integrated single-cell sequencing analysis. Annu. Rev. Biomed. data Sci. 3, 1–22 (2020).
Article CAS PubMed PubMed Central Google Scholar
Leyes Porello, E. A., Trudeau, R. T. & Lim, B. Transcriptional bursting: stochasticity in deterministic development. Development 150, dev201546 (2023).
Velten, B. & Stegle, O. Principles and challenges of modeling temporal and spatial omics data. Nature Methods 20, 1462–1474 (2023).
Grün, D. Revealing dynamics of gene expression variability in cell state space. Nat. methods 17, 45–49 (2020).
Article PubMed Google Scholar
Rosales-Alvarez, R. E. et al. VarID2 quantifies gene expression noise dynamics and unveils functional heterogeneity of ageing hematopoietic stem cells. Genome Biol. 24, 1–30 (2023).
Article Google Scholar
Vallejos, C. A., Marioni, J. C. & Richardson, S. BASiCS: Bayesian analysis of single-cell sequencing data. PLoS computational Biol. 11, e1004333 (2015).
Article ADS Google Scholar
Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat. methods 17, 159–162 (2020).
Article CAS PubMed Google Scholar
Hu, Y., Peng, T., Gao, L. & Tan, K. CytoTalk: De novo construction of signal transduction networks using single-cell transcriptomic data. Sci. Adv. 7, eabf1356 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Williams, C. G., Lee, H. J., Asatsuma, T., Vento-Tormo, R. & Haque, A. An introduction to spatial transcriptomics for biomedical research. Genome Med. 14, 1–18 (2022).
Article Google Scholar
Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).
Article ADS PubMed Google Scholar
Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792 (2022).
Article CAS PubMed Google Scholar
Cho, C.-S. et al. Microscopic examination of spatial transcriptome using Seq-Scope. Cell 184, 3559–3572 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Article PubMed PubMed Central Google Scholar
Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ke, R. et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nat. methods 10, 857–860 (2013).
Article CAS PubMed Google Scholar
Lee, J. H. et al. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues. Nat. Protoc. 10, 442–458 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Article PubMed PubMed Central Google Scholar
Svensson, V., Teichmann, S. A. & Stegle, O. SpatialDE: identification of spatially variable genes. Nat. methods 15, 343–346 (2018).
Article CAS PubMed PubMed Central Google Scholar
Cang, Z. & Nie, Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat. Commun. 11, 2084 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Cang, Z. et al. Screening cell–cell communication in spatial transcriptomics via collective optimal transport. Nat. Methods 20, 218–228 (2023).
Article CAS PubMed PubMed Central Google Scholar
Tanevski, J., Flores, R. O. R., Gabor, A., Schapiro, D. & Saez-Rodriguez, J. Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol. 23, 1–31 (2022).
Article Google Scholar
Yuan, Y. & Bar-Joseph, Z. GCNG: graph convolutional networks for inferring gene interaction from spatial transcriptomics data. Genome Biol. 21, 1–16 (2020).
Article Google Scholar
Fischer, D. S., Schaar, A. C. & Theis, F. J. Modeling intercellular communication in tissues using spatial graphs of cells. Nat. Biotechnol. 41, 332–336 (2023).
Article CAS PubMed Google Scholar
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 (2019).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. & Michailidis, G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32, 1–8 (2016).
Article PubMed Google Scholar
Kleshchevnikov, V. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat. Biotechnol. 40, 661–671 (2022).
Article CAS PubMed Google Scholar
Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. methods 18, 1352–1362 (2021).
Article PubMed PubMed Central Google Scholar
Mages, S. et al. TACCO unifies annotation transfer and decomposition of cell identities for single-cell and spatial omics. Nat. Biotechnol. 41, 1465–1473 (2023).
Article CAS PubMed PubMed Central Google Scholar
Cao, K., Gong, Q., Hong, Y. & Wan, L. A unified computational framework for single-cell data integration with optimal transport. Nat. Commun. 13, 7419 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Petukhov, V. et al. Cell segmentation in imaging-based spatial transcriptomics. Nat. Biotechnol. 40, 345–354 (2022).
Article CAS PubMed Google Scholar
Zhang, M. et al. Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature 598, 137–143 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Lohoff, T. et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat. Biotechnol. 40, 74–85 (2022).
Article CAS PubMed Google Scholar
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024).
Article CAS PubMed Google Scholar
Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet. 56, 74–84 (2024).
Article CAS PubMed Google Scholar
Hu, J. et al. SpaGCN: Integrating gene expression, spatial ___location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. methods 18, 1342–1351 (2021).
Article PubMed Google Scholar
Shang, L. & Zhou, X. Spatially aware dimension reduction for spatial transcriptomics. Nat. Commun. 13, 7203 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Singhal, V. et al. BANKSY unifies cell typing and tissue ___domain segmentation for scalable spatial omics data analysis. Nature Genetics 1–11 (2024).
Yao, Z. et al. A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain. Nature 624, 317–332 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Qiu, C. et al. Systematic reconstruction of cellular trajectories across mouse embryogenesis. Nat. Genet. 54, 328–341 (2022).
Article CAS PubMed PubMed Central Google Scholar
Qu, F. et al. Three-dimensional molecular architecture of mouse organogenesis. Nat. Commun. 14, 4599 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Kelly, R. G. & Evans, S. M. The Second Heart Field. in Heart Development and Regeneration 143–169 (Academic Press, 2010).
Kuhn, E. N. & Wu, S. M. Origin of cardiac progenitor cells in the developing and postnatal heart. J. Cell. Physiol. 225, 321–325 (2010).
Article CAS PubMed PubMed Central Google Scholar
Palpant, N. J. et al. Transmembrane protein 88: a Wnt regulatory protein that specifies cardiomyocyte development. Development 140, 3799–3808 (2013).
Article CAS PubMed PubMed Central Google Scholar
Cai, C.-L. et al. Isl1 identifies a cardiac progenitor population that proliferates prior to differentiation and contributes a majority of cells to the heart. Developmental cell 5, 877–889 (2003).
Article CAS PubMed PubMed Central Google Scholar
Zhuang, S. et al. Expression of Isl1 during mouse development. Gene Expr. Patterns 13, 407–412 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kelly, R. G., Buckingham, M. E. & Moorman, A. F. Heart fields and cardiac morphogenesis. Cold Spring Harb. Perspectives Med. 4, a015750 (2014).
Bruneau, B. G. The developmental genetics of congenital heart disease. Nature 451, 943–948 (2008).
Article ADS CAS PubMed Google Scholar
Yu, P. et al. Cardiomyocyte-restricted high-mobility group box 1 (HMGB1) deletion leads to small heart and glycolipid metabolic disorder through GR/PGC-1α signalling. Cell death Discov. 6, 106 (2020).
Article CAS PubMed PubMed Central Google Scholar
Beumer, J. & Clevers, H. Regulation and plasticity of intestinal stem cells during homeostasis and regeneration. Development 143, 3639–3649 (2016).
Article CAS PubMed Google Scholar
Böttcher, A. et al. Non-canonical Wnt/PCP signalling regulates intestinal stem cell lineage priming towards enteroendocrine and Paneth cell fates. Nat. cell Biol. 23, 23–31 (2021).
Article PubMed Google Scholar
Niec, R. E. et al. Lymphatics act as a signaling hub to regulate intestinal stem cell activity. cell stem cell 29, 1067–1082 (2022).
Article CAS PubMed PubMed Central Google Scholar
Flanagan, D. J. et al. Frizzled7 functions as a Wnt receptor in intestinal epithelial Lgr5+ stem cells. Stem cell Rep. 4, 759–767 (2015).
Article CAS Google Scholar
Mason, K. et al. Niche-DE: niche-differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell-cell interactions. Genome Biol. 25, 14 (2024).
Article CAS PubMed PubMed Central Google Scholar
Pham, D. et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat. Commun. 14, 7739 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Kim, H. et al. CellNeighborEX: deciphering neighbor-dependent gene expression from spatial transcriptomics data. Mol. Syst. Biol. 19, e11670 (2023).
Article CAS PubMed PubMed Central Google Scholar
Michalopoulos, G. K. & Bhushan, B. Liver regeneration: biological and pathological mechanisms and implications. Nat. Rev. Gastroenterol. Hepatol. 18, 40–55 (2021).
Article PubMed Google Scholar
Arroyo, V. et al. Acute-on-chronic liver failure in cirrhosis. Nat. Rev. Dis. Prim. 2, 1–18 (2016).
Google Scholar
Halpern, K. B. et al. Single-cell spatial reconstruction reveals global division of labour in the mammalian liver. Nature 542, 352–356 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Aizarani, N. et al. A human liver cell atlas reveals heterogeneity and epithelial progenitors. Nature 572, 199–204 (2019).
Article CAS PubMed PubMed Central Google Scholar
Guilliams, M. et al. Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches. Cell 185, 379–396 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bonnardel, J. et al. Stellate cells, hepatocytes, and endothelial cells imprint the Kupffer cell identity on monocytes colonizing the liver macrophage niche. Immunity 51, 638–654 (2019).
Article CAS PubMed PubMed Central Google Scholar
Friedman, S. L. Hepatic stellate cells: protean, multifunctional, and enigmatic cells of the liver. Physiological Rev. 88, 125–172 (2008).
Article CAS Google Scholar
Yang, W. et al. Single-cell transcriptomic analysis reveals a hepatic stellate cell–activation roadmap and myofibroblast origin during liver fibrosis in mice. Hepatology 74, 2774–2790 (2021).
Article CAS PubMed Google Scholar
Nguyen-Lefebvre, A. T. & Horuzsko, A. Kupffer cell metabolism and function. J. Enzymol. Metaboli. 1, 101 (2015).
Wang, Y., Li, J., Wang, X., Sang, M. & Ho, W. Hepatic stellate cells, liver innate immunity, and hepatitis C virus. J. Gastroenterol. Hepatol. 28, 112–115 (2013).
Article CAS PubMed PubMed Central Google Scholar
Baghy, K., Iozzo, R. V. & Kovalszky, I. Decorin–TGFβ axis in hepatic fibrosis and cirrhosis. J. Histochemistry Cytochemistry 60, 262–268 (2012).
Article CAS Google Scholar
Dewidar, B., Meyer, C., Dooley, S. & Meindl-Beinker, N. TGF-β in hepatic stellate cell activation and liver fibrogenesis—updated 2019. Cells 8, 1419 (2019).
Article CAS PubMed PubMed Central Google Scholar
Li, Y., Fan, W., Link, F., Wang, S. & Dooley, S. Transforming growth factor β latency: a mechanism of cytokine storage and signalling regulation in liver homeostasis and disease. JHEP Rep. 4, 100397 (2022).
Article PubMed Google Scholar
Shah, S. et al. Single-molecule RNA detection at depth by hybridization chain reaction and tissue hydrogel embedding and clearing. Development 143, 2862–2867 (2016).
Article CAS PubMed PubMed Central Google Scholar
Cable, D. M. et al. Cell type-specific inference of differential expression in spatial transcriptomics. Nat. methods 19, 1076–1087 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bellamy, T. C. Interactions between Purkinje neurones and Bergmann glia. cerebellum 5, 116–126 (2006).
Article PubMed Google Scholar
Metea, M. R. & Newman, E. A. Calcium signaling in specialized glial cells. Glia 54, 650–655 (2006).
Article PubMed PubMed Central Google Scholar
Raj, A. & Van Oudenaarden, A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell 135, 216–226 (2008).
Article CAS PubMed PubMed Central Google Scholar
Esteban-Martínez, L. & Torres, M. Metabolic regulation of cell competition. Developmental Biol. 475, 30–36 (2021).
Article Google Scholar
Veličković, P. et al. Graph attention networks. arXiv preprint https://doi.org/10.48550/arXiv.1710.10903 (2017).
Yao, Z. et al. A transcriptomic and epigenomic cell atlas of the mouse primary motor cortex. Nature 598, 103–110 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. 40, 517–526 (2022).
Article CAS PubMed Google Scholar
Haghverdi, L., Lun, A. T., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
Article CAS PubMed PubMed Central Google Scholar
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).
Article CAS PubMed PubMed Central Google Scholar
Plimpton, S. Fast parallel algorithms for short-range molecular dynamics. J. computational Phys. 117, 1–19 (1995).
Article ADS CAS Google Scholar
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lee, D. D. & Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999).
Article ADS CAS PubMed Google Scholar
Kotliar, D. et al. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. Elife 8, e43803 (2019).
Article PubMed PubMed Central Google Scholar
Türei, D., Korcsmáros, T. & Saez-Rodriguez, J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat. methods 13, 966–967 (2016).
Article PubMed Google Scholar
Hou, R., Denisenko, E., Ong, H. T., Ramilowski, J. A. & Forrest, A. R. Predicting cell-to-cell communication networks using NATMI. Nat. Commun. 11, 5011 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Efremova, M., Vento-Tormo, M., Teichmann, S. A. & Vento-Tormo, R. CellPhoneDB: inferring cell–cell communication from combined expression of multi-subunit ligand–receptor complexes. Nat. Protoc. 15, 1484–1506 (2020).
Article CAS PubMed Google Scholar
Garcia-Alonso, L. et al. Single-cell roadmap of human gonadal development. Nature 607, 540–547 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
Article CAS PubMed Google Scholar
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinforma. 14, 1–14 (2013).
Article Google Scholar
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. methods 9, 676–682 (2012).
Article CAS PubMed Google Scholar
Axelrod, S. et al. starfish: scalable pipelines for image-based transcriptomics. J. Open Source Softw. 6, 2440 (2021).
Article ADS Google Scholar
Bankhead, P. et al. QuPath: Open source software for digital pathology image analysis. Sci. Rep. 7, 1–7 (2017).
Article CAS Google Scholar
Mederacke, I., Dapito, D. H., Affò, S., Uchinami, H. & Schwabe, R. F. High-yield and high-purity isolation of hepatic stellate cells from normal and fibrotic mouse livers. Nat. Protoc. 10, 305–315 (2015).
Article CAS PubMed PubMed Central Google Scholar
Thomann, S. et al. YAP orchestrates heterotypic endothelial cell communication via HGF/c-MET signaling in liver tumorigenesis. Cancer Res. 80, 5502–5514 (2020).
Article CAS PubMed Google Scholar
Perkins, J. R. et al. ReadqPCR and NormqPCR: R packages for the reading, quality checking and normalisation of RT-qPCR quantification cycle (Cq) data. BMC genomics 13, 1–8 (2012).
Article Google Scholar
Agrawal, A., Thomann, S., Basu, S. & Dominic, G. NiCo identifies extrinsic drivers of cell state modulation by niche covariation analysis. https://doi.org/10.5281/zenodo.13902981 (2024).

Download references

Acknowledgements

We thank the members of the Grün lab for their advice and encouragement throughout this project. We also thank Julio Saez-Rodriguez, Jovan Tanevski, Moritz Lampert and Ingo Scholtes for helpful discussions and suggestions. We thank Carlos Talavera-Lopéz for critical reading and feedback on the manuscript. This work was supported by the German Research Foundation (DFG) (SPP1937 GA 2129/2-2, SFB1425 Project #422681845 and INST 93/1072-1 Project #471222118, to D.G.), by the CZI Seed Networks for the Human Cell Atlas (to D.G.), by the ERC (818846 - ImmuNiche - ERC-2018-COG, to D.G.), and by the Bundesministerium für Bildung und Forschung (BMBF) (TissueNet − 031L0311A and CureFib − 01EJ2201C, to D.G.).

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Würzburg Institute of Systems Immunology, Julius‐Maximilians-Universität Würzburg, Würzburg, Germany
Ankit Agrawal, Stefan Thomann, Sukanya Basu & Dominic Grün
CAIDAS - Center for Artificial Intelligence and Data Science, Würzburg, Germany
Dominic Grün

Authors

Ankit Agrawal
View author publications
Search author on:PubMed Google Scholar
Stefan Thomann
View author publications
Search author on:PubMed Google Scholar
Sukanya Basu
View author publications
Search author on:PubMed Google Scholar
Dominic Grün
View author publications
Search author on:PubMed Google Scholar

Contributions

D.G. conceived the project and proposed the algorithm. D.G. and A.A. designed the algorithm. A.A. implemented the algorithm, collected and analyzed all data. S.B. performed in vitro culture experiments. S.T. carried out smFISH experiments. D.G. and A.A. wrote the manuscript. D. G. supervised the project. All authors reviewed the manuscript.

Corresponding author

Correspondence to Dominic Grün.

Ethics declarations

Competing interests

D.G. serves on the scientific advisory board of Gordian Biotechnology. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Agrawal, A., Thomann, S., Basu, S. et al. NiCo identifies extrinsic drivers of cell state modulation by niche covariation analysis. Nat Commun 15, 10628 (2024). https://doi.org/10.1038/s41467-024-54973-w

Download citation

Received: 17 August 2024
Accepted: 22 November 2024
Published: 05 December 2024
DOI: https://doi.org/10.1038/s41467-024-54973-w

This article is cited by

Decoding cell–cell communication using spatial transcriptomics
- Ankit Agrawal
Nature Reviews Genetics (2025)