Abstract
Differentiation of induced pluripotent stem cells (iPSCs) into specialized cell types is essential for uncovering cell-type specific molecular mechanisms and interrogating cellular function. Transcription factor screens have enabled efficient production of a few cell types; however, engineering cell types that require complex transcription factor combinations remains challenging. Here, we report an iterative, high-throughput single-cell transcription factor screening method that enables the identification of transcription factor combinations for specialized cell differentiation, which we validated by differentiating human microglia-like cells. We found that the expression of six transcription factors, SPI1, CEBPA, FLI1, MEF2C, CEBPB, and IRF8, is sufficient to differentiate human iPSC into cells with transcriptional and functional similarity to primary human microglia within 4 days. Through this screening method, we also describe a novel computational method allowing the exploration of single-cell RNA sequencing data derived from transcription factor perturbation assays to construct causal gene regulatory networks for future cell fate engineering.
Similar content being viewed by others
Introduction
Recent advances and applications of single-cell assays, exemplified by collaborative efforts such as the Human Cell Atlas (HCA)1, have begun to provide a comprehensive view of cell types and cellular states within the human body. Such maps are crucial for understanding tissue heterogeneity during human development and disease. Furthermore, these maps can be mined for promising targets for cell fate engineering, with significant implications for disease modeling, cell therapy, and regenerative medicine. Previously, our lab generated the first comprehensive library of 1732 human transcription factor (TF) isoforms and developed an unbiased screening approach to discover 290 TFs that differentiate human induced pluripotent stem cells (iPSCs) into various functional cell types2. This approach illustrates the power of unbiased screening and led to the discovery of differentiation protocols for several cell types; however, creating a user-specified cell type of interest remains an open area for innovation. To this end, we developed a novel screening platform that enables the identification of TF combinations for differentiating iPSCs into any given cell type using a combination of pooled transfections of a barcoded TF library, single-cell RNA sequencing (scRNA-seq) with TF ranking (Fig. 1a), and subsequent validation and characterization. As proof of principle, we sought to determine TF combinations capable of producing microglia in an efficient, rapid, and reproducible fashion and perform a comparative analysis to other differentiation methods.
a Workflow of the first pooled TF screen. Two independent transductions were performed for the first pooled TF screen. This panel is generated using BioRender under an academic license. b Flow cytometry analysis of stem cell (TRA-1-60) and microglia (P2RY12, CD11b, CX3CR1) proteins in the PGP1 + 40 TF pool before and after Dox induction from two independent pooled TF transductions. c Representative image of cells with low TRA-1-60 expression in the Dox+ group to be sorted for scRNA-seq. d UMAP clustering of two independently transfected and differentiated PGP1 iPSC pools. Colors represent clusters identified by Seurat at 0.3 resolution. e Expression of microglia (ITGAM, CX3CR1, TMEM119, P2RY12, TREM2) and spiked-in stem cell (POU5F1) genes in scRNA-seq. f Primer designs for co-amplification of TF and cell barcodes in 10x Genomics 3’ workflow. This panel is generated using BioRender under an academic license. g Number of TFs per cell counted from normalized and binarized TF expression matrix. h Ranking of the 40 TFs after one-sided Wilcoxon rank sum test with the two tested groups being with or without microglia gene expression. Blue highlights top-ranking TFs. i Flow cytometry validation of a single transduction of top-ranking TFs for inducing microglia protein expression. C = CEBPA, F = FLI1, S = SPI1. “Pool” refers to simultaneous transfection of multiple single-TF plasmids, no polycistronic cassette used. All source data are provided as a Source Data file. Raw and processed data are provided in GEO accession GSE287847.
Research on microglia has long relied on rodents, immortalized cell lines, or the isolation of primary microglia from human brain biopsies. However, many critical transcriptomic and functional differences exist between rodent versus human microglia, making them unsuitable for disease modeling and drug testing3,4,5. Using small molecules to differentiate iPSCs into microglia has greatly improved the availability of human microglia for in vitro studies. However, these protocols involve formulating complex cocktails of small molecules and growth factors, require extended differentiation periods, and sometimes necessitate co-culturing with neurons6,7,8,9,10,11,12,13,14. Current TF-based microglia differentiation protocols are largely based on literature search, and thus have been restricted to a limited set of known TFs15,16. In contrast, our method systematically and iteratively tests a wide array of TF candidates to identify an optimal set of TFs that effectively yield a desired cell identity. Additionally, variability in microglia yield and quality across different iPSC lines remains a significant bottleneck. While our previous studies have demonstrated the reproducibility of this approach across three distinct iPSC lines2, we have chosen to use a single founder line for screening and TF identification to enhance efficiency.
In this report, we conduct two sequential iterations of pooled TF screening. Each screening round involves creating a barcoded TF library, performing pooled transfection into iPSCs to induce differentiation, and subsequently analyzing the results with scRNAseq. By ranking TFs based on their ability to drive microglial gene expression, we identify and validate a combination of six TFs. This approach captures previously reported TFs SPI, CEBPA, and IRF8 while also identifying FLI1, MEF2C, and CEBPB. Overexpressing these six TFs in iPSCs enables faster generation of microglia-like cells (TFiMGLs) within four days without compromising cell identity. In addition to shared transcriptional and molecular signatures, we confirm that these cells exhibit key functional features of human primary microglia and other TF-based microglia. Notably, we achieve this differentiation process in standard culture media without the need for additional factors, offering a potential avenue for defining media compositions for further microglia specification in the future. Our barcoding and amplification strategy allows for high-throughput, simultaneous detection of single-cell gene expression and TF barcodes, which provides a template for others seeking to discover TFs for cell fate programming. Finally, we use the TF-perturbation data we generated in a stepwise regression model to construct gene regulatory networks (GRNs) from pooled TF screens. This perturbation-based approach to the construction of regulatory relationships offers unique insight from inferential GRNs, thus providing a computational framework for using high-throughput screening data to better understand regulatory relationships for future cell engineering.
Results
First round of pooled screening identified initial TFs for inducing microglia gene expression
To identify TFs that differentiate iPSCs to microglia, we first surveyed previous literature on microglial development17,18,19,20,21, epigenetic, and transcriptomic patterns3,22,23,24,25, and gene regulatory networks26 and shortlisted 40 TFs for the first pooled TF screen (Supplementary Data S1). To ensure genomic integration of several TF copies we cloned each TF into the pBAN2 vector for integration with PiggyBac transposase and doxycycline (Dox)-inducible expression. Moreover, to distinguish between exogenous and endogenous TF transcripts, we added a 20-nucleotide (nt) barcode between the stop codon and the poly-A sequence of each TF (Supplementary Fig. S1). We then transfected the 40 TF vectors into 600,000 hiPSCs from a healthy donor (PGP1) in duplicates with mass ratio of 4:1 between TF and transposase DNA. We added different amounts of DNA during TF nucleofection and quantified copy numbers of PiggyBac vector genomic integration. We determined that a DNA dose of 5 µg was most consistent for integrating a single digit copy number of at least 5 TFs per cell (Supplementary Fig. S2a, b). After puromycin selection for TF-integrated cells, we induced differentiation by Dox treatment for four days. Fluorescent activated cell sorting (FACS) of two separate differentiations showed that 0.3-0.5% of the cells expressed the consensus microglial surface proteins CX3CR1, P2RY12, and CD11b (Fig. 1b). In addition, we found that 30% of the cells lost expression of the stem cell marker, TRA-1-60 (Fig. 1c). To pinpoint which of the 40 TF(s) induced microglial gene expression, we sorted all differentiated cells (TRA-1-60 negative) for scRNA-seq. Then, we performed scRNA-seq on 10,000 cells, reasoning that since not all TF combinations are divergent or functionally unique, this quantity would provide sufficient information for initial screen and TF prioritization (Fig. 1a). We performed two independent transfections (Supplementary Fig. S3) and included a 10% spike-in of non-induced hiPSCs in each replicate as an undifferentiated control during scRNA-seq. After observing a consistent differentiation pattern between the two replicates (Fig. 1d), we pooled the data for downstream analysis. We observed expression of microglia genes (ITGAM, P2RY12, CX3CR1, TMEM119, TREM2), separating from a cluster of cells with high expression of POU5F1, a marker for stem cells (Fig. 1d, e). We quantified the expression of exogenous TFs through amplicon sequencing of co-amplified TF and cell barcodes from cDNAs (Fig. 1f and Supplementary Fig. S4). An average of 6.9 TFs (median 6, first quantile 4, third quantile 9) were expressed per cell while 877 (8.5%) out of 10285 single cells lacked TF expression, consistent with the 10% stem cell spike-in (Fig. 1g).
We compared TF expression levels in cells with or without microglial RNA expression and identified the three TFs -SPI1, FLI1, and CEBPA- to most likely induce microglial gene expression (Fig. 1h). SPI1, which encodes PU.1 protein, is a known TF required for microglia development20,21. CEBPA is a known critical regulator for myeloid differentiation27. Although FLI1 has not yet been published for microglial differentiation, it has been reported to interact with TFs RUNX128 and SPI129, which are both indispensable for tissue-resident macrophage development30. We further tested whether these TFs could lead to microglial differentiation individually, or in combination. Individual expression of CEBPA and FLI1 in hiPSCs led to almost complete cell death observed by microscopy, while expression of SPI1 alone was insufficient for differentiation, as only 3% of all cells showed CD11b induction (Supplementary Fig. S5), indicating the need for multiple TFs for microglia differentiation. Thus, we tested various combinations of these TFs. Pooled transfection of CEBPA + FLI1 (“C + F pool”) and CEBPA + SPI1 (“C + S pool”) led to improved microglial marker expression, while CEBPA + FLI1 + SPI1 (“C + F + S pool”) produced the most positive cells, reaching 14% CD11b+, 54% P2RY12+ after four days (Fig. 1i, Supplementary Fig. S6). However, we did not observe expression of CX3CR1, a chemokine receptor important for microglia activation and migration31,32. We hypothesized that this could be due to incomplete integration of different TFs. Therefore, to ensure integration and expression of all three TFs, we generated polycistronic expression cassettes by linking the TFs with 2 A peptides. Previous studies reported that the gene position in the cassette affects their relative expression level, with the first gene being the highest expressed33,34,35. Therefore, we arranged the TFs in different orders (Supplementary Fig. S7). We named the construct by assigning letters based on the order of TFs on the plasmid. For example, SPI1-T2A-FLI1-P2A-CEBPA was named “MG3.1-SFC”. Transfection and induction of the MG3.1-CFS and FCS cassettes both led to almost complete cell death by day 4, consistent with the previous observations of sole CEBPA and FLI1 expression causing cell death. MG3.1-SFC, which had SPI1 positioned at the front, produced cells expressing microglial genes, CD11b+ and P2RY12+ cells (37% and 6% of cells, respectively), but not CX3CR1 (Fig. 1i). The difference in CD11b and P2RY12 expression between cells derived using the MG3.1-SFC cassette and the C + F + S pool is potentially due to different TF dosages. While individual cells within the C + F + S pool may have expressed variable dosage combinations of the three TFs, MG3.1-SFC likely induced a fixed dosage ratio for all cells. Critically, the lack of CX3CR1 expression and the low percentage of CD11b- and P2RY12-positive cells from all 3-TF conditions indicated that additional TFs were needed for efficient microglia differentiation from hiPSCs.
Second iteration of pooled TF screen using MG3.1-SFC as baseline identified additional TFs for improved microglia differentiation
To identify TFs to supplement SPI1, FLI1, and CEBPA, we pursued a second iteration of our screen. We used the top three TFs from the first iteration as a baseline and tested the addition of other TFs (3 + X) (Fig. 2a). To determine what TFs should be included in the second pool, we performed bulk RNA-seq analysis of MG3.1-SFC and compared it with published data on human primary microglia (GSE89189, GSE99074)7,22. Based on differential gene expression analysis using DESeq236, we first picked 25 TFs included in the first pool that showed lower expression levels in MG3.1-SFC than primary microglia. We then included six additional TFs that showed significantly higher gene expression in primary microglia. We also included IRF2 and ELF1 since these TFs showed the highest enrichment in downregulated genes in MG3.1-SFC when using Molecular Signatures Database (MSigDB)37 regulatory target gene sets. Additionally, using CellNet38, a computational tool that can classify bulk transcriptomic data and predict missing gene regulators, we added six more TFs to the pool. Lastly, following a recent single-cell study on fetal microglia development39, we added SPIB, ETS1, and ELK3. In summary, the second TF pool contained a total of 42 TFs (Supplementary Data S2).
a Workflow of the second pooled TF screen performed in duplicate. This panel is generated using BioRender under an academic license. b Polycistronic cassette design for performing dual-drug selection to achieve 3 + X TF screen. c Normalized mRNA expression from the polycistronic cassette (SPI1, FLI1, CEBPA) and stem cells (POU5F1) from two independent TF transductions. d TF barcode counting enabled the identification of stem cells (“No TF BC”), MG3.1-SFC, and cells with additional TFs (“SFC + X”). e Example histograms of TF barcode raw counts in single cells. f Number of TFs per cell counted from normalized and binarized TF expression matrix. g Ranking of the 42 TFs after one-sided Wilcoxon rank sum test with the two tested groups being with or without microglia gene expression. Blue highlights top-ranking TFs. Grey highlights the SFC polycistronic cassette. h Flow cytometry validation of a single transduction of the top-ranking TFs for improving microglia protein expression. i Polycistronic cassette designs for varying TF orders. This panel is generated using BioRender under an academic license. j Flow cytometry analysis of a single transduction of different arrangements of the six-TF recipe in comparison with MG3.1-SFC. All source data are provided as a Source Data file. Raw and processed data are provided in GEO accession GSE287850.
To ensure each cell expressed both the SPI1-T2A-FLI1-P2A-CEBPA cassette and additional TFs from the second pool, we cloned the SFC cassette into a bleomycin-resistant vector and the new TF pool into a puromycin-resistant vector (Fig. 2b). We transfected 600,000 PGP1 hiPSCs in duplicates and performed dual-drug selection. After four days of TF expression, we performed the same process of scRNA-seq and TF barcode amplicon sequencing as in the first iteration (Supplementary Fig. S8). As controls, we spiked in 5% undifferentiated hiPSCs and 10% MG3.1-SFC during single-cell encapsulation to mark the differentiation starting point of two iterations. We applied this approach to two independent pools of cells.
When we analyzed TF barcodes in this experiment, we observed two clusters of cells on UMAP that corresponded to (1) hiPSCs (POU5F1 positive, no TF barcode) and (2) MG3.1-SFC along with other cells expressing additional TFs (Fig. 2c, d). Upon counting TF barcodes, we observed that out of the total 8051 single cells from two independent transfections, 284 (3.5%) cells had no TF barcode, and 613 (7.6%) cells had only the barcode for MG3.1-SFC. On average each cell expressed five TFs (median 4, first quantile 3, third quantile 7) (Fig. 2e, f, Supplementary Fig. S9), with most cells (88.9%) expressing the SFC cassette plus at least one other TF.
To determine which of the new TFs lead to improved microglia differentiation, we analyzed their effects on microglial gene expression. We were especially interested in increasing the expression of CX3CR1, which was not expressed in MG3.1-SFC. We observed significantly higher (p < 0.01) number of MEF2C and KLF6 barcode in cells expressing CX3CR1 (Fig. 2g), suggesting their ability to induce CX3CR1 expression. We also noted that MEF2C was also present in the first screening but failed to reach significance for upregulating CX3CR1, indicating that the use of the SFC cassette as baseline enabled other influential TFs to be found. MEF2C and BHLHE41 also reached high ranking for TMEM119 (Fig. 2g). The SFC cassette ranked at the top for inducing ITGAM and P2RY12 expression, which is expected from the results of the first iteration. In this round of screening, CEBPB and IRF8 emerged as high-potential TFs for promoting ITGAM or P2RY12 expression (Fig. 2g). From this second pooled TF screening, the additional TFs of interest found were MEF2C, CEBPB, IRF8, KLF6, and BHLHE41.
To validate that these additional TFs can promote microglial gene expression, we individually expressed them in addition to SFC (SFC + 1). When compared with MG3.1-SFC, SFC + CEBPB increased the percentage of CD11b+ cells from 37% to 98% (Fig. 2h) but led to more cell death at day 4. SFC + MEF2C and SFC + IRF8 increased P2RY12 expression from 6% to 45% (Fig. 2h). Notably, SFC + MEF2C and SFC + KLF6 increased CX3CR1+ cells from 0% to 20% and 2% respectively (Fig. 2h). These results corresponded well with the predictions from single-cell TF barcode analysis, indicating the validity of using pooled TF screening for inferring causality between TF and target gene expression.
To test if microglia differentiation could be further promoted by delivering more TFs to each cell, we chose the three TFs from the SFC + 1 experiment that led to the highest increase in percentage of microglial gene-expressing cells, CEBPB, IRF8, and MEF2C, to add to the SFC set. We combined MEF2C, CEBPB and IRF8 into polycistronic cassettes. Because MEF2C demonstrated the ability to induce both CX3CR1 and P2RY12, we put it in the first place and varied the position of CEBPB and IRF8, producing two cassettes: MIC and MCI (Fig. 2i). We also varied the position of FLI1 and CEBPA in the first construct to produce SFC and SCF, keeping SPI1 in the front to avoid excessive cell death during differentiation. We tested all four combinations of the two 3-TF cassettes (SFC-MIC, SFC-MCI, SCF-MIC, SCF-MCI) for their ability to induce microglia differentiation (Fig. 2i). Encouragingly, all 6-TF cocktails produced cell pools with increased expression of microglial proteins when compared with MG3.1-SFC (Fig. 2j). We observed that the most effective combination was MG6.4-SCF-MCI, resulting in 66% CD11b+, 93% P2RY12+ and 16% CX3CR1+ cells at day 4, compared with 37%, 6%, and 0% respectively for MG3.1-SFC, the baseline of the second iteration. These results highlighted the value of the second iteration and demonstrated the utility of iterative TF screening for cell fate engineering. We define cells differentiated using MG6.4-SCF-MCI Transcription Factor-induced MicroGlial-Like cells, or TFiMGLs.
TFiMGLs share molecular signatures with primary microglia
To determine the differentiation dynamics of TFiMGLs, we performed bulk RNA-seq analysis of the cells on days 0, 1, 2, 3, 4, 6 post TF induction. We observed a rapid induction of all six TFs accompanied by a quick downregulation of POU5F1 on day 1 (Fig. 3a). TF induction reached a plateau on day 2, aligning with the sustained upregulation of microglial genes from that point onward. (Fig. 3b). Similarly, principal component analysis (PCA) showed rapid differentiation on day 1 and 2, followed by a gradual deceleration from day 3 to day 6 (Fig. 3c). Moreover, we assessed how accurately the TFiMGLs transcriptome recapitulates that of human microglia, by comparing TFiMGLs bulk-RNA-seq data to previously published bulk RNA-seq data from human primary microglia, iPSC-derived microglia (GSE89189, GSE99074) and TF-based microglia differentiation protocols3,7,15,16,22. To minimize potential batch effects that might hinder meaningful comparison between datasets before downstream analysis, we aligned all raw FASTQ files to the same reference genome and applied a negative binomial regression-based batch effect correction method, ComBat-seq40. The PCA shows that the TFiMGLs transcriptomes from days 2–6 are more similar to primary microglia from various sources than to either iPSCs or hematopoietic progenitors (HPCs), suggesting successful microglial fate induction (Fig. 3d, Supplementary Fig. S10a). Additionally, we observed that TFiMGLs were distinct from monocytes or dendritic cells, two related cell types from the myeloid lineage. Importantly, Gene Set Enrichment Analysis (GSEA)41 on TFiMGLs versus iPSCs using two microglial gene collections from the MSigDB database derived from human brain scRNA-seq (M40168; M39077), revealed significant enrichment scores with both gene collections (M40168: score = 0.72, p-value = 9.01e−10, gene set size = 313; M39077: score = 0.66, p-value = 9.01e−10, gene set size = 391), demonstrating successful derivation of microglia-like cells (Fig. 3e). To further investigate transcriptomic similarity of TFiMGLs to primary microglia, we assessed gene clustering after adding a previously published collection of 881 microglia-enriched genes3. The transcriptomes of day-1 TFiMGLs clustered closer to iPSCs while from day-2 onwards the upregulation of key microglial genes resulted in closer clustering with primary microglia, distinct from monocytes or dendritic cells (Supplementary Fig. S10b). These results demonstrate that induction of SPI1, CEBPA, FLI1, MEF2C, CEBPB, and IRF8 in iPSCs for 2 days is sufficient to obtain cells with a transcriptional profile characteristic of microglia.
a Expression of the six induced TFs over time measured by bulk RNA-seq (n = 2 for each day). b Expression of stem cell (POU5F1) and microglia (ITGAM, CX3CR1, TMEM119, P2RY12, TREM2) genes over time (n = 2 for each day) measured by bulk RNA-seq. c PCA plot for the bulk transcriptome of TFiMGLs (MG6.4) over time (n = 5 for day 0, n = 2 for days 1–6). d PCA of bulk RNA-seq data from multiple sources containing primary microglia. MG Microglia, DC dendritic cell, HPC hematopoietic progenitor, iMGL growth factor-induced microglia-like cell, Mono monocyte. e GSEA of TFiMGLs versus iPS using two microglia marker gene sets from MSigDB: M40168 and M39077. Raw and processed data are provided in GEO accession GSE287851.
Further survival and maturation of TFiMGLs might require adapting the culture media composition. Thus, we conducted a time course study to assess transcriptional profiles of TFiMGLs under two different cell culture conditions. In the first condition, TFiMGLs were induced following the conditions of the TF screen—in short, cells were continuously cultured in mTeSR Plus (+MT) media supplemented with Dox for 6 days. Whereas in the second condition, mTeSR Plus media was switched on day 2 to commercially available microglia-specific media (+MG) supplemented with Dox until day 6. Cells grown in both conditions were collected at days 2, 4, and 6 post induction for scRNA-seq, allowing us to explore the impact of the media compositions on TFiMGL development. Visualizing TFiMGLs gene expression patterns using UMAP revealed two distinct clusters. By color coding cells based on the sample condition, we observed that TFiMGLs at 2 days post induction (D2) localized to a single cluster, whereas all other conditions localized to the second cluster (Fig. 4a). TFiMGLs at 4- or 6-days post induction grown in either +MG or +MT did not show a discrete separation on the second cluster suggesting that media composition has limited effects on RNA expression patterns at such early induction time points. In addition, plotting a correlation matrix of scRNA-seq data revealed a considerably high correlation between D4 and D6 despite culturing in different media (Supplementary Fig. S11a). We compared differentially expressed genes (log2 fold change ≥ 2, p value ≤ 0.05) from each TFiMGL sample versus all other samples and used the Enrichr search engine42,43 to query annotated cell-type gene sets from other RNA-seq or scRNA-seq databases (Fig. 4b, c, Supplementary Data S3–5). Importantly, all TFiMGL samples revealed microglia as the top significantly enriched cell type using the scRNA-seq reference databases Descartes44 and Cell Marker Augmented45, suggesting that these specific media compositions cannot modify TFiMGL identity between days 2 to 6 post induction. Since P2RY12 and CX3CR1 are consensus microglial markers, we further sub-sampled P2YR12 or CX3CR1 positive cells from our scRNA-seq data to determine the proportion of cells expressing these markers and the identity of those cells. Interestingly, we observed an increase in both marker’s expressions from D2 to D4 post induction; however, this increase was only maintained until D6 on the +MT condition while the +MG condition reverted to levels comparable to D2 (Fig. 4d, g). When tracing both sub-sampled populations back onto UMAP space, we observed that the P2RY12+ population was evenly dispersed in both clusters, whereas the CX3CR1+ population was slightly more biased towards the D4 and D6 cluster (Fig. 4e, h). We performed gene enrichment analysis for each sub-sampled population using Descartes as a reference dataset which revealed that both sub-sampled populations displayed a significant microglial cell type enrichment from various brain regions (Fig. 4f, i, Supplementary Data S6–7).
a UMAP clustering of a single TFiMGL induction at 2-, 4-, and 6-days post induction switching to either stem cell media or microglia media from day 2. Colors represent clusters identified by day and media condition. D2: day 2, D4 + MT: day 4 stem cell media, D4 + MG: day 4 microglia media, D6 + MT: day 6 stem cell media, D6 + MG: day 6 microglia media. Top 5 cell-type enrichments per TFiMGL condition using (b) Descartes single cell atlas or (c) Cell Marker Augmented single cell atlas as reference. Enrichment p-value was computed using Fisher’s exact test or the hypergeometric test and adjusted p-value was computed using the Benjamini-Hochberg method for correction for multiple hypotheses. d Proportion of cells per TFiMGL condition expressing microglia marker gene P2RY12. e Normalized mRNA expression of P2RY12 on overall TFiMGL population. f Top 5 cell-type enrichments of P2RY12+ cell subset using Descartes single cell atlas as reference. g Proportion of cells per TFiMGL condition expressing microglia marker gene CX3XR1. h Normalized mRNA expression of CX3CR1 on overall TFiMGL population. i Top 5 cell-type enrichments of CX3CR1+ cell subset using Descartes single cell atlas as reference. j Mean expression levels of known pan-microglial and mature human microglia genes from different TFiMGL condition (rows). Microglial marker genes were obtained from previously published studies. Genes directly overexpressed are outlined in red. k–m Mean expression levels of (k) proliferation-associated microglia, (l) neural-associated microglia, and (m) immune-associated microglia transcriptional signatures found in various regions of the developing human brain. All source data are provided as a Source Data file. Raw and processed data are provided in GEO accession GSE287852.
In addition, we curated gene marker lists from previously published studies involving in vitro microglia cultures3,46 as well as mature human microglia47 and determined gene expression overlap of our transcriptomic dataset for each TFiMGL condition and timepoint (Supplementary Data S8). All TFiMGL conditions exhibited high expression of ‘human microglia’ genes, including ENTPD148, a purinergic receptor modulating immune responses and inflammation, ELMO149, a key regulator of microglial function, and NEAT150, an immune response and inflammatory process regulator in microglia. However, low expression was observed for other microglial genes, such as GPR34, a receptor mediating microglial chemotaxis and neuroinflammation, ADRB2, an immune response regulator via adrenergic signaling, and CSF1R, a receptor critical for microglial survival, differentiation, and proliferation. This might suggest conversion to microglia of different activation stages or subtypes of microglia (Fig. 4j). Different timepoints and media conditions had moderate effects on expression of B2M and HLA-B but didn’t change gene expression patterns drastically. To further validate microglia diversity, we identified and visualized collective gene expression of previously reported, defined microglia signatures (Supplementary Data S9) within our scRNA-seq data set (Fig. 4k–m). The cluster comprising the day 2 TFiMGL subpopulation showed enriched gene expression signatures associated with proliferating microglia (Fig. 4k), while neural associated microglia gene signatures were spread throughout the differentiation time points and media conditions (Fig. 4l). Notably, immune associated microglia gene expression patterns are mostly found in TFiMGLs at days 4 and 6 cultured with mTeSR1 media (Fig. 4m). Overall, TFiMGLs show clear microglia identity and display a variety of different microglia signatures depending on maturation and irrespective of media composition.
TFiMGLs are responsive to disease-relevant stimuli and have phagocytic activity
Since TFiMGL transcriptomes on day 4 displayed mature microglia identity, we chose this timepoint for functional characterization of TFiMGLs. Immunofluorescence analysis of TFiMGLs after Dox induction showed the loss of pluripotency marker OCT4 and expression of key microglial proteins: CD11b, P2RY12, and CX3CR1 (Fig. 5a). Additional brightfield microscopy analysis of TFiMGLs confirmed rapid morphological change from day 1 to day 6 (Supplementary Fig. S12). Importantly, TFiMGLs demonstrated reproducible differentiation between replicates, with 53.9 ± 0.57% (SD, n = 3) CD11b+, 93.1 ± 0.50% (SD, n = 3) P2RY12+ and 14.8 ± 0.68% (SD, n = 3) CX3CR1+ cells (Fig. 5b). Since microglia respond to signals associated with brain infection and inflammation in vivo, we assessed the ability of TFiMGLs to respond to various stimuli. Specifically, we tested their response to interferon gamma (IFNγ), beta amyloid (Aβ), and TDP-43. IFNγ, secreted by T lymphocytes, is a known microglial activator51. Aβ has been shown to elicit microglial response in the context of Alzheimer’s Disease (AD) pathology52, while TDP-43 aggregation, a hallmark of amyotrophic lateral sclerosis (ALS) pathology, has also been shown to activate microglia53. To investigate how TFiMGLs respond to IFNγ, fibrillar Aβ (fAβ), and TDP-43, we treated TFiMGLs in triplicates with each of the three molecules for 24 h and harvested cells for RNA-seq. PCA analysis revealed transcriptomic changes in the IFNγ and TDP-43 treated group, while the fAβ-treated group showed minimal differences (Fig. 5c, Supplementary Fig. S13). We confirmed fAβ formation by conducting an in vitro amyloid fibrillation experiment that showed the Aβ peptide could form fibrils after 1 h of incubation (Supplementary Fig. S14). Pathway analysis of differentially expressed genes from the IFNγ treated group included “response to virus” and “response to bacterium” (Fig. 5d, Supplementary Fig. S15), corresponding to the role of IFNγ production as a response to infection. Top upregulated genes by IFNγ included CXCL10, CXCL11, IRF1 and IL18BP (Supplementary Fig. S15), aligning with the IFNγ response genes revealed by an independent single-cell level human-derived macrophage stimulation study54. For the TDP-43 treated cells, top differentially regulated pathways included “myeloid leukocyte mediated immunity” and “myeloid cell activation involved in immune response” (Fig. 5e, Supplementary Fig. S16), demonstrating that TFiMGLs were activated by the TDP-43 treatment. Collectively, these results suggest that TFiMGLs exhibited microglia-like responses to infection and ALS-related TDP-43 aggregation.
a Representative immunofluorescence images of stem cell (OCT4), Dox-induced (PU.1), and microglia (CD11b, P2RY12, CX3CR1) proteins on day 4. Scale bar: 20 µm. b Flow cytometry quantification of microglia protein expression on day 4 (n = 3). c PCA of TFiMGLs transcriptome after 24 h treatment with IFNγ, fAβ, or TDP43. (n = 3 per treatment). d, e Pathway analysis of significantly differentially expressed genes after treatment with IFNγ or TDP43. One-sided Fisher’s exact test was used to identify enriched categories with FDR control using Benjamini–Hochberg procedure. f Representative graph of flow cytometry analysis of the uptake of pHrodo-labeled S. aureus Bioparticles over time (n = 3). g Microscopy analysis of particle uptake combined with microglia surface protein staining. Graphs shown are representative of three independent experiments with similar results. h Calcium imaging with Fluo-4 AM after stimulation with 150 µM ADP and peak quantification. Images are taken once every 3 s. ADP was added at t0 Graphs shown are representative of two independent experiments with similar results. i Quantification of fluorescent signals from all cells in the field of view in panel h over a period of 10 min. j Peak dynamics analysis shows a fast rise and slow decay pattern of the intracellular calcium concentration. All source data are provided as a Source Data file. Raw and processed data are provided in GEO accession GSE287853.
Another key role of microglia is their ability to respond to signals from degenerating or injured neurons and to subsequently phagocytose them7. To assess this phagocytic activity, we incubated TFiMGLs with pHrodo green labeled S. aureus particles for 0.5, 2, and 4 h in triplicate, and performed flow cytometry and microscopy. While 0.5-h incubation showed minimal phagocytic activity, nearly all cells were positive for pHrodo green at 2 h with fluorescence intensity reaching its peak at 4 h (Fig. 5f, Supplementary Video S1-2). Additionally, co-staining with microglial surface proteins CD11b, P2RY12, and CX3CR1 confirmed the intracellular localization of pHrodo green labeled particles (Fig. 5g). Injured neurons release adenosine diphosphate (ADP) and stimulate microglial responses55,56. To verify ADP response capabilities, we incubated TFiMGLs with the calcium indicator Fluo-4 AM and then stimulated them with ADP containing media. We imaged the cells at a three-second interval and observed a rapid increase in calcium signal following ADP exposure, suggesting that TFiMGLs are responsive to ADP stimulation (Fig. 5h–j; Supplementary Video S3).
Regression analysis reveals causal TF-gene regulatory relationships
An important area of continuing progress in cell fate engineering is the derivation of a complete knowledge map for accurately predicting TF expression levels along with their respective downstream gene expression signatures. Currently, insight on this subject has been gained by leveraging computational methods for inferring TF-gene regulatory network (GRN)57,58 and databases based on TF-binding sites59,60, TF-gene co-expression61,62, and protein-protein interaction63,64. However, determining causal TF-gene regulatory relationships would require direct TF perturbations with downstream transcriptomics, and to do this at an omics scale requires the scaling of classic single-gene perturbation methods to a high-throughput one. In this context, the barcoding and amplification strategy presented in this study enables the readout of single-cell gene expression after multiplexed TF perturbation, and the ability to link transcriptomic results with TF barcode quantification of those individual cells. Thus, we utilized TF-perturbation data from the two pooled screens to develop a stepwise regression model to construct gene regulatory networks (GRNs) to better understand TF regulatory relationships for future cell engineering (Fig. 6 and Methods). In our dataset, each TF transgene was represented by two distinct values: counts from barcode amplicon sequencing, and counts from their RNA molecules in scRNA-seq. Although the counts from scRNA-seq might contain reads from endogenous TF expression, the two measurements correlated well for most TFs (Supplementary Fig. S17), the exception being a few TFs with low expression. We reasoned that TFs that had a higher correlation between their barcode and RNA measurements demonstrated higher consistency between experiments, making them more reliable to produce accurate regression results using the two matrices. Thus, by screening TFs with a correlation coefficient greater than 0.3 between both measurements, we identified 18 TFs from the first iteration and 21 TFs from the second for regression analysis. We observed extensive gene expression changes caused by CEBPA and the triple TF cassette MG3.1_SFC expression (Fig. 6a, b), as well as slightly smaller networks from CIITA, SPI1, ERG2, JUN, CEBPB, ZFP36, and BHLHE41 (Fig. 6c–h, Supplementary Fig. S18). Among the 672 edges, we observed 76% of them to be positive regulations. Some TFs (CIITA, SPI1, JUN) only showed positive edges in current thresholding conditions (Abs(coefficient) >0.1 & −log10(p-value) >20), indicating they were mostly activating other genes. Other TFs (CEBPB, ZFP36, BHLHE41) showed negative edges, indicating their repressive roles. We also observed several genes simultaneously connected with more than one TF (Supplementary Fig. S19). For example, RAB13, a membrane trafficking regulator, was upregulated by both CEBPA and CEBPB; HMGA1, a master regulator of chromatin structure, was downregulated by both BHLHE41 and CEBPA; FLNC, an actin crosslinking protein, was upregulated by JUN while downregulated by CEBPA. There are many more regulatory relationships we listed in detail from these two pooled screens (Supplementary Data S10, 11). Although we have created a firm foundation, with the aid of larger perturbation libraries, higher-throughput scRNA-seq, and more scalable regression analysis methods, it should be possible to build a complete knowledge map of causal TF-gene regulatory relationships in the near future, greatly facilitating cell fate engineering efforts.
The width of edges is correlated with coefficient values, the larger the value the wider the edge. A red edge means upregulation while a blue edge means downregulation. Edges were selected with these criteria: Abs(coefficient) >0.1 and −log10(p-value) >20. F-test was used to determine if any of the independent variables were significant in the regression model. a Global network for the first pooled screen. b Global network for the second pooled screen. Sub-network for (c) CIITA, (d) JUN, (e) SPI1, (f) CEBP3, (g) ZFP36, (h) EGR2. Source data are provided in Supplementary Data file.
Discussion
The ability to obtain highly specialized cell types that are not routinely accessible through biopsies is essential to understanding basic cellular mechanisms, molecular disease drivers, and therapeutics development. While engineered expression of transcription factors (TFs) identified through TF screens in iPSCs can rapidly and efficiently produce some cell types, the generation of other cell types that require complex combinations of TFs has been difficult. Here, we addressed this challenge by combining an iterative genetic library screen in hiPSCs with high-throughput scRNA-seq for cell fate engineering and validated this approach by generating microglia-like cells. We conducted two iterations of our design-screen-validate workflow and discovered that the combination of TFs SPI1, CEBPA, FLI1, MEF2C, CEBPB, and IRF8 effectively drives microglia differentiation from hiPSCs within 2 days, significantly reducing the standard 35 days through growth factor-based protocols11. Characterizations of TFiMGLs demonstrated transcriptomic and functional resemblance to primary human microglia. We further uncovered TF-gene regulatory relationships in these pooled screens through genome-wide regression analysis between TF barcode counts and gene expression levels.
During this study, we noted several technological challenges that could be addressed in future studies. Most current TF-based differentiation protocols rely on a one-time induction of TF expression, while lacking the capability for sequential induction. Despite this challenge, current strategies have successfully generated certain cell types, including multiple types of neurons65, endothelial cells2, and the induction of iPSCs themselves66. However, during development in vivo coordinated gene programs are sequentially activated, as observed in time-resolved transcriptomic analysis of developing tissues67. This feature could be re-created by identifying orthogonal induction system with a comparable strength to the doxycycline-inducible system or developing tunable gene circuits. With these tools, it will be possible to test whether sequential TF expression can lead to improved differentiation accuracy. In addition to temporal control of TF expression, the ability to regulate expression levels of individual TFs could also lead to improvements in differentiation. There were two manifestations of this pattern in the current study. In the first case, although CEBPA and FLI1 expressed individually led to cell death, their presence in the SFC cassette enabled cell survival and differentiation. The reduced expression levels of CEBPA and FLI1 and their potential interactions with other TFs could also explain why we were able to observe an extensive GRN for CEBPA on the first pooled screen, which would not be possible due to toxicity in CEBPA-expressing cells. The second case can be observed in the effect that different sequential arrangements of TFs in the polycistronic cassettes led to different levels of downstream microglial protein expression. The effect of TF stoichiometry on differentiation efficiency has also been observed for cardiac myocyte programming68 and was shown to influence iPSC reprogramming34,35. While the positional effects in a polycistronic cassette offer one way to explore the stoichiometry space, the development of new titratable promoters allowing the turning of individual genes could be an important tool for cell fate engineering.
Establishing a specialized cell fate from stem cells requires orchestrated interactions between many genes. To increase the likelihood of priming iPSCs to the targeted microglia cell fate, we designed the pooled screen to have multiple TFs delivered to each cell by transposase. Because the combinatorial space of the TFome can quickly become astronomical and cost-prohibitive, especially when scRNA-seq is involved, we decided to prioritize a small subset of the TFome based on existing literature evidence for microglia development and differentiation. It is impossible to pre-determine how many TFs will be necessary to achieve effective differentiation. While having a large number (>10) of TFs overexpressed in a single stem cell could increase the chance of having the correct combination, it also raises the risk of biological noise, making the screening results less interpretable. With this in mind, we aimed for approximately five TFs per cell and adopted an iterative screening strategy. Although using fewer TFs per cell reduces the potential to discover the complete set of necessary TFs in a single screen, this issue can be mitigated by identifying top TFs from each round of screening and keeping them constant in subsequent screen iterations. We believe this stepwise approach gives us a better chance of disentangling the complex gene regulatory network needed for establishing a specific cell fate.
Microglia are resident immune cells of the brain which originate from erythro-myeloid progenitors (EMPs) in the yolk sac17,69. They play important and diverse roles in brain development and maintaining homeostasis18,19,70,71,72. Recent studies have demonstrated the link between neuroinflammation and neurodegenerative disease, such as Alzheimer’s Disease (AD)73,74, and microglia have been shown to be an important cell type in AD and other neurodegenerative diseases75,76,77,78. However, functional studies to define microglia-targeting therapeutics have been greatly hindered by the limited availability of human brain biopsies79,80. This supply issue cannot be simply mitigated by using murine models given that interspecies differences limit the transferability of knowledge4,5. Various research groups have developed human microglia-like cells from iPSCs in efforts to fill this gap. Several studies have accomplished this goal through a process of embryoid body formation, growth factor treatment, and, in some cases, co-culturing with neurons6,7,8,9,10,11,12,13,14. These protocols draw inspiration from the natural developmental stages of microglia and have timelines ranging from 30–74 days. More recently, a study showed microglia differentiation through forced expression of both SPI1 and CEBPA16. Comparison of RNAseq datasets showed similarities between SPI1/CEBPA16, PU.1/MAFB/CEBPα/CEBPβ/IRF5/IRF815 induced iPSCs and TFiMGLs at early time points, however at later time points TFiMGLs showed closer similarity to microglia derived through longer iPSC differentiation processes involving hematopoiesis and subsequent microglia differentiation7. This suggests that through the induction of additional TFs, we leveraged complex TF-gene regulatory relationships which altered the overall developmental trajectory of TFiMGLs. In the future, the identity of TFiMGLs could be more precisely defined by optimizing culture conditions and co-culturing with neural cells or organoids. This approach would enhance the transcriptional cascades triggered by TF overexpression, enabling a deeper investigation into their similarity to in vivo microglia under various conditions. Differentiating human cell types from stem cells is essential for basic research and therapeutics development, especially when the desired cell types are not easily obtainable from accessible human tissues. Advances in the understanding of developmental biology have fueled the discovery and application of protocols to differentiate specific cell types from iPSCs. Some of this work has been translated into treatment strategies that are now being investigated with clinical trials for devastating diseases like age-related macular degeneration81 and type 1 diabetes82. Differentiated iPSCs have now also become routinely used in laboratories for studying disease mechanisms and testing drugs83. Recent global efforts on building single-cell atlases of cellular development have expanded the knowledge of human development and diseases and also present a key resource for cell fate engineering. Combining technological advancements in genetic library construction with high-throughput screening and sequencing technologies, this work lays the groundwork for the developmental biology field to engineer cell fates in a more systematic and multiplexed fashion.
Methods
Barcoded TF expression vector construction
All TFs used in this study were obtained from the TFome collection2 in pDONR format. For expression in hiPSCs, a PiggyBac integrating Dox-inducible vector pBAN2 was used. To create barcoded pBAN expression vector (pBAN-BC), the original pBAN was digested with AgeI and KpnI, followed by ligation of a gBlock (IDT DNA) containing the same excised piece with an additional 20-bp random barcode. After bacteria transformation, individual colonies were expanded and extracted for plasmid DNA. Gateway cloning was used to transfer each TF from pDONR to pBAN-BC vector. Barcode sequence for each TF was confirmed by Sanger sequencing.
Cell culture
The human male iPSC line GM23338 was obtained from Coriell. No commonly misidentified cell lines were used in the study. hiPSCs were cultured in mTeSR Plus media (Stemcell Technologies, 100-0276) on multi-well plates coated with Matrigel (Corning, 354277) or Cultrex (Bio-Techne Corporation, 3434-005-02). For passaging, cells were dissociated with TrypLE Express (Life Technologies, 12604013) and seeded into freshly coated plates and media containing 10 µM Y-27632 ROCK inhibitor (Millipore, 688001) for 24 h. Daily media change was performed until cells were ready for another passaging or downstream experiments.
Nucleofection, TF integration and differentiation
TF (pBAN-TF-BC) and Super PiggyBac (SPB) Transposase (System Biosciences, PB210PA-1) expression vectors were mixed at a mass ratio of 4:1 and transfected into hiPSCs using P3 Primary Cell 4D-Nucleofector X Kit L (Lonza, V4XP-3024) on a 4D-Nucleofector X Unit (Lonza, AAF-1002X) following the manufacturer’s instructions. For the two pooled TF screenings, 600,000 cells were transfected with 5 µg of DNA and seeded into one well of a 6-well plate. For individual TF combinations, 120,000 cells were transfected with 2.5 µg of DNA and seeded into one well of a 12-well plate. For PiggyBac dose curve of copy number quantification using qPCR, 600,000 cells were nucleofected with vectors containing puromycin- and bleomycin-resistance genes at total DNA concentrations of 1.25 µg, 2.5 µg, 5 µg, and 10 µg and SPB transposase was mixed at a mass ration of 4:1. Nucleofected cells were seeded into one well of a 6-well plate. Program CB150 was used for the nucleofections. 48 h after nucleofection, 1 µg/mL of puromycin (Gibco, A1113803) or 50 µg/mL of zeocin (Gibco, R25001) was added to the culture for the selection of TF-integrated cells. Cells were passaged again when reaching 80% confluency. For induction of TF expression, cells were seeded into mTeSR Plus media containing 0.5 µg/mL doxycycline (Sigma-Aldrich, D3072) and 10 µM Y-27632 ROCK inhibitor and were changed into media only containing doxycycline after 24 h.
PiggyBac dose curve copy number quantification
iPSCs integrated with vectors containing Puromycin- and Bleomycin-resistance genes at concentrations of 1.25 µg, 2.5 µg, 5 µg, and 10 µg or with 5 µg of DNA of MG6.4-SCF-MCI TFs were harvested by first washing with PBS (- calcium, - magnesium) and then dissociating them into single cell suspensions with Accutase (StemCell Technologies, 07920) for 10 min at 37 °C. Cultures were then quenched with DMEM/F12 (Gibco, 11320033) and subsequently centrifuged at 300g for 3 min. The cell pellet was rinsed with PBS (- calcium, - magnesium), centrifuged at 300g for 3 min and flash frozen at −110 °C for PiggyBac copy number qPCR quantification.
gDNA extraction was conducted using the PiggyBac Copy Number qPCR Kit (System Biosciences, PBC100A-1) following the manufacturer’s protocol. The qPCR reaction mixes comprised 1x DyNAmo ColorFlash SYBR Green qPCR Mix (Thermo Scientific, F-416L), 840 nM primers, and 1 µL of gDNA (10-50 ng). Primers targeting rtTA, bleomycin resistance gene, and puromycin resistance gene were employed to amplify PiggyBac integrations, while UCR1 primers (System Biosciences) were utilized for the normalization to the cell genome copies. Quantitative PCR was performed using the QuantStudio 7 Flex system (Applied Biosystems) with thermocycling settings recommended by the PiggyBac Copy Number qPCR Kit. Calculations to determine integration copies were performed in accordance with the manufacturer’s protocol.
Primer sequences:
pBAN-rtTA-fwd CGGAACAACGCCAAGTCATA
pBAN-rtTA-rev GATGCTCCTGTTCCTCCAATAC
pBAN-Puro-fwd GTCACCGAGCTGCAAGAA
pBAN-Puro-rev AGGAGGCCTTCCATCTGT
pBAN-Bleo-fwd CAA GTT GAC CAG TGC CGT TC
pBAN-Bleo-rev ATG AAC AGG GTC ACG TCG TC
Flow cytometry and sorting
For cytometry analysis, cells were dissociated with TrypLE Express for 5 min at 37 °C, diluted with twice the volume of Cell Staining Buffer (Biolegend, 420201) and centrifuged at 200 g for 3 min to remove the digesting enzyme. Cells were then incubated with 25 µg/mL of Human Fc Block (BD Biosciences, 564219) diluted in Cell Staining Buffer for 15 min on ice, followed immediately by staining with fluorescently conjugated antibodies or isotype controls at 5 µl per million cells in 100 µl staining volume for 30 min on ice. Antibodies were diluted in Cell Staining Buffer and Human Fc Block was not removed from the mixture. After antibody staining, cells were washed twice with Cell Staining Buffer before being put through 35 µm nylon mesh into a 5 mL round bottom polystyrene tube (Falcon, 352235). Flow cytometry data were acquired on a BD LSRFortessa Cell Analyzer. For cell sorting, the staining protocol was the same except that Cell Staining Buffer was replaced with mTeSR Plus media in order to maintain the best viability of cells. Cell sorting was performed on a BD FACSAria Cell Sorter. Flow cytometry antibodies used in this study were: FITC-TRA-1-60 (BD Biosciences, 560380), BV421-CX3CR1 (Biolegend, 341620), PE-P2RY12 (Biolegend, 392104), APC-CD11b (Biolegend, 101212). Isotype controls used were: BV421- Rat IgG2b (Biolegend, 400640), PE-Mouse IgG2a (Biolegend, 400214), APC- Rat IgG2b (Biolegend, 400612).
scRNA-seq library preparation for TF barcode screening
scRNA-seq experiments were performed using 10x Genomics Chromium Single Cell 3’ Reagent Kits v3 or v3.1 following the manufacturer’s instruction. 5000 single cells were calculated as targeted input for each sample. For the first iteration, 10% of stem cells were spiked in as undifferentiated control. For the second iteration, 5% stem cells and 5% MG3.1-SFC were spiked in as undifferentiated and initial differentiation control. The only modification made to the protocol was at the Sample Index PCR step, where 5 µL of the PCR mix was taken out and mixed with 0.5 µL 1000X SYBR Gold (Invitrogen, S11494) for a qPCR reaction. The optimal amplification cycle was determined as the cycle just before half maximum of the total signal. Final libraries were sequenced on NextSeq 500 or NovaSeq with a goal of at least 30,000 reads per cell.
TF barcode amplicon library preparation
Because after the cDNA amplification step in the 10x scRNA-seq protocol the amplicons contained cell barcodes, UMIs, and TF barcodes, these cDNAs could be used as the template for further amplification of TF-cell barcodes. Two sequential PCR reactions were performed, each was accompanied by a SYBR Gold spike-in qPCR to determine the optimal cycle number as described in “scRNA-seq library preparation”. For PCR1, NGS10x-F-i7-BC-PCR1F and i5000 were used as primers. A 50 µL PCR1 reaction contains 25 µL Q5 Hot Start High-Fidelity 2X Master Mix (New England Biolabs, M0494L), 5 µL amplified cDNA, 2.5 µL of both primers at 10 µM stock concentration, and 15 µL nuclease-free water. PCR1 program was initial denaturation, 98 °C, 30 s; 11–13 cycles (qPCR determined) of 98 °C, 10 s, 67 °C, 30 s, 72 °C, 30 s; final extension, 72 °C, 2 min. PCR1 reaction was purified with 1.2X SPRIselect beads (Beckman Coulter, B23318) following standard protocol. The sample was eluted in 20 µL water. For PCR2, i7000, P5, and P7 were used as primers. A 50 µL PCR2 reaction contains 25 µL Q5 Hot Start High-Fidelity 2X Master Mix, 10 µL PCR1 product, 2.5 µL of all three primers at 10 µM stock concentration, and 7.5 µL nuclease-free water. PCR2 program was initial denaturation, 98 °C, 30 s; 4–5 cycles (qPCR determined) of 98 °C, 10 s, 67 °C, 30 s, 72 °C, 30 s; final extension, 72 °C, 2 min. PCR2 product was purified the same as PCR1. Final libraries were submitted for MiSeq v3 with paired-end reads of 80 cycles from either direction.
Primer sequences:
NGS10x-F-i7-BC-PCR1F: GGAGTTCAGACGTGTGCTCTTCCGATCTCTTTTCCAAGCACCTGCTACATAG
i5000: AATGATACGGCGACCACCGAGATCTACACaactcgctACACTCTTTCCCTACACGACGCTCTTCCGATCT (lower case region represents a sample-specific barcode)
i7000: CAAGCAGAAGACGGCATACGAGATtcgccttaGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT (lower case region represents a sample-specific barcode)
P5: AATGATACGGCGACCACCGA
P7: CAAGCAGAAGACGGCATACGA
Analysis of scRNA-seq and TF barcode-seq data
For scRNA-seq, raw FASTQ files were aligned to GRCh38 and quantified using Cell Ranger. Detailed information about cell number, read depth and gene detected is visualized in Supplementary Figs. S3 and S8. Seurat was used to perform cell filtering, data normalization, and clustering. The generated Seurat object also contained the single-cell raw expression matrix for all genes. For TF barcode-seq, in the paired-end MiSeq data, one of the read pairs contains the 20 bp TF barcode while the other one contains the 16 bp cell barcode and the 12 bp UMI. By matching the names of the reads within the pair, three sequences were compiled into one table with three columns: TF-BC, cell-BC, UMI. To remove duplicated reads from the same molecule, duplicated rows that have the same value for all three columns were removed. Then the table was counted and reshaped into a frequency table where the row names represent cell and column names represent TF. This table contains the raw counts of each TF barcode in all single cells. Because the TF barcodes were amplified from the cDNA during library preparation, we normalized the TF barcode count with the number of total RNA UMIs detected in each cell, reasoning that cells with more total UMIs were likely to have more reads for TF barcode. The raw gene expression matrix and normalized TF count matrix were used to identify which TF barcodes were likely to induce microglial gene expression. Specifically, the expression of microglial genes was binarized, with any cell had a non-zero expression being 1. Then between the two groups of cells 0 or 1 microglial gene expression, a one-sided Wilcoxon rank sum test was performed for all barcoded TFs to determine which TF(s) had a higher expression in cells expressing microglial genes. The TFs were ranked by -log10(p-value).
Bulk RNA-seq library preparation
Cultured cells were dissolved directly with TRIzol (Thermo Fisher Scientific, 15596018) for total RNA purification with Direct-zol RNA MiniPrep Kit (Zymo Research, R2050). RNA concentration was quantified with Qubit RNA HS Assay Kit (Thermo Fisher Scientific, Q32852). RNA integrity was confirmed by the presence of 18S and 28S bands on a 2% E-Gel EX Agarose Gel (Thermo Fisher Scientific, G402002). Between 100 ng to 1000 ng total RNA was used as input for mRNA enrichment using NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs, E7490), followed by library construction with NEBNext Ultra II Directional RNA Library Prep Kit (New England Biolabs, E7760S) following the manufacturer’s instructions. Biopolymers Facility at Harvard Medical School performed library QC and sequencing.
Analysis of bulk RNA-seq data
For both in-house generated sample and datasets downloaded from GEO, raw FASTQ files were aligned to GRCh38 and quantified using STAR (version 2.5.2b). Regularized-logarithm (rlog) transformation was applied to the raw counts before visualization using PCA. For analysis where data from multiple sources were involved, ComBat-seq was used for batch correction before PCA. Differential gene expression analysis was conducted with DESeq236. Pathway enrichment and GSEA analysis were performed with clusterProfiler84. Bulk RNA-seq data generated and analyzed in this study have been deposited in Gene Expression Omnibus (GEO).
TFiMGL Single-cell RNA sequencing processing and analysis
TFiMGLs were fully cultured in either stem cell media, mTeSR Plus (MT), or switched to commercially available microglia media (MG) (Stemcell Technologies, 100-0020) from day 2 until day 6. All media were supplemented with 0.5 µg/mL doxycycline for the duration of the experiment. Undifferentiated iPSCs and TFiMGLs at days 2-, 4-, and 6-post induction (both MT and MG media conditions for days 4 and 6) were harvested by first washing with PBS (- calcium, - magnesium) followed by dissociating cultures into single cell suspensions with Accutase (StemCell Technologies, 07920) for 10 min at 37 °C, cultures were then quenched with DMEM/F12 (Gibco, 11320033) and subsequently centrifuged at 300 g for 3 min. The cell pellet was rinsed with PBS (- calcium, - magnesium) and centrifuged again at 300 g for 3 min prior to fixation. Single cell suspensions from each sample were processed using the Parse Biosciences Fixation Kit following the manufacturer’s instructions (SB1001) and subsequently stored at −80 °C until all samples of the time course were harvested. Libraries for all samples were prepared using the Parse Whole Transcriptome Mini V1 kit following the manufacturer’s instructions (EC-W01010) where an average of 450 bp libraries were generated. Sequencing of libraries was carried out on an Illumina NextSeq 500 High Output v2.5 (300 Cycles) (sequenced at Biopolymers Facility - Harvard Medical School) using a single index (6 bp) and 5% PhiX spike-in as per Parse Biosciences sequencing instructions. Sequenced data was processed using Parse Batch Analysis (version 1.0.9) on the DNAnexus platform to demultiplex into library fastq files and count matrices. A total of 3,121 cells were analyzed where 1,178 cells corresponded to D2 TFiMGLs, 308 cells to D4 + MG, 727 cells to D4 + MT, 438 cells to D6 + MT, and 470 cells to D6 + MT. Processing and analysis of scRNAseq data was carried out using the Python package Scanpy85 (version 1.9.5)—data processing included initial filtering out of cells that had fewer than 200 genes as well as genes found in fewer than 2 cells followed by concatenating all of the samples. Further pre-processing of the data included removing cells with gene counts higher than 7000 in addition to a mitochondrial gene expression higher than 12%. The dataset was then normalized, log transformed and a scaling factor of 10,000 was applied. Downstream analyses (e.g. dimensionality reduction, clustering and plotting of graphs) followed standard procedures as per Scanpy documentation (https://scanpy.readthedocs.io/en/stable/). For differential gene expression analysis of samples, we compared each condition to all other conditions and set a minimum threshold of log2 fold change ≥ 2 and minimum p-value of ≤ 0.05. This generated differently sized lists of genes for each condition (Supplementary Data S3). For consistency, the top 122 genes for all conditions were used and cell type annotations were assessed with EnrichR (https://maayanlab.cloud/Enrichr/#libraries) through either the Descartes or Cell Marker Augmented databases. We obtained the top enriched cell marker annotation hits for each database by considering adjusted p-value ≤ 0.05 (Supplementary Data S4–5). An expression threshold method was used when we subsetted cells based on the expression of a specific gene of interest (CX3CR1 or P2RY12). This approach involved selecting cells that exhibited expression levels above a predefined threshold of 0.5. The threshold was determined based on the distribution of expression values within the dataset, ensuring that only cells with significant expression of the target gene were included for subsequent analyses. For differential gene expression analysis of the subsetted cluster we used the top 200 genes and input them in the python package Gseapy (version 1.0.5) to analyze GO term pathways in addition to cell type annotation assessment through Descartes cell atlas database (https://maayanlab.cloud/Enrichr/#libraries) (Supplementary Data S6-7). For cell type annotation, we used marker gene sets from the following published datasets as references35 (Supplementary Data S8–9). scRNA-seq data generated and analyzed in this study have been deposited in Gene Expression Omnibus (GEO).
Immunofluorescence (IF)
IF experiments were performed in µ-Plate 96 Well Black plate (ibidi, 89626). After media removal, cells were fixed with 4% paraformaldehyde (Electron Microscopy Sciences, 15710) in 1x phosphate buffered saline (PBS) (Thermo Fisher Scientific, 10010072) for 15 min at room temperature (RT). Cells were rinsed three times with PBS before proceeding to permeabilization or blocking. For staining of Oct-3/4 and PU.1, cells were permeabilized, while not for cell surface proteins’ staining. Permeabilization was conducted with 0.25% Triton-X-100 (Thermo Fisher Scientific, 85111) in 1x PBS for 15 min at RT followed by three rinses with PBS. Cells were then blocked with 1% bovine serum albumin (BSA) in PBS for 1 h at RT. For primary and secondary antibody staining, antibodies were diluted in PBS with 1% BSA and incubated with cells for 1 h at RT. Three 5-min washes with PBS were used to remove excessive antibodies after staining. Cells were directly imaged in a plate on a Nikon Ti2 Eclipse inverted microscope with a Plan Apo Lambda DM 60× (1.4 NA, Ph3) oil objective and an Andor Zyla sCMOS camera. Images were acquired by NIS-Element AR software. All antibodies were used at 1:200 dilution. Primary IF antibodies used in this study were: Oct-3/4 (Santa Cruz Biotechnology, sc-5279), PU.1 (Thermo Fisher Scientific, PA5-17505), CD11b (BioLegend, 101202), P2RY12 (Thermo Fisher Scientific, 702516), CX3CR1 (Abcam, ab8021).
Phagocytosis assay
Differentiated cells were incubated with 20 µg/mL of pHrodo Green S. aureus BioParticles (Thermo Fisher Scientific, P35382) for 0-4 h in mTeSR Plus media in the presence of 100 µg/ml Penicillin-Streptomycin (Corning, 30-002-CI). After removal of excessive particles with PBS washes, cells were harvested for antibody (CX3CR1, P2RY12, CD11b) staining and flow cytometry analysis as described in the previous section. Remaining stained cells after flow cytometry were transferred into µ-Plate 96 Well Black plate for fluorescence microscopy to confirm the intracellular localization of the particles. This step needs to be conducted swiftly after flow cytometry in order to avoid changes in cellular morphology due to cell death.
Calcium imaging
Calcium imaging experiment was conducted in standard 12-well cell culture plates. Differentiated cells were incubated with 1 µg/mL Fluo-4 AM calcium indicator (Thermo Fisher Scientific, F23917) in 1 mL of mTeSR Plus media for 30 min in a cell culture incubator. Excessive dye was washed away with two 1 mL media washes. After adding 1 mL of fresh mTeSR Plus, the cells were put on stage in a microscope inside the incubator. Image acquisition started without stimulation for 90 s to determine baseline signal. One image was acquired every 3 s, the fastest possible on the instrument. After 90 s, 1 mL of media containing 150 µM ADP was added to the cells while imaging was continuing. The total length of imaging was 10 min. The fluorescent signal was quantified and plotted using MATLAB.
Amyloid fibrillation
Aβ fibrillation experiments were performed using SensoLyte Thioflavin T β-Amyloid (1-42) Aggregation Kit (AnaSpec, AS-72214) according to the manufacturer’s instruction. The reaction was set up in µ-Plate 96 Well Black plate. Data were acquired on a plate reader with excitation/emission = 440 nm/484 nm at 37 degrees Celsius once every 5 min for 3 h.
TFiMGL disease-relevant stimuli for bulk RNAseq
Day 4 TFiMGLs were plated at a seeding density of 120k cells per well of a 12-well plate in triplicates for each condition. Cells were incubated for 24 h with either 250 μg/mL Aβ (Anaspec, AS-72214), 20 ng/mL INFγ (Novosbio NBP2-34992) or 2.8 μg/mL TDP-43 (RnD systems, AP-190) added to the culture media. After the incubation, cells were harvested for bulk RNA sequencing as previously described.
TF-gene stepwise regression model construction
The stepwise regression (or stepwise selection) is a regression model that iteratively adds and removes predictors in the predictive model to find the subset of variables in the data set resulting in the best performance, and consequently lowering the perdition error in the model. During the process, the value of the statistical test is used to screen the variables. If the value is less than or equal to 0.05, then the variable enters the regression model, and the selected variable is the independent variable of the regression model. For the construction of the model:
Step 1: Establish \(P\) regression models between the independent variables \({X}_{1},\,{X}_{2},\,{\ldots,X}_{p}\) (\({number}={P}\)) and the dependent variable \(Y\) respectively,
Calculate the statistical value of the F-test with the regression coefficient \({F}_{1}^{(1)}\),…, \({F}_{p}^{(1)}\), and take the maximum value \({F}_{i1}^{(1)}\),
For a given significance level \(\alpha\), the threshold value is \({F}^{1}\). If \({F}_{i1}^{(1)}\, > \,{F}^{1}\), then \({X}_{i1}\) will be included in the regression model and recorded as the set of selected variable indicators as \({I}_{1}\).
Step 2: Establish a binary regression model of the dependent variable \(Y\) and the independent variable subset \(\{{X}_{i1},\ldots,{X}_{1}\}\), \(\{{X}_{i1},\ldots,{X}_{i1-1}\}\), \(\{{X}_{i1},\ldots,{X}_{i1+1}\}\), calculate the statistical value of the F-test with the regression coefficient \({F}_{k}^{(2)}\) and take the maximum value \({F}_{i2}^{(2)}\),
For a given significance level \(\alpha\), record the corresponding critical value as \({F}^{(2)}\). If \({F}_{i2}^{\left(2\right)}\, > \,{F}^{(2)}\), then the variable is introduced into the regression model. Otherwise, the variable introduction process is terminated.
Step 3: Repeat Step 2 with the subset of variables \(\{{X}_{i1},\,{X}_{i2},{X}_{k}\}\). This step is repeated by selecting an independent variable that is not introduced into the regression model until the test does not introduce any variables.
TF-gene network visualization
Both p-values and coefficients in the regression analysis work together to represent relationships in the model about the significant factors. The coefficients describe the mathematical relationship between each independent variable and the dependent variable. The p-values for the coefficients indicate whether these relationships are statistically significant. We selected the 250 TF-gene combinations from the first pooled screen and 422 in the second with the criteria Abs(coefficient) >0.1 & -log10(p-value) >20, then visualized them using Cytoscape version 3.886. Briefly, source and target TF node interactions were visualized using a node degree sorting scheme. Edge directionality was represented by color (red for positive, blue for negative) and the edge weight was also represented by width.
Statistics and reproducibility
Sample sizes were determined by existing studies in the field to enable statistical analyses and reproducibility. For each pooled screen, 600,000 cells were transfected in order to obtain sufficient numbers of cells post-purification and scRNA-seq processing. We performed scRNA-seq on 10,000 cells reasoning that since not all TF combinations are divergent or functionally unique, this quantity would provide sufficient information for initial screen and TF prioritization. No data was excluded from the relevant analyses. Both pooled screens were performed in duplicate, TFiMGL characterizations were performed in triplicate, PiggyBac dose curve copy number quantification was performed in replicate. TF-induced iPSC differentiation has demonstrated high reproducibility across experiments and studies. Randomization was not utilized in this study since the cell products were derived from the same cell source. The Investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Bulk RNA-seq data and scRNA-seq data generated and analyzed in this study have been deposited in Gene Expression Omnibus (GEO) with the accession codes: GSE287847, GSE287850, GSE287851, GSE287852, and GSE287853. The raw data generated in this study are provided in the Supplementary Information/Source Data file. Source data are provided with this paper.
Code availability
R script for analyzing gene expression data to identify relationships between target genes and transcription factors (TFs) using linear regression models is available on GitHub87 https://github.com/churchlab/Microglia/tree/main and https://doi.org/10.5281/zenodo.15026232.
References
Regev, A., Teichmann, S., Lander, E., Amit, I. & Benoist, C. Science forum: the human cell atlas. elife 6, e27041 (2017).
Ng, A. H. M. et al. A comprehensive library of human transcription factors for cell fate engineering. Nat. Biotechnol. 39, 510–519 (2021).
Gosselin, D. et al. An environment-dependent transcriptional network specifies human microglia identity. Science 356, eaal3222 (2017).
Smith, A. M. & Dragunow, M. The human side of microglia. Trends Neurosci. 37, 125–135 (2014).
Watkins, L. R. & Hutchinson, M. R. A concern on comparing ‘apples’ and ‘oranges’ when differences between microglia used in human and rodent studies go far, far beyond simply species: comment on Smith and Dragunow. Trends Neurosci. 37, 189–190 (2014).
Muffat, J. et al. Efficient derivation of microglia-like cells from human pluripotent stem cells. Nat. Med. 22, 1358–1367 (2016).
Abud, E. M. et al. iPSC-derived human microglia-like cells to study neurological diseases. Neuron 94, 278–293.e9 (2017).
Haenseler, W. et al. A highly efficient human pluripotent stem cell microglia model displays a neuronal-co-culture-specific expression profile and inflammatory response. Stem Cell Rep. 8, 1727–1742 (2017).
Douvaras, P. et al. Directed differentiation of human pluripotent stem cells to microglia. Stem Cell Rep. 8, 1516–1524 (2017).
Takata, K. et al. Induced-pluripotent-stem-cell-derived primitive macrophages provide a platform for modeling tissue-resident macrophage differentiation and function. Immunity 47, 183–198.e6 (2017).
McQuade, A. et al. Development and validation of a simplified method to generate human microglia from pluripotent stem cells. Mol. Neurodegener. 13, 67 (2018).
Speicher, A. M., Wiendl, H., Meuth, S. G. & Pawlowski, M. Generating microglia from human pluripotent stem cells: novel in vitro models for the study of neurodegeneration. Mol. Neurodegener. 14, 46 (2019).
Xu, R. et al. Human iPSC-derived mature microglia retain their identity and functionally integrate in the chimeric mouse brain. Nat. Commun. 11, 1577 (2020).
Pandya, H. et al. Differentiation of human and murine induced pluripotent stem cells to microglia-like cells. Nat. Neurosci. 20, 753–759 (2017).
Dräger, N. M. et al. A CRISPRi/a platform in human iPSC-derived microglia uncovers regulators of disease states. Nat. Neurosci. 25, 1149–1162 (2022).
Chen, S.-W. et al. Efficient conversion of human induced pluripotent stem cells into microglia by defined transcription factors. Stem Cell Rep. 16, 1363–1380 (2021).
Ginhoux, F. et al. Fate mapping analysis reveals that adult microglia derive from primitive macrophages. Science 330, 841–845 (2010).
Nayak, D., Roth, T. L. & McGavern, D. B. Microglia development and function. Annu. Rev. Immunol. 32, 367–402 (2014).
Matcovitch-Natan, O. et al. Microglia development follows a stepwise program to regulate brain homeostasis. Science 353, aad8670 (2016).
Kierdorf, K. et al. Microglia emerge from erythromyeloid precursors via Pu.1- and Irf8-dependent pathways. Nat. Neurosci. 16, 273–280 (2013).
Smith, A. M. et al. The transcription factor PU.1 is critical for viability and function of human brain microglia. Glia 61, 929–942 (2013).
Galatro, T. F. et al. Transcriptomic analysis of purified human cortical microglia reveals age-associated changes. Nat. Neurosci. 20, 1162–1171 (2017).
Zhong, S. et al. A single-cell RNA-seq survey of the developmental landscape of the human prefrontal cortex. Nature 555, 524–528 (2018).
Olah, M. et al. A transcriptomic atlas of aged human microglia. Nat. Commun. 9, 539 (2018).
Butovsky, O. et al. Identification of a unique TGF-β-dependent molecular and functional signature in microglia. Nat. Neurosci. 17, 131–143 (2014).
Wehrspaun, C. C., Haerty, W. & Ponting, C. P. Microglia recapitulate a hematopoietic master regulator network in the aging human frontal cortex. Neurobiol. Aging 36, 2443.e9–2443.e20 (2015).
Avellino, R. & Delwel, R. Expression and regulation of C/EBPα in normal myelopoiesis and in malignant transformation. Blood 129, 2083–2091 (2017).
Lichtinger, M. et al. RUNX1 reshapes the epigenetic landscape at the onset of haematopoiesis. EMBO J. 31, 4318–4333 (2012).
Starck, J. et al. Spi-1/PU.1 is a positive regulator of the Fli-1 gene involved in inhibition of erythroid differentiation in friend erythroleukemic cell lines. Mol. Cell Biol. 19, 121–135 (1999).
Hoeffel, G. & Ginhoux, F. Ontogeny of tissue-resident macrophages. Front. Immunol. 6, 486 (2015).
Bazan, J. F. et al. A new class of membrane-bound chemokine with a CX3C motif. Nature 385, 640–644 (1997).
Hughes, P. M., Botham, M. S., Frentzel, S., Mir, A. & Perry, V. H. Expression of fractalkine (CX3CL1) and its receptor, CX3CR1, during acute and chronic inflammation in the rodent CNS. Glia 37, 314–327 (2002).
Liu, Z. et al. Systematic comparison of 2A peptides for cloning multi-genes in a polycistronic vector. Sci. Rep. 7, 2193 (2017).
Okita, K., Nakagawa, M., Hyenjong, H., Ichisaka, T. & Yamanaka, S. Generation of mouse induced pluripotent stem cells without viral vectors. Science 322, 949–953 (2008).
Carey, B. W. et al. Reprogramming factor stoichiometry influences the epigenetic state and biological properties of induced pluripotent stem cells. Cell Stem Cell 9, 588–598 (2011).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Cahan, P. et al. CellNet: network biology applied to stem cell engineering. Cell 158, 903–915 (2014).
Kracht, L. et al. Human fetal microglia acquire homeostatic immune-sensing properties early in development. Science 369, 530–537 (2020).
Zhang, Y., Parmigiani, G. & Johnson, W. E. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genom. Bioinform 2, lqaa078 (2020).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 14, 128 (2013).
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
Cao, J. et al. A human cell atlas of fetal gene expression. Science 370, eaba7721 (2020).
Zhang, X. et al. CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res. 47, D721–D728 (2019).
Dolan, M.-J. et al. Exposure of iPSC-derived human microglia to brain substrates enables the generation and manipulation of diverse transcriptional states in vitro. Nat. Immunol. 24, 1382–1390 (2023).
Li, Y. et al. Decoding the temporal and regional specification of microglia in the developing human brain. Cell Stem Cell 29, 620–634.e6 (2022).
Jakovljevic, M. et al. Induction of NTPDase1/CD39 by reactive microglia and macrophages is associated with the functional state during EAE. Front Neurosci. 13, 410 (2019).
Janus-faced microglia: beneficial and detrimental consequences of microglial phagocytosis—PubMed. https://pubmed.ncbi.nlm.nih.gov/23386811/.
Guvenek, A. et al. Transcriptional profiling in microglia across physiological and pathological states identifies a transcriptional module associated with neurodegeneration. Commun. Biol. 7, 1168 (2024).
Ivashkiv, L. B. IFNγ: signalling, epigenetics and roles in immunity, metabolism, disease and cancer immunotherapy. Nat. Rev. Immunol. 18, 545–558 (2018).
Zhong, L. et al. Amyloid-beta modulates microglial responses by binding to the triggering receptor expressed on myeloid cells 2 (TREM2). Mol. Neurodegener. 13, 15 (2018).
Zhao, W. et al. TDP-43 activates microglia through NF-κB and NLRP3 inflammasome. Exp. Neurol. 273, 24–35 (2015).
Zhang, F. et al. IFN-γ and TNF-α drive a CXCL10 + CCL2+ macrophage phenotype expanded in severe COVID-19 lungs and inflammatory diseases with tissue inflammation. Genome Med. 13, 64 (2021).
Inoue, K. Purinergic systems in microglia. Cell Mol. Life Sci. 65, 3074–3080 (2008).
Di Virgilio, F., Ceruti, S., Bramanti, P. & Abbracchio, M. P. Purinergic signalling in inflammation of the central nervous system. Trends Neurosci. 32, 79–87 (2009).
Van de Sande, B. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat. Protoc. 15, 2247–2276 (2020).
Hecker, M., Lambeck, S., Toepfer, S., van Someren, E. & Guthke, R. Gene regulatory network inference: data integration in dynamic models-a review. Biosystems 96, 86–103 (2009).
Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).
Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2022).
Kanehisa, M., Furumichi, M., Sato, Y., Ishiguro-Watanabe, M. & Tanabe, M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 49, D545–D551 (2021).
Liu, Z.-P., Wu, C., Miao, H. & Wu, H. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Database (Oxford) 2015 (2015).
Oughtred, R. et al. The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 30, 187–200 (2021).
Szklarczyk, D. et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, D605–D612 (2021).
Flitsch, L. J., Laupman, K. E. & Brüstle, O. Transcription factor-based fate specification and forward programming for neural regeneration. Front. Cell. Neurosci. 14 (2020).
Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006).
Haniffa, M. et al. A roadmap for the Human Developmental Cell Atlas. Nature 597, 196–205 (2021).
Wang, L. et al. Stoichiometry of Gata4, Mef2c, and Tbx5 influences the efficiency and quality of induced cardiac myocyte reprogramming. Circ. Res 116, 237–244 (2015).
Hoeffel, G. et al. C-Myb(+) erythro-myeloid progenitor-derived fetal monocytes give rise to adult tissue-resident macrophages. Immunity 42, 665–678 (2015).
Crotti, A. & Ransohoff, R. M. Microglial Physiology and Pathophysiology: Insights from Genome-wide Transcriptional Profiling. Immunity 44, 505–515 (2016).
Nimmerjahn, A., Kirchhoff, F. & Helmchen, F. Resting microglial cells are highly dynamic surveillants of brain parenchyma in vivo. Science 308, 1314–1318 (2005).
Salter, M. W. & Beggs, S. Sublime microglia: expanding roles for the guardians of the CNS. Cell 158, 15–24 (2014).
Colonna, M. & Butovsky, O. Microglia function in the central nervous system during health and neurodegeneration. Annu. Rev. Immunol. 35, 441–468 (2017).
Salter, M. W. & Stevens, B. Microglia emerge as central players in brain disease. Nat. Med. 23, 1018–1027 (2017).
Mathys, H. et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570, 332–337 (2019).
Keren-Shaul, H. et al. A Unique Microglia Type Associated with Restricting Development of Alzheimer’s Disease. Cell 169, 1276–1290.e17 (2017).
Yeh, F. L., Hansen, D. V. & Sheng, M. TREM2, microglia, and neurodegenerative diseases. Trends Mol. Med. 23, 512–533 (2017).
Ulland, T. K. et al. TREM2 maintains microglial metabolic fitness in alzheimer’s disease. Cell 170, 649–663.e13 (2017).
Dello Russo, C. et al. The human microglial HMC3 cell line: where do we stand? A systematic literature review. J. Neuroinflammation 15, 259 (2018).
Timmerman, R., Burm, S. M. & Bajramovic, J. J. An Overview of in vitro Methods to Study Microglia. Front. Cell Neurosci. 12, 242 (2018).
Maeda, T., Sugita, S., Kurimoto, Y. & Takahashi, M. Trends of stem cell therapies in age-related macular degeneration. J. Clin. Med 10, 1785 (2021).
de Klerk, E. & Hebrok, M. Stem cell-based clinical trials for diabetes mellitus. Front. Endocrinol. (Lausanne) 12, 631463 (2021).
Shi, Y., Inoue, H., Wu, J. C. & Yamanaka, S. Induced pluripotent stem cell technology: a decade of progress. Nat. Rev. Drug Discov. 16, 115–130 (2017).
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data—PubMed. https://pubmed.ncbi.nlm.nih.gov/34557778/.
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Liu, S. et al. Iterative transcription factor screening enables rapid generation of microglia-like cells from human iPSC, churchlab/Microglia:Microglia V1.0 (Microglia) (Microglia). Zenodo. https://doi.org/10.5281/zenodo.15026232 (2025).
Acknowledgements
The authors would like to acknowledge Gabriel Filsinger, Yu Wang, Cory Smith, Dima Ter-Ovanesyan, and Michael Chou for helpful discussions. The authors would also like to thank the Biopolymers Facility at HMS, Research Computing at HMS, Harvard Chan Bioinformatics Core, and HMS Immunology Flow Cytometry Core Facility for technical assistance. This work has been supported by the National Human Genome Research Institute of the National Institutes of Health [RM1 HG008525 to G.M.C.] and the Lipper Foundation. Some schematics in this paper were created using BioRender.com as specified in the figure legends.
Author information
Authors and Affiliations
Contributions
G.M.C. and S.L. conceived the project. S.L., with guidance from A.H.M.N. and P.K., performed early exploratory experiments. S.L., G.M.C., L.L., F.Z. designed overall experimental and analytical strategies. S.L. performed experiments with help from B.S., Y.C., E.A., M.G-C., C-T.W., J.Y.H., Y.T., P.F., and G.C. L.L. and R.N.R. helped with single-cell TF barcode quantification and regression analysis. Y.R.L. and J.C. performed PiggyBac copy number quantification. F.Z. and S.R. helped with cell atlas integration and mapping. J.A., J.M.T., E.L., and K.M. provided significant discussion and input over project design. S.L. and K.M. wrote the manuscript with help from L.L. and M.G-C., and with input and feedback from all authors.
Corresponding authors
Ethics declarations
Competing interests
S.L. and G.M.C. are listed as inventors of a patent related to work on this article. G.M.C., P.K., and A.H.M.N. are co-founders/employees/advisors at, and have equity in GC Therapeutics, Inc, and are inventors on patents filed by the Presidents and Fellows of Harvard College. Full disclosure for GMC is available at arep.med.harvard.edu/gmc/tech.html. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Keisuke Kaji, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, S., Li, L., Zhang, F. et al. Iterative transcription factor screening enables rapid generation of microglia-like cells from human iPSC. Nat Commun 16, 5136 (2025). https://doi.org/10.1038/s41467-025-59596-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-59596-3