Extended Data Fig. 1: Establishing the TeloHAEC CRISPRi model and Perturb-seq details.
From: Convergence of coronary artery disease genes onto endothelial cell programs

a, Enrichment of CAD heritability in TeloHAEC enhancers, from Stratified Linkage Disequilibrium Score Regression analysis (S-LDSC, see Methods), where enrichment is the percentage of heritability explained by variants in enhancers (%heritability), divided by the percentage of variants in enhancers (%SNPs). Enhancers in TeloHAEC (treated under the indicated conditions) were identified from ATAC-seq and H3K27ac ChIP-seq data (n = 6 for control ATAC, 3 for IL-1β, TNFα or VEGF ATAC, 4 for control ChIP, and 2 for IL-1β, TNFα or VEGF ChIP) by the Activity-by-Contact model. Error bars: standard error around the enrichment estimate, calculated by S-LDSC using jackknife (which resamples the data used for calculating heritability enrichment). P-values were calculated using the S-LDSC method28, and FDR by the Benjamini-Hochberg method. *: FDR < 0.05, with specific FDR values of: Ctrl; 0.037, IL-1β; 0.015, TNFα; 0.020 and VEGF; 0.041. Full S-LDSC results can be found in Supplementary Table 27. b, Scatter density plot of human right coronary artery endothelial cell single cell RNA-seq pseudobulk gene expression (from69) versus teloHAEC pseudobulk gene expression, for genes perturbed in this study. 2,107 of the 2,285 perturbed genes are expressed at TPM > 1 in healthy or diseased RCAECs. R and p-values from two sided Pearson correlation test. c, As in b, but for the 41 V2G2P genes. d, Heatmap of gene expression (log10 TPM) of the 41 V2G2P genes in diseased RCAECs and in teloHAEC. All but one gene, FBN2, is expressed at > 1 TPM in RCAECs. e, FACS showing dox inducibility of KRAB-dCas9-IRES-BFP in TeloHAEC, after sorting but before the screen. Left panels: gating for viable individual cells. Right panels: counts of gated cells by fluorescence intensity in the BFP/PB450 channel. f, BFP channel counts of cells grown in parallel and concurrently with cells for the Perturb-seq screen. After expansion to 120 M cells, transduction, selection and 5-day doxycycline treatment, 92% of cells remain BFP positive. g, Cumulative distribution fraction for duplication levels of unique CBC-UMI-Guide combinations in deeply-sequenced dialout libraries (“unique UMIs”, red) or all guide reads (blue) versus duplication level. Requiring four duplicates (dotted line) eliminates 90% of CBC-UMI-guide combinations (likely PCR chimaeras), while retaining >85% of total guide reads. h, UMIs for top guide per CBC. Arrow: the chosen 4 UMI threshold. i, Counts of singlets (1 gRNA, black bar), doublets (2) and higher multimers, as well as cells with no guide called (0), at the chosen thresholds of 4 UMIs for the top guide and 4 or more fold fewer for the next most frequent guide. j, Histogram of counts of singlet cells per target. Dotted line: average. k, As in j, but for singlet cells per guide. l, Read UMI counts for all transcripts per cell by singlet/multiplet status. The median UMI count for doublets was 37% more than singlets. Assuming that droplets with two cells will have double the number of reads as singlets, this suggests 37% of doublets are due to two cells (9.3% of cells with guides) while the remainder (15.7% of cells with guides) are due to two guides in one cell, very close to the expectation from the infection MOI of 15%. n = 352686, 214449, 79744, 19195 and 5345 cells with 0, 1, 2, 3, or 4 guides, respectively. Boxplot centre line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. m, Distribution of knockdown efficiency across target genes (log2 expression in cells containing guideRNAs targeting the gene versus in cells containing negative control guideRNAs). Grey line: all targeted genes. Yellow and red lines: genes expressed at >30 and >300 TPM, respectively. Red dotted vertical line: 40% knockdown (average for 300 + TPM target genes). n, Distribution of fitness effects across all guideRNAs (log2 ratio of guide frequency in singlet cells from the Perturb-seq experiment after 5 days of CRISPRi induction compared to guide frequency in the original guideRNA library). Guides targeting common essential genes (red) were depleted more frequently than guideRNAs targeting other genes. o, Relative count frequencies for the number of nominally significant differentially expressed (DE) genes per perturbed target (log2 of genes with raw p < 0.01, and fold change >1.15 from EdgeR DE analysis), for the indicated subclasses of targets. p, Volcano plot showing log2 (# DE genes for target)/(avg. # DE genes for non-expressed controls) versus -log10 FDR (capped at 100). Significance was assessed by two sided binomial test versus DE gene counts for the 48 perturbed non-expressed control genes. Right: symbols for target genes with the strongest effects. 10.7% of all targets had a significant effect on transcription (FDR < .05 increased DE gene count), including 31.9% of common essential genes, and 9.0% of other genes. q, Percent of perturbations that have a significant transcriptional effect in Perturb-seq, as defined by either (i) “DE Genes”, as per p, or (ii) “DE Programs”: perturbations that lead to significant changes in program expression by MAST package50 with 10X lane correction (FDR < 0.05), by each indicated class: Permuted Controls (statistical tests performed on randomly drawn cells with negative control or safe-targeting guides), Expressed (>=1 TPM in TeloHAEC bulk RNA-seq), Low or No Expression (<1 TPM), Common essential (as identified in DepMap119), TeloHAEC Proliferation (showing fitness effects, as per n, of +/−15%, FDR < 0.05), Gene near CAD GWAS signals (expressed genes nearby any CAD GWAS signal, see Methods: Defining variants in CAD GWAS signals), Gene near IBD signals (perturbed expressed genes nearby 10 selected IBD GWAS signals, with no genes overlapping those for CAD signals).