Abstract
Biomolecular condensates organize numerous subcellular processes and have been implicated in diseases, including neurodegeneration and cancer. Protein sequences intrinsically encode their propensity to form condensates, but specific sequence features that regulate this behavior have not been systematically explored at scale. Here, we develop CondenSeq, a high-throughput pooled imaging with in situ sequencing approach to measure propensities of thousands of protein sequences to form nuclear condensates. Leveraging the large scale of these experiments, we evaluated the impacts of dozens of sequence features across a wide range of sequence contexts, identifying several features with highly consistent, context-independent effects and others with less-consistent effects. We also identified multiple classes of condensates and discovered distinct sequence properties that drive their formation. Our results provide a systematic overview of the relationships between protein sequences and nuclear condensate formation and establish a general approach for further dissecting these relationships at scale.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
27,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
269,00 € per year
only 22,42 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
Images have been deposited to the Bioimage Archive77 (accession no. S-BIAD1738, https://doi.org/10.6019/S-BIAD1738). Processed data for all protein sequences are available in Supplementary Data 1–3. FINCHES predictions for the CondenSeq large library sequences are available on Zenodo at https://doi.org/10.5281/zenodo.15098929 (ref. 78). Previously published databases that we used for sequence design or analysis are publicly available at MobiDB (https://mobidb.org/), LLPSDB (http://bio-comp.org.cn/llpsdb/home.html), Disprot (https://disprot.org/) and Phasepro (https://phasepro.elte.hu/).
Code availability
Code for analyzing SBS data is available at https://github.com/kkappel1/OpticalPooledScreens2023 (this is a slightly modified version of previously published code51). Additional image analysis code, including examples, is available at https://github.com/kkappel1/ops_analysis.
References
Banani, S. F., Lee, H. O., Hyman, A. A. & Rosen, M. K. Biomolecular condensates: organizers of cellular biochemistry. Nat. Rev. Mol. Cell Biol. 18, 285–298 (2017).
Shin, Y. & Brangwynne, C. P. Liquid phase condensation in cell physiology and disease. Science 357, eaaf4382 (2017).
Lyon, A. S., Peeples, W. B. & Rosen, M. K. A framework for understanding the functions of biomolecular condensates across scales. Nat. Rev. Mol. Cell Biol. 22, 215–235 (2021).
Nedelsky, N. B. & Taylor, J. P. Bridging biophysics and neurology: aberrant phase transitions in neurodegenerative disease. Nat. Rev. Neurol. 15, 272–286 (2019).
Alberti, S. & Hyman, A. A. Biomolecular condensates at the nexus of cellular stress, protein aggregation disease and ageing. Nat. Rev. Mol. Cell Biol. 22, 196–213 (2021).
Boija, A., Klein, I. A. & Young, R. A. Biomolecular condensates and cancer. Cancer Cell 39, 174–192 (2021).
McSwiggen, D. T., Mir, M., Darzacq, X. & Tjian, R. Evaluating phase separation in live cells: diagnosis, caveats, and functional consequences. Genes Dev. 33, 1619–1634 (2019).
A, P. & Weber, S. C. Evidence for and against liquid-liquid phase separation in the nucleus. Noncoding RNA 5, 50 (2019).
Sabari, B. R., Dall’Agnese, A. & Young, R. A. Biomolecular condensates in the nucleus. Trends Biochem. Sci. 45, 961–977 (2020).
Borcherds, W., Bremer, A., Borgia, M. B. & Mittag, T. How do intrinsically disordered protein regions encode a driving force for liquid-liquid phase separation? Curr. Opin. Struct. Biol. 67, 41–50 (2021).
Holehouse, A. S. & Kragelund, B. B. The molecular basis for cellular function of intrinsically disordered protein regions. Nat. Rev. Mol. Cell Biol. 25, 187–211 (2024).
Chen, J. & Kriwacki, R. W. Intrinsically disordered proteins: structure, function and therapeutics. J. Mol. Biol. 430, 2275–2277 (2018).
Schuster, B. S. et al. Biomolecular condensates: Sequence determinants of phase separation, microstructural organization, enzymatic activity, and material properties. J. Phys. Chem. B 125, 3441–3451 (2021).
Nott, T. J. et al. Phase transition of a disordered nuage protein generates environmentally responsive membraneless organelles. Mol. Cell 57, 936–947 (2015).
Martin, E. W. et al. Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science 367, 694–699 (2020).
Wang, J. et al. A molecular grammar governing the driving forces for phase separation of prion-like RNA binding proteins. Cell 174, 688–699.e16 (2018).
Pak, C. W. et al. Sequence determinants of intracellular phase separation by complex coacervation of a disordered protein. Mol. Cell 63, 72–85 (2016).
Bremer, A. et al. Deciphering how naturally occurring sequence features impact the phase behaviours of disordered prion-like domains. Nat. Chem. 14, 196–207 (2022).
Schuster, B. S. et al. Identifying sequence perturbations to an intrinsically disordered protein that determine its phase-separation behavior. Proc. Natl Acad. Sci. USA 117, 11421–11431 (2020).
Greig, J. A. et al. Arginine-enriched mixed-charge domains provide cohesion for nuclear speckle condensation. Mol. Cell 77, 1237–1250.e4 (2020).
Quiroz, F. G. & Chilkoti, A. Sequence heuristics to encode phase behaviour in intrinsically disordered protein polymers. Nat. Mater. 14, 1164–1171 (2015).
Yang, Y., Jones, H. B., Dao, T. P. & Castañeda, C. A. Single amino acid substitutions in stickers, but not spacers, substantially alter UBQLN2 phase transitions and dense phase material properties. J. Phys. Chem. B 123, 3618–3629 (2019).
Tripathi, S. et al. Defining the condensate landscape of fusion oncoproteins. Nat. Commun. 14, 6008 (2023).
Rekhi, S. et al. Expanding the molecular language of protein liquid-liquid phase separation. Nat. Chem. 16, 1113–1124 (2024).
Patil, A. et al. A disordered region controls cBAF activity via condensation and partner recruitment. Cell 186, 4936–4955 e26 (2023).
Joseph, J. A. et al. Physics-driven coarse-grained model for biomolecular phase separation with near-quantitative accuracy. Nat. Comput Sci. 1, 732–743 (2021).
Ruff, K. M., Pappu, R. V. & Holehouse, A. S. Conformational preferences and phase behavior of intrinsically disordered low complexity sequences: insights from multiscale simulations. Curr. Opin. Struct. Biol. 56, 1–10 (2019).
Harmon, T. S., Holehouse, A. S., Rosen, M. K. & Pappu, R. V. Intrinsically disordered linkers determine the interplay between phase separation and gelation in multivalent proteins. eLife 6, e30294 (2017).
Lin, Y.-H., Brady, J. P., Forman-Kay, J. D. & Chan, H. S. Charge pattern matching as a ‘fuzzy’mode of molecular recognition for the functional phase separations of intrinsically disordered proteins. N. J. Phys. 19, 115003 (2017).
Zheng, W. et al. Hydropathy patterning complements charge patterning to describe conformational preferences of disordered proteins. J. Phys. Chem. Lett. 11, 3408–3415 (2020).
Das, R. K. & Pappu, R. V. Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proc. Natl Acad. Sci. USA 110, 13392–13397 (2013).
Weiner, B. G., Pyo, A. G., Meir, Y. & Wingreen, N. S. Motif-pattern dependence of biomolecular phase separation driven by specific interactions. PLoS Comput. Biol. 17, e1009748 (2021).
Statt, A., Casademunt, H., Brangwynne, C. P. & Panagiotopoulos, A. Z. Model for disordered proteins with strongly sequence-dependent liquid phase behavior. J. Chem. Phys. 152, 075101 (2020).
Choi, J.-M., Dar, F. & Pappu, R. V. LASSI: A lattice model for simulating phase transitions of multivalent proteins. PLoS Comput. Biol. 15, e1007028 (2019).
Krainer, G. et al. Reentrant liquid condensate phase of proteins is stabilized by hydrophobic and non-ionic interactions. Biophys. J. 120, 28a (2021).
Maharana, S. et al. RNA buffers the phase separation behavior of prion-like RNA binding proteins. Science 360, 918–921 (2018).
Alberti, S., Gladfelter, A. & Mittag, T. Considerations and challenges in studying liquid-liquid phase separation and biomolecular condensates. Cell 176, 419–434 (2019).
Datta, D. et al. Nucleo-cytoplasmic environment modulates spatiotemporal p53 phase separation. Sci. Adv. 10, eads0427 (2024).
Mitrea, D. M. et al. Methods for physical characterization of phase-separated bodies and membrane-less organelles. J. Mol. Biol. 430, 4773–4805 (2018).
Saar, K. L. et al. Protein Condensate Atlas from predictive models of heteromolecular condensate composition. Nat. Commun. 15, 5418 (2024).
Hadarovich, A. et al. PICNIC accurately predicts condensate-forming proteins regardless of their structural disorder across organisms. Nat. Commun. 15, 10668 (2024).
Kilgore, H. R. et al. Protein codes promote selective subcellular compartmentalization. Science 387, 1095–1101 (2025).
von Bülow, S., Tesei, G., Zaidi, F. K., Mittag, T. & Lindorff-Larsen, K. Prediction of phase-separation propensities of disordered proteins from sequence. Proc. Natl. Acad. Sci. USA 122, e2417920122 (2025).
Ginell, G. M. et al. Sequence-based prediction of intermolecular interactions driven by disordered regions. Science 388, eadq8381 (2025).
Saar, K. L. et al. Theoretical and data-driven approaches for biomolecular condensates. Chem. Rev. 123, 8988–9009 (2023).
Chu, X. et al. Prediction of liquid-liquid phase separating proteins using machine learning. BMC Bioinform. 23, 72 (2022).
Cai, H., Vernon, R. M. & Forman-Kay, J. D. An interpretable machine-learning algorithm to predict disordered protein phase separation based on biophysical interactions. Biomolecules 12, 1131 (2022).
Erkamp, N. A., Qi, R., Welsh, T. J. & Knowles, T. P. J. Microfluidics for multiscale studies of biomolecular condensates. Lab Chip 23, 9–24 (2022).
Alberti, S. et al. A user’s guide for phase separation assays with purified proteins. J. Mol. Biol. 430, 4806–4820 (2018).
Chen, T., Lei, Q., Shi, M. & Li, T. High-throughput experimental methods for investigating biomolecular condensates. Quant. Biol. 9, 255–266 (2021).
Feldman, D. et al. Pooled genetic perturbation screens with image-based phenotypes. Nat. Protoc. 17, 476–512 (2022).
Feldman, D. et al. Optical pooled screens in human cells. Cell 179, 787–799.e17 (2019).
Irgen-Gioro, S., Yoshida, S., Walling, V. & Chong, S. Fixation can change the appearance of phase separation in living cells. eLife 11, e79903 (2022).
Schmidt, H. B., Barreau, A. & Rohatgi, R. Phase separation-deficient TDP43 remains functional in splicing. Nat. Commun. 10, 4890 (2019).
Altmeyer, M. et al. Liquid demixing of intrinsically disordered proteins is seeded by poly (ADP-ribose). Nat. Commun. 6, 8088 (2015).
Saito, M. et al. Acetylation of intrinsically disordered regions regulates phase separation. Nat. Chem. Biol. 15, 51–61 (2019).
Andrusiak, M. G. et al. Inhibition of axon regeneration by liquid-like TIAR-2 granules. Neuron 104, 290–304.e8 (2019).
Bracha, D. et al. Mapping local and global liquid phase behavior in living cells using photo-oligomerizable seeds. Cell 175, 1467–1480.e13 (2018).
Garcia-Jove Navarro, M. et al. RNA is a critical element for the sizing and the composition of phase-separated RNA–protein condensates. Nat. Commun. 10, 3230 (2019).
Rana, U. et al. Asymmetric oligomerization state and sequence patterning can tune multiphase condensate miscibility. Nat. Chem. 16, 1073–1082 (2024).
Crabtree, M. D. et al. Ion binding with charge inversion combined with screening modulates DEAD box helicase phase transitions. Cell Rep. 42, 113375 (2023).
Bah, A. & Forman-Kay, J. D. Modulation of intrinsically disordered protein function by post-translational modifications. J. Biol. Chem. 291, 6696–6705 (2016).
Hofweber, M. & Dormann, D. Friend or foe—post-translational modifications as regulators of phase separation and RNP granule dynamics. J. Biol. Chem. 294, 7137–7150 (2019).
Lin, Y., Currie, S. L. & Rosen, M. K. Intrinsically disordered sequences enable modulation of protein phase separation through distributed tyrosine motifs. J. Biol. Chem. 292, 19110–19120 (2017).
Dzuricky, M. et al. De novo engineering of intracellular condensates using artificial disordered proteins. Nat. Chem. 12, 814–825 (2020).
Maristany, M. J. et al. Decoding Phase Separation of Prion-Like Domains through Data-Driven Scaling Laws (eLife Sciences Publications, (2024).
Pesce, F. et al. Design of intrinsically disordered protein variants with diverse structural properties. Sci. Adv. 10, eadm9926 (2024).
Kobayashi, H., Cheveralls, K. C., Leonetti, M. D. & Royer, L. A. Self-supervised deep learning encodes high-resolution features of protein subcellular localization. Nat. Methods 19, 995–1003 (2022).
Martin, R. M. et al. Principles of protein targeting to the nucleolus. Nucleus 6, 314–325 (2015).
Klosin, A. et al. Phase separation provides a mechanism to reduce noise in cells. Science 367, 464–468 (2020).
Riback, J. A. et al. Composition-dependent thermodynamics of intracellular phase separation. Nature 581, 209–214 (2020).
Dörner, K. et al. Tag with caution: how protein tagging influences the formation of condensates. Preprint at bioRxiv https://doi.org/10.1101/2024.10.04.616694 (2024).
Ginell, G. M. & Holehouse, A. S. Analyzing the sequences of intrinsically disordered regions with CIDER and localCIDER. Methods Mol Biol. 2141, 103–126 (2020).
Mészáros, B., Erdős, G. & Dosztányi, Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 46, W329–W337 (2018).
Cohan, M. C., Shinn, M. K., Lalmansingh, J. M. & Pappu, R. V. Uncovering non-random binary patterns within sequences of intrinsically disordered proteins. J. Mol. Biol. 434, 167373 (2022).
Lancaster, A. K., Nutter-Upham, A., Lindquist, S. & King, O. D. PLAAC: a web and command-line application to identify proteins with prion-like amino acid composition. Bioinformatics 30, 2501–2502 (2014).
Hartley, M. et al. The BioImage Archive – building a home for life-sciences microscopy data. J. Mol. Biol. 434, 167505 (2022).
Kappel, K. CondenSeq large sequence library: FINCHES predictions. Zenodo https://doi.org/10.5281/zenodo.15098929 (2025).
Acknowledgements
We thank D. Abbondanza for assistance with confocal imaging for pilot experiments; M. Alimova, R. Muraleedharan, P. Byrne and the entire CDoT High-Content Imaging Facility team for imaging assistance and for maintaining the Opera Phenix High-Content Screening System at the Broad Institute; A. Singh, R. Walton, O. Ursu, X. Chen, K.G.-Schuller, P. Thakore and T. Harvey for advice and discussions about initial experiments. We thank all members of the Zhang laboratory for helpful discussions and support. K.K. was supported by the Schmidt Science Fellows, in partnership with the Rhodes Trust and the HHMI Hanna H. Gray Fellows Program. A.R. was an HHMI Investigator when this study was initiated. Work was supported by the Klarman Cell Observatory (A.R.). D.S. was supported by fellowships from the Swiss National Science Foundation (P400PB_199261 and P2ELP3_187926). K.K.E. is supported by the Helen Hay Whitney Foundation Postdoctoral Fellowship. Work was supported by HHMI (F.Z.).
Author information
Authors and Affiliations
Contributions
K.K.: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Writing – Original Draft, Writing – Review & Editing, Visualization, Supervision, Funding acquisition. D.S.: Investigation, Scientific Discussion, Writing – Review & Editing. K.H.K.E.: Scientific Discussion, Writing – Review & Editing, Visualization. S.V.: Investigation, Writing – Review & Editing. C.V.: Scientific Discussion, Writing – Review & Editing. T.B.: Resources, Writing – Review & Editing. S.F.: Resources, Writing – Review & Editing. R.M.: Supervision, Writing – Review & Editing, Visualization. F.Z.: Conceptualization, Writing – Review & Editing, Supervision, Funding acquisition. A.R.: Conceptualization, Writing – Review & Editing, Supervision, Funding acquisition.
Corresponding authors
Ethics declarations
Competing interests
A.R. is a founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas Therapeutics and until 31 August 2020 was a scientific advisory board member of Syros Pharmaceuticals, Neogene Therapeutics, Asimov and Thermo Fisher Scientific. Since 1 August 2020, A.R. has been an employee of Genentech, a member of the Roche Group, with equity in Roche. F.Z. is a scientific advisor and cofounder of Beam Therapeutics, Pairwise Plants, Arbor Biotechnologies, Aera Therapeutics and Moonwalk Biosciences. F.Z. is also a scientific advisor for Octant. All other authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Pilong Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Live cell timelapse experiments.
(A) Schematic overview of the live cell timelapse experiments. (B) Example traces from live cell timelapse imaging (barcode: TCGGG). Left: The presence of condensates in three example cells as a function of total protein concentration. The dashed black line denotes the average threshold concentration (Cthresh), the concentration at which condensates are first observed, for these cells. Right: Images of the same three cells (nuclei, masked) over time. Dashed pink boxes denote the first frame in which condensates appear. Two independent biological replicates of this experiment were performed with the same results. (C) Reproducibility between replicates for live cell timelapse experiments. For the fraction of cells with condensates (fcondensates; left), each point represents fcondensates for one protein sequence within a defined concentration bin. For the fraction of GFP in condensates, each point represents the mean value of the fraction of the GFP signal in condensates over all cells that express a particular protein sequence within a defined concentration bin. (D) Reproducibility of threshold concentrations determined from independent replicates of live cell timelapse experiments. Each point represents the threshold concentration for a single protein sequence. (E) The fraction of the total GFP signal found in condensates for the pooled versus arrayed experiments. Each point represents the mean value of the fraction of the GFP signal in condensates over all cells that express a particular protein sequence within the medium concentration bin. Pearson’s correlation is noted on the plot. (F) Representative images of nuclei (masked) from arrayed and pooled experiments (all sequences were fused to the 24-mer oligomerization ___domain and GFP). Barcodes are indicated on the left of each pair of images. Scale bar = 5 µm. Two independent biological replicates of this experiment were performed with the same results. (G) Cthresh for protein sequences corresponding to previously studied constructs. Each red point denotes Cthresh for a single cell. Each gray point represents the maximum concentration (Cmax) observed over the full timelapse for cells that do not form condensates. Black lines in the violin plots show the medians of Cthresh for each protein sequence. The expected change in Cthresh relative to the corresponding wild type (WT) fragment, based on previous studies, is indicated with an arrow pointing up for increased Cthresh or down for decreased Cthresh. Solid horizontal lines are shown for WT sequences. * denotes statistically significant difference compared to the corresponding wild type threshold intensity (two-sided t-test, p values adjusted for multiple comparisons by applying the Bonferroni correction). The dashed black line denotes 0.06 µM protein concentration, the lowest protein concentration that we could reliably distinguish from background. p values: hnRNPA1 add many Y = 2×10−32; hnRNPA1 add fewer Y = 2×10−51; hnRNPA1 add 1 Y = 0.19; DDX4 all R to A = 4×10−124; DDX4 all F to A = 3×10−10; DDX4 1 F to A = 0.43; TDP-43 hydrophobic & aromatic to S = 7×10−87; TDP-43 aromatic to S = 8×10−58; TDP-43 all hydrophobic & aromatic to S = 7×10−83; EWS all Q to R = 2×10−86; EWS 3 polar to R = 2×10−119; EWS 1 polar to R = 2×10−16; DDX3 all K to Q = 2×10−13; DDX3 3 K to Q = 0.26; DDX3 1 K to Q = 0.73; TIAR-2 all Y to G = 2×10−41; TIAR-2 all S to A = 1.0; TIAR-2 1 Y to G = 0.0002. hnRNPA1 WT fragment, n = 102 cells; hnRNPA1 add many Y, n = 116 cells; hnRNPA1 add fewer Y, n = 204 cells; hnRNPA1 add 1 Y, n = 100 cells; DDX4 WT fragment, n = 203 cells; DDX4 all R to A, n = 307 cells; DDX4 all F to A, n = 194 cells; DDX4 1 F to A, n = 149 cells; TDP-43 WT fragment, n = 234 cells; TDP-43 hydrophobic & aromatic to S, n = 181 cells; TDP-43 aromatic to S, n = 219 cells; TDP-43 all hydrophobic & aromatic to S, n = 306 cells; EWS WT fragment, n = 224 cells; EWS all Q to R, n = 146 cells; EWS 3 polar to R, n = 275 cells; EWS 1 polar to R, n = 82 cells; DDX3 WT fragment, n = 206 cells; DDX3 all K to Q, n = 134 cells; DDX3 3 K to Q, n = 186 cells; DDX3 1 K to Q, n = 140 cells; TIAR-2 WT fragment, n = 154 cells; TIAR-2 all Y to G, n = 297 cells; TIAR-2 all S to A, n = 88 cells; TIAR-2 1 Y to G, n = 258 cells.
Extended Data Fig. 2 Details of protein sequence libraries.
(A) Composition of the large protein sequence library. (B) Principal component analysis of the amino acid composition and dipeptide composition of all sequences in the large protein sequence, as well as all sequences in the human proteome, and all disordered regions in the human proteome.
Extended Data Fig. 3 Assessing the impact of valence on condensate formation.
(A) Schematic of the experiment to test the effect of protein valence on condensate formation. The small sequence library is fused to GFP and four different oligomerization domains resulting in valence 1, 4, 6, or 24, then cells are imaged and barcodes are read out. (B) Fraction of cells that contain condensates for the small sequence library fused to GFP and each of the four different oligomerization domains. Each point represents one protein sequence. Black lines show the means. The increases in fcondensates as valence is increased are all statistically significant (valence = 1 vs 4: p = 0.002; 4 vs 6: p = 2×10−6; 6 vs 24: p = 6×10−10, two-sided paired t-test, after Bonferroni correction, medium test protein concentration bin). (C) Example images of cells (masked nuclei) expressing protein sequences (rows) fused to GFP and each oligomerization ___domain (columns). These example images are representative of the following numbers of cells for which we collected data in our defined concentration bins: 1933 (AAGCG, valence=1), 1629 (AAGCG, valence=4), 1040 (AAGCG, valence=6), 760 (AAGCG, valence=24), 1360 (TCGCC, valence=1), 1730 (TCGCC, valence=4), 1492 (TCGCC, valence=6), 1348 (TCGCC, valence=24), 2775 (AACCT, valence=1), 3722 (AACCT, valence=4), 3445 (AACCT, valence=6), 2238 (AACCT, valence=24), 629 (AAAGA, valence=1), 761 (AAAGA, valence=4), 736 (AAAGA, valence=6), 487 (AAAGA, valence=24). Scale bars denote 5 µm.
Extended Data Fig. 4 Systematic assessment of the effects of amino acid patterning on condensate formation.
All data presented in this figure is for sequences fused to GFP in the medium concentration bin. (A) Schematic of patterning parameters. Each string of circles represents a protein sequence, with each circle representing a single amino acid. Negative z-scores for patterning parameters indicate well-mixed amino acids of the specified type, while more positive z-scores indicate higher segregation of the specified amino acids. z-scores are computed with NARDINI1. (B) Each violin shows fcondensates for all scrambled versions of the specified base sequence. Each black dot represents a single scrambled sequence. Red bars denote the values for the base sequences. Black bars denote the means of the scrambled sequences. Violins are ordered by the difference between the means of the base and scrambled sequences (low to high). * denotes base sequence values (red bars) that are statistically unlikely, given the given the distribution of fcondensates values for all of the scrambled variants of that base sequence (black dots) (smoothed empirical CDF test, see Supplementary Note 1 for detailed description of this test; p values: NUP100 = 0.0003, DYRK1A = 0.006, RBM14 = 0.003, SYN1 = 0.004, NAB3 = 5×10−13). (C) The change in fcondensates for patterning mutants versus unpatterned sequences that do not form condensates (patterning score near 0, Supplementary Note 1). Each violin contains sequences that test the effect of a different patterning parameter. “>” and “<” indicate mutants that increase or decrease the designated patterning parameter. The mutants shown here have a substantial change only in the designated patterning parameter; for example for δ+- mutants, there is little change in other patterning parameters. The colors and sizes of the dots indicate the change in the patterning value of the mutant sequence relative to the unpatterned sequence. Asterisks denote groups with a significant change in the fraction of cells with condensates and red lines show their mean values (two-sided Wilcoxon signed-rank test; >δ+- p value = 0.04). Gray lines denote the mean values for other groups. p values are adjusted for multiple comparisons by applying the Bonferroni correction. The dashed black line is shown as a reference point marking a change of 0, that is no difference between mutant and base sequences. (D) Correlation between sequence features and fcondensates for all large library sequences that contain at least 5% positively charged, negatively charged, aromatic, hydrophobic, and polar amino acids (so that all patterning parameters can be computed for all sequences). The colors of the bars represent the Pearson correlation (r value); bars are only shown if two-sided p values are less than 0.05. p values are adjusted for multiple comparisons by applying the Bonferroni correction. Black outlines denote bars for patterning parameters.
Extended Data Fig. 5 Assessing the impacts of different types of mutations across many sequence contexts.
(A) The change in the propensities for mutant sequences versus base sequences to form condensates. The values in the heatmap are the mean values of fcondensates for all mutations of the specified type minus fcondensates for the base sequence. All values plotted are for the GFP fusions in the medium concentration bin. There may not be data for a given mutation type (box colored yellow) for one of two reasons: (1) it was not possible to make the mutation type for that sequence (for example, it is not possible to make a–R mutant if the base sequence does not contain any R residues); or (2) the sequence was not expressed within the GFP fusion medium concentration bin. The top two rows show consistency scores over the base sequences for which fcondensates is less than 0.5 or greater than 0.5, respectively. The consistency score indicates the fraction of base sequences over which the sequence feature has the most common effect (1.0 indicates that the sequence feature has the given effect across 100% of the base sequences) (Supplementary Note 1). The dot size indicates the number of base sequences for which there is data for the given sequence feature. The two rows below the consistency scores show the mean Δ fcondensates values for the base sequences for which fcondensates is less than 0.5 or greater than 0.5, respectively.
Extended Data Fig. 6 Features of sequences in clusters with distinct intermolecular chemical specificities.
(A) Mean features of the sequences in each cluster (Fig. 5b) with expression in the medium concentration bin (GFP fusions). Homotypic ε is the FINCHES interaction parameter for the interaction of the test protein sequence with itself. The fraction heterotypic ε < homotypic ε is the fraction of the FINCHES interaction parameters for a test protein sequence with all human IDRs that are less than the homotypic ε value for that test protein sequence. The number of favorable interactions means the number of human IDRs with which a test protein has an attractive FINCHES interaction parameter (less than −3). The minimum and maximum values for the color scale are as follows: 0 to 0.27 for the fraction of each individual amino acid (for example, fraction A); 0 to 0.4 for the fraction of each group of amino acids (for example, fraction ILMV); 0 to 1.0 for the relative fractions of amino acids or amino acid groups (for example, fraction R/RK or fraction FWY//FWYILV); −1.31 to 1.31 for the patterning features (for example, δ+-); −0.21 to 0.21 for NCPR; 1 to 6 for mean hydropathy; −12 to 12 for homotypic ε; 0 to 1 for the fraction heterotypic ε < homotypic ε; 0 to 3000 for the number of favorable interactions; 0 to 1 for fcondensates (medium concentration bin, GFP fusion). (B) Standard deviations of the sequence features for each cluster. Clusters are the same as those shown in (A). The minimum and maximum values for the color scale are as follows: 0 to 0.15 for the fraction of each individual amino acid (for example, fraction A); 0 to 0.15 for the fraction of each group of amino acids (for example, fraction ILMV); 0 to 0.44 for the relative fractions of amino acids or amino acid groups (for example, fraction R/RK or fraction FWY//FWYILV); 0 to 2 for the patterning features (for example, δ+-); 0 to 0.1 for NCPR; 0 to 1 for mean hydropathy; 0 to 5 for homotypic ε; 0 to 0.3 for the fraction heterotypic ε < homotypic ε; 0 to 900 for the number of favorable interactions; 0 to 0.4 for fcondensates (medium concentration bin, GFP fusion).
Supplementary information
Supplementary Information
Supplementary Results, Supplementary Discussion, Supplementary Notes, Supplementary Figs. 1–15 and Supplementary Tables 1, 2, 4 and 6.
Supplementary Tables
Supplementary Table 3: P values and n values for Fig. 3. Supplementary Table 5: Colocalization of GFP and SNAP-tag sequences with endogenous nuclear condensates. Supplementary Table 7: Effects of mutating positively charged residues on nucleolar and chromatin localization. Localization is classified as nucleolar, chromatin, other (the sequence forms condensates, but does not localize to the nucleolus or chromatin) or none (the sequence does not form condensates). All data are shown for GFP fusions in the medium concentration bin. Supplementary Table 8: Primer sequences and DNA sequences encoding constructs for arrayed experiments. Supplementary Table 9: P values and n values for Supplementary Fig. 12.
Supplementary Data 1
Data for small sequence library.
Supplementary Data 2
Long sequence library information and data.
Supplementary Data 3
Data for large sequence library.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kappel, K., Strebinger, D., Edmonds, K.K. et al. Characterizing protein sequence determinants of nuclear condensates by high-throughput pooled imaging with CondenSeq. Nat Methods (2025). https://doi.org/10.1038/s41592-025-02726-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41592-025-02726-y
This article is cited by
-
How to spy on condensates
Nature Methods (2025)