Characterizing protein sequence determinants of nuclear condensates by high-throughput pooled imaging with CondenSeq

Kappel, Kalli; Strebinger, Daniel; Edmonds, KeHuan K.; Chau-Duy-Tam VO, Samuel; Vockley, Christopher M.; Biswas, Tridib; Farhi, Samouil L.; Macrae, Rhiannon; Zhang, Feng; Regev, Aviv

doi:10.1038/s41592-025-02726-y

Article
Published: 16 June 2025

Characterizing protein sequence determinants of nuclear condensates by high-throughput pooled imaging with CondenSeq

Nature Methods (2025)Cite this article

5105 Accesses
1 Citations
64 Altmetric
Metrics details

Subjects

Abstract

Biomolecular condensates organize numerous subcellular processes and have been implicated in diseases, including neurodegeneration and cancer. Protein sequences intrinsically encode their propensity to form condensates, but specific sequence features that regulate this behavior have not been systematically explored at scale. Here, we develop CondenSeq, a high-throughput pooled imaging with in situ sequencing approach to measure propensities of thousands of protein sequences to form nuclear condensates. Leveraging the large scale of these experiments, we evaluated the impacts of dozens of sequence features across a wide range of sequence contexts, identifying several features with highly consistent, context-independent effects and others with less-consistent effects. We also identified multiple classes of condensates and discovered distinct sequence properties that drive their formation. Our results provide a systematic overview of the relationships between protein sequences and nuclear condensate formation and establish a general approach for further dissecting these relationships at scale.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: CondenSeq: pooled image-based characterization of condensates.**

**Fig. 2: Characterization of a diverse library of protein sequences.**

**Fig. 3: Large-scale mutagenesis to assess effects of amino acid composition on condensate formation.**

**Fig. 4: Image-based classification of different types of condensates.**

**Fig. 5: Classifying sequences based on predicted intermolecular chemical specificity.**

Emerging regulatory mechanisms and functions of biomolecular condensates: implications for therapeutic targets

Article Open access 06 January 2025

RNA-mediated demixing transition of low-density condensates

Article Open access 27 April 2023

Protein Condensate Atlas from predictive models of heteromolecular condensate composition

Article Open access 10 July 2024

Data availability

Images have been deposited to the Bioimage Archive⁷⁷ (accession no. S-BIAD1738, https://doi.org/10.6019/S-BIAD1738). Processed data for all protein sequences are available in Supplementary Data 1–3. FINCHES predictions for the CondenSeq large library sequences are available on Zenodo at https://doi.org/10.5281/zenodo.15098929 (ref. ⁷⁸). Previously published databases that we used for sequence design or analysis are publicly available at MobiDB (https://mobidb.org/), LLPSDB (http://bio-comp.org.cn/llpsdb/home.html), Disprot (https://disprot.org/) and Phasepro (https://phasepro.elte.hu/).

Code availability

Code for analyzing SBS data is available at https://github.com/kkappel1/OpticalPooledScreens2023 (this is a slightly modified version of previously published code⁵¹). Additional image analysis code, including examples, is available at https://github.com/kkappel1/ops_analysis.

References

Banani, S. F., Lee, H. O., Hyman, A. A. & Rosen, M. K. Biomolecular condensates: organizers of cellular biochemistry. Nat. Rev. Mol. Cell Biol. 18, 285–298 (2017).
Article CAS PubMed PubMed Central Google Scholar
Shin, Y. & Brangwynne, C. P. Liquid phase condensation in cell physiology and disease. Science 357, eaaf4382 (2017).
Article PubMed Google Scholar
Lyon, A. S., Peeples, W. B. & Rosen, M. K. A framework for understanding the functions of biomolecular condensates across scales. Nat. Rev. Mol. Cell Biol. 22, 215–235 (2021).
Article CAS PubMed Google Scholar
Nedelsky, N. B. & Taylor, J. P. Bridging biophysics and neurology: aberrant phase transitions in neurodegenerative disease. Nat. Rev. Neurol. 15, 272–286 (2019).
Article PubMed Google Scholar
Alberti, S. & Hyman, A. A. Biomolecular condensates at the nexus of cellular stress, protein aggregation disease and ageing. Nat. Rev. Mol. Cell Biol. 22, 196–213 (2021).
Article CAS PubMed Google Scholar
Boija, A., Klein, I. A. & Young, R. A. Biomolecular condensates and cancer. Cancer Cell 39, 174–192 (2021).
Article CAS PubMed PubMed Central Google Scholar
McSwiggen, D. T., Mir, M., Darzacq, X. & Tjian, R. Evaluating phase separation in live cells: diagnosis, caveats, and functional consequences. Genes Dev. 33, 1619–1634 (2019).
Article CAS PubMed PubMed Central Google Scholar
A, P. & Weber, S. C. Evidence for and against liquid-liquid phase separation in the nucleus. Noncoding RNA 5, 50 (2019).
PubMed PubMed Central Google Scholar
Sabari, B. R., Dall’Agnese, A. & Young, R. A. Biomolecular condensates in the nucleus. Trends Biochem. Sci. 45, 961–977 (2020).
Article CAS PubMed PubMed Central Google Scholar
Borcherds, W., Bremer, A., Borgia, M. B. & Mittag, T. How do intrinsically disordered protein regions encode a driving force for liquid-liquid phase separation? Curr. Opin. Struct. Biol. 67, 41–50 (2021).
Article CAS PubMed Google Scholar
Holehouse, A. S. & Kragelund, B. B. The molecular basis for cellular function of intrinsically disordered protein regions. Nat. Rev. Mol. Cell Biol. 25, 187–211 (2024).
Chen, J. & Kriwacki, R. W. Intrinsically disordered proteins: structure, function and therapeutics. J. Mol. Biol. 430, 2275–2277 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schuster, B. S. et al. Biomolecular condensates: Sequence determinants of phase separation, microstructural organization, enzymatic activity, and material properties. J. Phys. Chem. B 125, 3441–3451 (2021).
Article CAS PubMed PubMed Central Google Scholar
Nott, T. J. et al. Phase transition of a disordered nuage protein generates environmentally responsive membraneless organelles. Mol. Cell 57, 936–947 (2015).
Article CAS PubMed PubMed Central Google Scholar
Martin, E. W. et al. Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science 367, 694–699 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, J. et al. A molecular grammar governing the driving forces for phase separation of prion-like RNA binding proteins. Cell 174, 688–699.e16 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pak, C. W. et al. Sequence determinants of intracellular phase separation by complex coacervation of a disordered protein. Mol. Cell 63, 72–85 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bremer, A. et al. Deciphering how naturally occurring sequence features impact the phase behaviours of disordered prion-like domains. Nat. Chem. 14, 196–207 (2022).
Article CAS PubMed Google Scholar
Schuster, B. S. et al. Identifying sequence perturbations to an intrinsically disordered protein that determine its phase-separation behavior. Proc. Natl Acad. Sci. USA 117, 11421–11431 (2020).
Article CAS PubMed PubMed Central Google Scholar
Greig, J. A. et al. Arginine-enriched mixed-charge domains provide cohesion for nuclear speckle condensation. Mol. Cell 77, 1237–1250.e4 (2020).
Article CAS PubMed PubMed Central Google Scholar
Quiroz, F. G. & Chilkoti, A. Sequence heuristics to encode phase behaviour in intrinsically disordered protein polymers. Nat. Mater. 14, 1164–1171 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yang, Y., Jones, H. B., Dao, T. P. & Castañeda, C. A. Single amino acid substitutions in stickers, but not spacers, substantially alter UBQLN2 phase transitions and dense phase material properties. J. Phys. Chem. B 123, 3618–3629 (2019).
Article CAS PubMed Google Scholar
Tripathi, S. et al. Defining the condensate landscape of fusion oncoproteins. Nat. Commun. 14, 6008 (2023).
Article CAS PubMed PubMed Central Google Scholar
Rekhi, S. et al. Expanding the molecular language of protein liquid-liquid phase separation. Nat. Chem. 16, 1113–1124 (2024).
Article CAS PubMed PubMed Central Google Scholar
Patil, A. et al. A disordered region controls cBAF activity via condensation and partner recruitment. Cell 186, 4936–4955 e26 (2023).
Article CAS PubMed PubMed Central Google Scholar
Joseph, J. A. et al. Physics-driven coarse-grained model for biomolecular phase separation with near-quantitative accuracy. Nat. Comput Sci. 1, 732–743 (2021).
Article PubMed PubMed Central Google Scholar
Ruff, K. M., Pappu, R. V. & Holehouse, A. S. Conformational preferences and phase behavior of intrinsically disordered low complexity sequences: insights from multiscale simulations. Curr. Opin. Struct. Biol. 56, 1–10 (2019).
Article CAS PubMed Google Scholar
Harmon, T. S., Holehouse, A. S., Rosen, M. K. & Pappu, R. V. Intrinsically disordered linkers determine the interplay between phase separation and gelation in multivalent proteins. eLife 6, e30294 (2017).
Article PubMed PubMed Central Google Scholar
Lin, Y.-H., Brady, J. P., Forman-Kay, J. D. & Chan, H. S. Charge pattern matching as a ‘fuzzy’mode of molecular recognition for the functional phase separations of intrinsically disordered proteins. N. J. Phys. 19, 115003 (2017).
Article Google Scholar
Zheng, W. et al. Hydropathy patterning complements charge patterning to describe conformational preferences of disordered proteins. J. Phys. Chem. Lett. 11, 3408–3415 (2020).
Article CAS PubMed PubMed Central Google Scholar
Das, R. K. & Pappu, R. V. Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proc. Natl Acad. Sci. USA 110, 13392–13397 (2013).
Article CAS PubMed PubMed Central Google Scholar
Weiner, B. G., Pyo, A. G., Meir, Y. & Wingreen, N. S. Motif-pattern dependence of biomolecular phase separation driven by specific interactions. PLoS Comput. Biol. 17, e1009748 (2021).
Article CAS PubMed PubMed Central Google Scholar
Statt, A., Casademunt, H., Brangwynne, C. P. & Panagiotopoulos, A. Z. Model for disordered proteins with strongly sequence-dependent liquid phase behavior. J. Chem. Phys. 152, 075101 (2020).
Article CAS PubMed Google Scholar
Choi, J.-M., Dar, F. & Pappu, R. V. LASSI: A lattice model for simulating phase transitions of multivalent proteins. PLoS Comput. Biol. 15, e1007028 (2019).
Article CAS PubMed PubMed Central Google Scholar
Krainer, G. et al. Reentrant liquid condensate phase of proteins is stabilized by hydrophobic and non-ionic interactions. Biophys. J. 120, 28a (2021).
Article Google Scholar
Maharana, S. et al. RNA buffers the phase separation behavior of prion-like RNA binding proteins. Science 360, 918–921 (2018).
Article CAS PubMed PubMed Central Google Scholar
Alberti, S., Gladfelter, A. & Mittag, T. Considerations and challenges in studying liquid-liquid phase separation and biomolecular condensates. Cell 176, 419–434 (2019).
Article CAS PubMed PubMed Central Google Scholar
Datta, D. et al. Nucleo-cytoplasmic environment modulates spatiotemporal p53 phase separation. Sci. Adv. 10, eads0427 (2024).
Article CAS PubMed PubMed Central Google Scholar
Mitrea, D. M. et al. Methods for physical characterization of phase-separated bodies and membrane-less organelles. J. Mol. Biol. 430, 4773–4805 (2018).
Article CAS PubMed PubMed Central Google Scholar
Saar, K. L. et al. Protein Condensate Atlas from predictive models of heteromolecular condensate composition. Nat. Commun. 15, 5418 (2024).
Article CAS PubMed PubMed Central Google Scholar
Hadarovich, A. et al. PICNIC accurately predicts condensate-forming proteins regardless of their structural disorder across organisms. Nat. Commun. 15, 10668 (2024).
Article CAS PubMed PubMed Central Google Scholar
Kilgore, H. R. et al. Protein codes promote selective subcellular compartmentalization. Science 387, 1095–1101 (2025).
Article CAS PubMed Google Scholar
von Bülow, S., Tesei, G., Zaidi, F. K., Mittag, T. & Lindorff-Larsen, K. Prediction of phase-separation propensities of disordered proteins from sequence. Proc. Natl. Acad. Sci. USA 122, e2417920122 (2025).
Ginell, G. M. et al. Sequence-based prediction of intermolecular interactions driven by disordered regions. Science 388, eadq8381 (2025).
Article CAS PubMed Google Scholar
Saar, K. L. et al. Theoretical and data-driven approaches for biomolecular condensates. Chem. Rev. 123, 8988–9009 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chu, X. et al. Prediction of liquid-liquid phase separating proteins using machine learning. BMC Bioinform. 23, 72 (2022).
Article CAS Google Scholar
Cai, H., Vernon, R. M. & Forman-Kay, J. D. An interpretable machine-learning algorithm to predict disordered protein phase separation based on biophysical interactions. Biomolecules 12, 1131 (2022).
Article CAS PubMed PubMed Central Google Scholar
Erkamp, N. A., Qi, R., Welsh, T. J. & Knowles, T. P. J. Microfluidics for multiscale studies of biomolecular condensates. Lab Chip 23, 9–24 (2022).
Article PubMed PubMed Central Google Scholar
Alberti, S. et al. A user’s guide for phase separation assays with purified proteins. J. Mol. Biol. 430, 4806–4820 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chen, T., Lei, Q., Shi, M. & Li, T. High-throughput experimental methods for investigating biomolecular condensates. Quant. Biol. 9, 255–266 (2021).
Article CAS Google Scholar
Feldman, D. et al. Pooled genetic perturbation screens with image-based phenotypes. Nat. Protoc. 17, 476–512 (2022).
Article CAS PubMed PubMed Central Google Scholar
Feldman, D. et al. Optical pooled screens in human cells. Cell 179, 787–799.e17 (2019).
Article CAS PubMed PubMed Central Google Scholar
Irgen-Gioro, S., Yoshida, S., Walling, V. & Chong, S. Fixation can change the appearance of phase separation in living cells. eLife 11, e79903 (2022).
Article CAS PubMed PubMed Central Google Scholar
Schmidt, H. B., Barreau, A. & Rohatgi, R. Phase separation-deficient TDP43 remains functional in splicing. Nat. Commun. 10, 4890 (2019).
Article CAS PubMed PubMed Central Google Scholar
Altmeyer, M. et al. Liquid demixing of intrinsically disordered proteins is seeded by poly (ADP-ribose). Nat. Commun. 6, 8088 (2015).
Article CAS PubMed Google Scholar
Saito, M. et al. Acetylation of intrinsically disordered regions regulates phase separation. Nat. Chem. Biol. 15, 51–61 (2019).
Article CAS PubMed Google Scholar
Andrusiak, M. G. et al. Inhibition of axon regeneration by liquid-like TIAR-2 granules. Neuron 104, 290–304.e8 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bracha, D. et al. Mapping local and global liquid phase behavior in living cells using photo-oligomerizable seeds. Cell 175, 1467–1480.e13 (2018).
Article CAS PubMed PubMed Central Google Scholar
Garcia-Jove Navarro, M. et al. RNA is a critical element for the sizing and the composition of phase-separated RNA–protein condensates. Nat. Commun. 10, 3230 (2019).
Article PubMed PubMed Central Google Scholar
Rana, U. et al. Asymmetric oligomerization state and sequence patterning can tune multiphase condensate miscibility. Nat. Chem. 16, 1073–1082 (2024).
Article CAS PubMed PubMed Central Google Scholar
Crabtree, M. D. et al. Ion binding with charge inversion combined with screening modulates DEAD box helicase phase transitions. Cell Rep. 42, 113375 (2023).
Article CAS PubMed PubMed Central Google Scholar
Bah, A. & Forman-Kay, J. D. Modulation of intrinsically disordered protein function by post-translational modifications. J. Biol. Chem. 291, 6696–6705 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hofweber, M. & Dormann, D. Friend or foe—post-translational modifications as regulators of phase separation and RNP granule dynamics. J. Biol. Chem. 294, 7137–7150 (2019).
Article CAS PubMed Google Scholar
Lin, Y., Currie, S. L. & Rosen, M. K. Intrinsically disordered sequences enable modulation of protein phase separation through distributed tyrosine motifs. J. Biol. Chem. 292, 19110–19120 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dzuricky, M. et al. De novo engineering of intracellular condensates using artificial disordered proteins. Nat. Chem. 12, 814–825 (2020).
Article CAS PubMed PubMed Central Google Scholar
Maristany, M. J. et al. Decoding Phase Separation of Prion-Like Domains through Data-Driven Scaling Laws (eLife Sciences Publications, (2024).
Pesce, F. et al. Design of intrinsically disordered protein variants with diverse structural properties. Sci. Adv. 10, eadm9926 (2024).
Article CAS PubMed PubMed Central Google Scholar
Kobayashi, H., Cheveralls, K. C., Leonetti, M. D. & Royer, L. A. Self-supervised deep learning encodes high-resolution features of protein subcellular localization. Nat. Methods 19, 995–1003 (2022).
Article CAS PubMed PubMed Central Google Scholar
Martin, R. M. et al. Principles of protein targeting to the nucleolus. Nucleus 6, 314–325 (2015).
Article CAS PubMed PubMed Central Google Scholar
Klosin, A. et al. Phase separation provides a mechanism to reduce noise in cells. Science 367, 464–468 (2020).
Article CAS PubMed Google Scholar
Riback, J. A. et al. Composition-dependent thermodynamics of intracellular phase separation. Nature 581, 209–214 (2020).
Article CAS PubMed PubMed Central Google Scholar
Dörner, K. et al. Tag with caution: how protein tagging influences the formation of condensates. Preprint at bioRxiv https://doi.org/10.1101/2024.10.04.616694 (2024).
Ginell, G. M. & Holehouse, A. S. Analyzing the sequences of intrinsically disordered regions with CIDER and localCIDER. Methods Mol Biol. 2141, 103–126 (2020).
Article CAS PubMed Google Scholar
Mészáros, B., Erdős, G. & Dosztányi, Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 46, W329–W337 (2018).
Article PubMed PubMed Central Google Scholar
Cohan, M. C., Shinn, M. K., Lalmansingh, J. M. & Pappu, R. V. Uncovering non-random binary patterns within sequences of intrinsically disordered proteins. J. Mol. Biol. 434, 167373 (2022).
Article CAS PubMed Google Scholar
Lancaster, A. K., Nutter-Upham, A., Lindquist, S. & King, O. D. PLAAC: a web and command-line application to identify proteins with prion-like amino acid composition. Bioinformatics 30, 2501–2502 (2014).
Article CAS PubMed PubMed Central Google Scholar
Hartley, M. et al. The BioImage Archive – building a home for life-sciences microscopy data. J. Mol. Biol. 434, 167505 (2022).
Article CAS PubMed Google Scholar
Kappel, K. CondenSeq large sequence library: FINCHES predictions. Zenodo https://doi.org/10.5281/zenodo.15098929 (2025).

Download references

Acknowledgements

We thank D. Abbondanza for assistance with confocal imaging for pilot experiments; M. Alimova, R. Muraleedharan, P. Byrne and the entire CDoT High-Content Imaging Facility team for imaging assistance and for maintaining the Opera Phenix High-Content Screening System at the Broad Institute; A. Singh, R. Walton, O. Ursu, X. Chen, K.G.-Schuller, P. Thakore and T. Harvey for advice and discussions about initial experiments. We thank all members of the Zhang laboratory for helpful discussions and support. K.K. was supported by the Schmidt Science Fellows, in partnership with the Rhodes Trust and the HHMI Hanna H. Gray Fellows Program. A.R. was an HHMI Investigator when this study was initiated. Work was supported by the Klarman Cell Observatory (A.R.). D.S. was supported by fellowships from the Swiss National Science Foundation (P400PB_199261 and P2ELP3_187926). K.K.E. is supported by the Helen Hay Whitney Foundation Postdoctoral Fellowship. Work was supported by HHMI (F.Z.).

Author information

Aviv Regev
Present address: Genentech, South San Francisco, CA, USA

Authors and Affiliations

Howard Hughes Medical Institute, Cambridge, MA, USA
Kalli Kappel, Daniel Strebinger, KeHuan K. Edmonds, Samuel Chau-Duy-Tam VO, Rhiannon Macrae & Feng Zhang
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Kalli Kappel, Daniel Strebinger, KeHuan K. Edmonds, Samuel Chau-Duy-Tam VO, Christopher M. Vockley, Rhiannon Macrae, Feng Zhang & Aviv Regev
McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
Kalli Kappel, Daniel Strebinger, KeHuan K. Edmonds, Samuel Chau-Duy-Tam VO, Rhiannon Macrae & Feng Zhang
Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Kalli Kappel, Daniel Strebinger, KeHuan K. Edmonds, Samuel Chau-Duy-Tam VO, Rhiannon Macrae & Feng Zhang
Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Kalli Kappel, Daniel Strebinger, KeHuan K. Edmonds, Samuel Chau-Duy-Tam VO, Rhiannon Macrae & Feng Zhang
Spatial Technology Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Tridib Biswas & Samouil L. Farhi

Authors

Kalli Kappel
View author publications
Search author on:PubMed Google Scholar
Daniel Strebinger
View author publications
Search author on:PubMed Google Scholar
KeHuan K. Edmonds
View author publications
Search author on:PubMed Google Scholar
Samuel Chau-Duy-Tam VO
View author publications
Search author on:PubMed Google Scholar
Christopher M. Vockley
View author publications
Search author on:PubMed Google Scholar
Tridib Biswas
View author publications
Search author on:PubMed Google Scholar
Samouil L. Farhi
View author publications
Search author on:PubMed Google Scholar
Rhiannon Macrae
View author publications
Search author on:PubMed Google Scholar
Feng Zhang
View author publications
Search author on:PubMed Google Scholar
Aviv Regev
View author publications
Search author on:PubMed Google Scholar

Contributions

K.K.: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Writing – Original Draft, Writing – Review & Editing, Visualization, Supervision, Funding acquisition. D.S.: Investigation, Scientific Discussion, Writing – Review & Editing. K.H.K.E.: Scientific Discussion, Writing – Review & Editing, Visualization. S.V.: Investigation, Writing – Review & Editing. C.V.: Scientific Discussion, Writing – Review & Editing. T.B.: Resources, Writing – Review & Editing. S.F.: Resources, Writing – Review & Editing. R.M.: Supervision, Writing – Review & Editing, Visualization. F.Z.: Conceptualization, Writing – Review & Editing, Supervision, Funding acquisition. A.R.: Conceptualization, Writing – Review & Editing, Supervision, Funding acquisition.

Corresponding authors

Correspondence to Kalli Kappel or Aviv Regev.

Ethics declarations

Competing interests

A.R. is a founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas Therapeutics and until 31 August 2020 was a scientific advisory board member of Syros Pharmaceuticals, Neogene Therapeutics, Asimov and Thermo Fisher Scientific. Since 1 August 2020, A.R. has been an employee of Genentech, a member of the Roche Group, with equity in Roche. F.Z. is a scientific advisor and cofounder of Beam Therapeutics, Pairwise Plants, Arbor Biotechnologies, Aera Therapeutics and Moonwalk Biosciences. F.Z. is also a scientific advisor for Octant. All other authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Pilong Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Live cell timelapse experiments.

(A) Schematic overview of the live cell timelapse experiments. (B) Example traces from live cell timelapse imaging (barcode: TCGGG). Left: The presence of condensates in three example cells as a function of total protein concentration. The dashed black line denotes the average threshold concentration (C_thresh), the concentration at which condensates are first observed, for these cells. Right: Images of the same three cells (nuclei, masked) over time. Dashed pink boxes denote the first frame in which condensates appear. Two independent biological replicates of this experiment were performed with the same results. (C) Reproducibility between replicates for live cell timelapse experiments. For the fraction of cells with condensates (f_condensates; left), each point represents f_condensates for one protein sequence within a defined concentration bin. For the fraction of GFP in condensates, each point represents the mean value of the fraction of the GFP signal in condensates over all cells that express a particular protein sequence within a defined concentration bin. (D) Reproducibility of threshold concentrations determined from independent replicates of live cell timelapse experiments. Each point represents the threshold concentration for a single protein sequence. (E) The fraction of the total GFP signal found in condensates for the pooled versus arrayed experiments. Each point represents the mean value of the fraction of the GFP signal in condensates over all cells that express a particular protein sequence within the medium concentration bin. Pearson’s correlation is noted on the plot. (F) Representative images of nuclei (masked) from arrayed and pooled experiments (all sequences were fused to the 24-mer oligomerization ___domain and GFP). Barcodes are indicated on the left of each pair of images. Scale bar = 5 µm. Two independent biological replicates of this experiment were performed with the same results. (G) C_thresh for protein sequences corresponding to previously studied constructs. Each red point denotes C_thresh for a single cell. Each gray point represents the maximum concentration (C_max) observed over the full timelapse for cells that do not form condensates. Black lines in the violin plots show the medians of C_thresh for each protein sequence. The expected change in C_thresh relative to the corresponding wild type (WT) fragment, based on previous studies, is indicated with an arrow pointing up for increased C_thresh or down for decreased C_thresh. Solid horizontal lines are shown for WT sequences. * denotes statistically significant difference compared to the corresponding wild type threshold intensity (two-sided t-test, p values adjusted for multiple comparisons by applying the Bonferroni correction). The dashed black line denotes 0.06 µM protein concentration, the lowest protein concentration that we could reliably distinguish from background. p values: hnRNPA1 add many Y = 2×10⁻³²; hnRNPA1 add fewer Y = 2×10⁻⁵¹; hnRNPA1 add 1 Y = 0.19; DDX4 all R to A = 4×10⁻¹²⁴; DDX4 all F to A = 3×10⁻¹⁰; DDX4 1 F to A = 0.43; TDP-43 hydrophobic & aromatic to S = 7×10⁻⁸⁷; TDP-43 aromatic to S = 8×10⁻⁵⁸; TDP-43 all hydrophobic & aromatic to S = 7×10⁻⁸³; EWS all Q to R = 2×10⁻⁸⁶; EWS 3 polar to R = 2×10⁻¹¹⁹; EWS 1 polar to R = 2×10⁻¹⁶; DDX3 all K to Q = 2×10⁻¹³; DDX3 3 K to Q = 0.26; DDX3 1 K to Q = 0.73; TIAR-2 all Y to G = 2×10⁻⁴¹; TIAR-2 all S to A = 1.0; TIAR-2 1 Y to G = 0.0002. hnRNPA1 WT fragment, n = 102 cells; hnRNPA1 add many Y, n = 116 cells; hnRNPA1 add fewer Y, n = 204 cells; hnRNPA1 add 1 Y, n = 100 cells; DDX4 WT fragment, n = 203 cells; DDX4 all R to A, n = 307 cells; DDX4 all F to A, n = 194 cells; DDX4 1 F to A, n = 149 cells; TDP-43 WT fragment, n = 234 cells; TDP-43 hydrophobic & aromatic to S, n = 181 cells; TDP-43 aromatic to S, n = 219 cells; TDP-43 all hydrophobic & aromatic to S, n = 306 cells; EWS WT fragment, n = 224 cells; EWS all Q to R, n = 146 cells; EWS 3 polar to R, n = 275 cells; EWS 1 polar to R, n = 82 cells; DDX3 WT fragment, n = 206 cells; DDX3 all K to Q, n = 134 cells; DDX3 3 K to Q, n = 186 cells; DDX3 1 K to Q, n = 140 cells; TIAR-2 WT fragment, n = 154 cells; TIAR-2 all Y to G, n = 297 cells; TIAR-2 all S to A, n = 88 cells; TIAR-2 1 Y to G, n = 258 cells.

Extended Data Fig. 2 Details of protein sequence libraries.

(A) Composition of the large protein sequence library. (B) Principal component analysis of the amino acid composition and dipeptide composition of all sequences in the large protein sequence, as well as all sequences in the human proteome, and all disordered regions in the human proteome.

Extended Data Fig. 3 Assessing the impact of valence on condensate formation.

(A) Schematic of the experiment to test the effect of protein valence on condensate formation. The small sequence library is fused to GFP and four different oligomerization domains resulting in valence 1, 4, 6, or 24, then cells are imaged and barcodes are read out. (B) Fraction of cells that contain condensates for the small sequence library fused to GFP and each of the four different oligomerization domains. Each point represents one protein sequence. Black lines show the means. The increases in f_condensates as valence is increased are all statistically significant (valence = 1 vs 4: p = 0.002; 4 vs 6: p = 2×10⁻⁶; 6 vs 24: p = 6×10⁻¹⁰, two-sided paired t-test, after Bonferroni correction, medium test protein concentration bin). (C) Example images of cells (masked nuclei) expressing protein sequences (rows) fused to GFP and each oligomerization ___domain (columns). These example images are representative of the following numbers of cells for which we collected data in our defined concentration bins: 1933 (AAGCG, valence=1), 1629 (AAGCG, valence=4), 1040 (AAGCG, valence=6), 760 (AAGCG, valence=24), 1360 (TCGCC, valence=1), 1730 (TCGCC, valence=4), 1492 (TCGCC, valence=6), 1348 (TCGCC, valence=24), 2775 (AACCT, valence=1), 3722 (AACCT, valence=4), 3445 (AACCT, valence=6), 2238 (AACCT, valence=24), 629 (AAAGA, valence=1), 761 (AAAGA, valence=4), 736 (AAAGA, valence=6), 487 (AAAGA, valence=24). Scale bars denote 5 µm.

Extended Data Fig. 4 Systematic assessment of the effects of amino acid patterning on condensate formation.

All data presented in this figure is for sequences fused to GFP in the medium concentration bin. (A) Schematic of patterning parameters. Each string of circles represents a protein sequence, with each circle representing a single amino acid. Negative z-scores for patterning parameters indicate well-mixed amino acids of the specified type, while more positive z-scores indicate higher segregation of the specified amino acids. z-scores are computed with NARDINI¹. (B) Each violin shows f_condensates for all scrambled versions of the specified base sequence. Each black dot represents a single scrambled sequence. Red bars denote the values for the base sequences. Black bars denote the means of the scrambled sequences. Violins are ordered by the difference between the means of the base and scrambled sequences (low to high). * denotes base sequence values (red bars) that are statistically unlikely, given the given the distribution of f_condensates values for all of the scrambled variants of that base sequence (black dots) (smoothed empirical CDF test, see Supplementary Note 1 for detailed description of this test; p values: NUP100 = 0.0003, DYRK1A = 0.006, RBM14 = 0.003, SYN1 = 0.004, NAB3 = 5×10⁻¹³). (C) The change in f_condensates for patterning mutants versus unpatterned sequences that do not form condensates (patterning score near 0, Supplementary Note 1). Each violin contains sequences that test the effect of a different patterning parameter. “>” and “<” indicate mutants that increase or decrease the designated patterning parameter. The mutants shown here have a substantial change only in the designated patterning parameter; for example for δ_+- mutants, there is little change in other patterning parameters. The colors and sizes of the dots indicate the change in the patterning value of the mutant sequence relative to the unpatterned sequence. Asterisks denote groups with a significant change in the fraction of cells with condensates and red lines show their mean values (two-sided Wilcoxon signed-rank test; >δ_+- p value = 0.04). Gray lines denote the mean values for other groups. p values are adjusted for multiple comparisons by applying the Bonferroni correction. The dashed black line is shown as a reference point marking a change of 0, that is no difference between mutant and base sequences. (D) Correlation between sequence features and f_condensates for all large library sequences that contain at least 5% positively charged, negatively charged, aromatic, hydrophobic, and polar amino acids (so that all patterning parameters can be computed for all sequences). The colors of the bars represent the Pearson correlation (r value); bars are only shown if two-sided p values are less than 0.05. p values are adjusted for multiple comparisons by applying the Bonferroni correction. Black outlines denote bars for patterning parameters.

Extended Data Fig. 5 Assessing the impacts of different types of mutations across many sequence contexts.

(A) The change in the propensities for mutant sequences versus base sequences to form condensates. The values in the heatmap are the mean values of f_condensates for all mutations of the specified type minus f_condensates for the base sequence. All values plotted are for the GFP fusions in the medium concentration bin. There may not be data for a given mutation type (box colored yellow) for one of two reasons: (1) it was not possible to make the mutation type for that sequence (for example, it is not possible to make a–R mutant if the base sequence does not contain any R residues); or (2) the sequence was not expressed within the GFP fusion medium concentration bin. The top two rows show consistency scores over the base sequences for which f_condensates is less than 0.5 or greater than 0.5, respectively. The consistency score indicates the fraction of base sequences over which the sequence feature has the most common effect (1.0 indicates that the sequence feature has the given effect across 100% of the base sequences) (Supplementary Note 1). The dot size indicates the number of base sequences for which there is data for the given sequence feature. The two rows below the consistency scores show the mean Δ f_condensates values for the base sequences for which f_condensates is less than 0.5 or greater than 0.5, respectively.

Extended Data Fig. 6 Features of sequences in clusters with distinct intermolecular chemical specificities.

(A) Mean features of the sequences in each cluster (Fig. 5b) with expression in the medium concentration bin (GFP fusions). Homotypic ε is the FINCHES interaction parameter for the interaction of the test protein sequence with itself. The fraction heterotypic ε < homotypic ε is the fraction of the FINCHES interaction parameters for a test protein sequence with all human IDRs that are less than the homotypic ε value for that test protein sequence. The number of favorable interactions means the number of human IDRs with which a test protein has an attractive FINCHES interaction parameter (less than −3). The minimum and maximum values for the color scale are as follows: 0 to 0.27 for the fraction of each individual amino acid (for example, fraction A); 0 to 0.4 for the fraction of each group of amino acids (for example, fraction ILMV); 0 to 1.0 for the relative fractions of amino acids or amino acid groups (for example, fraction R/RK or fraction FWY//FWYILV); −1.31 to 1.31 for the patterning features (for example, δ_+-); −0.21 to 0.21 for NCPR; 1 to 6 for mean hydropathy; −12 to 12 for homotypic ε; 0 to 1 for the fraction heterotypic ε < homotypic ε; 0 to 3000 for the number of favorable interactions; 0 to 1 for f_condensates (medium concentration bin, GFP fusion). (B) Standard deviations of the sequence features for each cluster. Clusters are the same as those shown in (A). The minimum and maximum values for the color scale are as follows: 0 to 0.15 for the fraction of each individual amino acid (for example, fraction A); 0 to 0.15 for the fraction of each group of amino acids (for example, fraction ILMV); 0 to 0.44 for the relative fractions of amino acids or amino acid groups (for example, fraction R/RK or fraction FWY//FWYILV); 0 to 2 for the patterning features (for example, δ_+-); 0 to 0.1 for NCPR; 0 to 1 for mean hydropathy; 0 to 5 for homotypic ε; 0 to 0.3 for the fraction heterotypic ε < homotypic ε; 0 to 900 for the number of favorable interactions; 0 to 0.4 for f_condensates (medium concentration bin, GFP fusion).

Extended Data Table 1 Reproducibility and numbers of barcode and protein sequences for small libraries

Full size table

Extended Data Table 2 Reproducibility between libraries

Full size table

Extended Data Table 3 Reproducibility and numbers of barcodes and protein sequences for large libraries

Full size table

Extended Data Table 4 Number of protein sequences in each dataset

Full size table

Supplementary information

Supplementary Information

Supplementary Results, Supplementary Discussion, Supplementary Notes, Supplementary Figs. 1–15 and Supplementary Tables 1, 2, 4 and 6.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Table 3: P values and n values for Fig. 3. Supplementary Table 5: Colocalization of GFP and SNAP-tag sequences with endogenous nuclear condensates. Supplementary Table 7: Effects of mutating positively charged residues on nucleolar and chromatin localization. Localization is classified as nucleolar, chromatin, other (the sequence forms condensates, but does not localize to the nucleolus or chromatin) or none (the sequence does not form condensates). All data are shown for GFP fusions in the medium concentration bin. Supplementary Table 8: Primer sequences and DNA sequences encoding constructs for arrayed experiments. Supplementary Table 9: P values and n values for Supplementary Fig. 12.

Supplementary Data 1

Data for small sequence library.

Supplementary Data 2

Long sequence library information and data.

Supplementary Data 3

Data for large sequence library.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kappel, K., Strebinger, D., Edmonds, K.K. et al. Characterizing protein sequence determinants of nuclear condensates by high-throughput pooled imaging with CondenSeq. Nat Methods (2025). https://doi.org/10.1038/s41592-025-02726-y

Download citation

Received: 09 September 2024
Accepted: 16 May 2025
Published: 16 June 2025
DOI: https://doi.org/10.1038/s41592-025-02726-y

This article is cited by

How to spy on condensates
- Vivien Marx
Nature Methods (2025)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links