Fig. 4: Structural characterization of the NMPFs.
From: Unraveling the functional dark matter through global metagenomics

a, Protein clusters with at least 16 effective sequences (eff. seqs) or many contacts were submitted to AlphaFold. The results were filtered to include structures with high predicted confidence (pTM ≥ 0.70), which were then clustered on the basis of pairwise TM-score calculation. All of the subsequent steps of the workflow display the number of unique clusters followed by the total number of NMPFs in parentheses. As filtering was performed at the NMPF level, only the numbers in parentheses will sum, as it is possible for members of the same cluster to fall on different sides of each TM-score filtering step. Each predicted structure was aligned against SCOPe domains. Models with no hits to SCOPe were further aligned and filtered if there were any hits to full PDB assemblies or one of the SCOPe domains aligned to at least 50% of the predicted structure. The domains (from SCOPe matches) or multi-___domain (from PDB matches) were further screened using HHsearch against the PDB. The PDB of the top hit was compared to the prediction. b, Models with no significant hits to either SCOPe or PDB were considered to be potential novel folds. pLDDT, per-residue confidence score. c, Models with hits to either SCOPe domains or PDB biological assemblies with no significant HHsearch hits (HMM-TM-score < 0.5) were considered to be novel assignments.