Fig. 2: Large-scale sequence similarity network for more than 6 million UniRef50 cluster representatives with high predicted accuracy models in AFDB (AFDB90). | Nature

Fig. 2: Large-scale sequence similarity network for more than 6 million UniRef50 cluster representatives with high predicted accuracy models in AFDB (AFDB90).

From: Uncovering new families and folds in the natural protein universe

Fig. 2

a, Layout of the resulting network, as computed with Cosmograph (https://cosmograph.app/). The network contained 4,270,404 nodes connected by 10,339,158 edges, reduced for simplicity to a set of 688,852 communities connected by a total of 1,488,764 edges (see Methods section ‘Large-scale sequence similarity network’ for details). The 1,865,917 UniRef50 clusters that did not connect to any other in the MMseqs2 searches were excluded. Only the 473,612 communities that have at least one inbound or outbound edge (degree of 1) are shown in the figure. Nodes are coloured by the average functional brightness of the UniRef50 clusters included in the corresponding community. An interactive version is available at https://uniprot3d.org/atlas/AFDB90v4. b, Histograms of functional brightness content for connected components with more than 50,000 and with only five to two nodes (UniRef50 clusters), highlighting their different darkness content. c, Scatter plot of the component size (that is, number of UniRef50 clusters) cut-off and the percentage of functionally dark UniRef50 clusters. d, Histogram of the average (avg.) brightness per connected component. e,f, Size distribution for fully dark connected components (e, average brightness less than 5%) and fully bright connected components (f, average brightness more than 95%).

Back to article page