Fig. 1: Temporal and spatial signature of spread in clusters of identical SARS-CoV-2 sequences.
From: Fine-scale patterns of SARS-CoV-2 spread from identical pathogen sequences

a, Clustering of identical pathogen sequences across population groups reflects underlying disease transmission patterns at the population level and can be used to characterize spread patterns between groups. Each colour represents a different cluster of identical sequences. b, Probability for two individuals separated by a fixed number of transmission generations of being infected by viruses at a given genetic distance assuming a Poisson process for the occurrence of substitutions (at a rate μ = 8.98 × 10−2 substitutions per day) and gamma-distributed generation time (mean, 5.9 days; s.d., 4.8 days). c, Size distribution of clusters of identical sequences in the WA dataset. Clusters of size 1 correspond to singletons and are therefore not included in the RR computations. d, Spatiotemporal dynamics of sequence collection in two large clusters of identical sequences. The black diamonds indicate the ___location of Seattle, the largest city in WA. e, Radius of clusters of identical sequences (red line) and probability for all sequences within a cluster of identical sequences of remaining in the same spatial units (black lines) as a function of time since first sequence collection. In e, the cluster radius is computed as the mean spatial expansion of clusters of identical sequences. f, Definition of the RR of observing pairs of sequences in two subgroups as a measure of enrichment. g, RR of observing pairs of sequences within the same county as a function of the genetic distance separating them. The grey points correspond to values for individual counties. The orange triangles correspond to the median across counties. For a, d and f, maps were generated using shapefiles from the US Census Bureau44.