Introduction

Cerebrovascular disease (CVD) is defined as neurological deficits caused by arterial insufficiency or occlusion, venous occlusive disease, or hemorrhage and can manifest in a person as an acute nonfatal event, or fatal event, with stroke being the primary disease type1. There are approximately 795,000 acute strokes every year in the US which carries an annual healthcare cost of $17.9 billion2. As a result, CVD is a leading cause of serious long-term disability, and the second leading cause of death worldwide, posing a significant global health challenge.

There are many risk factors for CVD, some of the most common being high blood pressure, high BMI, diabetes, smoking, age, carotid artery disease (stenosis and atherosclerosis), previous CVD events and more3,4,5. Despite improved management of these risk factors, there is still a large public health burden of CVD, creating urgency to develop new methods in characterizing disease earlier and more accurately to prevent fatal and non-fatal events attributed to CVD.

Recent research has proposed brain-based subtyping methods in neuroimaging as a step towards precision medicine in diagnostics and therapeutics6. Subtyping methods use biomarkers to categorize subjects into homogeneous subgroups, which in the context of CVD, can be used to categorize subjects into risk factor groups for targeted therapy and clinical trials. Additionally, novel brain subtypes can be used for early detection, finding distinct disease phenotypes, and to learn more about mechanisms of disease. In Drysdale et al., the authors use brain subtyping methods with fMRI biomarkers to uncover novel signatures for different disease courses in brain disorders such as depression7. As many neurological disorders are comorbid and often present overlapping clinical symptoms, subtyping can also be used to identify distinguishing features of disease7,8.

MR imaging provides insight into CVD manifestation in the brain, with existing studies showing that in addition to major events such as stroke, MRI can visualize other pathologies caused by CVD such as lacunes, infarcts and white matter lesions (WML)9,10. The T2 Fluid Attenuation Inversion Recovery (FLAIR) MRI sequence is commonly used to identify CVD pathology as the suppressed cerebrospinal fluid (CSF) signal enhances contrast between healthy tissue and high signal due to ischemia. While WML in FLAIR MRI have been heavily studied11,12, there is growing interest in other features from FLAIR that may be related to neurodegeneration and CVD. Previous research suggests FLAIR intensity is related to myelination and water content13,14, while FLAIR texture was found to be related to microstructural tissue integrity and organization14,15. FLAIR texture biomarkers in WM tracts and WML penumbra (boundary) regions differentiated patients with mixed disease (vascular and dementia) and subcortical vascular disease, respectively, from healthy and demented non-vascular disease patients15. As a routinely acquired imaging sequence, FLAIR holds clinical promise as an attractive and cost-effective modality for stratification of subjects with CVD. In this work, we aim to uncover clinically distinct subgroups of CVD patients through unsupervised clustering of explainable FLAIR biomarkers and to analyze clinical information to further characterize the subgroup phenotypes.

We hypothesize regional FLAIR texture and intensity biomarkers can discern changes in brain health linked to WM disease and CVD risk levels. Low, moderate and high CVD risk levels are considered based on the burden of vascular risk factors, existence of infarcts and high WML volume16, and subsequent occurrence of ischemic events. The characterization of homogeneous subgroups can aid in better understanding of CVD and may further facilitate personalized treatment decisions through stratification.

Methods

Data

This study was approved by the local institutional review board of a Canadian university (Toronto Metropolitan University). Due to the retrospective nature of the study, informed consent was waived by the local research ethics board (2021-430-3). The Canadian Atherosclerosis Imaging Network (CAIN) dataset is a multicenter pan-Canadian study containing 379 baseline FLAIR imaging volumes of patients with atherosclerotic disease pertaining to carotid artery disease17. The inclusion criteria of the study were the following: 1) male and female patients over the age of 18 years, 2) patients with mild to severe carotid artery disease (carotid stenosis >= 30%). Data on the occurrence of ischemic events including stroke and transient ischemic attacks were acquired during yearly follow-up imaging. FLAIR images were acquired using GE, Philips, and SIEMENS scanners from 8 different imaging centers with magnetic field strength of 3 T and acquisition parameters TR, TE, TI of 9000–11000 ms, 117–141 ms, and 2200–2500 ms respectively. FLAIR images had voxel sizes of 0.4286–1 mm × 0.4286–1 mm x 3 mm. Carotid artery imaging was completed with a volumetric high resolution T1 fat-saturated gradient Echo image for IPH identification and MRI angiography for stenosis measurements (MRIPH)17. On baseline images, established lacunar and territorial infarcts were identified by a neuroradiologist (P.M.). Cohort demographics are shown in Table 1. Additionally, the FLAIR Brainder atlas18 and blood supply territory (BST) atlas developed in Chan et al. 15 were used.

Table 1 Summary of CAIN clinical data

Image pre-processing

FLAIR volumes were intensity standardized19, brain extracted20 and registered to atlas space15 using the Advanced Normalization Tools (ANTs) symmetric normalization20,21, resulting in dimensions of 256x256x55 for all volumes. To extract WM tract regions, a generative adversarial network (GAN) model was trained as in Chan et al. 22 which demonstrated high accuracies for diffusion tensor imaging (DTI) fractional anisotropy (FA) volumes particularly in the WM region.

The registered FLAIR volumes were used to generate corresponding FA volumes for each patient. The unsupervised tract segmentation method in Chan et al. 15 was then used to acquire FLAIR tract masks for all patients. This involved using K-means clustering to segment tract regions from the FA volumes and morphological operations to clean the resulting binary tract masks. The Dice Similarity Coefficient (DSC) metric was used to compare synthetic tract masks to those segmented from the 107 acquired real FA volumes, along with additional evaluation metrics shown in Supplementary Table 123,24,25.

To extract white matter (WM) tract regions, a generative adversarial network (GAN) model was trained following the method in Chan et al. 22, which demonstrated high accuracy specifically for diffusion tensor imaging (DTI) fractional anisotropy (FA) volumes, particularly in WM regions. The motivation for generating FA volumes from registered FLAIR images stems from the limitations of FLAIR images in distinguishing fine white matter tracts, as this level of precision is necessary for studies of brain connectivity and tract-specific pathology. While FLAIR imaging is effective for identifying lesions and gross WM abnormalities, it does not provide sufficient contrast for isolating individual tracts, which requires the finer detail provided by FA maps derived from DTI. By synthesizing FA volumes, we aimed to leverage the higher WM contrast from FA without acquiring DTI for all patients. The synthetic FA maps were generated by training the GAN to model the mapping between FLAIR and FA, ensuring tract delineation could be performed in a manner similar to tractography from actual DTI datasets. The GAN model was trained and evaluated using 420 FLAIR and DTI volumes (11,957 images), including volumes from the CAIN dataset alongside a separate multi-center cohort of dementia and vascular disease patients22.

Once the FA maps were generated, the unsupervised tract segmentation method from Chan et al. 15 was applied. This involved using K-means clustering to identify the WM tract regions from the FA volumes, followed by morphological operations to refine the binary tract masks. The synthetic tract masks were then compared against 107 real FA volumes, acquired through DTI from the same patients, using the DSC to assess the accuracy of the segmentation. The resulting mean DSC of 0.644 demonstrates comparable performance to existing unsupervised brain tissue segmentation techniques23,24,25,26, while providing a more accessible method for WM tract segmentation in datasets where DTI is not readily available. Additional evaluation metrics of the WM tract segmentations are shown in Supplementary Table 1.

Regions of interest

Three major regions of interest are included in this work: the WM tracts, WML penumbra, and BST regions (Fig. 1). The WM tract region is analyzed as an entire region of all tracts combined. WM tract biomarkers were found in an existing study to be optimal in differentiating patients with Mixed (vascular and dementia) disease from other disease groups15. This is expected as the tracts are at the distal borders of BSTs, where they are susceptible to vascular insufficiencies. The WML penumbra, which is the boundary region directly surrounding WMLs, was found in previous studies to carry signs of abnormal diffusion and cerebral blood flow alterations related to vascular disease27,28,29. Additionally, studies have found biomarkers extracted from penumbra regions to be optimal for identifying subcortical vascular MCI patients from other disease groups15, and for observing progression of WM injury27. WML penumbra sub-regions were segmented into five regions P1 (adjacent to WML) to P5 (most distal) using the methods outlined in Chan et al. 15. Each penumbra region is a voxel (0.86 mm) further from the WML than the previous. Lastly, the BST atlas was employed to segment FLAIR volumes into regions supplied by the middle cerebral artery (MCA), posterior cerebral artery (PCA), and anterior cerebral artery (ACA), as the BSTs are directly related to cerebral vascularization and are likely to be affected by vascular disease. The FLAIR Brainder atlas with the masks of the MCA, PCA, and ACA territories are shown in Supplementary Fig. 1. This resulted in a total of 9 regions of interest.

Fig. 1: Visuals of the ROIs in this work.
figure 1

A Sample mean texture maps in ACA, MCA, and PCA BST regions. B Sample microstructural integrity maps of WM tract regions from samples slices of a volume, lower to higher slices from left to right and top to bottom. C Middle slice of original FLAIR volume (left), and WML penumbra regions P1 (green) to P5 (light pink) delineated on FLAIR NABM slices.

Biomarker extraction

Three FLAIR texture markers - damage, integrity and wavelet biomarkers - and FLAIR intensity were computed to provide a comprehensive assessment of WM alterations, which are critical markers for assessing CVD and neurodegeneration. These biomarkers were specifically chosen based on their strong correlations found in previous studies14,15 with fractional anisotropy (FA) and mean diffusivity (MD), two widely recognized diffusion metrics used to characterize microstructural integrity and tissue changes in the brain. This resulted in four biomarkers from each of the nine regions for a total of 36 biomarkers per subject.

Damage biomarker

Measures fluctuations in intensity in a local window of an image, thus describing heterogeneity of tissue intensity. Higher damage values indicate more roughness in the tissue which is associated with lower cognitive scores and increased water diffusion described by DTI MD values14,15 (Eq. 1).

$${Damage}={\mathop{\sum }\limits_{i=1}^{N}}{\mathop{\sum }\limits_{j=1}^{N}}{Wij}{U}_{{ij}}$$
(1)

where Wij is the distance between pixels si and sj and Uij is the absolute difference of their intensities14.

Integrity biomarker

Measures the repetition of local texture patterns, in which a higher number of similar repeating structured patterns indicates more tissue integrity which is correlated with better cognition and lower MD/higher FA values14,15 (Eq. 2).

$${Integrity}=\mathop{\sum }\limits_{p=0}^{p-1}s\left({I}_{p}-{I}_{c}\right){2}^{P},{\text{where}}\,s\left(x\right)=\left\{\right.1,{\text{when}}\,x \,>\, 0,\,0\, \text{otherwise}$$
(2)

where P is the number of neighbours, Ip is the intensity of the neighbouring pixel, and Ic is the intensity of the central pixel14.

A3 mean (wavelet) biomarker

Computed as the mean of the approximation coefficients from the three-level decomposed FLAIR volumes using a Haar wavelet, and describes the homogeneity of tissue on a microstructural level15. Lower A3 Mean values are associated with higher tissue integrity and structural organization as shown by correlations with MD and FA15.

Intensity biomarker

Computed as the median intensity value of masked regions.

Following the methods as in Chan et al. 15 and Bahsoun et al. 14, textures were computed exclusively from the normal-appearing brain matter (NABM) volume to ensure that only non-lesioned, normal tissue was analyzed. NABM volumes were created by removing WML and CSF from each patient’s FLAIR image. Damage and integrity texture maps are computed on a slice-wise basis for each NABM volume, resulting in 3D texture volumes. The regions of interest were then masked from the texture volumes and voxel-wise averaged across slices to create 2D texture maps per region. The median of the 2D texture maps were taken as the final biomarker values for damage and integrity. For intensity, regions were masked from the FLAIR NABM volumes and the median intensity value was computed as the intensity biomarker. For the A3 Mean biomarker, regions were masked from the wavelet-decomposed approximation volumes and the mean coefficient value was computed as the final biomarker.

From previous studies, negative correlations found between DTI MD values and A3 Mean and integrity in all regions15 identify these markers as “integrity” markers, which are lower for low levels of neurodegeneration. Similar “integrity” marker trends were found for intensity in only the BST regions15. Conversely, damage in all regions, and intensity in the WML penumbra and tract regions, are identified as “damage” markers in previous studies where higher values indicate more neurodegeneration.

Brain subtyping

Unsupervised K-means clustering was performed to identify patient subtypes using FLAIR biomarkers. K-means clustering involves iteratively assigning data points with similar features to a cluster by minimizing the total distance between data points and the subgroup centroid. The FLAIR biomarkers were z-score standardized to have a mean of 0 and variance of 1. Principal component analysis (PCA) was then employed to reduce the feature dimensionality to two principal components describing 95% of the variance in the data. The optimal number of clusters, k, was determined using the elbow method. The resulting clusters/subgroups describe homogeneous brain signatures.

FLAIR biomarker signatures

To examine the important biomarkers for differentiating subgroups found using K-means clustering, a multi-class Random Forest classifier (RFC) was trained using the biomarkers to classify subjects into each cluster. Evaluation was completed using all the FLAIR biomarkers with 5-fold cross validation. Feature importance was determined based on mean decrease in impurity, a common method for determining important features in decision tree-based models30. FLAIR biomarker signatures are then constructed for each subgroup using the 15 most important biomarkers contributing to the classification task. To analyze the global disease burden per subgroup, composite integrity and damage biomarkers are computed by taking the mean of all z-scored regional damage/integrity biomarkers per subgroup.

Subgroup characterization

To characterize the patient subgroups, demographic and clinical variables including age, MOCA score, WML volume, degree of left and right carotid stenosis, left and right IPH volume were used. Further, CVD risk factors were included as categorical variables, with specific thresholds for conversion to binary categories. These CVD risk factors included: hypertension, diabetes, hyperlipidemia, smoking, high BMI ( > 25)3, high waist circumference ( > 88.9 cm for females, >101.6 cm for males)3, coronary artery disease (CAD), peripheral vascular disease (PVD), myocardial infarction (MI), atrial fibrillation, high systolic blood pressure ( > 140 mmHg)3, high diastolic blood pressure ( > 90 mmHg)3, and the presence of prior infarcts. Each risk factor was converted into a binary categorical variable, where the presence of the condition or risk (e.g., BMI > 25, systolic blood pressure > 140 mmHg) was assigned a value of 1, and the absence was assigned a value of 0. Similarly, the occurrence of ischemic events (including TIAs and strokes) after baseline imaging was also treated as a binary variable (event = 1, no event = 0).

To create a compound CVD risk score, these CVD risk factors are aggregated into a single measure based on the prevalence of the risks for each subgroup. The compound CVD risk score per subgroup was estimated using the prevalence of each risk factor in each subgroup, subtracted by the population prevalence of each risk factor, and summed across risk factors. All factors were given equal weighting. As a result, the compound risk score quantifies the level of CVD risk in each subgroup relative to the entire population where a risk score of 0 is the population risk. “Low-“ and “high-risk” thresholds were quantified as the mean plus/minus one standard deviation of the risk scores. The final characterization of subgroups considers: 1) neurodegenerative brain biomarker signatures, 2) significant clinical variables, 3) compound CVD risk, and 4) future ischemic events.

Statistical analysis

Pearson’s correlation tests were performed to investigate the associations between clinical variables and FLAIR biomarkers. ANOVA or Kruskal-Wallis and their respective post-hocs with multiple comparison corrections (Tukey’s HSD and Bonferroni, respectively) were performed to compare clinical variables and FLAIR biomarkers between the patient subgroups. Fisher’s Exact Tests were also performed to investigate associations between Subgroup and the prevalence of each vascular risk factor as well as future ischemic events. Fisher’s Exact Test is a statistical test used for the analysis of categorical variables and small sample sizes, and calculates the exact probability of obtaining the observed samples in the data. Significance in the Fisher’s test indicates there are differences in risk factor prevalence across the patient subgroups but does not identify specific pairwise differences between subgroups. Statistical significance was defined as p < 0.05 for all tests.

Results

In total, 36 total regional texture and intensity biomarkers were extracted from each of the 379 patients. The biomarkers were used to cluster the patients into homogeneous subgroups using unsupervised K-Means clustering, and statistical tests were employed to investigate the clinical characteristics of each subgroup.

Brain subtyping and FLAIR biomarker signatures

Using the regional FLAIR biomarkers extracted for the dataset, the optimal number of clusters was determined to be K = 5 using the elbow method. The resulting cluster sizes ranged from 27–108 patients. The FLAIR biomarkers were then used to train a multi-class RFC to determine the 15 features to include in the biomarker profiles. The resulting classifier had mean classification accuracy, recall, precision, and F1 score of 0.842, 0.853, 0.842, and 0.84 respectively - indicating the biomarkers are robustly clustering the subgroups. The 15 most important biomarkers identified from the classification and results of ANOVA and Kruskal-Wallis tests are shown in Table 2, with significant differences (p < 0.001) across the five subgroups for every FLAIR biomarker. Post-hoc tests showed majority of the biomarkers were significantly different across all subgroups (Supplementary Fig. 2).

Table 2 Mean (SD) values of FLAIR biomarkers by subgroup, along with p-value results of ANOVA/Kruskal-Wallis tests for significant differences between subgroups

Subgroup biomarker profiles are shown in Fig. 2A and composite integrity and damage biomarkers are shown in Fig. 2B. Composite biomarkers are computed by taking the mean of all z-scored regional damage/integrity biomarkers by subgroup. Subgroup 4 exhibits the most neurodegeneration, represented by significantly higher intensity and damage biomarkers (p < 0.001) and significantly lower integrity biomarkers (p < 0.001) than all other subgroups, with the highest composite damage and lowest composite integrity. Subgroup 1 has the lowest composite damage with a slight positive composite integrity which indicates this group has better brain health. This is also confirmed on a regional level, where Subgroup 1 had significantly lower WML penumbra intensity (p < 0.001), lower BST damage (p < 0.001) and higher BST integrity (p < 0.001) than all subgroups except Subgroup 5. Subgroup 5 exhibits low composite damage and highest composite integrity. MCA and PCA damage are not significantly different from Subgroup 1 while the integrity biomarkers are significantly higher (p < 0.001) than all other groups. As shown by the composite damage marker, which is the mean of all z-scored regional damage biomarkers, Subgroup 2 exhibits slight positive composite damage which reflects the increased regional damage in the MCA and PCA regions only. Subgroup 3 has different trends, with second highest composite damage due to the significantly higher damage biomarkers across all regions in the regional biomarker profile. Subgroup 2 has the second lowest composite integrity due to the decreased integrity biomarkers across all regions, whereas Subgroup 3 only has a slight negative composite integrity due to decreased integrity biomarkers in the MCA region.

Fig. 2: Regional and composite FLAIR biomarker profiles for clustered subgroups.
figure 2

A Z-scored FLAIR biomarker profiles for each subgroup of top 15 important biomarkers. Integrity biomarkers are shown in blue and Damage biomarkers are shown in red. B Composite Z-scored damage (right) and integrity (left) biomarkers by subgroup.

Clinical profiles

Summary statistics of clinical variables are shown in Table 3, and the z-scored subgroup profiles are visualized in Fig. 3A. Statistically significant differences (p < 0.05) were found for age, WML volume, and left IPH volume, while MOCA score did not reach statistical significance (p = 0.051), as shown in Table 3.

Table 3 Summary statistics of patient subgroups with p-values of ANOVA t-tests. Bolded values indicate significant differences.
Fig. 3: Characterization of subgroups using z-scored demographic/clinical variables and CVD risk scores.
figure 3

A Z-scored profiles of each subgroup.* indicate variables with significant differences (p < 0.05) between groups. B Compound CVD risk score shown by subgroup. High/low score thresholds are shown as dashed lines.

Post-hoc tests (Supplementary Fig. 3) revealed significant differences in age (p < 0.05) between most subgroups, except for Subgroups 2 and 3, and Subgroups 1 and 5, which were not significantly different. For WML volume, significant differences were observed between all subgroups except for Subgroups 2 and 3, and importantly, Subgroups 3 and 4, which exhibited similar WML volumes despite their other clinical differences. Regarding left IPH volume, Subgroup 4 was significantly lower than all other subgroups (p < 0.05).

Although MOCA score approached significance (p = 0.051), the differences between Subgroups 3 and 4 were not statistically significant. This is an important distinction, as while Subgroups 3 and 4 have similar WML volumes and MOCA scores, they exhibit markedly different profiles in terms of age and left IPH volume, with both showing statistically significant differences between the two subgroups (p < 0.05).

Pearson’s correlations between clinical variables and the 15 important FLAIR biomarkers are shown in Fig. 4. Age and WML volume are significantly correlated with most biomarkers and regions, while MOCA shows a significant positive correlation with the wavelet biomarkers in the BST and WML penumbra regions. Right stenosis and right IPH volume are significantly correlated only with WM tract intensity, while left stenosis is significantly associated with both WM tract intensity and MAD in the PCA and MCA.

Fig. 4: Heatmap of R correlation coefficients between clinical variable and FLAIR biomarker.
figure 4

* indicate significant correlations (p < 0.05).

CVD risk profiles

Results of the Fisher’s Exact test for each risk factor are shown in Supplementary Table 2, with hypertension (p < 0.01), diabetes (p < 0.05), hyperlipidemia (p < 0.05), smoking (p < 0.05), high BMI (p < 0.01), high sBP (p < 0.05), and baseline infarcts (p < 0.001) showing significant associations with Subgroup. For patients with ischemic events occurring in the future, Fisher’s Exact Tests demonstrated a significant association between Subgroup and future event occurrence (p < 0.05). The percentage of risk factors and events by subgroup is shown in Supplementary Fig. 4. Subgroup 3 had the largest prevalence of hypertension, hyperlipidemia, smoking, high BMI, sBP, and dBP. Comparatively, Subgroup 4 has the lowest prevalence of CVD risk factors and future events.

The compound CVD risk scores computed using the vascular risk factors and events are shown in Fig. 3B. Subgroup 4 exhibits the lowest risk score of -1.05, followed by Subgroup 2, 1, and 5 with risk scores of 0.29, 0.79, and 1.14 respectively, and finally Subgroup 3 with the highest risk score of 2.28. Using the thresholds, Subgroup 4 is lower than the mean and would be considered low CVD risk, Subgroups 1, 2, 5 are medium CVD risk, and Subgroup 3 is considered high risk CVD. Subgroup 3 also had the highest number of strokes in the future, followed by Subgroup 1 and Subgroup 5. The lowest number of future ischemic events is in Subgroups 2 and 4.

Subgroup characterization

The subgroup characteristics are defined in Table 4 using the compound integrity and damage markers, MoCA, clinical variables that were significant in the statistical analyses, compound CVD risk scores, and future event prevalence. Subgroup differences in values are indicated by a ↑ if the value was significantly higher than other subgroups and population mean, ↓ for a value significantly lower, and -- for values that were not significantly different for a subgroup compared to the other groups and were similar to the population mean. While there were no statistically significant differences in MoCA scores across subgroups, we observed numerically large variations in these scores, which we present for context and to acknowledge potential trends.

Table 4 Subgroup characterization.

Subgroup 3 (second highest damage and lower integrity compared to the mean) shows the highest CVD risk score, with a higher number of future strokes, as well as significantly higher patient age and WML volume compared to the mean. While this group has a high CVD risk score and significant damage accumulation in the brain, the future event prevalence is similar to that of Subgroups 1 and 5, based on the risk percentage.

Subgroup 4 (highest damage and lowest integrity) is the oldest patient group, with the highest WML volume and significantly lower left IPH volume. Although this group has a numerically lower MoCA score compared to other subgroups, the difference is not statistically significant. Interestingly, Subgroup 4 has the lowest compound CVD risk score among all subgroups, despite its higher WML burden and lower biomarkers of integrity. This group also shows a relatively low number of future strokes, though its association with neurodegeneration remains unclear based on the available data.

Subgroups 1, 2, and 5 represent moderate CVD risk groups. Subgroup 2 (mean damage and second lowest integrity biomarkers) has significantly higher age and WML volume than Subgroups 1 and 5, yet its risk score is lower than both. Subgroup 2 also exhibits the lowest number of future strokes. Subgroup 1 (lowest damage and mean integrity biomarkers) is the youngest group with the lowest WML volume and a moderate CVD risk score, slightly lower than Subgroup 5. Despite having a large number of future ischemic events, its CVD risk is comparable to other subgroups. Subgroup 5 (second lowest damage and highest integrity biomarkers) has the second lowest WML volume (significantly higher than Subgroup 1) and is among the youngest groups. It has the highest CVD risk score of the moderate-risk groups and shows a large number of future strokes.

Discussion

This work uses FLAIR texture and intensity biomarkers to detect WM changes and offers novel stratification of subjects with different levels of CVD risk. Biomarker and clinical differences suggest different underlying processes across subgroups. Automated stratification offers a novel, non-invasive approach for early detection and stratification which can be used to tailor prevention and treatment strategies to avert future ischemic events. Existing studies utilize clinical factors and clustering methods to stratify stroke patients into subgroups with and without subsequent events, for the prediction of vascular outcomes31,32,33. Sperber et al. 34 used clustering methods to stratify stroke patients with cerebral small vessel disease based on lesion type. However, these studies did not include non-stroke patients or imaging biomarkers. Other regression studies often focus on associations between imaging biomarkers, vascular risk factors, and events, to observe relationships between CVD risk and changes in tissue microstructure or event recurrence3,35. While promising, these frameworks do not provide comprehensive methodologies for risk stratification. As such, the novelty of our work is three-fold. First, we demonstrate that explainable FLAIR biomarkers can distinguish patients with varying CVD risk and clinical profiles, from an atherosclerotic cohort. These findings are valuable due to the incorporation of imaging biomarkers which could be used to monitor patients over time or early disease detection. Secondly, the regional FLAIR biomarkers are correlated with various clinical variables, demonstrating they are quantifying structural changes in the brain related to disease factors. Lastly, this work considers prior, existing, and future CVD-related factors for each patient in the cohort. This provides a more comprehensive profile of CVD burden and risk, addressing the gaps left by previous methodologies.

Among the subgroups, there was one group with low CVD risk (Subgroup 4), one group with high CVD risk (Subgroup 3) and three subgroups with moderate CVD risk (Subgroups 1, 2, 5). A key finding is Subgroup 4, with advanced neurodegeneration, cognitive impairment, highest age and largest WML loads, accompanied with minimal CVD risk factors. This may suggest neurodegeneration is being driven by processes related to accelerated aging or AD, rather than primarily CVD. While WML burden is often associated with CVD, several studies9,36 postulated that both vascular and AD processes contribute to WML development. This could perhaps explain the high WML load in Subgroup 4, given its lower CVD risk, although it is not significantly different from Subgroup 3. Uncovering this homogeneous group within a vascular disease cohort underscores the utility of the framework in potentially differentiating underlying disease mechanisms which could help to choose optimal candidates for therapy or to learn more about disease37. To analyze the relationship to AD further would require other variables such as blood biomarkers, PET imaging and spatial WML patterns38.

Subgroup 3 exhibits accumulated brain damage and this subgroup is likely at an advanced stage of vascular disease and CVD5. Contributing factors to the severity of Subgroup 3 may include a markedly high prevalence of MI, though it did not reach statistical significance due to sample size. Witt et al. 39 observed a three-fold increase in stroke risk during the first 3 years after incident MI, suggesting the FLAIR biomarker profiles, particularly high WML penumbra and tract intensity may be capturing structural differences in patients with MI. These patients could be automatically identified for aggressive management of lifestyle and cardiovascular risk factors to avoid future negative outcomes such as stroke and death.

The proposed framework offers a nuanced understanding of brain health patterns in patients with moderate CVD risk, despite the fact that stratifying groups with less severe disease levels is more challenging10. The moderate CVD group with the lowest risk (Subgroup 2), characterized by low integrity and minimal damage, presents clinical features that do not clearly support accelerated aging. Rather, this subgroup’s low CVD burden and previous infarcts suggest that its patients may benefit from monitoring for signs of further CVD risk. Subgroup 2 also had the largest proportion of prior infarcts and patients may have received treatment after the stroke, which could have aided in reducing the CVD burden in the brain. Zhang et al. 35 found that drug adherence after stroke was a significant factor in predicting stroke recurrence within a 3-year period. However, this would have to be confirmed using a dataset with treatment information in the future.

Subgroups 1 and 5 have less neurodegeneration and share similar biomarker characteristics and risk factors (i.e., large number of future events, lower age, low WML volume). However, they are also unique with different levels of damage and integrity between groups, and Subgroup 5 had the highest MoCA scores in the cohort, suggesting cognitive resilience despite moderate CVD risk. Though these differences in MoCA scores were not statistically significant, we note them here as potential areas for further study. Subjects from these groups had a high number of future stroke events, making it important to identify them, and apply any appropriate therapy early. Subjects are relatively young in these cohorts and could benefit from short- and long-term health benefits afforded by making lifestyle changes (i.e., physical activity, nutrition, weight management, avoidance of tobacco and management of cardiovascular health–related factors such as cholesterol, blood pressure, and glucose)40. The automated biomarker system can be used to automatically identify these subjects for lifestyle intervention, to preserve brain health through minimizing modifiable risk factors41.

Interestingly, the moderate CVD risk groups Subgroups 1 and 5 display notable prevalence of patients with AF and PVD respectively, despite not reaching statistical significance due to small sample sizes of patients with these risk factors. AF is associated with the presence of infarcts10 corresponding to the larger proportion of baseline infarcts in Subgroup 1, while PVD is associated with increased WML volume42, which supports the higher WML volume in Subgroup 5. These differences, noted without statistical significance, suggest AF and PVD as potential contributors to the biomarker and clinical differences between these subgroups that warrant further investigation. Most notably, Subgroup 5 is the only group with more women (51.5%) than men, while the proportion of females in all other subgroups was between 28–35%. Bonberg et al. 4 found a larger impact of hypertension and smoking on WM damage in women, aligning with the high prevalence of these risk factors observed in Subgroup 5. The brain biomarkers for the two subgroups indicate minimal accumulated brain damage with moderate CVD risk.

A few limitations exist in our study. While the study cohort inclusion criteria was >=30% stenosis, the mean stenosis of the entire population was ~33% indicating the majority of the cohort had mild stenosis. Future work should include a larger sample size of subjects with severe carotid artery disease, which may allow effects of carotid atherosclerosis to be better distinguished between the subgroups. To further investigate sex differences in the clusters, the study population should also include more female subjects in the future. However, as CVD is more common in men (though the difference decreases with age), the cohort used in this work is a natural sampling. Further, the analysis did not consider the effects of treatments or medications, which should be considered as factors within each subgroup for any future analyses. Lastly, the analysis is done cross-sectionally, making it difficult to draw conclusions about the dynamic processes of CVD and AD-related pathology. Longitudinal experiments as well as validation and refinement of subtypes using integrated diffusion or ASL biomarkers could be performed in future work to further explore pathological mechanisms. Additionally, single-subject studies analyzing new subjects with respect to the subgroups should be investigated in the future to optimize translation.

This study proposes a framework for utilizing regional FLAIR texture and intensity biomarkers to stratify patients with atherosclerosis into homogeneous disease subgroups. By leveraging FLAIR MRI, clinicians can effectively differentiate between various pathological mechanisms, disease stages, and risk factors associated with CVD and its interactions with neurodegeneration. The identified subgroups, namely Subgroup 4 characterized by non-vascular related pathology, Subgroups 1, 2 and 5 representing moderate risk CVD cohorts, and Subgroup 3 with high CVD risk, provide valuable insights into disease manifestations. These subgroups not only shed light on the differential effects of CVD on brain health but also highlight the potential for personalized treatment decisions and risk stratification.