Abstract
Aggressive variant and androgen receptor (AR)-independent castration resistant prostate cancers (CRPC) represent the most significant diagnostic and therapeutic challenges in prostate cancer. This study examined a case of simultaneous progression of both adenocarcinoma and squamous tumors from the same common origin. Using whole-genome and transcriptome sequencing from 17 samples collected over >6 years, we established the clonal relationship of all samples, defined shared complex structural variants, and demonstrated both divergent and convergent evolution at AR. Squamous CRPC-associated circulating tumor DNA was identified at clinical progression prior to biopsy detection of any squamous differentiation. Dynamic changes in the detection rate of histology-specific clones in circulation reflected histology-specific sensitivity to treatment. This dataset serves as an illustration of non-neuroendocrine transdifferentiation and highlights the importance of serial sampling at progression in CRPC for the detection of emergent non-adenocarcinoma histologies with implications for the treatment of lineage plasticity and transdifferentiation in metastatic CRPC.
Similar content being viewed by others
Introduction
Metastatic prostate cancer is known to be a highly heterogeneous disease, both in clinical behavior and molecular characteristics. Tumors vary significantly from one individual to another including relatively indolent to rapidly and widely progressive. Tumors can also vary and evolve within an individual1,2. The most fundamental example of tumor evolution in prostate cancer (PC) is the transition from hormone-sensitive (castration-sensitive) to hormone-refractory (castration-resistant) prostate cancer, and it is following the attainment of androgen indifference that the most fatal form of PC develops3. Along the spectrum of CRPC phenotypes is the development of neuroendocrine prostate cancer (NEPC), which shares histologic features of high grade, small cell carcinoma and is considered to be one of the most aggressive subtypes of PC. These tumors typically arise as following exposure to several lines of therapy, foundationally with androgen deprivation, as a mechanism of resistance, and are considered treatment-emergent NEPC (also referred to as t-NEPC or CRPC-NE in prior studies). These tumors are clinically characterized by resistance to standard therapies, rapid growth, relatively low serum PSA, and proclivity for spread to the viscera and central nervous system.
More recently, there have been several demonstrations that the transition from PSA-producing CRPC with adenocarcinoma histology (CRPC-Ad) to CRPC-NE may occur along a continuous spectrum with some tumors retaining adenocarcinoma features, simultaneously expressing the AR and NE markers (amphicrine) and those lacking both markers (double-negative)4. We and others have observed additional evolutionary destinations for prostate adenocarcinomas, similar to development of CRPC-NE, yet with distinct histologies. Here we present a unique comprehensive molecular history of an aggressive CRPC with divergent development of PSA-low squamous and PSA-positive adenocarcinoma histologies.
Results
Samples were collected as standard of care tumor resections or biopsies and at the time of death as part of our rapid autopsy program5. The samples span greater than 6 years with 12 tumor (2 primary, 10 metastatic, all castration resistant) and 5 cell-free DNA (cfDNA) samples. Three tumor and 4 cfDNA samples were obtained during the patient’s lifetime; 9 tumor and 1 cfDNA samples were collected at autopsy (Fig. 1a, Table 1).
a Plot of patient’s PSA (ng/mL) from time of initial systemic therapy until death. The small white circles indicate PSA samples, the large blue circles indicate tumor biopsies, and the red circles indicate cfDNA samples. The bottom track shows timing, duration, and type of therapies. b Histopathology of WCM63, primary and metastatic sites. A. Transurethral resection of prostate (TURP) in 2013 shows adenocarcinoma in the primary site. B. Biopsy of bone in 2016 shows metastatic carcinoma with squamous differentiation. C. Liver biopsy in 2017 shows metastatic prostatic adenocarcinoma. (Hematoxylin and eosin stain, 200x original magnification. Freeze artifact present in C). c Anatomical sites of disease colored by histology.
Clinical course summary
At age 55, the patient was diagnosed with de novo metastatic hormone-naive prostate cancer (Grade Group 4 - Gleason Score 4 + 4 = 8, sample not available for sequencing) with radiographic evidence of lymph node and bone metastases and pretreatment PSA of 86.15 ng/mL. Initial therapy was androgen deprivation therapy (ADT) with LHRH antagonist (degarelix). ADT was continued for the duration of the patient’s care, although later replaced with an LHRH agonist (leuprolide). Following a short-interval (7.5 months) after primary ADT, CRPC was established with a rising PSA and a first generation antiandrogen (bicalutamide) was added. After 10 months, PSA rose again prompting administration of autologous cellular therapy (sipuleucel-T) yet the patient had continued PSA rise, radiographic progression at the level of the bladder, and clinical progression manifested as worsening obstructive urinary symptoms. This prompted a diagnostic and therapeutic transurethral resection of the prostate (TURP), representing WCM63_A (Fig. 1b (top)). He then began the first of ten cycles of taxane chemotherapy (docetaxel) with robust clinical and PSA response. It was during this time he also received two doses of targeted radiotherapy therapy against prostate specific membrane antigen (PSMA-TRT) with a monoclonal antibody (J591) conjugated to beta emitter 177Lu on a phase I clinical trial at WCM (NCT03545166).
His next line of therapy was enzalutamide, a potent AR antagonist, and peripheral blood was collected for cfDNA (WCM63_1). He experienced a PSA response with decline of >90%, reflected in an estimated tumor fraction of 0.002, yet with rising PSA he was treated after 13 months with the addition of unconjugated anti-PSMA monoclonal antibody J591 on study (NCT025523947). With clinical progression (bone pain) and rising PSA, the bone-seeking alpha-emitting radiotherapy Radium-223 was administered with enzalutamide. Upon development of worsening back pain and immediate associated PSA rise, imaging revealed enlarging retroperitoneal nodes and a mid-thoracic spinal mass with soft tissue component. At this transition, a second plasma sample was collected for cfDNA analysis (WCM63_2), yielding a tumor fraction of 0.375, a 187-fold increase from the first plasma sample. Given extraosseous progression, systemic therapy was switched to the CYP17 inhibitor abiraterone acetate and prednisone (AAP). Biopsy of the soft tissue and bone spinal mass (WCM63_B) unexpectedly was found to be a carcinoma with squamous features (Fig. 1b (middle)). Suspicion was highest for a second primary of a squamous tissue origin, now metastatic to spine, although the possibility of this tumor being derived from a prostate primary was considered. A third cfDNA sample (WCM63_3) was collected during this interval prior to progression with a corresponding decrease in tumor fraction to 0.193. Following a brief PSA decline of nearly 50%, his PSA rose and radiographic progression was noted within 6 months of AAP initiation, including in viscera (liver and lungs), as well as prior sites of disease. A liver biopsy (WCM63_C) showed adenocarcinoma without variant histologic features and a fourth cfDNA sample was collected (WCM63_4)(Fig. 1b (bottom)), showing an increase in tumor fraction to 0.347. Abiraterone acetate was held and, given visceral disease and prior therapies, he was initiated on combination chemotherapy with cabazitaxel and carboplatin. PSA continued to rise with radiographic progression in liver, lungs, lymph nodes, and bones, prompting empiric initiation of chemotherapy with a small cell lung cancer-like regimen of cisplatin and etoposide, followed by progression in pulmonary metastases. The patient received carboplatin, paclitaxel, and trastuzumab, the last added given evidence of HER2 amplification on WCM63_C via FISH (ratio > 2.0). After three cycles, the patient developed hemoptysis requiring hospitalization and bronchoscopy. Following discharge a single cycle of vinorelbine was attempted. cfDNA was collected (WCM63_5) with the tumor fraction peaking at 0.704, and the patient expiring 3 days later. A rapid autopsy was completed, as part of the WCM Rapid Autopsy program, which corresponds to samples WCM63_D-L (prostate, lymph node, soft tissue, lung [3], and liver [3]). A diagram of all anatomical sites sampled as part of this study is shown in Fig. 1c.
Tumor driver genomic landscape
We performed tumor and germline whole-genome sequencing (median tumor coverage of 91.36, Supplementary Data 1). The patient’s germline genome revealed a likely benign mutation in RAD50 (p.H1041R) and a pathogenic frameshift mutation in RAD51D (G110fs), but no other suspected pathogenic alterations conferring heritable risk. Assessment of genomic ancestry revealed significant admixture, predominantly associated with approximately equal proportions of African and East Asian continental ancestries (Supplementary Fig. 1).
We characterized the entire set of tumor samples for common prostate cancer genomic drivers and observed only a single deletion of RB1 in WCM63_G among the genes commonly considered aggressive variant prostate cancer (AVPC; i.e., PTEN, RB1, and TP53)8, even in samples collected at the end of life from visceral organs (WCM63_E, F, I, and L)8(Fig. 2a, top). A homozygous deletion in APC was detected in all samples. AR showed variable copy number gains (range 2–81, median 3) in all samples with adenocarcinoma histology with the exception of the earliest sample generated from a transurethral resection of the prostate (TURP, WCM63_A); all squamous cell samples were without a copy number change at AR (CN = 1)(Fig. 2a). A mutation was present in a well-described hotspot within the pro-proliferative Wing2 ___domain of transcription factor FOXA1 (p.D249_M253del; Average VAF: 0.3611, Supplementary fig. 2a) in all samples except WCM63_L (due to a 4.5 megabase deletion in that region resulting in loss of heterozygosity, Supplementary Fig. 2b). Depending on its mutational status, FOXA1 defines a subset of prostate cancers, promotes AR resistance, and drives phenotypic differentiation9. All samples lacked the fusion TMPRSS2::ERG. Although considered generally mutually exclusive from prostate tumors containing a subset of mutations in FOXA1, the absence of a TMPRSS2 fusion as a common prostate cancer pathognomonic rearrangement, may have contributed to uncertainty at the time of sampling as to whether the squamous and adenocarcinoma samples shared a common origin in the prostate.
a Oncoprint summarizing the presence of common prostate driver alterations. Row groups show the most common (i) prostate cancer driver alterations, (ii) squamous drivers and (iii) differentially altered genes between squamous and adenocarcinoma histologies among consensus pan-cancer genes. Histology, vital status, and whether the patient had received chemotherapy at time of biopsy are indicated below. Gains and deletions are restricted to focal events no larger than 3 megabases in size. The asterisk indicates a 4.5 megabase deletion that resulted in loss of a truncal FOXA1 mutation in WCM63_L. b Summary of mutational signatures. The top row corresponds to the number of somatic SNV/indels, represented in log10 scale. The middle row represents the proportion of the most representative COSMIC SBS signatures. Histology, vital status, and whether the patient had received chemotherapy at time of biopsy are indicated below. c Phylogenetic clone tree constructed using single nucleotide variants (SNVs) and insertion/deletion variants (INDELs) supports a common cancer cell progenitor of the prostate. Each node represents a clone, and the number of variants unique to a given clone are overlaid. The circle graphs surrounding each node indicate the COSMIC SBS mutational signature proportions for the variants in that clone. Clone IDs are shown just outside each node, as well as any known oncogenes affected by variants in each clone. The text color of each gene name indicates whether it is affected by a moderate or high-impact mutation, as inferred by the Ensembl Variant Effect Predictor. d Stacked bar plots indicating clone-level cancer cell fractions (CCF) for each sample. Samples are grouped by histology, and the top track shows the Shannon entropy of the clone CCFs for each sample. e Schematic showing inferred tumor evolution over time for each sample, given the phylogenetic tree describing the evolutionary relationship between clones, and the clone CCFs at the time of sequencing. CRPC-Ad samples are shown in the top row, and CRPC-SCC are shown in the bottom row. A box is drawn around the two prostate samples.
Having evaluated the gene alterations commonly driving adenocarcinoma CRPC (CRPC-Ad), we next sought differences between CRPC-Ad and squamous samples (CRPC-SCC). First, using a panel of common drivers observed in squamous cell carcinomas, we observed no clear difference between CRPC-SCC and CRPC-Ad among these genes (Fig. 2a, middle).
Secondly, we evaluated a broad panel of consensus pan-cancer driver genes10 and tested for differentially altered genes between CRPC-SCC and CRPC-Ad. Among these tissue-agnostic drivers, we observed loss (or deletion) in CTNNA2, the gene encoding an actin binding protein, in all CRPC-Ad samples but only half of the CRPC-SCC samples (Fig. 2a, bottom).
Categorization of somatic mutational patterns and association with clinical exposures
Having established the profile of driver mutations in WCM63 without identification of a candidate driver of squamous differentiation, we next sought to describe the somatic mutational patterns across time and between histologies. Overall tumor mutational burden (TMB) was similar across samples (median count SNV/indel 2.94 mut/MB, IQR 2.02 mut/MB), with the exception of WCM63_L (22.97 mut/MB)(Fig. 2b, top track). When examining single base substitution (SBS) signatures, all tumors enriched in SBS31 and SBS35 were from samples collected after the patient received platinum-based chemotherapy, consistent with the established etiology for SBS31 and SBS35 (Fig. 2b). Pairwise comparison of each SBS signature between the two histologies did not reveal significant differences, but highlighted the presence of three signatures associated with deficiencies in DNA mismatch repair (SBS20, SBS21 and SBS26, as well as insertion deletion signature ID2 [data not shown]) in the hypermutated sample WCM63_L. Evaluation of genes involved in DNA mismatch repair revealed that this sample harbored a deleterious frameshift mutation disrupting the ERCC1 binding ___domain of ERCC4 (p.Met885AsnfsTer17, VAF = 0.2 with local copy number 4, purity 89%)11,12, an endonuclease critical for DNA repair13, and a missense mutation in the exonuclease ___domain of POLD1 (p.Cys284Tyr, VAF = 0.09 with local copy number 2), consistent with the etiology of SBS20. Both mutations are part of a hypermutator clone private to WCM63_L, with a CCF of 0.21, with 3 of 4 copies mutated in ERCC4, and 1 of 2 copies mutated in POLD1.
Evaluation of single nucleotide variants (SNVs) describes a complex phylogeny supportive of transdifferentiation and genesis of CRPC-SCC
With the aforementioned absence of a TMPRSS2::ERG fusion, we sought evidence of a common prostate origin of all samples in the study. To do so, we deconvolved the observed SNV/INDELs into 22 distinct clones, and determined their evolutionary relationship via a phylogenetic tree (Fig. 2c, Supplementary Data 2). The phylogeny supports a shared progenitor tumor cell for both squamous and adenocarcinoma histology and provides evidence for multiple squamous lineages.
Four clones, named C0 through C3, were determined to be present across all samples, and gave rise to all shared and private clones observed in the phylogenetic tree. C0, the most truncal clone in the tree, contained mutations in known driver genes ATM and FAT4. We observed only one additional clone per sample in the primary (sample from the prostate) tumors, which harbored less clonal diversity than metastatic samples as measured by the normalized Shannon entropy of the tree-constrained clone proportions (p = 0.008, Welch Two Sample t-test)14. Two clones (C5 and C12) were common to all metastatic samples, irrespective of histology.
Within the samples with squamous histology, we identified two distinct lineages. The first, corresponding to samples WCM63_B, WCM63_H, WCM63_I, and WCM63_L, were characterized by clone C15 (hereafter referred to as squamous lineage 1). This clone contained a mutation in tier 2 CGC (Cancer Gene Census) gene CNTNAP2. The second lineage (WCM63_J, WCM63_K) lacked clone C15, and contained clones descended from clone C8, a clone found only in the metastatic adenocarcinoma samples. Clone C13 was found at a high prevalence in both samples, as well as all metastatic adenocarcinoma samples. Notably, clone C13 was also present at a small proportion in WCM63_L.
Taken together, the presence and absence of clones within samples and their evolutionary relationship constrains the possible patterns of migration. The initial metastatic seeding event was likely polyclonal, given that four clones are shared across all samples, and they or their descendants are detected among the final clones of each metastatic sample (Fig. 2d, e). The absence of any additional shared clones in the primary samples implies some form of metastasis-to-metastasis seeding, wherein some metastatic tumor containing a mixture of clones C0, C1, C2, C3, C5, and C12 gave rise to all additional adenocarcinoma and squamous metastases. The similarity of the second squamous lineage (WCM63_J, WCM63_K) to the metastatic adenocarcinoma samples suggests a later divergence than the first squamous lineage and supports two distinct squamous evolutions from a transformation-susceptible progenitor.
Complex structural variants also support clonality of samples
Having established the somatic mutational landscape of WCM63, we comprehensively assessed the structural variants (SV) of each sample. Initial chromosome-arm level copy number profiling revealed commonality across all samples and histologies, including large deletions of 11q and chromothriptic rearrangements in 5q (Fig. 3a). Overall junction burden appeared lower in CRPC-Ad versus CRPC-SCC, yet this was reflective of lower overall junction burden in samples derived from the primary organ (prostate), of which neither were SCC, as compared to metastatic samples (Fig. 3b). Given similarities among samples in junction number and overall copy number changes, we sought support for clonality using SVs. Here, we define a junction as a somatic non-reference adjacency, also referred to as a breakpoint in other studies15,16, and an SV as a somatic event integrating copy number information and one or more junctions. Among all junctions detected (n = 970), 30% (n = 291) were restricted to CRPC-SCC samples and were entirely absent from every CRPC-Ad sample. Conversely, 40.1% (n = 389) of junctions were private to CRPC-Ad samples. When assessed strictly (i.e., mandating all junctions be present in each candidate shared SV), no single SV was observed to be present in all of one histology and completely absent from the other. When relaxing stringency (a threshold of 75% of all junctions in an event to be present in a candidate shared SV), still no single private event was fully conserved in each respective histology (Fig. 3c). Using these criteria, 19.2% (30/156) of SVs were shared across all samples, including a chromothripsis in chromosome 5 (Fig. 3d). Detailed analysis of read support for each junction in each sample confirms the shared nature of this complex SV. Variation in non-identical portions reflected gained complexity over time, perhaps indicative of instability driving further rearrangement. Types and burdens of both simple and complex SVs were similar across histologies (Fig. 3e).
a The landscape of simple and complex structural variants, summarized in 1 megabase windows across the genome. The top three tracks represent clonal (read support in all samples), adenocarcinoma-private (no read support in all squamous samples), and squamous-private (no read support in all adenocarcinoma samples) structural variant junctions. The bottom two tracks display the sample-level genomic footprint of structural variants as classified by JaBbA. Rows were clustered within histology via complete linkage clustering, using the Hamming distance metric. b Violin plots of junction burden summarized by histological subtype (L) and site of samples (R). c UpSet plots of sample presence and absence among histology-private junctions. The left-most bar plot represents the total histology-private junction burden of each sample. The top bar plot represents the magnitude of each set overlap, as indicated by the connected dots. d Example of a chromothripsis event shared across all adenocarcinoma and squamous tumors. The left panel is a genome-level view of the event for three representative samples. The gray bars represent genomic intervals, and the height of the bars indicate their copy number. The multicolored arcs indicate the somatic junctions comprising the event. These colors align to the heatmap on the right, which indicates read support for each junction/sample combination. Cells are colored by read support, and are marked with an “x” if that particular junction was not integrated into that sample’s JaBbA model. e Heatmap of complex and simple SV junction burdens. Columns represent each sample, and rows represent SV classifications. Simple and complex SVs were clustered separately via complete linkage clustering using Euclidean distance. Samples were clustered in the same manner, but cut into two groups as determined by the dendrogram.
A complex rearrangement in chromosome 4 was shared in all samples yet classified variably by the computational tool (JaBbA)17. This rearrangement was classified as a double minute (DM) in 5 of 6 samples (4 CRPC-Ad, 1 CRPC-SCC), a breakage fusion bridge (BFB) in another (CRPC-Ad, WCM63_C), and finally a chromothripsis in a CRPC-SCC sample (WCM63_K). No SV was called at this locus in a total of 5 samples. To reconcile the apparent shared presence of a DM in a subset of samples between disparate histologies and across separate branches of the phylogenetic tree, sample-level junctions were assessed. Inspection of the genomic “walks” of all samples revealed visually similar topology (Supplementary fig. 3a). Analysis of read support for each junction demonstrated that a core set of junctions (Group I) supported a shared early event that subsequently diverged over time (Supplementary fig. 3b). To wit, histology-private junctions shared across more than one sample were observed (Groups II and III) supporting divergence within the rearrangement that explains the inconsistent categorization of the event. Consistent with this, the earliest sample collected (WCM63_A, prostate) harbored the fewest sample-specific junctions and is likely representative of the ancestral rearrangement. All samples showed amplification of a series of consensus cancer genes (e.g., PDGFRA, KDR, KIT)(Supplementary fig. 3c).
Intrapatient variation in transcriptional program
Having comprehensively defined the DNA landscape of the biopsy-collected and rapid autopsy specimens without identifying a squamous-specific driver, we sought to use analysis of bulk RNA sequencing to implicate a transcriptional program driving histological transformation. After correcting for differences in sample preparation, unsupervised principal component analysis revealed tight grouping within histology for all CRPC-Ad and CRPC-SCC samples (Fig. 4a). Differential expression analysis was initially performed in a supervised manner, respecting known histological classification, and demonstrated clear separation of histologies (Fig. 4b). The most strongly differentially expressed genes in CRPC-Ad reflected androgen receptor (AR) transcriptional control (e.g., KLK2, KLK3, PRAC1, CLDN8, TMPRSS2) and AR itself. Eight of the top 10 genes enriched in CRPC-SCC were cytokeratin or keratin-associated, reflecting squamous histology (Supplementary Data 3). Unsupervised hierarchical clustering of the most variable genes showed a clear delineation into histological subgroups (Fig. 4c, Supplementary Data 4).
a First two principal components of a PCA of gene expression after correction for disparate processing. Axis labels indicate the percentage of explained variance for each principal component. Points are colored by histology. b Heatmap showing differentially expressed genes (DEG) between histologies (padj < 0.01). c Heatmap showing the 4889 most variable genes, where the samples group by histology after unsupervised hierarchical clustering. The 4889 most variable genes were selected on the basis of their standard deviation across all samples, to match the number of differentially expressed genes (DEGs). d Convergent evolution of adenocarcinoma copy number gain and resultant increase in expression of AR. The left panel is a genome-level view of the AR locus for four representative samples. The top three samples display different routes of AR copy number gain in CRPC-Ad samples, and the bottom sample is a representative CRPC-SCC with no copy number gain. The gray bars represent genomic intervals, and the height of the bars indicate their copy number. The multicolored arcs indicate the somatic junctions within 100 Kb of AR, and the gray arcs represent somatic junctions more than 100 Kb away from AR. These colors align to the heatmap on the right, which indicates read support for each junction/sample combination. Cells are colored by read support, and are marked with an “x” if that particular junction was not integrated into that sample’s JaBbA model. e Gene set enrichment analysis (GSEA) results using the Gene Ontology molecular function (top) and KEGG (bottom) reference databases. A positive normalized enrichment score (NES) corresponds to pathways significantly (p < 0.05) enriched by genes that are up-regulated in squamous samples and down-regulated in adenocarcinoma samples, while a negative NES corresponds to pathways that are up-regulated in adenocarcinoma samples and down-regulated in squamous samples.
We next used the composite of DNA and RNA sequencing to seek evidence of a mechanism of histological transformation. To do this, we evaluated the lists of most strongly differentially expressed genes with those with detectable DNA alterations across consensus gene lists10, grouped by alteration type: SNV/INDEL, CN gain, and CN loss. We defined as “fully conserved” any gene alteration exclusively present in all (6 of 6) samples of one histology. Partially conserved was defined as alterations exclusively present in 4 or 5 (of 6 samples) of one histology. Among SNVs/Indels, three genes with lower expression in CRPC-SCC v. CRPC-Ad were found to have pathogenic mutations in a majority of SCC samples (4 of 6, CNTNAP2, CRAT, TRPM4)(Table 2, Supplementary Data 5). None of the DEG had fully histology-conserved CN losses. Four of 6 CRPC-Ad samples had CN losses in 9 genes with lower expression in this group. Among genes with significantly lower expression in SCC samples, three were partially conserved (4 of 6) as CN loss (EIF2B1, ELK4, and TCTN2).
Among genes with higher expression in SCC, 5 displayed partially conserved (4 of 6) CN gains. Only one SCC-overexpressed gene, ROBO1, had fully conserved CN gains in SCC. In contrast, AR expression was consistently higher in CRPC-Ad than CRPC-SCC (log2-fold change −9.07, adj-p 2.94E-35)(Fig. 4d). Despite this, AR expression was not equal across Ad samples. Correspondingly, focal CN gains of AR relative to ploidy were only partially-conserved (4 of 6 Ad, 0 of 6 SCC)(Fig. 4d, Supplementary Data 5).
Focused evaluation of the DNA topology at AR and its proximal enhancers revealed variable rearrangement. The highest expression of AR was observed in WCM63_G, the liver metastasis which was phylogenetically-distinct from three additional liver metastases (WCM63_[C,E,F])(Fig. 2d, e). WCM63_G had a BFB (CN 27) incorporating AR associated with high AR expression. In contrast, the three liver metastases more closely related on the SNV-based phylogenetic tree via clone C17 all harbored a simple duplication at AR (Fig. 4d), supporting a match between SNV-based phylogeny and SVs.
Furthermore, the AR locus provides evidence of dynamic and continued evolution. As above, the four liver metastases reflect two separate rearrangement patterns. Within the prostate, two separate patterns were also observed. An arm-level copy number gain (CN = 3) was observed in the pre-platinum prostate (TURP) sample (WCM63_A) versus a separate complex, highly amplified rearrangement classified as a DM in the autopsy sample collected >4 years later (WCM63_D), reflecting continued complex evolution at AR even within the prostate gland18(Fig. 4d). The absence of any rearrangements including AR in all CRPC-SCC samples supports their AR-independent behavior and early transformation.
Pathway analyses were performed to define the types of transcriptional programs in each histology. These included enrichment in molecular function (Gene Ontology) pathways characteristic of squamous cells, including collagen binding and extracellular matrix (ECM)(Fig. 4e). Using the KEGG database, comparison of CRPC-SCC and CRPC-Ad supported enrichment in the SCC in ECM-receptor interaction, focal adhesion, and TGF-beta signaling, the last of which is well-characterized in prostate cancer cellular plasticity19,20. In contrast, CRPC-Ad samples were most enriched in metabolic and detoxification pathways, possibly reflecting the enrichment among CRPC-Ad in liver samples (4 of 6)(Fig. 4e).
Relative mutational burden in circulation reflects disease biology and response
Mutation-based phylogenetic analysis demonstrated a clonal relationship among all tumor samples, irrespective of histology. These data, as well as those for SVs such as those resulting in AR copy number gains, support a relatively early transdifferentiation of adenocarcinoma to squamous cell carcinoma. We next sought to evaluate whether the squamous tumor DNA could be non-invasively detected in the circulation prior to biopsy (Fig. 5a). To this goal, we queried serially collected plasma samples for presence of circulating tumor DNA (ctDNA), categorizing the tumor content using WGS to a median coverage of 77.21X (Supplementary Data 6). Included in the dataset were samples (WCM63_1 and WCM63_2) collected prior to clinical detection of CRPC-SCC, which was first detected with his spinal biopsy (WCM63_B). Using the set of SNV/INDELs partitioned into clones as part of the phylogenetic analysis, we used a tumor-informed approach to query for their presence in the cfDNA WGS data at each time point. Using plasma samples from unrelated patients, we empirically determined a lower limit of detection (LLOD) for each clone. We detected both Ad-private and SCC-private clones in circulation >19 months (clones C16 and C21, WCM63_1), and >1 month (clones C6, C9, C14-C19, C21, WCM63_2) prior to clinical detection and biopsy, underscoring the potential importance of liquid biopsies for patient surveillance as well as metastatic biopsies at the time of disease progression (Fig. 5a–c, Supplementary Data 7, Supplementary fig. 4a).
a Plasma detection rate of selected clones over time, juxtaposed with PSA levels and treatment regimen. The left y-axis corresponds to the plasma detection rate shown by each of the lines, and the right y-axis corresponds to the blue area plot of PSA levels as measured in ng/ml. The bottom track shows the timing, duration, and type of therapies. b Schematic of the phylogenetic tree, restricted to clones shared across more than one sample. Each panel represents a subset of the tree as it relates to specific sample lineages, labeled to the right of each tree. The size of each node indicates how many samples a clone is detected in. c Plasma detection rate of all non-private clones over time, corresponding to the clones in the top-most tree in the preceding (b). The shaded polygon and colored dashed lines represent the 95% bootstrap confidence intervals of the detection rate, and the black dashed line represents the clone-specific lower limit of detection. d Heatmap showing the similarity of plasma and tumor copy number profiles as measured by the cosine similarity of normalized read depth. Tumor samples are represented by each row, and cfDNA samples are measured on each column. Samples are grouped by histology. The color of each cell, and the annotated value indicate cosine similarity, The top track indicates which histology was more similar to the cfDNA at a given time point, as measured by a greater cosine similarity (P < 0.05, Welch Two Sample t-test).
We estimated the tumor fraction in the cfDNA samples two ways, using copy number signal (Supplementary Fig. 4b), and via read support for SNV/INDELs associated with the earliest clone C0 (Supplementary Data 6) (Methods). The estimates were largely concordant (r = 0.95), with a mean absolute difference of 0.054, and both methods were well-correlated with PSA levels (SNV/INDEL r = 0.90, CNV r = 0.83). At cfDNA time point WCM63_3, the difference was greatest between the two methods, with the SNV/INDEL approach showing a drop in tumor fraction from 0.375 to 0.193, and the copy number approach remaining relatively unchanged, dropping from 0.373 to 0.349. Overall SNV/INDEL-based tumor fraction, and detection rates of clone C8 and C13 decreased from WCM63_2 to WCM63_3 (48.5%, 57.3%, and 58.3% change, respectively), whereas clone C15 was comparatively unaffected (11.3% change), suggesting that squamous lineage 1-associated CNVs may be buffering the expected drop in copy number signal in the cfDNA.
To further explore the discrepancy between tumor fraction estimates at WCM63_3, we compared the genome-wide cosine similarity of read depth profiles between each tumor and plasma sample (Fig. 5d). At WCM63_1, the tumor fraction was low and both histologies did not resemble the observed read depth profile with mean cosine similarities of 0.098 for CRPC-SCC and 0.13 for CRPC-Ad. At WCM63_2, both histologies were highly similar to the plasma, with mean cosine similarities of 0.767 for CRPC-SCC and 0.753 for CRPC-Ad. At WCM63_3, we observed a reduction in similarity for CRPC-Ad (P = 0.008, Welch Two Sample t-test), while similarity to CRPC-SCC remained unchanged (P = 0.409, Welch Two Sample t-test). Consistent with the signal from SNV/INDELs in the cfDNA, similarity increases for both histologies at WCM63_4, though not significantly. At WCM63_5, the histologies again diverge, with CRPC-SCC similarity peaking, and CRPC-Ad returning to levels similar to WCM63_3. Notably, squamous lineage 1 (samples BHIL) had the highest cosine similarity to cfDNA of all samples at WCM63_3-5, mirroring the detection rate trajectory of clone C15.
Leveraging the phylogenetic relationship between clones detected in the tumor samples, and their relative abundance in cfDNA across time as measured by a read-level detection rate, provided insight into clonal dynamics in response to therapy. During the 75 day period between the collection of WCM63_2 and WCM63_3, the patient was started on abiraterone acetate, the pro-drug of abiraterone. This drug inhibits CYP17, thereby reducing production of highly-potent dihydrotestosterone (DHT) in tissues. Hormone-sensitive prostate adenocarcinomas are commonly sensitive to abiraterone acetate (AA), as are most AA-naïve CRPC tumors, yet considered only effective against AR-dependent tumors [16–18]. The reduction in overall tumor fraction and relatively small reduction in squamous lineage 1 clone C15 is consistent with differential sensitivity between CRPC-Ad (AR-driven) and CRPC-SCC (AR-independent) (Fig. 5a–c).
Discussion
With the benefit of retrospection, we sought to detail the genomic course of a metastatic prostate cancer to provide insight into the intrinsic and selective biology underlying the divergent progression of this patient’s disease which, in addition to common adenocarcinoma histomorphology, demonstrated an unusual squamous differentiation.
Accurate description of tumor evolution has importance in both clinical practice and research investigations. In clinical practice, the histological origin of a tumor can help to inform treatment approaches. When a male patient has a cancer of unknown primary, the presence of the TMPRSS2::ERG fusion is sought. When present, as it is in 30–50% of prostate cancers depending on ancestral background21,22, this rearrangement can be used as a pathognomonic for tumor origin from prostate. In our patient case, this fusion was not present. Adding further complexity, SPOP, which is frequently mutated in both localized and advanced prostate cancers23 and is considered mutually exclusive from TMPRSS2::ERG fusions, was also not detected. The absence of both TMPRSS2::ERG fusions and mutations in SPOP may be consistent with emerging data on the genomic characteristics of prostate cancers from people of African ancestry. Analysis of his genetic ancestry estimated approximately 50% ancestry from the African continent (Supplementary Fig. 1). This is in keeping with the relatively low incidence of TMPRSS2::ERG fusions in patients of African ancestry compared to patients of European ancestry22, even in the absence of SPOP mutation24, and also highlights the important impact of genomic ancestry in interpreting patient data.
With the benefit of whole-genome sequencing, we demonstrate clear evidence of clonality of all samples. This conclusion is supported by single nucleotide variant mutation data as well as both simple and complex rearrangements. In comprehensive analysis of the DNA sequencing, we did not identify a putative driver of squamous cell differentiation. The transcriptomic profile could have suggested a mechanism of differentiation. The pathways most active in the CRPC-Ad and CRPC-SCC samples, respectively, reflected their histology; adenocarcinomas were characterized by AR signaling and squamous cells were characterized by keratin and extracellular matrix pathways. Only in the increased signaling via TGFbeta, known to be a critical influence in epithelial to mesenchymal transition, did we see a clue as to how the cells made their transformation. We were unable to elucidate an active transcriptional program that allowed differentiation from adenocarcinoma to squamous histology. We hypothesize that the program that allowed differentiation was transient and thus undetectable by examining differences between the origin and destination states (i.e., adeno v. squamous). Recent investigations into cancer cellular plasticity have supported the existence of transitory transitional states between histologies in both prostate and lung cancers25, with ongoing pursuit of approaches to identify early transition cells and to revert cell identity through inhibiting candidate pathways of transition26. Despite the broad sampling from this patient and deep characterization of the acquired tissues, we found neither a mixed histology sample nor any specimen that had DNA or RNA components reflective of intermediacy between the two histologies. It also is noted that all the metastatic lesions in the liver were adenocarcinoma, raising the possibility of an anatomical site preference, particularly given that one of the liver samples was unlike the others, suggestive of two seedings. It is also possible that epigenetic regulation contributes to the transition or destination state of these tumors27,28.
Furthermore, identification of complex structural variants shared across all samples with a minority of SVs private to either histology supports two conclusions. The first is a clear demonstration of clonality and prostate origin, similar to what was observed in the analysis of SNVs. The second is that the absence of many SVs private to SCC samples, for example, suggests that complex SVs–canonical drivers of prostate cancer–were not a direct driver of histological transformation29,30.
Despite our inability to develop a generalizable framework for adeno-to-squamous differentiation in CRPC, we were able to make three key observations that may impact approach to and study of CRPC. The first was evidence of dynamic evolution at the AR locus. As previously shown, complex rearrangements and copy number changes at AR are common in CRPC31,32. Here we observe that continued evolution occurs in the primary tumor, significantly after dissemination has occurred. Despite no evidence of those clones giving rise to a pathologically- or clinically-detected metastatic tumor in this patient, this observation may support the rationale behind the rising interest in treating the primary tumor in patients with metastatic disease as observed in the STAMPEDE trial and as is currently formally being tested in other studies33. Secondly, with the benefit of tumor-informed detection, we observed squamous carcinoma-specific mutations in circulation many months before there was suspicion for a differentiated tumor. This supports, in part, the practice of routine metastatic biopsies at the time of progression, as well as raises an important question. Is SCC-transformed CRPC more common than currently thought and could it explain a subset of clinically-aggressive CRPC that lacks canonical markers of AVPC or NEPC? This leads into the third observation suggesting that intrapatient tumor heterogeneity is a potential driver of variability of tumor response. Although at the time it was not definitively known that the two distinct tumor histologies were both of prostate origin, examination of the cfDNA allowed for description of a modest and transient response in AR-driven, PSA-producing adenocarcinoma cells with stability or growth of AR-independent, PSA-negative prostate squamous carcinoma cells. Intrapatient variability has increasing relevance beyond AR pathway targeted therapies, where heterogeneity is well-documented and understood (e.g., NEPC, double-negative tumors)34. With the establishment of theranostics, including against PSMA and other emerging cell surface targets, understanding expression patterns and heterogeneity of clinically relevant biomarkers has therapeutic importance35,36. Although functional PSMA imaging was not included in this analysis, this patient was treated with two courses of investigational anti-PSMA treatment (177Lu-J591 and J591 naked antibody). The transcriptomic analysis supports a lack of FOLH1 expression (data not shown) in the tumor cells from CRPC-SCC and correspondingly would not have been detected by PSMA PET and unlikely to experience a clinical benefit of these treatments to those tumors37.
In conclusion, this novel molecular case report illustrates squamous transformation of a castration-resistant prostate adenocarcinoma. Acknowledging limitations to broad conclusions from a deep dataset on a single patient, these data support non-neuroendocrine transformed states which should be considered in patients with aggressive tumor behavior and/or discordance between clinical response, radiographic change, and PSA blood levels.
Methods
Sample collection
This study, including the use of archival tissue samples for research, was performed under an institutional review board-approved protocol. Briefly, when patients are referred to and enrolled in the Precision Medicine Program at Weill Cornell Medicine (WCM) (IRB Protocol No.1305013903), they have the option to consent to the Rapid Autopsy Program5. Hematoxylin and eosin (H&E)-stained slides and immunohistochemistry slides were reviewed by histopathologists with expertise in uropathology.
Tissue whole-genome sequencing
Whole-genome sequencing for all samples was performed at the New York Genome Center. Targeting 500 bp fragments, whole-genome sequencing libraries were prepared using the KAPA Hyper Library Preparation Kit (KAPABiosystems KK8502, KK8504) in accordance with the manufacturer’s instructions. Briefly, 200 ng of DNA was sheared using a Covaris LE220 sonicator (adaptive focused acoustics). DNA fragments were end-repaired, adenylated, ligated to Illumina sequencing adapters, underwent bead-based size selection and were amplified. Final libraries were quantified using the Qubit Fluorometer (Life Technologies) or Spectramax M2 (Molecular Devices) and Fragment Analyzer (Advanced Analytical) or Agilent 2100 BioAnalyzer. Libraries were sequenced on an Illumina Novaseq6000 sequencer to a target 80X tumor and 40X normal coverage using 2 ×150bp cycles.
Whole-genome sequencing data analysis
Analysis of tumor-normal whole-genome sequencing data was performed as previously described38,39.
Sequencing reads for the tumor and normal samples were aligned to the GRCh38 reference using BWA-MEM (v0.7.15)40. NYGC’s ShortAlignmentMarking (v2.1) was used to mark short reads as unaligned41. GATK (v4.1.0)42 FixMateInformation was run to verify and fix mate-pair information, followed by Novosort (v1.03.01) markDuplicates to merge individual lane BAM files into a single BAM file per sample. Duplicates were then sorted and marked, and GATK’s base quality score recalibration (BQSR) was performed. Sequencing and alignment metrics are listed in Supplementary Data 1.
The tumor and normal bam files were processed through NYGC’s variant calling pipeline38, which consists of MuTect2 (GATK v4.0.5.1)43, Strelka2 (v2.9.3)44 and Lancet (v1.0.7)45 for calling Single Nucleotide Variants (SNVs) and short Insertion-or-Deletion (Indels), SvABA (v0.2.1)15 for calling Indels and Structural variants (SVs), Manta (v1.4.0)46 and Lumpy (v0.2.13)47 for calling SVs, and BIC-Seq2 (v0.2.6)48 for to generate non-purity/ploidy adjusted copy-number variants (CNVs). Manta also outputs a candidate set of Indels which was provided as input to Strelka2. Lancet is only run on the exonic part of the genome. It is also run on the +/− 250nt regions around non-exonic variants that are called by only one of the other callers, to add confidence to such variants. Small SVs called by Manta are also used to add confidence to the indel calls.
Variant calls were merged by variant type (SNVs, Multi-Nucleotide Variants (MNVs), Indels and SVs). MuTect2 and Lancet call MNVs, however Strelka2 does not, and it also does not provide any phasing information. To merge such variants across callers, MNVs called by MuTect2 and Lancet were split to SNVs, and then merged the SNV callsets across the different callers. If the caller support for each SNV in a MNV was the same, they were merged back to MNVs. Otherwise those are represented as individual SNVs in the final callset. Lancet and Manta are the only tools that can call deletion-insertion events. Other tools may represent the same event as separate yet adjacent indel and/or SNV variants. Such events are relatively less frequent, and difficult to merge. Therefore, these calls were not merged with SNV and Indel calls from other callers. All SVs below 500 bp were excluded and the rest merged across callers using bedtools49 pairtopair (requiring slop of 300 bp, same strand orientation, and 50% reciprocal overlap).
SNVs and Indels were annotated with Ensembl as well as databases such as COSMIC (v86)50, 1000Genomes (Phase3)51, ClinVar (201706)52, PolyPhen (v2.2.2)53, SIFT (v5.2.2)54, FATHMM (v2.1)55, gnomAD (r2.0.1)56 and dbSNP (v150)57 using Variant Effect Predictor (v93.2)58.
All predicted SVs were annotated with germline variants by overlapping with known variants in 1000 Genomes and Database of Genomic Variants (DGV)59. Cancer-specific annotation included overlap with genes from Ensembl60 and Cancer Gene Census in COSMIC, and potential effect on gene structure (e.g., disruptive, intronic, intergenic). If a predicted SV disrupted two genes and strand orientations are compatible, it was annotated as a putative gene fusion candidate. Further annotations include sequence features within breakpoint flanking regions, e.g., mappability, simple repeat content, and segmental duplications.
For SNVs, Indels, and SVs, an in-house panel of normals (PON) was used to filter putative artifacts. Somatic SNVs and Indels were filtered out if they were found in more than two or more individuals in our PON. Somatic SVs were removed if they were detected in two or more individuals in our PON using bedtools pairtopair (requiring slop of 300 bp, same strand orientation, and 50% reciprocal overlap). In addition to PON filtering, SNVs and Indels that have minor allele frequency (MAF) of 1% or higher in either 1000 Genomes Phase 3 or gnomAD (r2.0.1), and SVs that overlap DGV, 1000Genomes Phase 3, or gnomAD SV61 were removed.
Variants were excluded if the variant allele frequency (VAF) in the tumor sample was below 0.0001, the VAF in the normal sample exceeded 0.2, or the sequencing depth at the position was less than 2 in either the tumor or normal sample. Additionally, variants were filtered out if the VAF in the normal sample was higher than that in the tumor sample.
For the final SNV and Indel callset, calls were retained that passed the above-mentioned filters, and were either called by two or more variant callers, or called by one caller and also seen in the Lancet validation calls or in the Manta SV calls. To increase sensitivity, a union of somatic SNVs and Indels across all the patient’s samples was generated. Pysam pileup (0.15.0)62 was then run on tumor and normal bam files to compute the read support for variants present in the union that were missing from each sample’s callset. Variants with allele frequency greater than 0 were then rescued.
The truncal FOXA1 variant was visualized with the lollipop package (v1.7.2)63, with the ___domain source set to Pfam.
For the final SV callset, calls were retained that passed the above-mentioned filters, and were either called by 2 or more variant callers, or called by Manta or Lumpy with either additional support from a nearby CNV changepoint, or split-read support from SplazerS64. An SV was considered supported by SplazerS if it found at least 3 split-reads in the tumor only. Nearby CNV changepoints were determined by overlapping BIC-Seq2 calls with the SV callset using bedtools closest. An SV was considered to be supported by a CNV changepoint if either breakend was within 1Kbp of a CNV changepoint.
For each sample, GC content and mappability-corrected read depth data was computed in 1 Kbp bins using fragCounter65. The read depth data was then corrected for systematic artifacts using dryclean66. Purity and ploidy were estimated for each sample by running AscatNGS67 and Sequenza68, and manually reviewing to select the most accurate estimate. Junction-balanced genome graphs with genomic interval and junction integer copy number were generated by running JaBbA17 with the SV callset, manually curated purity and ploidy estimates, dryclean-corrected tumor read depth data, and B-allele frequency data as input. gGnome69 was then used to call simple and complex structural variants, as well as compare read-level support for individual junctions.
Focal copy number variants (≤3MB) were determined relative to a sample’s copy-neutral state, as defined by ploidy. For samples with an intermediate average ploidy (fractional value between 0.4 and 0.6, e.g., 3.5), neutral copy state was set as the closest two integer values (e.g., for a ploidy of 3.5, neutral copy states would be 3 and 4). Otherwise, the neutral copy state was set as the rounded ploidy. Events above the neutral copy number were classified as gains, and those more than double ploidy were classified as amplifications. Conversely, events below the neutral copy number were classified as deletions. Events with a copy number of 0 were classified as losses.
Mutational signature fitting was performed using the deconstructSigs R package (v1.9)70. DeconstructSigs was run with ‘HighConfidence’ variants as input and COSMIC v3.2 signatures50 as a reference. The tool was run with arguments “contexts.needed=TRUE” and “signature.cutoff=0” for SBS, DBS, and ID signatures.
Somatic mutations were clustered into clones by integrating per-sample read support, allele-specific copy number, and purity, using PyClone-VI (v(0.1.1), a re-implementation of the PyClone algorithm optimized for whole genome sequencing data71. After reviewing the output, we noticed two clones with low assignment scores. The per-sample VAF distributions of these clones were close to zero for all samples except WCM63_L, suggesting that a low prevalence truncal clone was inadvertently merged with a WCM63_L-private clone. This behavior has been previously reported72,73. As these clones did not occupy informative positions on the downstream phylogenetic trees, and represented 4.8% of the total mutations detected in this patient, they were excluded from downstream analysis. The variants in each of the resulting clones were fed through deconstructSigs as described above to estimate a set of mutational signature proportions.
Phylogenetic trees describing the evolutionary relationship between clones were generated with PhyClone (v0.5.0)74, with 3000 burn-in iterations, 50,000 iterations of the MCMC sampler, and 16 parallel chains. Clonal diversity was calculated using the normalized Shannon entropy (Eq. 1), where p is the normalized clone proportions, and n is the total number of clones:
Whole-transcriptome sequencing
RNA was prepared for whole-transcriptome analysis (RNA-seq) in accordance with the standard Illumina mRNA sample preparation protocol, using Truseq Stranded mRNA with polyA selection (WCM63_B, WCM63_C, WCM63_E, WCM63_F, WCM63_G, WCM63_I), Truseq Stranded mRNA with ribodepletion (WCM63_D, WCM63_H, WCM63_J, WCM63_K, WCM63_L), and TruSeq RNA Sample Prep Kit v2 -Set B with polyA selection (WCM63_A). Paired-end RNA-seq at read lengths of 75 base pairs was performed with the NovaSeq 4000. Approximately 50 million pair-end reads were generated.
Whole-transcriptome data analysis
For alignment and fusion gene discovery, RNA-seq reads were processed STAR-Fusion v1.8.075 (which uses STAR v2.7.2b for alignment), except for samples WCM63_B and WCM63_C, which were processed with STAR-Fusion v1.3.1 (with STAR v2.5.0a). For gene expression estimation, htseq v0.976 was used with the annotation Gencode v1977.
Batch correction was implemented at the cohort level using ComBat-seq78 to mitigate potential confounding effects arising from variations in library preparation methods, specifically polyA selection or ribodepletion. ComBat-seq was executed using the raw count data of the prostate cancer WCM rapid autopsy cohort (44 samples from 9 patients) as input, alongside information indicating the library preparation method (24 polyA, 20 ribodepletion).
Differential expression analysis was performed using DESeq279 to compare gene expression between squamous and adeno samples, with squamous samples designated as the reference group. The batch-corrected count data served as input for this analysis.
Differential expression results were visualized using library-normalized, log-transformed count data with a false discovery rate (FDR) threshold of 0.01 to limit to only strongly significant genes. Library normalization of the batch-corrected count data was performed using DESeq2’s median-of-ratios normalization method79. The library-normalized, log-transformed data were also visualized by ranking genes by variance and visualizing the top variable genes, limiting the number of genes to the same number of significantly differentially expressed genes at the FDR threshold of 0.01.
Gene set enrichment analysis (GSEA) was implemented using WebGestalt80, utilizing both gene ontology and KEGG pathway functional databases. The ranked gene list, based on the computed Wald statistic from the differential expression analysis, was utilized as input. The false discovery rate was set to 0.05, while other options were left at their default settings.
cfDNA sequencing
10 ng of cfDNA were processed for library construction using the Accel-NGS@2S Plus DNA Library kit (cat # 21024) as per manufacturer instructions (Swift BioSciences, 58 Parkland Plaza, Ste. 100, Ann Arbor, MI 48103). After end repair of the cfDNA, full length-indexed Illumina adapter sequences were ligated to the 3’end, followed by ligation of Illumina adapter to the 5’ end. Libraries generated were amplified by the use of 6 PCR cycles. Libraries were sequenced on an Illumina NovaSeq 6000 in two runs, on an S2 flow cell and a S4 flow cell for 2 × 150 cycles to a mean coverage of 75X. Primary processing of sequencing images was done using Illumina’s Real Time Analysis software (RTA). CASAVA 1.8.2 software was used to perform image capture, base calling and demultiplexing.
cfDNA data analysis
Reads were aligned in the same manner as described in whole-genome sequencing preprocessing, with the addition of Skewer (v0.2.2)81 for adapter trimming. Sequencing and alignment metrics are listed in Supplementary Data 6. CNV profiles and tumor fraction were inferred using ichorCNA82 and the profile was compared with WGS from tissue using cosine similarity as implemented in the R lsa package (v0.73.3)83.
To determine the presence of tumor variants in cfDNA, Pysam pileup was used to query the union of tumor mutations as previously described in Somatic variant annotation and filtering.
Leveraging the clone assignments described in Clonal deconvolution and phylogenetic analysis, we computed a clone-level detection rate at each cfDNA time point by dividing the number of clone-specific variant-supporting reads by the total number of overlapping reads84. This read-based detection rate is arithmetically equivalent to a depth-weighted mean VAF. We determined clone-specific limits of detection by computing detection rates from an in-house set of unrelated cfDNA samples (6 samples across 4 patients). The mean and standard deviation of the clone-specific detection rates in the unrelated samples were used to normalize detection rates to Z-scores in the WCM63 cfDNA samples, and a clone was considered “detected” in a given time point if it had a Z-score >1.2. For one clone, the error rate was estimated as 0, so the error rate across all mutations was used.
To compute tumor fraction from SNV/INDELs, we only considered the detection rate for the most truncal clone that was present across all tumor samples (clone C0: 2591 variants). As we cannot distinguish between reference allele-supporting reads originating from tumors versus non-malignant cells, and we have temporally distant samples, we cannot rule out the possibility that a fraction of somatic variants detected in the later metastatic tumors have not yet arisen at the time of early cfDNA sample acquisition. Additionally, due to the large number of samples contributing to the patient’s compendium of somatic mutations, many variants are subclonal and unlikely to be detected in the cfDNA, diluting the overall tumor fraction estimate. Using variants that should be present across all tumors ensures a robust measurement of tumor fraction. The clone C0 detection rate was corrected for mean mutation multiplicity and total copy number, yielding tumor fractions that were in agreement with the CNV-derived ichorCNA estimates (Supplementary Data 6)[85, Supplementary note, Extended data 7b].
Data availability
The raw data (WGS and RNA-seq) will be shared via dbGap upon publication.
Code availability
Code for reproducing figures and custom analysis will be made available on Github at the time of publication.
References
Liu, W et al. Copy number analysis indicates monoclonal origin of lethal metastatic prostate cancer. Nat. Med. 15, 559–565 (2009).
Woodcock, DJ et al. Prostate cancer evolution from multilineage primary to single lineage metastases with implications for liquid biopsy. Nat. Commun. 11, 5070 (2020).
Davies, A et al. An androgen receptor switch underlies lineage infidelity in treatment-resistant prostate cancer. Nat. Cell Biol. 23, 1023–1034 (2021).
Labrecque, MP et al. Molecular profiling stratifies diverse phenotypes of treatment-refractory metastatic castration-resistant prostate cancer. J. Clin. Investig. 129, 4492–4505 (2019).
Pisapia, D. J. et al. Next-generation rapid autopsies enable tumor evolution tracking and generation of preclinical models. JCO Precis. Oncol. 1, 1–13 (2017).
Tagawa, ST et al. Phase 1/2 study of fractionated dose lutetium-177-labeled anti-prostate-specific membrane antigen monoclonal antibody J591 (177 Lu-J591) for metastatic castration-resistant prostate cancer. Cancer 125, 2561–2569 (2019).
Tagawa, ST et al. Anti-prostate-specific membrane antigen (PSMA) monoclonal antibody (mAb) J591 immunotherapy for prostate cancer. Ann. Oncol. 27, vi265 (2016).
Aparicio, AM et al. Combined tumor suppressor defects characterize clinically defined aggressive variant prostate cancers. Clin. Cancer Res. 22, 1520–1530 (2016).
Adams, EJ et al. FOXA1 mutations alter pioneering activity, differentiation and prostate cancer phenotypes. Nature 571, 408–412 (2019).
Sondka, Z et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).
de Laat, WL, Sijbers, AM, Odijk, H, Jaspers, NG & Hoeijmakers, JH Mapping of interaction domains between human repair proteins ERCC1 and XPF. Nucleic Acids Res. 26, 4146–4152 (1998).
Tripsianes, K et al. The structure of the human ERCC1/XPF interaction domains reveals a complementary role for the two proteins in nucleotide excision repair. Structure 13, 1849–1858 (2005).
Ghose, A., Moschetta, M., Pappas-Gogos, G., Sheriff, M. & Boussios, S. Genetic aberrations of DNA repair pathways in prostate cancer: translation to the clinic. Int. J. Mol. Sci. 22, 9783 (2021).
Kulman, E, Wintersinger, J & Morris, Q Reconstructing cancer phylogenies using Pairtree, a clone tree reconstruction algorithm. STAR Protoc. 3, 101706 (2022).
Wala, JA et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).
Cameron, DL et al. GRIDSS2: comprehensive characterisation of somatic structural variation using single breakend variants and structural variant phasing. Genome Biol. 22, 202 (2021).
Hadi, K et al. Distinct classes of complex structural variation uncovered across thousands of cancer genome graphs. Cell 183, 197–210.e32 (2020).
Zivanovic, A et al. Co-evolution of AR gene copy number and structural complexity in endocrine therapy resistant prostate cancer. NAR Cancer 5, zcad045 (2023).
Nauseef, JT & Henry, MD Epithelial-to-mesenchymal transition in prostate cancer: paradigm or puzzle?. Nat. Rev. Urol. 8, 428–439 (2011).
He, MX et al. Transcriptional mediators of treatment resistance in lethal prostate cancer. Nat. Med. 27, 426–433 (2021).
Stopsack, KH et al. Transcriptomes of prostate cancer with TMPRSS2:ERG and Other ETS fusions. Mol. Cancer Res. 21, 14–23 (2023).
Khani, F et al. Evidence for molecular differences in prostate cancer between African American and Caucasian men. Clin. Cancer Res. 20, 4925–4934 (2014).
Barbieri, CE et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat. Genet. 44, 685–689 (2012).
Huang, FW et al. Exome sequencing of African-American prostate cancer reveals loss-of-function ERF mutations. Cancer Discov. 7, 973–983 (2017).
Rubin, MA, Bristow, RG, Thienger, PD, Dive, C & Imielinski, M Impact of Lineage Plasticity to and from a neuroendocrine phenotype on progression and response in prostate and lung cancers. Mol. Cell 80, 562–577 (2020).
Chan, JM et al. Lineage plasticity in prostate cancer depends on JAK/STAT inflammatory signaling. Science 377, 1180–1191 (2022).
Beltran, H et al. Divergent clonal evolution of castration-resistant neuroendocrine prostate cancer. Nat. Med. 22, 298–305 (2016).
Zhao, SG et al. The DNA methylation landscape of advanced prostate cancer. Nat. Genet. 52, 778–789 (2020).
Zhou, M. et al. Patterns of structural variation define prostate cancer across disease states. JCI Insight 7, e161370 (2022).
Baca, SC et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).
Henzler, C et al. Truncation and constitutive activation of the androgen receptor by diverse genomic rearrangements in prostate cancer. Nat. Commun. 7, 13668 (2016).
Li, Y et al. Diverse AR gene rearrangements mediate resistance to androgen receptor inhibitors in metastatic prostate cancer. Clin. Cancer Res. 26, 1965–1976 (2020).
Phase III Randomized Trial of Standard Systemic Therapy (SST) versus Standard Systemic Therapy plus Definitive Treatment (Surgery or Radiation) of the Primary Tumor in Metastatic Prostate Cancer. SWOG, www.swog.org/clinical-trials/s1802.
Beltran, H et al. Circulating tumor DNA profile recognizes transformation to castration-resistant neuroendocrine prostate cancer. J. Clin. Investig. 130, 1653–1668 (2020).
Nauseef, JT, Bander, NH & Tagawa, ST Emerging prostate-specific membrane antigen-based therapeutics: small molecules, antibodies, and beyond. Eur. Urol. Focus 7, 254–257 (2021).
Denmeade, SR Resolute Progress down a long and winding road leads to the promised land of prostate-specific membrane antigen-based therapies for prostate cancer. J. Clin. Oncol. 42, 852–856 (2024).
Calais, J & Czernin, J PSMA expression assessed by PET imaging is a required biomarker for selecting patients for any PSMA-targeted therapy. J. Nucl. Med. 62, 1489–1491 (2021).
Arora, K et al. Deep whole-genome sequencing of 3 cancer cell lines on 2 sequencing platforms. Sci. Rep. 9, 19123 (2019).
Khani, F et al. Evolution of structural rearrangements in prostate cancer intracranial metastases. NPJ Precis Oncol. 7, 91 (2023).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN] (2013).
Nygc-Short-Alignment-Marking. (Github) https://github.com/nygenome/nygc-short-alignment-marking.
McKenna, A et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Cibulskis, K et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Kim, S et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
Narzisi, G et al. Genome-wide somatic variant calling using localized colored de Bruijn graphs. Commun. Biol. 1, 20 (2018).
Chen, X et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Layer, RM, Chiang, C, Quinlan, AR & Hall, IM LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Xi, R, Lee, S, Xia, Y, Kim, T-M & Park, PJ Copy number analysis of whole-genome data using BIC-seq2 and its application to detection of cancer susceptibility variants. Nucleic Acids Res. 44, 6274–6286 (2016).
Quinlan, AR & Hall, IM BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Tate, JG et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Byrska-Bishop, M et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440.e19 (2022).
Landrum, MJ et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).
Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. 76, 7–20 (2013).
Vaser, R, Adusumalli, S, Leng, SN, Sikic, M & Ng, PC SIFT missense predictions for genomes. Nat. Protoc. 11, 1–9 (2016).
Shihab, HA et al. Ranking non-synonymous single nucleotide polymorphisms based on disease concepts. Hum. Genom. 8, 11 (2014).
Lek, M et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Sherry, ST et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
McLaren, W et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
MacDonald, JR, Ziman, R, Yuen, RKC, Feuk, L & Scherer, SW The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42, D986–D992 (2014).
Hubbard, T et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002).
Collins, RL et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
Pysam: Pysam Is a Python Package for Reading, Manipulating, and Writing Genomics Data such as SAM/BAM/CRAM and VCF/BCF Files. It’s a Lightweight Wrapper of the HTSlib API, the Same One That Powers Samtools, Bcftools, and Tabix. (Github) https://github.com/pysam-developers/pysam.
Jay, JJ & Brouwer, C Lollipops in the clinic: information dense mutation plots for precision medicine. PLoS ONE 11, e0160519 (2016).
Emde, A-K et al. Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS. Bioinformatics 28, 619–627 (2012).
fragCounter: GC and Mappability Corrected Fragment Coverage for Paired End Whole Genome Sequencing. (Github) https://github.com/mskilab-org/fragCounter.
Deshpande, A., Walradt, T., Hu, Y., Koren, A. & Imielinski, M. Robust foreground detection in somatic copy number data. bioRxiv 847681 https://doi.org/10.1101/847681 (2019).
Raine, KM et al. ascatNgs: identifying somatically acquired copy-number alterations from whole-genome sequencing Data. Curr. Protoc. Bioinform. 56, 15.9.1–15.9.17 (2016).
Favero, F et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 26, 64–70 (2015).
gGnome: R API for Browsing, Analyzing, and Manipulating Reference-Aligned Genome Graphs in a GenomicRanges Framework. (Github) https://github.com/mskilab-org/gGnome.
Rosenthal, R, McGranahan, N, Herrero, J, Taylor, BS & Swanton, C DeconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016).
Gillis, S & Roth, A PyClone-VI: scalable inference of clonal population structures using whole genome data. BMC Bioinform. 21, 571 (2020).
Myers, MA, Satas, G & Raphael, BJ CALDER: Inferring phylogenetic trees from longitudinal tumor samples. Cell Syst. 8, 514–522.e5 (2019).
Frankell, AM et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525–533 (2023).
Hurtado, E., Bouchard-Côté, A. & Roth, A. PhyClone: accurate Bayesian reconstruction of cancer phylogenies from bulk sequencing. bioRxiv https://doi.org/10.1101/2024.08.14.607069 (2024).
Haas, B.J., Dobin, A. & Li, B. et al. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 20, 213 (2019).
Putri, G.H. et al. Analysing high-throughput sequencing data in Python with HTSeq 2.0. Bioinformatics 38, 2943–2945 (2022).
Frankish, A et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 51, D942–D949 (2023).
Zhang, Y, Parmigiani, G & Johnson, WE ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genom. Bioinform. 2, lqaa078 (2020).
Anders, S & Huber, W Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
Wang, J, Vasaikar, S, Shi, Z, Greer, M & Zhang, B WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res. 45, W130–W137 (2017).
Jiang, H, Lei, R, Ding, S-W & Zhu, S Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinform. 15, 182 (2014).
Adalsteinsson, VA et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat. Commun. 8, 1324 (2017).
Wild F (2022). lsa: Latent Semantic Analysis. R package version 0.73.3. https://CRAN.R-project.org/package=lsa.
Zviran, A et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat. Med. 26, 1114–1124 (2020).
Abbosh, C et al. Tracking early lung cancer metastatic dissemination in TRACERx using ctDNA. Nature 616, 553–562 (2023).
Acknowledgements
The authors highlight the selfless donation made by the patient and his loved ones, for which we are indebted. This work was supported by the Englander Institute for Precision Medicine. The authors thank project support provided by laboratory technicians (Joonghoon Auh, Troy Kane) at the Englander Institute for Precision Medicine, by research pathology fellows (Ahmed Elsaeed, Majd Al Assaad), and by the Center for Translational Pathology (Ruben Diaz, Leticia Dizon, Bing He) from the Department of Pathology and Laboratory Medicine at Weill Cornell Medicine. Whole-genome sequencing was performed at the New York Genome Center and RNA-seq was performed at Weill Cornell Medicine. J.M.M., J.T.N., N.R., M.S., J.M., B.D.R., D.M.N., O.E., and M.I. are supported by the Department of Defense Prostate Cancer Research Program (PCRP) Health Disparity Research Award (PC200267). B.D.R., J.M.M., and A.S. are also supported by the WCM NCI SPORE (P50CA211024).
Author information
Authors and Affiliations
Contributions
J.T.N., T.R.C., N.R., and J.M.M. conceived and designed the study. M.S., J.M., S.T.T., D.M.N., and J.M.M. enrolled patients and collected clinical data. B.D.R. and J.M.M. reviewed pathology. T.R.C., W.F.H., A.A., M.S., A.O., H.G., M.S., Z.R.G., Z.S., L.W., A.S., O.E., I.H., M.I., and N.R. performed computational analyses. All authors reviewed data for publication. J.T.N., T.R.C., W.F.H., N.R., and J.M.M. wrote the first draft of the manuscript; and all authors contributed to the writing and editing of the revised manuscript and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
JTN reports consulting or advising relationships with Pfizer, Bayer, and AIQ. At the time of publication, he is an employee of Convergent Therapeutics.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Nauseef, J.T., Chu, T.R., Hooper, W.F. et al. A complex phylogeny of lineage plasticity in metastatic castration resistant prostate cancer. npj Precis. Onc. 9, 91 (2025). https://doi.org/10.1038/s41698-025-00854-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41698-025-00854-4