Main

Plants defend themselves with complex chemical arsenals that are an essential source of therapeutics1,2. Paclitaxel (Taxol), a potent microtubule-stabilizing agent discovered in yew (Taxus) plants as part of a National Cancer Institute screening campaign during the 1960s and 1970s, remains one of the most valuable chemotherapeutics used in the clinic6. After its approval by the US Food and Drug Administration (FDA) in 1992 for the treatment of ovarian cancer, this diterpenoid became a best-selling pharmaceutical and remains the active component of diverse formulations, derivatizations and biological conjugates7. This extensive use, combined with the chemical complexity and low natural abundance (0.001–0.050% dried weight in Taxus bark)7,8 of Taxol has made it one of the most sought-after molecules for synthesis. Although elegant synthetic routes have been developed9, none are economically viable; drug supply still relies on extracting late-stage intermediates, such as baccatin III (16), from yew tissue10. The promise of a biomanufacturing strategy has made the discovery of the complete Taxus enzyme set for heterologous Taxol biosynthesis a grand challenge for natural product chemistry.

The search for the complete Taxol biosynthetic gene set, which was originally proposed to involve 19 enzymes, including 14 enzymes to baccatin III (16), began in the late 1990s. By 2006, the Croteau laboratory and others had discovered 12 enzymes, including the scaffold-forming enzyme, taxadiene synthase (TDS), as well as several tailoring oxidases and acyltransferases (Fig. 1a and Supplementary Table 1). Progress mostly stalled for two decades, until recent reports identified a taxane oxetanase (TOT) that installs Taxol’s unique oxetane moiety, and several additional enzymes that are thought to act in the pathway11,12,13,14,15. However, several of the crucial functional groups on Taxol, such as the C-1β hydroxyl, still lack an assigned enzyme with direct biochemical evidence. In addition to missing pathway enzymes, heterologous reconstitution of the Taxol pathway has been stymied by the inefficiency of the first proposed oxidation. Despite extensive troubleshooting efforts in a variety of heterologous systems16, the first Taxol oxidase, taxadiene 5α-hydroxylase, (T5αH), mainly produces side products with rearranged carbon bonds instead of the proposed ‘on-pathway’ intermediate, taxadien-5α-ol (2) (refs. 17,18,19,20). These two challenges highlight the major gap in our understanding of the endogenous biochemistry, and remain hurdles to heterologous Taxol production (Fig. 1a).

Fig. 1: A platform combining multiplexed perturbation and snRNA-seq (mpXsn) to overcome the challenges of Taxol biosynthetic gene discovery.
figure 1

a, Proposed Taxol biosynthesis pathway, with gaps highlighted in orange. In addition to our incomplete knowledge of the biosynthetic gene set, inefficiencies (red arrows) of the first oxidase, T5αΗ, prevent reconstitution and discovery of the Taxol pathway. b, Prominent classes of taxane metabolites that have been isolated from Taxus species. The tailoring acyl groups on taxanes include acetyl, benzoyl, short-chain fatty acid residues and phenylisoserine derivatives. c, Number of enzymes in the Taxus chinensis genome belonging to secondary-metabolism-related families. P450, cytochrome P450; αβ-hydro, α/β-hydrolase; AcylT, acyltransferase; TPS, terpene synthase. d, Overview of the differences between conventional co-expression approaches and the mpXsn methodology described here. Dot networks are visualizations of the co-expression network, in which nodes are linked when mutual rank is lower than 20, using either bulk RNA-seq or our mpXsn data. For visual clarity, only P450s are shown. e, Experimental overview of mpXsn, with uniform manifold approximation and projection (UMAP) of single-nucleus transcriptomes. f, Rank of each known Taxol gene by PCC to TDS using either bulk (n = 79 samples) or mpXsn (n = 17,143 cells across 3 experiments) data.

The missing components of the Taxol pathway have probably eluded scientists because of the metabolic and genomic complexity of Taxus. Taxol is one of almost 600 taxanes that have been isolated from Taxus species7,21, including hundreds of 6/8/6-taxanes that differ from Taxol only by subtle tailoring modifications7 (Fig. 1b). Within the Taxus genome, Taxol pathway genes are a minute fraction of the hundreds of oxidases, acyltransferases and other enzymes that are involved in the primary and secondary metabolism5 (Fig. 1c). Consequently, despite extensive transcriptional profiling of Taxus tissues5,22 (see also the Yew Genomics Resource; http://langelabtools.wsu.edu/ygr/), conventional transcriptional co-expression approaches have not identified the complete Taxol biosynthetic gene set. We hypothesized that an updated approach, which captured a much larger diversity of transcriptional states, would be required to improve co-expression resolution and discriminate between different branches of metabolism (Fig. 1d). Such improvements in resolution would be especially important if the Taxol pathway uses unanticipated genes.

To improve gene-association resolution within the taxane metabolic network, we developed a single-nucleus approach to efficiently profile Taxus cell states across a vast set of cell types and perturbations. Differential transcriptional activation of Taxus’s diverse biosynthetic processes enabled the discovery of several Taxol transcriptional modules, from which we identified eight new genes in Taxol biosynthesis. Highlighting the importance of this approach, most identified genes do not belong to previously proposed Taxol gene families, and would not have been anticipated from the proposed biosynthetic model. None of the three oxidases identified here belong to the cytochrome P450 CYP725A subfamily that has been the focus of previous search efforts11,12,13,14,15. Three other enzymes catalyse the addition and removal of cryptic acetylations absent in Taxol, but their inclusion is essential for progression through the pathway, akin to the protection–deprotection strategy used by chemists. Finally, we identified a protein from the NTF2-like family, not previously implicated in plant metabolism, that is crucial for high yields of the reconstituted pathway by resolving the inefficiency during the first taxane oxidation17,18,19,20. With these 8 genes, together with 11 that have been described previously, we constructed a total biosynthesis for the direct Taxol precursors baccatin III and 3’-N-debenzoyl-2’-deoxypaclitaxel in N. benthamiana. Without further optimization, our system heterologously produces baccatin III, the industrial semi-synthesis precursor for taxane therapeutics, at levels comparable with the natural abundance in yew, showcasing its tremendous potential as a sustainable source.

Multiplexed perturbation improves resolution

First, we developed a single-nucleus RNA sequencing (snRNA-seq) protocol for Taxus (Methods) and assessed whether natural cell-type heterogeneity alone would be sufficient to identify new Taxol genes by co-expression. However, after profiling 6,077 cells from unelicited (‘naive’) mature tissues, we observed that several Taxol biosynthetic enzymes, including TDS, T5αH and 10-deacetylbaccatin III-10-O-acetyl transferase (DBAT), were not highly expressed in any cell (Extended Data Fig. 1a). This exemplifies one of the core challenges of using transcriptomes to find secondary-metabolism genes: tissues must be in a state of active biosynthesis to capture pathway-associated transcripts23, but the search for such a tissue state can be difficult, because it might require a specific developmental age24 or exposure to a specific biotic stressor23.

To mitigate the difficulty of identifying biosynthetic cell states by individually testing large panels of perturbations, we made use of the scale of single-cell transcriptomics to develop a method we term multiplexed perturbation × single nuclei (mpXsn), which simultaneously tests hundreds of perturbations. Although single-cell transcriptomics have been developed into parallelized screens in mammalian systems25,26,27, no comparable platform was available for plants. Unlike most large-scale perturbation technologies, we designed mpXsn to require no genetic tools, so that it would be generalizable to diverse, non-model species.

By pooling diverse tissues and conditions before a single snRNA-seq library synthesis step, individual sample processing is no longer limiting. This approach enabled us to affordably test a large number of samples (272), spanning conditions and time points, in a single experiment (Fig. 1e). To maximize the probability of activating biosynthetic states, we compiled a panel of hormones, microorganisms and other potential elicitors (Supplementary Table 2). We subjected both young and mature Taxus media needles to this panel for one to four days before pooling all tissues and time points for snRNA-seq library synthesis. Compared with the naive cell states we originally profiled, a subset of the elicited cell states now exhibited high expression of the early Taxol pathway (Extended Data Fig. 1a).

To determine whether these single-cell transcriptional data provide new information to identify Taxol enzymes, we directly compared co-expression analyses using either our mpXsn data or bulk RNA-seq data from six previous studies spanning tissues and elicitation conditions (Methods). Using either bulk RNA-seq datasets (79 samples) or the mpXsn data (2,901 pseudobulk cell states), we ranked each gene in the Taxus genome by Pearson correlation coefficient (PCC) to TDS (Fig. 1f). Of the 14 genes previously associated with Taxol biosynthesis (Supplementary Table 1), all but 2 ranked higher in the mpXsn analysis than in the bulk RNA-seq analysis (Fig. 1f and Extended Data Fig. 1b–d). This suggested that the Taxol pathway was better resolved in the mpXsn dataset than in compiled bulk RNA-seq datasets.

Identification of three Taxol biosynthetic modules

The Taxol pathway has been hypothesized to involve 19 transformations, and at least 13 enzymes have been characterized (Fig. 2a and Supplementary Table 1). Although the expression of the first enzyme in the pathway, TDS, does correlate with most Taxol genes (Fig. 1f and Extended Data Fig. 1c–e), some known Taxol genes show stronger co-expression relationships with one another, forming distinct subclusters of co-expression (Fig. 2b). These subclusters prompted us to analyse the mpXsn dataset with an untargeted approach to identify gene co-expression modules across the T.media transcriptome.

Fig. 2: Identification of taxane biosynthetic gene modules.
figure 2

a, Schematic of Taxol biosynthesis and previously hypothesized gene order. Blue, previously identified Taxol biosynthesis enzymes; red, hypothesized enzymes. b, PCC between known Taxol-related genes using mpXsn data. To identify substructures, genes were hierarchically clustered (SciPy fcluster, Euclidean distance) on both axes. c, Schematic for matrix factorization. mpXsn data were factorized using cNMF28. d, Heat map showing the rank of known Taxol biosynthetic genes in each of the modules produced by matrix factorization. e, As in d, but showing only the three modules enriched in Taxol genes (modules 1, 2 and 3). f, Heat map of Taxol modules, showing module rankings for the two isoprenoid pathways in the primary metabolism potentially upstream of the Taxol biosynthesis. Only the MEP pathway is co-expressed with the first Taxol module, supporting its role in synthesizing Taxol precursors. g, All gene modules ranked by the total number of 2-ODD, P450 and acetyltransferase (AcylT) genes in the top 100 genes of each module. h, Module usage of each cell, which is analogous to gene expression, plotted onto the single-nucleus transcriptomic UMAP. Taxol modules 1–3 are expressed in non-overlapping cell states, and were mainly identified in different experiments. i, Unfiltered lists of the top genes in each module, plotted as module rank and score. Blue, previously identified genes associated with Taxol biosynthesis; red, new biosynthetic genes identified in this study.

To systematically organize genes into co-expressed modules, we factored the gene-by-cell matrix from the mpXsn dataset using a consensus non-negative matrix factorization (cNMF) approach (Fig. 2c) that was previously developed for single-cell datasets28. The modules and corresponding gene scores produced by this approach can reveal patterns of corresponding gene expression that are not readily apparent from linear correlation analysis (Methods). We ran cNMF analysis with different numbers of total modules (range 50–400). In all runs with more than 125 modules, known Taxol genes consistently dominated not one, but three separate gene modules (subsequently called Taxol modules 1, 2 and 3) (Fig. 2d,e,h,i and Supplementary Fig. 1). We proceeded with an intermediate value (200 total modules) for subsequent analysis to avoid over- or under-clustering. The observation that different subsets of Taxol genes rank highly in separate modules, roughly segregating by proposed order of biosynthesis (Fig. 2e and Supplementary Data 1), suggests that Taxol biosynthesis consists of separately regulated transcriptional programs. Furthermore, Taxol module 1 is enriched in genes of the methylerythritol phosphate (MEP) pathway, but not the mevalonate pathway, highlighting a link between primary and secondary metabolism (Fig. 2f). This finding aligns with the current consensus that the MEP pathway supplies precursors for diterpenoids in gymnosperms29.

The enrichment of P450, 2-oxoglutarate dependent dioxygenase (2-ODD) and acetyltransferase genes among the top 100 genes of the 3 Taxol modules (Fig. 2g) suggests that these modules are involved in Taxus secondary metabolism. In addition, these three modules were expressed in different subsets of cells (Fig. 2h). Elicitation was crucial for activating module 1, consisting of the early Taxol pathway, because it was not expressed in the naive young and mature Taxus tissues we profiled (Fig. 2h). Indeed, metabolomic analysis suggests that elicitors such as chitosan and methyl jasmonate lead to increased accumulation of an early pathway intermediate in Taxus needles (Supplementary Fig. 2), in line with previous reports that these are elicitors of gymnosperm stress responses30 and taxane biosynthesis31. An unfiltered analysis of the top genes in Taxol module 1 revealed all the genes that we had previously used to reconstitute the early Taxol pathway18, including TDS, T5αH, taxadien-5α-ol-O-acetyltransferase (TAT), taxane 10β-hydroxylase (T10βH) and DBAT (Fig. 2i). We therefore began our search for the missing components of the Taxol pathway by examining the uncharacterized genes in this highly co-expressed gene module.

Discovery of FoTO1

The first oxidation in the Taxol pathway, by T5αH, is highly inefficient and yields a large set of closely related products, most of which do not seem to be productive pathway intermediates. Numerous studies, including our own, have investigated and attempted to optimize T5αH across various contexts16,17,18,19,20. Previously, we reported18 the de novo reconstitution of the six early steps in Taxol biosynthesis, consisting of TDS, T5αH, TAT, T10βΗ, DBAT and taxane 13α-hydroxylase (T13αΗ), in N. benthamiana; this resulted in the production of 5α,10β-diacetoxytaxadien-13α-ol (4) (Fig. 3a). This required extensive tuning of the expression of T5αΗ, which increased yields of the on-pathway intermediate taxadien-5α-ol (2) (ref. 32) relative to cyclotaxane (OCT; 2’a), iso-OCT (2’b) and other rearranged side products (2’c). Despite our optimization efforts18, these side products still accumulated as the dominant products of T5αH. Notably, OCT-derived products do not naturally accumulate to notable levels in Taxus plants.

Fig. 3: Characterization of FoTO1.
figure 3

a, Early Taxol biosynthetic pathway and the T5αH product divergence. Blue shading highlights the biosynthetic pathway towards Taxol. Diamond indicates the structure is supported by NMR. b, Rank and score of genes in Taxol module 1. c, Bar graph showing the FoTO1-induced fold change in end-products’ peak area of subpathways when transiently expressed in N. benthamiana leaves. Fold change is calculated by quantifying the GC–MS total ion chromatogram (TIC) peak area of compounds 24 and normalizing to the −FoTO1 condition. cytoTDS, cytosolic TDS. Data are mean ± s.d., n = 3 biological, independent leaf samples. Statistical analyses were performed using a two-sided, unpaired Welch’s t-test. d, GC–MS TIC of N. benthamiana leaves transiently expressing the indicated genes. e, Bar graph of total oxidized taxanes for the +T5αH and +T5αΗ+FoTO1 conditions. Data are mean ± s.d. n = 6 biological, independent leaf samples. One asterisk indicates previously characterized; two asterisks indicate characterized in this study. f, Phylogenetic tree of FoTO1 homologues identified by HMMER. The tree was produced with FastTree, rooted with red algae homologues. g, Structural model of FoTO1, generated by AlphaFold3 and aligned with the Arabidopsis thaliana orthologue with FoldSeek. h, Bar graphs showing the integrated peak area of 2 and 2’a when N- or C-terminally truncated FoTO1 is transiently expressed in N. benthamiana leaves together with TDS and T5αH. Data are mean ± s.d., n = 3 biological replicates. i, Quantification of binding between purified T5αΗ and FoTO1 or FoTO1(ΔC) using microscale thermophoresis. Data are mean ± s.d., n = 3 replicates. j, Quantification of binding between purified TDS2 and FoTO1 or FoTO1(ΔC), as in i (n = 3 replicates). N-terminal transmembrane domains of T5αΗ and TDS2 are removed for purification purposes. Data are mean ± s.d., n = 3. k, Immunoblot of the co-IP of T5αH–HA (prey) by V5–FoTO1 (bait) in N. benthamiana leaves expressing both proteins (Supplementary Fig. 7).

We hypothesized that Taxol module 1 contains unanticipated proteins that facilitate this initial oxidation and prevent the formation of side products. During our testing of approximately 77 gene candidates that were highly ranked in Taxol module 1 (Fig. 3b), we found that the addition of a single protein, a NTF2-like protein ranked 13 in module 1, increased the yield of our pathway (Supplementary Fig. 3). We hypothesized that this protein is involved in altering the product flux of the early oxidative steps. Expressing this gene, which we later named FoTO1 (facilitator of taxane oxidation), resulted in a 10–17-fold increase in yields from early Taxol pathways (Fig. 3c). To determine whether FoTO1 ameliorates side-product formation during the first oxidation, we compared gas chromatography–mass spectrometry (GC–MS) analyses of N. benthamiana leaf extracts expressing TDS and T5αΗ with and without FoTO1. Without FoTO1, T5αΗ yields mainly undesired side products, including 2’a–c and many uncharacterized compounds, and very little of the desired product, taxadien-5α-ol (2) (Fig. 3d,e). However, inclusion of FoTO1 markedly alters the product profile: side products 2’a–c are no longer produced, and taxadien-5α-ol (2) and an over-oxidized product, 4β,20-epoxy-taxadien-5α-ol (2’d, Supplementary Note 1), become the major products (Fig. 3d,e and Supplementary Fig. 4).

To gain insight into how FoTO1 modulates T5αH product formation, we further analysed its sequence and phylogeny. FoTO1 is a 195-amino-acid protein in the NTF2-like family, which has not previously been implicated in plant metabolism. Although some fungal NTF2 proteins have evolved catalytic activity33,34, plant and animal NTF2 proteins have been studied mainly for their capacity to mediate protein transport to the nucleus35. Using HMMER36, we identified 1,957 homologues in plant genomic databases. We found that FoTO1 homologues are widespread across Viridiplantae, and generally present as a single copy per genome, suggesting a conserved function (Fig. 3f). Gymnosperms, however, contain multiple paralogues that derive from both an ancient duplication (Fig. 3f) and recent duplications, including in the genus Taxus (Supplementary Fig. 5). These duplications might have allowed one paralogue to evolve alternative functions and contribute to taxane biosynthesis. Supporting this functional divergence, we found that neither the FoTO1 paralogue from T. media nor the homologue from Arabidopsis thaliana could produce the same metabolomic change for early Taxol pathway reconstitution as FoTO1 (Fig. 3g and Extended Data Fig. 2).

We anticipated that FoTO1 could be operating through several mechanisms, including (i) scaffolding or allosteric support of Taxol enzymes, (ii) transport or positioning of taxane intermediates or (iii) enzymatic resolution of an unstable intermediate. FoTO1 does not affect the production of taxadiene (1) and iso-taxadiene by TDS1 or TDS2 (Extended Data Fig. 3a), suggesting that it has no enzymatic activity on taxadiene (1). To further test for potential active sites that could be important for catalysis or substrate binding, we generated mutations of residues within the protein’s cavity on the basis of the AlphaFold-predicted structure of FoTO1 (Fig. 3g and Extended Data Fig. 3b–d). Although none of the tested amino acid substitutions caused FoTO1 to lose its capacity to suppress oxidation side products (Extended Data Fig. 3b–d), deletion of the C-terminal α-helix, but not the N-terminal helix, eliminated FoTO1’s in planta phenotype (Fig. 3g,h).

To determine whether the function of FoTO1 involved a direct interaction with Taxol enzymes and perhaps a scaffolding role, we purified FoTO1 (fused to N-terminal mTurquoise2) and the soluble portions of TDS2 and T5αΗ. Using microscale thermophoresis, we found that FoTO1 binds to both T5αΗ and TDS2 with high nanomolar dissociation constant (Kd) values (Fig. 3i,j). Deletion of the C-terminal helix, which disrupts FoTO1’s in planta metabolic phenotype (Fig. 3g), also eliminated binding affinity with both proteins (Fig. 3i,j), suggesting that this region is involved in protein–protein interactions. To determine whether this physical interaction was physiologically relevant, we conducted co-immunoprecipitation (co-IP) of epitope-tagged proteins, V5–FoTO1 and T5αΗ–HA, co-expressed in N. benthamiana leaves (Fig. 3k). Immunoprecipitation of V5–FoTO1 using a V5 antibody was able to capture T5αΗ–HA (Fig. 3k and Supplementary Figs. 6 and 7). Furthermore, the T5αΗ–HA co-IP signal increased fourfold when leaf lysates were incubated with 45 μM taxadiene (1), but not when they were incubated with a mock terpene (caryophyllene) (Fig. 3k and Supplementary Figs. 6 and 7).

Together, these data support a mechanism involving a direct interaction between FoTO1, T5αΗ and possibly TDS. The influence of taxadiene (1) on the efficiency of co-IP of T5αΗ and FoTO1 could indicate that the metabolite has a direct role in this interaction. Although TDS, T5αΗ and FoTO1 localize to different subcellular regions—plastid, endoplasmic reticulum (ER) membrane and cytoplasm, respectively (Supplementary Fig. 8)—direct contacts between the outer lamina of these membranes are known sites of lipid trafficking37 and the biosynthesis of other diterpenoids such as gibberellin38.

Independently evolved T9αHs for different pathways

The presence of FoTO1 and all biosynthetic enzymes for 4 in gene module 1 suggested a coordinated regulation of early Taxol biosynthetic enzymes, and an opportunity to discover missing Taxol enzymes from this module (Fig. 3b). The next oxidation after C-5α, C-10β and C-13α hydroxylation is proposed to be the C-9α hydroxylation. In N. benthamiana, we screened 2-ODD and P450 oxidases within the top 50 genes of module 1 by co-expressing candidate genes in batches together with the upstream pathway to 4, a taxane with three oxidation and two acylation modifications (3O2A; hereafter, the nomenclature nOmA describes the collection of taxane isomers bearing n oxy groups, for example, hydroxylation and epoxidation, and m acylations, for example, acetylation and benzoylation, on the taxadiene scaffold). We found that the 27th gene in module 1, a P450 in the CYP750C family (T9αH-750C), resulted in depletion of 3O2A (4) and concurrent production of a mass corresponding to a new 4O3A (5) intermediate with an additional oxidation and an acetylation on 4 (Fig. 4b). This additional acetylation was unexpected given that the acetyltransferases expressed, TAT and DBAT, were known mainly for acetylation activity on the C-5 and C-10 hydroxyls, respectively. To confirm the function of T9αH-750C and to provide support for the structural assignment of 5, we co-expressed the biosynthetic genes to 5 with TAX19, the previously characterized C-13α-O-acetyltransferase39, and isolated the resulting acetylated 4O4A products from N. benthamiana leaves. Through nuclear magnetic resonance (NMR) analysis and tandem mass spectrometry (MS/MS) comparison with a standard, we structurally characterized the products as taxusin (6) and its isomer 13β-taxusin (6’) (Fig. 4b, Supplementary Table 3 and Extended Data Fig. 4a,b). This structural analysis supports the role of T9αH-750C as a taxane 9α-hydroxylase (T9αH) and TAT as a bi-functional C-5α/9α-O-acetyltransferase (Extended Data Fig. 4c and Supplementary Note 2).

Fig. 4: Discovery and characterization of T9αH, T7AT and two T1βΗs.
figure 4

a, Heat map showing the ranks of new T9αH and T1βHs and other Taxol biosynthetic genes in the three modules. T9αH-725A is the T9αH independently reported by other groups12,13,14. Black, previously known Taxol genes. Orange, new Taxol pathway genes identified in this figure. Red, Taxus enzymes proposed to act in other taxane pathways. b, Proposed biosynthetic pathway from 4, the latest intermediate we reported recently18, to taxusin (6), and the corresponding extracted ion chromatograms (EICs) of products 46 when the indicated sets of genes were expressed in N. benthamiana leaves. Blue shading highlights the biosynthetic pathway towards Taxol, which only involves 4 and 5. c, Proposed biosynthetic pathway from 5 to baccatin VI (11-Bz), and the corresponding EICs of intermediates when the indicated sets of genes were expressed in N. benthamiana leaves. Blue shading highlights the biosynthetic pathway towards Taxol. Structures of 5, 7-Ac, 7-Bz, 8-Ac, 8-Bz, 9-Ac, 9-Bz, 10-Ac, 10-Bz and 11-Ac are proposed on the basis of the functions of enzymes previously characterized (TAT, TAX19, T2αH, TBT, T7βH and TOT) and described in this study (Τ9αH-750C, T7AT and T1βH). TAX19 is used to generate known 13-O-acetylated products, including taxusin (6) and baccatin VI (11-Bz), for structural analysis. Diamond indicates the structure is supported by NMR. d, EICs of expected products when the pathway to 6O4A is expressed with TOT, T7AT or both in N. benthamiana. In the absence of T7AT, no notable 6O4A depletion or product formation by TOT is observed. e, MS/MS fragmentation patterns of heterologously produced baccatin VI (11-Bz) in N. benthamiana compared with that of 11-Bz standard. MS/MS fragmentations were generated using [M+Na]+ (m/z = 737.2788) as the precursor ion and fragmented with a collision energy of 30 eV.

Several groups have independently reported that a CYP725A P450, with less than 20% identity at the protein level to T9αH-750C, is the T9αH in Taxol biosynthesis (referred to here as T9αΗ-725A, for distinction). This enzyme was identified by transcriptome-informed screening of CYP725A genes, a Taxus-specific enzyme family that includes all previously known Taxol P450s12,13,14. Although both T9αH-750C and T9αΗ-725A can act as T9αH to produce taxusin (6) when TAX19 is present, only T9αΗ-750C can deplete 3O2A to yield 4O3A (Fig. 4b, Supplementary Fig. 9 and Extended Data Fig. 4a,b). By contrast, T9αΗ-725A is unable to produce 4O3A and seems to require a substrate with a C-13α acetoxy group. Extremely low sequence conservation between these enzymes suggests the independent evolution of T9αH activities in two distinct P450 families (CYP725A and CYP750C), and their different substrate specificities suggest that T9αΗ-725A is involved in the biosynthesis of C-13α-acetoxyl taxanes, whereas T9αH-750C is involved in the biosynthesis of C-13α-hydroxyl taxanes. Because Taxol and its precursor baccatin III (16) lack the C-13α-acetoxy moiety that is required for T9αH-725A, we used T9αH-750C, instead of T9αH-725A, as the 9α-hydroxylase for all subsequent taxane pathway reconstitutions (Fig. 4c and Supplementary Fig. 10).

Discovery of T7AT and two T1βHs

The missing C-1β hydroxylation in Taxol is proposed to occur after functionalization by several known mid-pathway enzymes: taxoid 2α-hydroxylase (T2αH) and 7β-hydroxylase (T7βH)40,41, taxane 2α-O-benzoyltransferase (TBT)42 and taxane oxetanase (TOT)11,12,13. When we co-expressed T2αΗ and TBT with the pathway to 5, we detected a mass feature corresponding to the expected hydroxylated and benzoylated product (7-Bz) (Fig. 4c). Notably, we also detected a mass feature corresponding to the acetylated product (Fig. 4c), suggesting that TBT also catalyses acetylation. Testing of TBT with 1-hydroxybaccatin I and baccatin VI showed that TBT mediates the interconversion between C-2α-benzoyl and acetyl groups (Supplementary Fig. 11), consistent with a previous report43, and this might explain the prevalence of C-2α-acetoxy taxanes in nature7,21.

Expression of T7βH with the 5O4A gene set yielded the expected benzoylated and acetylated 6O4A products, but subsequent addition of TOT did not convert these 6O4A compounds to a hepta-oxidized (7O) product (Fig. 4d). Given that many of the highly oxygenated taxanes with an oxetane or epoxide moiety originating from TOT activity are also C-7β-Ο-acetylated21 (for example, baccatin I, baccatin IV and baccatin VI; Supplementary Fig. 12), we reasoned that the installation of a C-7β-acetoxy group might be a prerequisite for TOT function. Therefore, we screened acyltransferase candidates from the top 30 genes of module 2 and found that the 15th gene in module 2, which we named taxane C-7β-O-acyltransferase (T7AT), was capable of various C-7β-O-acylations, including acetylation (Supplementary Fig. 13). Expression of T7AT with the upstream 6O4A pathway resulted in the acetylated and benzoylated 6O5A products, 8-Ac and 8-Bz, respectively (Fig. 4c). In contrast to 6O4A products (7-Ac and 7-Bz), these 6O5A products are depleted after the addition of TOT, yielding dominant mass features that correspond to the hepta-oxidized products 9-Ac and 9-Bz (Fig. 4c,d). These major peaks (9-Ac and 9-Bz) are likely to correspond to the non-interchangeable epoxide and oxetane products generated by TOT (refs. 11,12,13). T7AT was independently reported in a recent publication, but its importance for TOT function has not been described13.

After reconstituting a pathway to 9-Ac and 9-Bz (7O5A), we screened 37 oxidases (2-ODDs and P450s) of module 2 to identify the missing C-1β hydroxylase. This revealed two 2-ODDs (2-ODD184 and 2-ODD686, protein sequence identity 72%), the 20th and 24th genes of module 2, that yielded multiple mono-oxidized products when expressed with various upstream pathways (Fig. 4c, Supplementary Fig. 14 and Extended Data Fig. 5a–d). Expression of the 7O5A pathway and TAX19 (C-13α-O-acetyltransferase) with either 2-ODD184 and 2-ODD686 resulted in the production of baccatin VI (11-Bz), as confirmed by MS/MS comparison with the standard (Fig. 4c,e) as well as several 1β-hydroxybaccatin I isomers (11-Ac peaks; Supplementary Fig. 15). This result suggests that either 2-ODD can function as the missing taxane 1β-hydroxylase (T1βH), because all other functional groups in baccatin VI (11-Bz) can be explained by other enzymes included in the reconstitution. Of note, both 2-ODDs resulted in two major products with the 2α-O-acetylated pathways but only one major product with the 2α-O-benzoylated pathways (Fig. 4c and Extended Data Fig. 5a–d). Among the two major products in the 2α-O-acetylated pathways, one is presumably the 1β-hydroxylated product, but the other remains unidentified. Therefore, we isolated the two products from 2-ODD184 co-expressed with the taxusin (6) pathway in N. benthamiana and structurally characterized them as 1β-hydroxytaxusin (6-Ο1) and its structural isomer, 15-hydroxy-11(15→1)abeo-taxusin (6-Ο2) (Supplementary Tables 46 and Extended Data Fig. 5). We propose that the non-classical 11(15→1)abeo-taxane scaffold arises from a radical rearrangement associated with 2-ODD-mediated 1β-hydroxylation (Extended Data Fig. 5e). These data support a role for these 2-ODDs in the C-1β hydroxylation, and thus we hereafter refer to them as T1βH-184 and T1βH-686.

Together, these results reveal the discovery of an independently evolved T9αΗ, a T7AT important for the function of TOT, and two T1βHs, which allow us to reconstitute the biosynthesis of highly oxygenated taxanes (10-Αc and 10-Bz) and their C-13α-acetoxy counterparts (11-Αc and 11-Bz). T1βH-686 results in significantly higher levels of the 2α-O-benzoylated product 10-Bz than does T1βH-184, which is desirable for Taxol production; thus, all subsequent pathway reconstitution was done with T1βH-686.

Deacetylases and T9ox enable baccatin III biosynthesis

The structure of baccatin III (16), the direct precursor to Taxol before side-chain installation, suggests that it requires nine oxidations (seven hydroxylations, one oxetane formation and one ketone formation) and three acylations (two acetylations and one benzoylation) on the taxadiene (1) scaffold. Our latest intermediate 10-Bz has two additional acetylations that are not found in baccatin III (16), and is lacking a C-9 ketone oxidation (Fig. 5a). The two additional acetylations come from (i) TAT that promiscuously O-acetylates the C-9-hydroxyl (Supplementary Note 2), and (ii) T7AT that O-acetylates the C-7-hydroxyl, which is crucial for the function of TOT (Fig. 4c,d). Although the role of these additional acetylations is unknown, we considered that they might serve as transient protecting groups during the biosynthesis—a strategy used in the biosynthesis of other plant terpenoids44—and might subsequently be removed by downstream deacetylases. To test for relevant deacetylases and identify the C-9 oxidase, we fed substrate baccatin VI (11-Bz) or 9-dihydro-13-acetylbaccatin III (9DHAB, 13), which structurally resemble our latest intermediate 10-Bz, to N. benthamiana leaves expressing top candidate genes to screen for desired activities.

Fig. 5: Total biosynthesis of baccatin III (16) and 3’-N-debenzoyl-2’-deoxypaclitaxel (17) in N. benthamiana.
figure 5

a, Simplified representation of biosynthetic transformations (acetylation and oxidation) from taxadiene (1) to baccatin III (16). b, Proposed biosynthetic pathway from 10-Bz to Taxol (18). c, EICs of intermediates when the indicated sets of genes were expressed in N. benthamiana leaves. d, MS/MS of heterologously produced 16 compared with that of 16 standard. e, Heat map showing ranks of PCL candidates in Taxol gene modules. f, EICs of 16 and 17 from feeding 16 to N. benthamiana leaves expressing PAM, PCL and BAPT or from expressing the complete gene set. g, Ranks of new Taxol biosynthetic genes discovered in this paper by PCC to TDS using either bulk (n = 79 samples) or mpXsn (n = 17,143 cells across 3 experiments) data. Orange indicates anticipated discovery; pink indicates unexpected discovery. h, Heat map showing ranks of updated Taxol biosynthetic genes in the modules. i, Baccatin III (16) yields in μg per g dried weight (DW), quantified with standards (Methods), of our 17-gene pathway with each single-gene dropout tested in N. benthamiana. Yields from replicating published gene sets12,14 and from T. media needles are shown for comparison. Data are mean ± s.d., n = 3 biological replicates (Supplementary Data 2). j, EIC showing the proposed taxadien-5α,13α-diol produced by T13αH and facilitated by FoTO1 (Extended Data Fig. 8). Over-oxidized derivative and TAX19-acetylated product are confirmed by NMR (diamond). k, Phylogenetic tree of Taxus P450s. See Extended Data Fig. 9. l, Single-cell expression of T5αΗ and T13αH. m, Taxol biosynthesis (blue shading) and the reconstituted 13α-acetoxy taxane biosynthesis. Structures assigned on the basis of high-resolution mass spectrometry (HRMS) (HRMS-predicted chemical formula), NMR (NMR analysis followed by purification) and/or HRMS/MS (MS/MS spectra comparison with authentic standards). A detailed biosynthetic pathway is shown in Extended Data Fig. 10.

Screening of 27 α/β-hydrolase candidates from modules 2 and 3 revealed two deacetylases (DeAc898 and DeAc1023) capable of the stepwise removal of two acetyls from baccatin VI (11-Bz) to yield a product matching a 9DHAB (13) standard (Supplementary Fig. 16), which lacks the C-7 and C-9 O-acetyl groups, as supported by MS/MS analysis (Supplementary Fig. 17a and Extended Data Fig. 6a). Expressing the full baccatin VI (11-Bz) pathway with the two deacetylases also resulted in the production of 9DHAB (Extended Data Fig. 6b). Screening of oxidase candidates identified a putative taxane C-9-oxidase (T9ox) in the 2-ODD family capable of oxidizing 9DHAB with the loss of two protons, presumably through ketone formation at C-9 (Supplementary Fig. 18). This gene has been independently reported by another group14. When we combined T9ox, DeAc898 and DeAc1023 with our 14-gene pathway to 10-Bz, an abundant and notable new taxane product formed (Fig. 5b,c, Supplementary Fig. 17b and Extended Data Fig. 7a), which was subsequently isolated and confirmed as baccatin III (16), on the basis of NMR and MS/MS analysis compared with a standard (Fig. 5d and Extended Data Fig. 7b). Stepwise assembly of the final steps of the pathway revealed that DeAc898 is a prerequisite for T9ox, suggesting that this enzyme hydrolyses the C-9 acetyl group (Fig. 5c). Consequently, we renamed DeAc898 and DeAc1023 as taxane 9α-O-deacetylase (T9dA) and taxane 7β-O-deacetylase (T7dA), respectively.

Discovery of PCL for side chain biosynthesis

The side-chain installation and maturation of Taxol from baccatin III (16) is proposed to involve phenylalanine aminomutase (PAM), β-phenylalanine-CoA ligase (PCL), baccatin III:3-amino-3-phenylpropanoyl transferase (BAPT), taxane 2’α-hydroxylase (T2’αΗ) and 3’-N-debenzoyl-2’-deoxypaclitaxel-N-benzoyl transferase (DBTNBT)22,45 (Fig. 5b and Supplementary Table 1). Although all five enzymes have been separately reported, we failed to observe Taxol when attempting to reconstitute complete biosynthesis by co-expressing these reported enzymes with our pathway in Nicotiana. Specifically, despite previous reports14,22,46 that two separate Taxus acyl-activating enzymes (AAEs) can act as PCL, neither resulted in the production of the expected 3’-N-debenzoyl-2’-deoxypaclitaxel (17) when expressed with PAM and BAPT with fed-in baccatin III (16) in our system. To identify the missing PCL, we examined the top AAEs in the Taxol expression modules (Fig. 5e), which revealed a single prominent candidate: AAE-867.5. Co-expression of AAE-867.5, PAM and BAPT with fed-in baccatin III (16) or our baccatin III pathway in planta resulted in a mass corresponding to 3’-N-debenzoyl-2’-deoxypaclitaxel (17) (Fig. 5f), indicating that AAE-867.5 is the functional PCL. This constitutes the first (to our knowledge) de novo biosynthetic production of the late-stage paclitaxel precursor 3’-N-debenzoyl-2’-deoxypaclitaxel (17).

Similarly, although T2’αΗ and DBTNBT have been previously reported45,47, we were not able to produce Taxol (18) when we added them to our pathway to 17. However, when DBTNBT was expressed with PCL, PAM and DBAT with fed-in baccatin III (16) we detected a mass corresponding to 2’-deoxypaclitaxel at very low levels (Supplementary Fig. 19). This result suggests that DBTNBT might be functional but requires upstream 2’α-hydroxylation by T2’αH to react efficiently, consistent with previous specificity measurements48.

Expression and essentiality of Taxol genes

Of the eight Taxol genes discovered in this work (Supplementary Tables 1 and 11), few were clear candidates for the pathway when performing co-expression analysis using bulk RNA-seq datasets (Fig. 5g). Correlation with TDS ranked these genes as 1,000th–10,000th priority when using bulk RNA-seq data, but within the top tens to hundreds when using mpXsn data (Fig. 5g), showing that our mpXsn strategy was crucial for efficient gene discovery. Furthermore, the transcriptional modules of these genes provide insights into the organization of the biosynthetic pathway. When ordered by presumed biosynthetic order, the early pathway forms an especially discrete cluster in Taxol module 1 (Fig. 5h), suggesting that the pathway up to 4O3A (5) is controlled by separate transcriptional regulation from the later pathway. The latter pathway is not as clustered by order, suggesting that either our presumed biosynthetic order does not reflect the real order, or the pathway is regulated at various post-transcriptional stages.

For our 17-gene reconstituted baccatin III (16) pathway, dropping most of the genes completely abolishes the production of baccatin III (16), with the exception of FoTO1, T5αΗ, DBAT and T9ox (Fig. 5i and Supplementary Figs. 20 and 21). Of these four genes, only T5αΗ can be left out without a substantial loss of baccatin III yield. Moreover, our 17-gene pathway yields levels of baccatin III (16) that are considerably higher than those obtained with two previously published gene sets12,14 (Fig. 5i and Supplementary Figs. 20 and 21), both of which did not produce baccatin III above the limit of detection in our system (Supplementary Fig. 20). The low yield of these enzyme sets is likely to be due to the absence of several biosynthetic genes identified here, the functions of which might be partially compensated for by multi-functional Taxus enzymes15 or endogenous host enzymes in other systems. Finally, exchanging the T9αH-750C discovered in this work for a recently reported12,13,14 alternative T9αH-725A also yielded no detectable baccatin III (16) in our system (Supplementary Fig. 21 and Extended Data Fig. 7c), as would be anticipated because of T9αH-750C’s specificity requirement of a C-13α acetoxy that is absent from Taxol and our pathway (Fig. 4b).

Notably, our dropout experiment revealed that T5αH is not essential in our reconstituted 17-gene baccatin III (16) biosynthesis (Fig. 5i and Supplementary Figs. 20 and 21). Despite the widely accepted model that T13αΗ is the second oxidase after T5αH, we found that T13αH can directly oxidize taxadiene (1), producing multiple oxidized products, some of which are the same as those formed by T5αH, including OCT (2’a), iso-OCT (2’b) and 2’c (Extended Data Fig. 8a). This prompted us to further investigate whether T13αH can compensate for T5αΗ. Similar to its effect on the product profile of T5αH, FoTO1 also streamlines T13αΗ’s products, eliminating 2’a–c, and selectively boosting the formation of taxadien-5α,13α-diol and its derivatives (Fig. 5j and Extended Data Fig. 8a–c). We structurally characterized an over-oxidized derivative, 4α,20-epoxy-5α-hydroxy-taxadien-13-one (Supplementary Note 3 and Supplementary Tables 8 and 9), and a TAX19 derivative, 5α,13α-diacetoxy-taxadiene (Extended Data Fig. 8d and Supplementary Table 10) by NMR, confirming that T13αΗ can compensate for T5αΗ by catalysing both 5α- and 13α-hydroxylation (Fig. 5j), consistent with the previously reported multifunctionality of T13αΗ (converting taxadiene (1) to taxadien-5α,10β,13α-triol)12. This exemplifies a repeated observation of enzyme functional overlap, including pairs of differentially expressed C-5-hydroxylases (T5αH and T13αΗ; Fig. 5k,l), TDSs, T9αΗs, T1βHs and C-5-O-acetyltransferases, during our dissection of the Taxol biosynthetic pathway (Extended Data Fig. 9).

Discussion

In this investigation, we have identified eight new genes in the Taxol biosynthetic pathway, and used them to build 17-gene and 20-gene pathways for the de novo biosynthesis of baccatin III (16) and 3’-N-debenzoyl-2’-deoxypaclitaxel (17), respectively (Fig. 5m and Extended Data Fig. 10), in N. benthamiana. This updates our long-held model of Taxol biosynthesis: two additional acetylations seem to be necessary for the functions of intermediate oxidases, and downstream deacetylations by two deacetylases furnish the baccatin III (16) end-product (Extended Data Fig. 10). Future engineering or discovery of the final oxidase, T2’αΗ, would enable de novo total biosynthesis of Taxol. Two of the new enzymes, T7AT and T9ox, have recently been independently reported by other groups13,14 (Supplementary Table 1). Without optimization, our reconstituted 17-gene pathway yields 10–30 μg g−1 baccatin III (16) in N. benthamiana leaves, equivalent to its natural abundance in T. media needles (Extended Data Fig. 7d). Because baccatin III (16) extracted from Taxus is the main precursor for industrial semi-synthesis, our work represents a major step towards the sustainable production of Taxol and other taxane-based therapeutics.

Our discovery of FoTO1 and its interaction with T5αH challenges a long-standing assumption that T5αH is the sole actor in the first oxidative steps of the Taxol pathway. Although most P450s require interaction with a cytochrome P450 reductase partner, few are known to associate with scaffold proteins. In Arabidopsis, membrane steroid-binding proteins have been identified as the scaffolds for three lignin biosynthetic P450s on the ER membrane49. More recently, a cellulose synthase-like protein, GAME15, was discovered to be an ER-localized scaffold for two P450s and one 2-ODD in the biosynthesis of steroids in Solanaceae50,51. Together with our work, this hints at a broader role for scaffolding proteins in plant secondary metabolism (Supplementary Note 4).

The identities of the new Taxol genes shed light on why they remained unknown for so long. Pathway reconstitution required unanticipated gene families (for example, NTF2-like proteins, 2-ODDs and deacetylases) and functionalizations (for example, acetylation and deacetylation). Moreover, ‘red herring’ enzymes further complicated pathway dissection, including: (i) enzymes that diverged biosynthesis towards other classes of taxanes (for example, abeo-taxanes and 13α-acetoxy taxanes; Extended Data Fig. 10) and (ii) homologues with relevant activity, but with substrate specificities incompatible with our Taxol pathway (for example, T9αHs and T1βHs). These challenges, exacerbated by the sheer quantity of specialized metabolism enzymes in Taxus, were overcome by developing a scalable transcriptomic strategy which prioritized candidates with improved specificity (Figs. 1f and 5g). Our mpXsn strategy uses scalable profiling of cell states with differentially perturbed biosynthetic processes, enabling better discrimination between their defining gene sets. In addition to gene discovery, mpXsn data provide biological context, including links with primary metabolism (Fig. 2f) and the partitioning of Taxol enzymes into modules that seem to be separately regulated.

Beyond Taxol biosynthesis, mpXsn will be useful for studying gene sets of interest in other non-model organisms. In mammalian systems, single-cell techniques with parallelized genetic25 or chemical26,27 perturbation experiments have enabled the de-orphaning of genes and dissection of gene networks. However, most organisms and biological systems lack genetic interrogation tools, and thus researchers rely heavily on observational experiments, such as transcriptomics and other ‘omics’. Eukaryotes, especially, pose major challenges for functional genomics and gene-guided discovery, because they generally lack the comprehensive gene clusters that are found in prokaryotes. The advent of methods such as mpXsn, which affordably capture precise gene covariance across hundreds of transcriptional states, might help to overcome this long-standing challenge in functional genomics.

Methods

Chemical and biological materials

Chemical standards were purchased from the following vendors (with catalogue number listed): taxusin (TargetMol; TN6763), 1-hydroxybaccatin I (LKT Labs; T0092), baccatin VI (Santa Cruz Biotechnology; sc-503244), 10-deacetylbaccatin III (Sigma-Aldrich; D3676), baccatin III (MedChemExpress; HY-N6985) and 9-dihydro-13-acetylbaccatin III (TargetMol; T5132). Taxadien-5α-ol was synthesized as previously described18. Taxus media var. hicksii was obtained from FastGrowingTrees.

Tissue preparation and single-nucleus sequencing

The cells of Taxus species, like those of many plants, are often two to three times larger than the 35-µm diameter limit for the standard 10x Genomics Chromium single-cell library devices. Consequently, single-cell isolation approaches (such as protoplasting) risked introducing a severe cell-type bias, and we instead adapted previously described nuclei isolation methods52 into a conifer-compatible snRNA-seq protocol. Taxus media var. hicksii aerial tissues (needles, stems and bud scales) were manually disrupted by razor blade and detergent treatment, followed by DNA staining, fluorescence-activated cell sorting (FACS) purification and library synthesis in the 10x Chromium platform. Nuclei extraction buffer (NIB) consisted of 5 mM MgCl2, 10 mM HEPES pH 7.6, 0.8 M sucrose, 0.1% Triton X-100 and (for density matching to prevent nuclei settling during flow sorting) 1% dextran T40 and 2% Ficoll. On the day of use, NIB was supplemented with 1 mM dithiothreitol. All nuclei-extraction steps were performed at 4 °C and wide-bore pipette tips were used when handling nuclei. Steps between tissue collection and loading into the Chromium device were completed within 90 min to avoid RNA loss. To isolate nuclei, approximately 1 g of T. media tissue was removed from the plant and immediately placed in a Petri dish with 10 ml NIB. Tissue was chopped by hand at around 200 rpm with a fresh razor blade for 5 min until most of the large tissue was broken down, and was then gently rocked at 4 °C for 15 min. To remove large debris, disrupted tissue was then passed through a pre-wet 100-μm cell strainer stacked on top of a 40-μm cell strainer. Nuclei were gently pelleted at 300g at 4 °C for 5 min and resuspended in 1 ml NIB with 5 ng μl−1 4,6-diamidino-2-phenylindole (DAPI, Thermo Fisher Scientific) and 5 ng μl−1 propidium iodide. Using a Sony SH800 cell sorter with a 70-μm chip, 140,000–200,000 nuclei were sorted into a tube containing 1 ml PBS+ (PBS, 0.1% bovine serum albumin and 20 U ml−1 Invitrogen ribonuclease inhibitor). The gating strategy is shown in Supplementary Fig. 22. Nuclei were centrifuged at 300g at 4 °C for 5 min, then gently resuspended in 40 μl PBS+. Nuclei were immediately loaded onto a 10x Genomics Chromium controller and libraries were generated using v3 chemistry. Libraries were sequenced on an Illumina NextSeq 3000.

Multiplexed tissue elicitation

For the multiplexed elicitation experiment, Taxus needles were subjected to perturbation in deep-well 96-well plates with 200 μl MS medium (7.5 g l−1 Murashige and Skoog macronutrients (Fisher), 3 g l−1 sucrose, pH 5.7) supplemented with elicitor. Two needles (biological replicates), each from two developmental stages (young and mature), were treated with each elicitation condition (17 conditions listed in Supplementary Table 2) for each time point (1, 2, 3 and 4 days), resulting in 272 tissue samples (2 replicates of 136 perturbations). To minimize contamination, needles were washed thoroughly in sterile water before moving to MS plates, which were sealed with breathable rayon film (VWR) and placed under 18-h light cycles. Tissue elicitation was started at staggered times so that all tissues could be collected simultaneously. To extract nuclei from elicited tissues, all tissues were combined in a wire mesh, washed with water and subjected to the above nuclei-extraction protocol.

Analysis of single-cell data

Reads were cleaned with Trimmomatic53 and mapped to the genomes of T. chinensis5 with STARsolo (v.2.7.10b)54 (STAR…–runThreadN 32–alignIntronMax 10000–soloUMIlen 12–soloCellFilter EmptyDrops_CR–soloFeatures GeneFull–soloMultiMappers EM–soloType CB_UMI_Simple). Ambient RNA was removed with CellBender (v.0.3.0)55. Using the doubletdetection (v.4.2) library56, doublets were removed, as well as cells with outlier numbers of reads or in which most reads were the most expressed genes (pct_counts_in_top_20_genes < 25). Genes were removed from analysis if expressed in fewer than 50 cells. For integrated UMAP plots, scVI was used to integrate cells from multiple single-cell experiments57. Scanpy (v.1.10.1)58 was used for processing and plotting post-filtered nuclear transcriptomes. For co-expression analysis and gene–gene correlation calculations, scVI-normalized57 transcriptomes (8,039 elicited transcriptomes, 3,027 naive transcriptomes from young tissues and 6,077 naive transcriptomes from mature tissues) were clustered into 2,901 cell states (around 10 cells per state) by Leiden clustering58, and then raw reads from each cluster were pooled to yield pseudobulk transcriptomes. These pseudobulk transcriptomes were used to calculate gene–gene correlations. For module analysis, raw reads were analysed by a cNMF package28 run with default parameters, except ‘total modules’) to yield gene modules and their usage across cells. Factorization approximates the observed dataset as the product of two smaller, meaningful matrices: (i) a gene–module matrix (a weight value for each gene in each module); and (ii) a cell–module matrix (expression values of each module in each cell) (Fig. 2c). The weight values of the gene–module matrix can be used as scores that identify the genes that dominate each module; top-scoring genes from the same module have coordinated expression patterns and are likely to be part of the same molecular processes. This approach adapts to the rich but noisy data inherent in single-cell analysis, and reveals patterns of coordinated gene expression that might not be apparent from linear correlation analysis. For example, it allows for genes to be in multiple, overlapping modules, which is likely to better represent how genes in a highly branched metabolism may be expressed. The ‘total modules’ parameter was scanned from k = 50 to k = 400 to determine the sensitivity of the results on this parameter (Supplementary Fig. 1).

Bulk RNA-seq analysis

Raw fastq files from six previous studies5,59,60,61,62,63 were downloaded from NCBI (PRJNA493167, PRJNA251671, PRJNA733140, PRJNA427840, PRJNA497542, PRJNA499080 and PRJNA864083), cleaned with Trimmomatic53 and aligned to the T. chinensis genome5 (STAR map64). Gene–gene correlation was calculated with numpy. Mutual rank (mr), used to calculate the gene linkage maps (Fig. 1d), is defined as:

$${{\rm{mr}}}_{ij}=\sqrt{{{\rm{rank}}}_{ij}\times {{\rm{rank}}}_{ji}},$$

where rankij indicates the Pearson correlation rank of gene i to gene j.

Cloning of Taxus genes

The cloning of cytosolic diterpenoid boost genes (tHMGR and GGPPS), cytosolic TDS1 and TDS2, T5αH, TAT, T10βΗ, DBAT, T13αΗ and TAX19 genes has been described previously18,65. Candidate genes were amplified from T. media gDNA or cDNA (generated with SuperScript IV, Thermo Fisher Scientific) by PCR (PrimeStar, Takara Bio R045B, primers in Supplementary Table 13), and the PCR products were ligated with AgeI- and XhoI- (New England Biolabs) linearized pEAQ-HT vector66 using HiFi DNA assembly mix (New England Biolabs). Gene annotations used for cloning were taken from the T. chinensis genome5 by default, but were BLAST-searched against the T. media genome (NCBI PRJNA1136025) to determine whether alternative gene models were available. Constructs were transformed into 10-beta competent E. coli cells (New England Biolabs). Plasmid DNA was isolated using the QIAprep Spin Miniprep kit (QIAGEN) and the sequence was verified by whole-plasmid sequencing (Plasmidsaurus).

Transient expression of Taxus genes in N. benthamiana by Agrobacterium-mediated infiltration

pEAQ-HT plasmids containing the Taxus gene were transformed into Agrobacterium tumefaciens (strain GV3101) cells using the freeze–thaw method. Transformed cells were grown on bacteria screening medium 523-agar (Phytotech Labs) plates containing kanamycin and gentamicin (50 μg ml−1 and 30 μg ml−1, respectively; same for the 523 medium below), at 30 °C for two days. Single colonies were then picked and grown overnight at 30 °C in 523-kanamycin–gentamicin liquid medium. The overnight cultures were used to make dimethyl sulfoxide (DMSO) stocks (7% DMSO) for long-term storage in the −80 °C fridge. For routine N. benthamiana infiltration experiments, individual Agrobacterium DMSO stocks were streaked out on 523-agar containing kanamycin and gentamicin and grown for around one to two days at 30 °C. Patches of cells were scraped off from individual plates using 10-μl inoculation loops and resuspended in around 1–2 ml of Agrobacterium induction buffer (10 mM MES pH 5.6, 10 mM MgCl2 and 150 μM acetosyringone; Acros Organics) in individual 2-ml safe-lock tubes (Eppendorf). The suspensions were briefly vortexed to homogeneity and incubated at room temperature for 2 h. The optical density at 600 nm (OD600 nm) of the individual Agrobacterium suspensions was measured, and the final infiltration solution, in which the OD600 nm was 0.2 for each strain (except for TDS, T7AT and T7dA; OD600 nm of 0.6, 0.4 and 0.1, respectively), was prepared by mixing individual strains and diluting with the induction buffer. Leaves of four-week-old N. benthamiana were infiltrated using needleless 1-ml syringes from the abaxial side. Each experiment was tested on leaf 6, 7 and 8 (numbered by counting from the bottom) of the same N. benthamiana plant, as three biological replicates.

For the reconstitution of pathways that involve TBT, the following modifications were made to the procedure above to increase the production of the desired benzoylated products: N. benthamiana plants were watered with 2 mM benzoic acid in water (buffered to pH 5.6) a day before Agrobacterium infiltration, 1 mM benzoic acid was added to the induction buffer and the pH was adjusted to 5.6 before being used for the resuspension of Agrobacterium and preparation of the final infiltration solution.

Phylogenomic analysis

FoTO1 homologues were identified by scanning the Thousand Plant Transcriptome (1KP)67, RefSeq plants and Uniprot Viridiplantae databases with jackhmmer36 (command: jackhmmer -o tempout.txt -E 1e-5 -N 4). Hits with greater than 40% sequence gaps to the original query were discarded. A phylogenetic tree was generated with the remaining protein sequences with FastTree68.

Metabolite extraction of N. benthamiana leaves

Five days after Agrobacterium infiltration, N. benthamiana leaf tissue was collected using a leaf disc cutter 1 cm in diameter and placed inside a 2-ml safe-lock tube (Eppendorf). Each biological replicate consisted of four leaf discs from the same leaf (approximately 40 mg fresh weight). The leaf discs were flash-frozen and lyophilized overnight. Analyses of the more hydrophobic metabolites (for example, compounds 16) were done by GC–MS, and analyses of the more hydrophilic metabolites (for example, compounds 418) were done by liquid chromatography–mass spectrometry (LC–MS). To extract metabolites, ethyl acetate (ACS reagent grade; J.T. Baker) or 75% acetonitrile (high-performance liquid chromatography (HPLC) grade; Fisher Chemical) in 500 μl water was added to each sample along with one 5-mm stainless steel bead for GC–MS or LC–MS analysis, respectively. The samples were homogenized in a ball mill (Retsch MM 400) at 25 Hz for 2 min. After homogenization, the samples were centrifuged at 18,200g for 10 min. For GC–MS samples, the supernatants were transferred to 50-μl glass inserts, placed in 2 ml vials and subjected to analysed by the GC–MS instrument. For LC–MS samples, the supernatants were filtered using 96-well hydrophilic PTFE filters with a pore size of 0.45 μm (Millipore) and analysed by the LC–MS instrument.

GC–MS analysis

GC–MS samples were analysed using an Agilent 7820A gas chromatography system coupled to an Agilent 5977B single quadrupole mass spectrometer. Data were collected with Agilent Enhanced MassHunter and analysed by MassHunter Qualitative Analysis B.07.00. Separation was done using an Agilent VF-5HT column (30 m × 0.25 mm × 0.1 μm) with a constant flow rate of helium of 1 ml per min. The inlet was set at 280 °C in split mode with a 10:1 split ratio. The injection volume was 1 μl. Oven conditions were as follows: start and hold at 130 °C for 2 min, ramp to 250 °C at 8 °C per min, ramp to 310 °C at 10 °C per min and hold at 310 °C for 5 min. The post-run condition was set to 320 °C for 3 min. MS data were collected with a mass range 50–550 m/z and a scan speed of 1,562 u s−1 after a 4-min solvent delay. The MSD transfer line was set to 250 °C, the MS source was set to 230 °C and the MS Quad was set to 150 °C.

LC–MS analysis

LC–MS samples were analysed on either or both of our two instruments: (1) an Agilent 1260 HPLC system coupled to an Agilent 6520 Q-TOF mass spectrometer or (2) an Agilent 1290 HPLC system coupled to an Agilent 6546 Q-TOF mass spectrometer. Typically, the 6520 system shows better sensitivity for the more hydrophobic metabolites, such as 46, whereas the 6546 system works better for the more hydrophilic, highly modified taxanes. Data were collected with Agilent MassHunter Workstation Data Acquisition and analysed by MassHunter Qualitative Analysis 10.0. Separation was done using a Gemini 5-μm NX-C18 110-Å column (2 × 100 mm; Phenomenex) with a mixture of 0.1% formic acid in water (A) and 0.1% formic acid in acetonitrile (B) at a constant flow rate of 400 μl per min at room temperature. The injection volume was 2 μl or 1 μl for the 6520 or the 6546 system, respectively. The following gradient of solvent B was used: 3% 0–1 min, 3%–50% 1–2 min, 50%–97% 2–12 min, 97% 12–14 min, 97%–3% 14–14.5 min and 3% 14.5–21 min (6520 system) and 3% 0–1 min, 3%–50% 1–5 min, 50%–97% 5–10 min, 97% 10–12 min, 97%–3% 12–12.5 min and 3% 12.5–15 min (6546 system). MS data were collected using electrospray ionization (ESI) in positive mode with a mass range of 50–1,200 m/z and a rate of one spectrum per second (6520 system), or Dual AJS ESI in positive mode with a mass range of 100–1,700 m/z and a rate of one spectrum per second (6546 system). The ionization source was set as follows: 325 °C gas temperature, 10 l min−1 drying gas, 35 psi nebulizer, 3,500 V VCap, 150 V fragmentor, 65 V skimmer and 750 V octupole 1 RF Vpp (6520 system), or 325 °C gas temperature, 10 l min−1 drying gas, 20 psi nebulizer, 3,500 V VCap, 150 V fragmentor, 65 V skimmer and 750 V octupole 1 RF Vpp (6546 system). MS/MS fragmentations were generated using [M+Na]+ as the precursor ion and fragmented with a collision energy of 30 eV unless otherwise stated.

Quantification of baccatin III (16)

The samples in Fig. 5i were analysed by an Agilent 1290 HPLC system coupled to an Agilent 6470 triple quadrupole (QQQ) mass spectrometer to accurately quantify the concentration of baccatin III. Data were collected with Agilent MassHunter Workstation Data Acquisition and analysed by MassHunter Quantitative Analysis 10.1 and Microsoft Excel. Separation was done using a ZORBAX RRHD Eclipse Plus C18 Column (2.1 × 50 mm, 1.8 µm; Agilent) with a mixture of 0.1% formic acid in water (A) and 0.1% formic acid in acetonitrile (B) at a constant flow rate of 600 μl per min at 30 °C. The injection volume was 0.5 μl. The following gradient of solvent B was used: 30% 0–1 min, 30%–100% 1–5 min, 100% 5–6.5 min, 100%–30% 6.5–7 min and 30% 7–8 min. MS data were collected using AJS ESI in positive mode. Multiple reaction monitoring was used to monitor the 609.2 to 549.2 ion transition at a collision energy of 24 eV as the quantifier, and the 609.2 to 427.1 ion transition at a collision energy of 32 eV as the qualifier. The ionization source was set as follows: 250 °C gas temperature, 12 l min−1 drying gas, 25 psi nebulizer, 300 °C sheath gas temperature, 12 l min−1 sheath gas flow, 3,500 V VCap, 0 V nozzle voltage.

Extraction and purification of taxanes from N. benthamiana

Nicotiana benthamiana plants were infiltrated with the combinations of biosynthetic genes shown in Supplementary Table 12 for the purification of taxusin (6), taxusin (6’), 1β-hydroxytaxusin (6-O1) and 15-hydroxy-11(15→1)abeo-taxusin (6-O2). Lyophilized N. benthamiana materials were cut into small pieces and extracted with 1 l ethyl acetate (ACS reagent grade; J.T. Baker) in a 2-l flask for 48 h at room temperature with constant stirring. Extracts were filtered using vacuum filtration and dried using rotary evaporation. Two rounds of chromatography were used to isolate compounds of interest. The chromatography conditions for each compound are summarized in Supplementary Table 12. In brief, the first chromatography was performed using a 7-cm-diameter column loaded with P60 silica gel (SiliCycle) and using hexane (HPLC grade; VWR) and ethyl acetate as the mobile phases. The second chromatography was performed on an automated Biotage Selekt system with a Biotage Sfär C18 Duo 6-g column using Milli-Q water and acetonitrile as the mobile phases. Fractions were analysed by LC–MS to identify those containing the compound of interest. Desired fractions were pooled and dried using rotary evaporation (first round) or lyophilization (second round). Purified products were analysed by NMR.

NMR analysis of purified compound

CDCl3 (Acros Organics) was used as the solvent for all NMR samples. 1H, 13C and 2D-NMR spectra were acquired on a Varian Inova 600-MHz or a Bruker NEO 500-MHz spectrometer at room temperature using VNMRJ 4.2, and the data were processed and visualized on MestReNova v.14.3.1-31739. Chemical shifts were reported in ppm downfield from Me4Si by using the residual solvent (CDCl3) peak as an internal standard (7.26 ppm for the 1H and 77.16 ppm for the 13C chemical shift). Spectra were analysed and processed using MestReNova v.14.3.1-31739.

Taxane feeding experiments

Taxus genes were expressed in N. benthamiana leaves using the Agrobacterium-mediated infiltration method described above. Three days after Agrobacterium infiltration, taxanes (purified 3O2A (4), taxusin (6), 10-deacetylbaccatin III or 9-dihydro-13-acetylbaccatin III (13); unless otherwise specified, a 100-μM solution after diluting with 10 mM DMSO stock was used) were fed into the leaves. Approximately 150 μl of solution was used per leaf to yield a circle with a diameter around 3 cm, which was marked for reference. After 18–24 h, four leaf discs were collected within the marked area with a 1-cm diameter cutter, and LC–MS samples were prepared following the methods described above.

Construction of phylogenetic trees

Sequences from the T. chinensis genome were selected using Pfam to identify 672 P450s (PF00067), 218 2-ODDs (PF03171) and 195 acyltransferases (PF02458). P450s were further filtered to those longer than 300 amino acids (467 P450s). Multiple sequence alignment for each family was performed using Clustal Omega, and the phylogenetic trees were constructed using the neighbour-joining method in Geneious Prime (v.2024.0.4) with 100 bootstrap replicates for initial analysis. Arabidopsis thaliana cinnamate 4-hydroxylase (AtC4H, accession NP_180607.1), A. thaliana gibberellin 20-oxidase1 (AtGA20ox1, accession NP_194272.1), and Hordeum vulgare agmatine coumaroyltransferase (HvACT, accession AAO73071.1) were used as outgroups for the P450, 2-ODD and acyltransferase families, respectively. All analyses were performed with default settings unless otherwise specified. Representative genes from major clades of the initial analyses and the Taxol biosynthetic genes were then selected to construct the final phylogenetic trees (Extended Data Fig. 9) using the neighbour-joining method with 1,000 bootstrap replicates.

Purification of proteins and binding assays

All proteins were purified from standard pET28a vectors expressed in BL21DE3 cells (New England Biolabs, C2527H). FoTO1 and FoTO1(ΔCterm) were purified as C-terminal fusions: His6-3×Flag-TEV-mTurq2-GSG-FoTO1. T5αH and TDS were purified with N-terminal purification tags (His6-3×Flag-TEV-enzyme) with N-terminal signal peptides removed (T5αH, 47 amino acids removed; TDS2, 60 amino acids removed). Proteins were purified as previously described69, with post-lysis steps done at 4 °C. In brief, 1 l of cells were grown to an OD600 nm of 0.4–0.5, induced with 0.3 mM IPTG and expressed for 16 h overnight at 18 °C. Cell pellets were lysed in lysis buffer (0.5 M NaCl, 20 mM HEPES pH 8.0, 0.1% Triton X-100, 1 mg ml−1 lysozyme, HALT protease cocktail (Thermo Fisher Scientific) and 1 μl ml−1 DNAse I (New England Biolabs)) by sonication, clarified by centrifugation for 1 h at 8,000g. Proteins were purified on pre-equilibrated Ni-NTA beads (New England Biolabs) and exchanged into a protein storage buffer (10 mM HEPES-KOH pH 8.0, 50 mM KCL, 10% glycerol, 1 mM DTT and 1 mM EDTA). Purified proteins were quantified by Bradford assay, and SDS–PAGE gels were used to verify protein size and correct protein concentration.

For each multiscale thermophoresis experiment, one protein was first labelled with the NanoTemper His-Tag labelling kit (RED-tris-NTA v2, MO-L018) for 30 min at room temperature according to reagent protocols. MST experiments were performed in PBS with 0.05% Tween-20 with labelled query protein (T5αH- or TDS-labelled) at 100 nM and a titration series of target protein.

Co-IP

Nicotiana benthamiana leaves were harvested four days after infiltration. Leaf tissue was homogenized in liquid nitrogen and resuspended in extraction buffer (50 mM Tris pH 7.5, 150 mM NaCl, 0.6% NP-40, 0.6% CHAPS and 1 mM β-mercaptoethanol)70. Lysates were kept on ice and centrifuged at 20,000g for 10 min at 4 °C. The protein content of the clarified extract was determined by Bradford assay (Abcam, 119216). Ten microlitres of protein-G-coated magnetic beads (Invitrogen, 10003d) were washed twice in binding buffer (50 mM Na2HPO4, 25 mM citric acid, pH 5.0) before a 1-h incubation at room temperature under agitation with 1 μl anti-V5 antibody. Lysates were incubated with the indicated compounds for 15 min under agitation. Antibody-bound beads were then washed twice in extraction buffer and incubated for 15 min under agitation at room temperature with lysate corresponding to 100 μg of total protein content (approximately 40 μl). After incubation, bead complexes were washed three times in extraction buffer and mixed with LDS sample buffer (Invitrogen, NP0007) for subsequent analysis by immunoblotting.

Immunoblotting

Lysates were separated for 1.5 h at 80 V on a NuPAGE gel (Invitrogen, NP0321) before transfer onto a PVDF membrane using a Bio-Rad Trans-Blot Turbo Transfer System (Bio-Rad, 1704150). Immunoblots were incubated with the indicated antibodies (anti-V5 at 1:1,000 and anti-HA-HRP at 1:2,500) for 3 h at room temperature under agitation. Blots were subsequently washed and incubated with HRP–protein G (Genscript, M00090) for 1 h, then imaged on the iBright FL1500 Imaging System (Invitrogen, a44241). The extraction buffer was adapted from a previously published procedure70.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.