Introduction

Atmospheric nitrogen is inaccessible to plants and must be converted into compounds such as ammonia. Plants have formed symbioses with nitrogen-fixing members of three major bacterial clades: cyanobacteria, Frankia, and diverse proteobacteria collectively termed “rhizobia.” In the latter two symbioses, flowering plants form nodules to house bacteria, most commonly on roots (root nodule symbiosis = RNS; nodulation). Nodulation of the important crop family, Leguminosae (Fabaceae; legumes, e.g., soybean and pea) by rhizobia has been studied intensively1, not only as a fascinating symbiosis, but also with the goal of engineering nitrogen fixation in crops such as wheat and rice, with enormous potential benefits to agriculture and the environment2. The feasibility of this goal has been linked to the evolvability of this trait and hence to whether nodulation has evolved once or many times. A landmark study by Kates et al.3 has added the latest chapter to this ongoing debate.

The first plant molecular phylogeny generated two competing origin hypotheses

Legumes and nine other flowering plant families with species engaging in RNS were widely scattered in pre-phylogenetic angiosperm higher-level classifications, so the default model was that nodulation had originated independently on numerous occasions from diverse genetic backgrounds. Surprisingly, however, the first large-scale chloroplast DNA-based phylogenies placed all nodulating species in a “nitrogen-fixing clade” within the eudicot rosid radiation4, a placement supported in subsequent phylogenies3,5. Following Griesmann et al., we will refer to this as the “nitrogen-fixing nodulation clade” (NFNC); it has also been called “FaFaCuRo” after the two pairs of sister orders (Fagales and Fabales, which includes legumes; Cucurbitales and Rosales)6,7 that together comprise the recently named fabid clade8.

But although nodulating lineages belong to a single clade, they are interspersed among many non-nodulating species (Fig. 1). To account for this complex distribution, Soltis et al. invoked a “predisposition” originating in the most recent common ancestor (MRCA) of the NFNC, and proposed two competing hypotheses4:

Fig. 1: Multiple origins and losses of nodulation hypothesized by Kates et al.3 in the four orders of the NFNC.
figure 1

a, b The two sister subfamilies of Fabaceae/Fabales, c Rosales, d Fagales, e Cucurbitales, with illustrations of floral and nodule diversity across nodulating lineages (note that nodule photos are not available for some taxa). Red numbered circles at nodes of phylogenetic trees indicate nodulation origins, and blue numbered X’s mark losses; numbers correspond to those in Kates et al.3. Gain 3 in fig. 1 (a) is gray because Dussia has recently been identified as a non-nodulator91. Photo credits: Lathyrus—Modified photo taken from Wikimedia Commons user Krzysztof Ziarnek (Kenraiz) from Openverse.org https://openverse.org/image/7758c4f4-c297-439b-bc26-bd569c062470?q=Lathyrus+japonicus&p = 20. CC BY-SA 3.0; Chorizema—Modified photo taken from Wikimedia Commons user jmcgross from Openverse.org https://openverse.org/image/74b0733e-8968-4a1f-9cfb-0d63ad654958?q=Chorizema&p = 3. CC BY 2.0; Baptisia—Modified photo taken from Flickr user scadwell from Openverse.org https://openverse.org/image/7b0c918e-f653-4380-bf8b-cec5a7268bfd?q=Baptisia&p = 10. CC BY-NC 2.0; Aeschynomene—Modified photo taken from Wikimedia Commons user Dinesh Valke (Thane, India) from Openverse.org https://openverse.org/image/d2183990-6a4e-4f80-acb8-7828ab4c8606?q=Aeschynomene&p = 20. CC BY-SA 2.0; Templetonia—Modified photo taken from Flickr user jeans_Photos from Openverse.org https://openverse.org/image/e38a4a70-191b-4797-80e2-430369178688. CC BY 2.0; Swartzia—Modified photo taken from Wikimedia Commons user Vojtěch Zavadil from Openverse.org https://openverse.org/image/4148c309-97d6-4397-8731-0d12db1af28d?q=Swartzia&p = 1. CC BY-SA 3.0; Mimosa—Modified photo taken from Flickr user João de Deus Medeiros from Openverse.org https://openverse.org/image/ad2bd259-bb73-4c53-b9f3-27afedec978e?q=Mimosa&p = 52. CC BY 2.0; Dimorphandra and Melanoxylon—Permission by Domingos Cardoso; Chamaecrista—Modified photo taken from Flickr user bob in swamp from Openverse.org https://openverse.org/image/89e3db92-edf3-4ef9-9cb4-afbfcff0ada0?q=Chamaecrista&p = 4. CC BY 2.0; Parasponia—Permission by Luuk Rutten; Ceanothus—Modified photo taken from Flickr user John Rusk from Openverse.org https://openverse.org/image/28ed838d-c3b0-4625-b219-05fec907bb10?q=Ceanothus&p = 2. CC BY 2.0; Discaria—Modified photo taken from Wikimedia Commons user Dick Culbert (Gibsons, B.C., Canada) from Openverse.org https://openverse.org/image/3e7ee66a-e7d4-403d-a23d-ee0e7b1771de?q=Discaria&p = 8. CC BY 2.0; Hippophae—Permission by Jan Thomas Johansson; Dryas—Modified photo taken from Wikimedia Commons user Krzysztof Ziarnek (Kenraiz) from Openverse.org https://openverse.org/image/2557c2d5-3a38-4324-a353-60faaa310c1d?q=Dryas+drummondii&p = 1. CC BY-SA 4.0; Alnus—Modified photo taken from Flickr user John Rusk from Openverse.org https://openverse.org/image/6ccfe034-cedf-4f24-8c22-d1130aa317f8?q=Alnus&p = 56. CC BY 2.0; Casuarina—Modified photo taken from Wikimedia Commons user Kevin Thiele (Perth, Australia) from Openverse.org https://openverse.org/image/ce9ef39d-4bb3-48ee-aa20-b70b252b7a89?q=Casuarina&p = 41. CC BY 2.0; Myrica—Modified photo taken from Wikimedia Commons user Hajotthu from Openverse.org https://openverse.org/image/75c68290-81d0-4942-85f4-a5d8ce5d0b6f?q=Myrica+gale&p = 11. CC BY-SA 3.0; Datisca—Modified photo taken from Wikimedia Commons user H. Zell from Openverse.org https://openverse.org/image/250d3f16-3764-4e40-9a50-dbde7a9c0730?q=Datisca&p=1. CC BY 3.0; Coriaria—Modified photo taken from Flickr user In Memoriam: Ecuador Megadiverso from Openverse.org https://openverse.org/image/e090111d-6a75-4a0d-98da-b3222d8a95db?q=Coriaria&p = 3. CC BY-NC-SA 2.0; Nodules: Lathyrus and Dimorphandra—Permission by Euan James; Chorizema and Templetonia—Permission by Julie Ardley; Baptisia—Permission by Jeff Doyle; Aeschynomene and Mimosa—Permission by Hukam Singh Gehlot; Swartzia—Permission by Domingos Cardoso; Chamaecrista—Permission by Eduardo Gross; Melanoxylon—Permission by Sergio de Faria; Parasponia—Permission by Luuk Rutten and Rene Geurts; Ceanothus—Permission by O’Dell et al.92; Discaria and Coriaria—Permission by Luis G. Wall; Dryas—Permission by Dagmar Hann and Jessica Folgmann; Datisca, Casuarina, Alnus, Hippophae and Myrica—Permission by Nadia Binte Obaid and Katharina Pawlowski. Links to licenses for reuse restrictions: CC BY-SA 3.0: https://creativecommons.org/licenses/by-sa/3.0/. CC BY 2.0: https://creativecommons.org/licenses/by/2.0/. CC BY-NC 2.0: https://creativecommons.org/licenses/by-nc/2.0/. CC BY-SA 2.0: https://creativecommons.org/licenses/by-sa/2.0/. CC BY-SA 4.0: https://creativecommons.org/licenses/by-sa/4.0/. CC BY 3.0: https://creativecommons.org/licenses/by/3.0/. CC BY-NC-SA 2.0: https://creativecommons.org/licenses/by-nc-sa/2.0/.

“…if the underlying predisposition to engage in symbiotic nitrogen fixation stems from a single, common origin, members of this clade that do not share this symbiosis must have lost the ability to form such associations. Alternatively, the ancestor of the nitrogen-fixing clade may have evolved the genetic components that would ultimately permit the evolution of symbiotic nodular nitrogen fixation. Following the establishment of these conditions, the necessary genetic background was present to allow parallel, recurrent evolution of symbiotic nitrogen fixation in the subsequent diversification of this clade. Both hypotheses call for mutations that established the symbiotic association with nitrogen-fixing bacteria, or at least the basis for this symbiosis, in the ancestor of the nitrogen-fixing clade [italics and bold face added].”

Clearly, for Soltis et al.4, “predisposition” is a feature shared by two competing hypotheses, in both cases providing “the necessary genetic background” for nodulation to evolve in the NFNC. Subsequently, however, “predisposition” came to refer exclusively to an unknown trait that was present in the NFNC ancestor but only much later enabled “parallel, recurrent evolution of symbiotic nitrogen fixation”4. We will refer to this hypothesis of multiple origins of a nodulation syndrome as “MUL” and the alternative hypothesis of the full nodulation symbiosis synnovation9 originating in a “single common origin4” as SNG (Fig.1). We will follow Werner et al.10 in referring to this as the “precursor trait,” using “predisposition” in its original sense to refer to both hypotheses. That Soltis et al.4 did not favor either hypothesis is seen in their conclusion that “nodulating and non-nodulating members of this nitrogen-fixing clade should be examined to ascertain whether recurrent losses or recurrent gains of nitrogen-fixing ability have occurred.”

For nearly a quarter of a century, as increasing numbers of species were phenotyped for nodulation (Box 1), the single origin hypothesis was ignored, with debate centering on how many independent origins had occurred. This period saw the advent of phylogenomics, including more sophisticated approaches to estimating the number of origins of nodulation10 that presaged the Kates et al. study (Box 2). Much attention focused on the unsuccessful search for the predisposition/precursor, but some attention was paid to the question of how to determine whether nodules in different groups were homologous—i.e., derived from the same origin11.

The unexpected “resurrection” of the single origin hypothesis

Distinguishing convergence from homology in nodulation is challenging11. In a short essay focusing on legumes, Doyle12 suggested that, rather than concentrating exclusively on testing homology of nodules, the problem of the number of independent origins might be addressed by studying non-nodulating species, distinguishing primary absence of nodulation from secondary loss. Species that had once been able to nodulate might preserve “fossil” evidence in their genomes. A similar approach had been used by Delaux et al.13 to detect unidentified genes involved in mycorrhizal symbiosis by comparing genomes of mycorrhizal plants with those of non-mycorrhizal Brassicales.

In 2018, two groups (one co-led by Pierre-Marc Delaux) reported that in nearly all non-nodulating NFNC taxa with sequenced genomes, a few key nodulation genes found in all nodulating species were either absent or were pseudogenes, and interpreted this as evidence for SNG6,14. The “resurrected” single gain/massive parallel loss hypothesis as expounded by van Velzen et al.15 proposes that a nodulation synnovation originated in the NFNC ancestor and has been maintained in all nodulating lineages (Fig. 1; SNG)6,14. The predisposition that is unique to the NFNC is part of the nodulation synnovation itself4, so all features of nodulation would only need to be assembled once rather than many times as in MUL, regardless of whether they evolved de novo or, more likely, were recruited from pre-existing functions.

A critical and under-appreciated feature of SNG is that although it is commonly described as the “single gain/massive parallel loss” hypothesis, it does not require any “loss” of nodulation to account for the phylogenetically scattered distribution of lineages that once nodulated but no longer do so. That is because nodulation is an optional, inducible, plastic phenotype, and whenever the cost of nodulation exceeds the benefit of symbiotically obtained nitrogen (as in a soybean field treated with exogenous fertilizer) plants do not invest in making nodules16,17,18,19. In the millions of years since the common ancestor of the NFNC, photosynthetic carbon has become increasingly expensive as atmospheric CO2 declined15,20,21. This had major effects on the evolution of plant, animal, and bacterial lineages and the ecosystems they inhabited22,23,24,25; critically, microsymbionts might no longer be present in the soil owing to radical environmental change (e.g., flooding, drought, soil pH and/or biochemical shifts)26. Although Li et al. had proposed that decrease in atmospheric CO2 could drive multiple origins of nodulation5, this would require the precursor trait to evolve at a period of relatively low CO2 cost, with individual taxa “waiting” millions of years before evolving nodulation when the C/N cost/benefit ratio was much less favorable for nodulation5. It seems much more likely that the decreasing benefit of nodulation would lead to its cessation.

In SNG, cessation of nodulation would be inevitable, not merely likely, whenever carbon cost exceeded nitrogen benefit, but that point would vary for each individual plant depending on the interactions between its genotype and that of its bacterial partner, and of both with the environment (Gp × Gb × E)27,28. Cessation is not loss, as is demonstrated by the plasticity of nodulation in extant species. However, similar to the decay of vision-related characters in another massively parallel phenomenon, blindness in cave-dwelling organisms29, unused key nodulation components would no longer be preserved by purifying (negative) selection, and would accumulate mutations stochastically that over time would inactivate them. Nodulation genes not under selection for other functions would become pseudogenes and could be lost entirely; regulatory elements, domains, or specific nucleotides involved in N-fixing symbiosis could be modified or lost from genes retained due to their non-nodulation functions. Lineages could preserve variable amounts of evidence of having once nodulated, as observed in the genomes of non-nodulating NFNC taxa6,14. Loss of nodulation-specific features could possibly be driven by selection30. Many members of the nodulation community were authors on either Griesmann et al.6 or van Velzen et al.14, so it is not surprising that SNG became the prevalent working model for nodulation evolution, though alternatives continued to be proposed31.

Resurgence of the multiple origins hypothesis

Kates et al.3 have now generated a robust, updated multiple origins hypothesis. This new formulation of MUL hearkens back to an earlier modeling study (Box 2) but with a bespoke phylogenomic analysis of ~13,000 mostly NFNC species based on 100 low copy nuclear genes; an augmented, comprehensive, expertly curated database of nodulation; and an updated version of the hidden rates model (HRM) approach32,33. Their best model identifies 16 independent origins of nodulation across the NFNC, with only 10 losses, all occurring within the legume family. As Kates et al.3 note, “The overall pattern of gain and loss resembles that recovered previously10, with more losses and gains reported here due in large part to our improved sampling of clades with high rates of RNS transition” (Fig. 1). Kates et al. counter expected criticism by noting that convergence is rampant in evolutionary history; they point in particular to C4/CAM photosynthesis, which has similarities but also key differences from nodulation (Box 3).

MUL sensu Kates et al. involves several different transitions between both observed and hidden states. First, an unidentified precursor trait, necessary but not sufficient for nodulation, evolves as in Soltis et al.4, but deeper in the rosid clade, earlier than the NFNC MRCA. This precursor state can either be lost (revert to the non-precursor state, e.g., in rosid lineages outside the NFNC), eliminating any chance of nodulation evolving, or, more rarely, can evolve to an intermediary hidden state, increasing tenfold the probability of nodulation evolving. The intermediary state itself can evolve either to (in descending order of probability) the non-precursor state, a transient nodulation symbiosis state (R1), or the precursor state. R1 can evolve to a stable nodulation symbiosis (R3) or to a second transient nodulation state (R2), which almost always reverts to the precursor state but also has a low probability of evolving a stable R3 nodulation symbiosis. These unknown transient states need not be homologous and may have nothing to do with nodulation (Box 2; e.g., R1 in one lineage could be an ecological shift, whereas R1 in another lineage could be a gene duplication). Kates et al.3 concluded that although genetic components leading to nodulation are easily lost, nodulation once gained is highly stable.

Among the most compelling features of the Kates et al.3 study is the rigorous testing of models with parameters strongly favoring SNG. Despite allowing the probability of loss to exceed that of gain by over 60-fold, a single origin of nodulation was always overwhelmingly rejected. It should be recalled, however, that loss of nodulation is not an essential part of SNG; what is important is cessation, which is inevitable and presumably incurs no cost at all. “Loss” is an inference; the observation is absence, and dealing with absence characters in phylogenetics is notoriously difficult34,35. We doubt that it is possible to parameterize an optional trait that ceases automatically when no longer beneficial, and/or when the microsymbiont disappears or becomes parasitic, Thus, we do not agree with the contention of Kates et al.3 that their results “definitively” identify the ancestor of the NFNC as non-nodulating.

Despite our disagreement, however, we believe that Kates et al.3 should be welcomed by the nodulation community as a major improvement over intuitive, parsimony-based mapping of nodulation on small phylogenies, and although sampling is still not exhaustive (e.g., some key legume genera are missing and there are mistakes in identification of terminal legume taxa), it is a great improvement over previous modeling based on flawed phylogenies10. It thus constitutes the best available MUL hypothesis for rigorous testing, and can guide innovative exploration of RNS evolution, such as conducted by Battenberg et al.36, who found signatures of positive selection associated with nodulation in several lineages. Kates et al.3 acknowledge that details of MUL will almost certainly change as additional taxa are sampled. Throughout their paper, they highlight the need for more information on the molecular and developmental biology of nodulation, particularly in under-studied lineages—which is most of the NFNC except for a few model (mostly legume) taxa—and for a more sophisticated definition of nodulation itself.

Deconstructing “nodulation”: testing for single vs. multiple assembly of homologous parts

It has long been accepted that the evolution of the complex association of integrated characters that comprise nodulation involved the co-option, rather than the invention, of individual genes, and the recruitment or rewiring of existing gene regulatory networks (GRNs)3,4,11,12,37,38,39,40,41,42. Homology of individual nodulation modules in all nodulating lineages is a given in SNG, so non-homology of genes or modules would support MUL. For example, if genes recruited for similar RNS functions in two nodulating taxa are non-homologous, or paralogous rather than orthologous, then the assembly was more likely independent, making nodulation non-homologous11.

Independent origins of nodulation likely would involve convergent or parallel recruitment of the same genes or GRNs, with the result that nodulation would be non-homologous, but some or all of its components could be homologous3,10,11,12,43. Such “deep homology”44 confounds simple definitions of homology for complex traits because it can be invoked to discount any amount of similarity45,46. This has consequences for inferring common RNS origin based on greater than random transcriptomic similarity in nodules of taxa from different NFNC orders14,36. For example, Libourel et al.47 reconstructed a core ancestral nodulation transcriptome of over 750 orthogroups shared across three orders of nodulating NFNC species, which was much higher than random expectation, and concluded that “RNS in the most recent common ancestor looked very similar to RNS in extant species.” But deep homology stems from the non-random recruitment of homologous GRNs, in parallel, for use in functionally similar syndromes, blunting such conclusions.

The eponymous organs of NFNC members are all called “nodules” because they meet the criterion of similarity in form and position that is the basis for the most common evolutionary/phylogenetic definition of homology as “similarity due to common descent.” However, the structural and biochemical diversity of nodules across the NFNC48 has favored non-homology and thus MUL. Under SNG, this diversity is explained by >100 MY of independent evolution since the divergence of modern lineages from the NFNC ancestor. We note that the diversity of nodules is matched, if not surpassed, by the flowers of these same NFNC species (Fig. 1), yet it has not been suggested that flowers are non-homologous in the clade. And although Kates et al.3 cite Liu et al.48 for “limited homology” of nodulation across the NFNC, much of that paper details differences among the homologous nodules of papilionoid legumes, which arguably differ phenotypically more from one another than some of them do from putatively non-homologous (under MUL) nodules of caesalpinioid legumes (Fig. 1)49. Moreover, looks can be deceiving. Shen et al.50 constructed an actinorhizal plant nodule fate map based on Parasponia andersonii and Alnus glutinosa, which revealed that the developmental processes of actinorhizal plant nodules have greater similarity to those of legume nodules than previously appreciated. Then, by mutating a single transcription factor (NOOT) in Medicago, they produced a homeotic change from legume-type to actinorhizal-type nodule anatomy.

What is truly novel about nodulation is the recruitment and assembly of a set of modular components into a functional nitrogen-fixing symbiosis with bacteria housed in plant-produced structures (Fig. 2a). In exploring how RNS might be engineered in crop plants, Jhu and Oldroyd51 provide an excellent description of these components and discuss how they might have been assembled at the origin of nodulation. They emphasize the longstanding paradigm that much of RNS has been recruited from the older mycorrhizal symbiosis, itself “appropriated” from plant defense responses52. Both symbioses begin with the Common Symbiosis Signaling Pathway (CSSP), but beyond that, there has been massive transcriptomic reprogramming leading to nodulation31, key to which is a single transcription factor (TF), Nodule Inception (NIN). The many roles of this master regulator of nodulation53,54,55,56 afford an opportunity for breaking “nodulation” into its constituent parts and asking when and where these were brought together—once or multiple times (Fig. 3)—in rosid phylogeny. As Geurts and Huisman57 note, understanding how NIN has been adapted to function in nodulation is critical for understanding the evolution of RNS.

Fig. 2: Stepwise recruitment of features of nodulation.
figure 2

a The nodulation symbiosis is assembled by recruitment of modules from pre-existing complex functional traits (A–D; e.g., symbiosis signaling, lateral root development, etc.), each comprising several components (a1, a2, … d3), beginning at to with the recruitment of component “a2” of the “A” function for a new role (designated by italic font). Various other functions are recruited over a period of time (delta t) until at tn a complete nodulation symbiosis evolves with the addition of component “d2.” Functions a-d are shown as being recruited into a pre-nodulation trait with intermediate correlated states (ab and abc); gene or GRN duplication could produce components that could diverge from ancestral functions without such association, but this would not be “recruitment,” and would collapse delta t to 0. b Only the timeline of recruitment of a2-d2 differs between SNG and MUL, as illustrated for Medicago. For both SNG and MUL, to could be the same, but whereas under SNG the full nodulation symbiosis has evolved by the time of the most recent common ancestor (MRCA) of the NFNC (tn >100 MYA), under MUL tn does not occur until after the origin of legumes (~70 MYA) but before the origin of crown papilionoid legumes. For MUL, the precursor evolves prior to the NFNC MRCA, but all other components of the full RNS could evolve as quickly as in SNG, in this case, 60 and 70 MYA.

Fig. 3: Relationships between processes shared between nodulation and other functions under the two competing hypotheses for legumes (Medicago) and Betulaceae (Alnus).
figure 3

a, b Single origin (SNG). a Nodulation originates in the NFNC ancestor (MRCA) by recruitment of modules from various processes (e.g., mycorrhizal signaling, lateral root development) as in Fig. 2a. Later, lineages leading to legumes and Betulaceae diverge, as shown by the slightly different colors (for convenience, the legume lineage is shown as retaining the ancestral darker coloration). Thus, recruited features have the potential to differentiate (e.g., specialize for nodulation vs. their previous role) prior to the divergence of legumes and Betulaceae. b Using recruitment of features from mycorrhizal signaling during the origin of nodulation in the NFNC MRCA as an example, there could be greater similarities between some features of mycorrhizal (brown box “M”) and nodulation symbiosis (green box “N”) in nodulating species of each lineage (many legumes; currently only Alnus in Betulaceae) than between either mycorrhizal or nodulation symbioses across species. c, d Multiple origins (MUL). c Divergence of taxa pre-dates independent origins of nodulation, therefore the individual processes have already diverged in the two taxa prior to their independent recruitment for nodulation, as shown by the slightly different colors. d Using recruitment of mycorrhizal signaling during the independent origin of nodulation in Medicago (legume) and Alnus (Betulaceae) as an example, divergence of the legume and Betulaceae lineages pre-dates convergent origins of nodulation, allowing time for details of mycorrhizal signaling to have diverged in the two lineages. This could be reflected in greater similarities between some features of nodulation and mycorrhizal symbiosis within each species than between either mycorrhizal or nodulation symbioses across species.

A brief history of NIN

The NIN-like protein (NLP) transcription factor gene family is plant-specific, with members belonging to three major groups in seed plants, playing key roles in the plant nitrate response58,59. Group 3 is duplicated in dicots, with one paralog comprising the NIN clade, within which there has been further amplification in various taxa (e.g., three lineages in Brassicaceae9). The three Arabidopsis paralogues (NLP1-3) play similar roles in nitrate sensing and share this with at least one member of the other Group 3 subclade, NLP760, making it likely that this is the ancestral role of both subclades.

It has been suggested that the loss of this role, through deletions in an otherwise conserved nitrate-responsive ___domain (NRD), was critical for the evolution of RNS53; however, NIN proteins of nodulating species outside papilionoid legumes retain much or all of the NRD9,61. Regardless of whether the NRD is functional in these NIN orthologues, the loss or inactivation of NIN in many non-nodulating NFNC lineages suggests that it is no longer essential for nitrate-sensing.

The co-option of mycorrhizal signaling through NIN (and Nod-factor receptor genes) required changes not only to the NIN protein but also to the NIN promoter region57. The CYCLOPS-responsive element (CYC-RE) links NIN to the most downstream CSSP TF, shared by RNS and mycorrhizal symbiosis62. Though not essential, it is important for infection thread formation in legumes63. The CYC-binding motif is part of a larger Predisposition-Associated Cis-regulatory Element (PACE) that Cathebras et al.64 identify as the nodulation predisposition/precursor trait, being found only in the NFNC, both in nodulating species and in non-nodulators retaining NIN. They also showed that PACE drives NIN expression during cortical infection thread development in Lotus and that, despite sequence differences, PACEs from the other three NFNC orders, including both nodulating and non-nodulating species, could rescue a Lotus nin mutant64. Moreover, the addition of PACE to a tomato NIN gene promoter also reactivated RNS in the Lotus mutant64.

Cathebras et al.64 identified intact NIN open reading frames in 10 of 28 non-nodulating NFNC species, but PACE was absent from five of these; from this, along with the presence of PACE in NIN of all 31 nodulating species surveyed, they concluded that PACE was functional in the NFNC MRCA, and was generally lost when nodulation ceased. However, although PACE was not reported in poplar, the CYC-RE occurs in a poplar Nod-factor-inducible NLP gene involved65, like NIN54 and other NLPs58, in lateral root development. This “cast doubt upon the theory that the essential innovation in the evolution of nodulation was the gain of regulation of NIN by the common symbiosis pathway”65.

According to Shen and Feng56, “NIN controls all aspects of symbiotic nodulation in legumes: rhizobial infection, nodule organogenesis, transition to nitrogen fixation, and regulation of nodule number in legumes and actinorhizal plants.” This sweeping statement is based almost exclusively on model papilionoid legumes. Although NIN is known to be essential for nodulation in Casuarina66,67 and Parasponia68, and is implicated in RNS of Ceanothus, Datisca, Alnus, and Purshia9,36,69, the role of NIN is completely unknown in the remaining nine Kates et al.3 RNS origins.

In legumes, NIN interacts directly with many genes belonging to the various modules recruited in the origin or subsequent evolution of RNS56,70. Any of these genes that did not previously interact with NIN would need to acquire NIN-binding sites in their promoters either once (SNG) or convergently (MUL), as appears to have occurred in the origins of C4 photosynthesis in grasses (Box 3). Much remains to be learned about promoter structure in NIN-responsive genes; an example is the complex dual promoter in Lotus leghemoglobin genes that binds NIN and NLP2, which belong to different NLP Group 3 subclades71. Yu et al. describe NIN-binding sites in the promoters of an NFNC-specific gene, Nodules with Activated Defense (NAD1)72. The gene and these “nodulation-associated cis-regulatory elements” (NACEs) are found in both nodulating and non-nodulating NFNC members. NACEs have sequence similarity to NIN-binding sites in known NIN targets41,54,71,73,74,75,76,77. Nodulation in Lotus mutants lacking Nad1 NACE sites was rescued by introducing NACE sequences from Alnus, Parasponia, Casuarina, and Datisca, demonstrating the conserved nature of these sites72; functionality was not tested for NACEs from two non-nodulating NFNC species.

Questions regarding nodulation, emphasizing the role of NIN

SNG is founded on the thesis that NIN was lost or pseudogenized in many non-nodulators because its role was solely associated with RNS; the retention of functional NIN in some lineages is consistent with neutral expectations (loss is random but not universal) or with the acquisition of functions not occurring in lineages where NIN is dispensable. In contrast, Kates et al.3 suggest that “essential RNS-related genes could be recruited without an RNS phenotype and then become dispensable and thus fragmented or lost in lineages that do not also eventually gain RNS”; they point to the presence of intact NIN in non-nodulators such as Ziziphus (Rhamnaceae) as support for MUL. Understanding the occurrence of NIN and characterizing its function in phylogenetically diverse nodulating and non-nodulating taxa is essential to testing the competing hypotheses.

What is the distribution of intact, functional, and pseudogenized NIN across the NFNC?

Plant gene families are constantly evolving by duplication, producing novel functions and expression patterns but more commonly leading to pseudogenization and loss78. Polyploidy duplicates all genes simultaneously and has been implicated in the evolution of nodulation (Box 4). Soybean has retained all four homoeologous NIN copies from the two most recent rounds of polyploidy that characterize its genome, and these show a complex pattern of expression at the tissue and single cell level79,80. This degree of differentiation among paralogous copies suggests that generalizations concerning “NIN” evolution and expression, particularly those based on transcriptomic studies of bulk tissues, may need to be re-evaluated. The condition of NIN is unknown for most NFNC, including key legume lineages. Comprehensive sampling of NIN could enable application of an approach similar to that of Kates et al.3 to estimate the number of losses and the correlation of NIN presence with nodulation. Minimum ages of NIN inactivation could be tested in lineages retaining NIN pseudogenes by identifying pseudogenizing mutations shared among species. The complexity of the NLP family, and of NIN, makes synteny information critical for the correct identification of NIN and would enable the detection of partially decayed pseudogenes.

How does NIN function in nodulation across the NFNC?

Is NIN constitutively localized to the nucleus in species other than the few papilionoid legume species in which this has been shown53? How many of the modular functions of nodulation controlled by NIN in model papilionoids are shared and regulated by NIN in other lineages, including crown papilionoids, where nodulation is homologous? For shared modules, are the same GRNs involved? Liu et al.70 identified 145 Medicago genes with which NIN interacts directly; how many of these genes interact with NIN in other nodulators, both within papilionoid legumes and in other putatively independent nodulating lineages? Are genes that interact directly with NIN in different nodulating lineages paralogous, which would support MUL11? Identification of genes interacting with NIN, for example, by reciprocal DAP-seq81 experiments involving multiple Kates et al.3 MUL lineages, would be informative.

What is the structure of the regulatory regions of genes targeted by NIN across the NFNC?

The “cis-regulatory code” is poorly understood82, but tools are being developed to “crack” it, even in plants83. For nodulation, the critical evolutionary question is how NIN recruited and brought together new targets and itself came under the regulation of the CSSP41; regulatory sequences are key to understanding this “rewiring”52,84. Because nearly all genes involved in nodulation presumably have non-nodulation functions, genes recruited to function under the control of NIN possess binding sites not only for NIN but for transcription factors involved in their original, conserved functions6,9,14,57. TF binding sites are short, conserved motifs often occurring within cis-regulatory modules in which the order of binding sites for different TFs may be conserved85. If promoters containing NIN-binding sites could be identified and related to target genes by reciprocal binding experiments involving numerous MUL lineages, these could be compared phylogenetically across lineages for evidence of potential homology of NIN-binding sites (Fig. 4).

Fig. 4: Homologous and non-homologous gene recruitment inferred from TF binding sites.
figure 4

a An ancestral gene, Xa (gray box, transcription start site shown by arrow), has a 5’ promoter region (narrow line) with two cis-regulatory elements (CREs) associated with two different functions (red and blue boxes). b In a descendant, the gene is recruited for nodulation by acquiring a NIN-binding CRE (green box) and is observed in nodulating species-1. ce Nodulating species 2, 3, and 4 use gene X in nodulation symbiosis, also under the control of NIN; all three have the same conserved NIN-binding motif. However, only in gene Xn-2 is the NIN-binding motif in the same position relative to motifs for the original functions as in gene Xn-1. This can be taken as evidence for a single recruitment event (SNG) in the ancestor of the taxa with genes Xn-1 and Xn-2, but independent recruitment events in the taxa with genes Xn-3 and Xn-4 (MUL).

Pereira et al.86 and Zhang et al.9 used genome-wide alignments of NFNC species to identify thousands of conserved noncoding sequences/elements (CNS/CNE), many associated with RNS, with some validation by Pereira et al.86. These provide a rich source of candidates for further study. Kates et al.3 MUL provides guidance for assessing the presence/absence of CNEs in the context of RNS evolution. The work of Yu et al.72 on NIN-binding sites, combining phylogenetic and functional studies, provides a template for investigations of promoters of other genes targeted by NIN, such as NFY and RPG73,75,87. Recent studies of accessible chromatin in the evolution of C4 photosynthesis in grasses88 provide a template for studying recruitment in building complex phenotypes.

What roles does NIN have apart from nodulation?

The Kates et al.3 contention that “essential RNS-related genes” such as NIN could be “recruited without an RNS phenotype” requires that NIN have some role other than RNS and that it retain functionality for millions of years before either becoming a key player in RNS independently in 16 NFNC lineages or being lost in many lineages that did not evolve RNS. There is no known role for NFNC orthologues of NIN other than RNS, but it is possible that NIN could conserve the ancestral nitrate-sensing function of Arabidopsis NIN-clade orthologues in NFNC taxa whose NIN retains most or all of the NRD. This includes not only non-legumes (e.g., Quillaja89, sister to Fabaceae in Fabales), but also sampled caesalpinioid legumes9,61. This should be tested, particularly in non-nodulators.

What is the fate of nodulation genes when nodulation ceases?

SNG and MUL sensu Kates et al.3 agree that 10 formerly nodulating lineages no longer engage in RNS. In addition to these, because Dryas drummondii is the only nodulating Dryas species, there is an additional recent “loss” of nodulation in the genus90 not mentioned by Kates et al.3. How have promoters of direct NIN target genes evolved in these taxa, and how has NIN and its promoter region evolved? This information could provide a baseline for distinguishing the primary absence of RNS (MUL) from its cessation/loss (SNG).

Are there intermediate states of RNS?

The assembly of a functional RNS from ancestrally independent modules implies a stepwise process regardless of whether it occurred once (SNG) or multiple times (MUL), but MUL stretches the completion of the assembly long enough for lineages that initiated the process to have diverged from one another, such that the final developments of RNS are convergent (Fig. 2b). Under MUL incipient nodulators could possess intermediate conditions with some but not all modules found in nodulating species. Note that this is not necessarily the same as intermediate hidden states3, which need not be involved in nodulation. Under SNG, partially assembled RNS would represent decaying symbioses. In such intermediate states, NIN might retain interactions with genes from a subset of modules.

Conclusions

How novel processes and structures originate is a key question in evolutionary biology, and RNS and the nodule organ are an excellent plant system for addressing it. Kates et al.3 have produced a strong formulation of the multiple origins hypothesis that clarifies and can guide efforts to test the convergence theory. Notably, given the importance of legumes, MUL sensu Kates et al.3 identifies six origins and ten losses in the papilionoid + caesalpinioid clade. The rich diversity of RNS in crown papilionoids alone, homologous under either hypothesis, defies simple characterization, and much can be learned about how RNS evolves without straying from that single clade.

In our opinion, SNG15 remains a viable alternative to MUL, one that should be considered when evaluating RNS traits across diverse nodulating lineages. The fact that cessation is not loss blunts the conclusion that massive loss is highly improbable, and until non-RNS roles are identified for NIN and other genes lost or pseudogenized in non-nodulating NFNC species, the explanation provided by Kates et al.3 will remain unsatisfying.

For those pursuing the goal of engineering nitrogen fixation in crops outside the NFNC, a research program aimed at testing the competing hypotheses will be of great value simply by filling in the many gaps in our understanding of the diversity of nodulation symbioses. Whether achieving this goal is more vs. less likely under one or the other hypothesis is arguable, as Kates et al.3 correctly point out. As noted by Zhang et al.9, under either hypothesis, common features of nodulation in different lineages represent a likely minimum or core set of functions either retained (SNG) or independently assembled (MUL) for nodulation, with variable features being lineage-specific alternative innovations (MUL) or optimizations (SNG). In 1995, Soltis et al.4 charted the path forward to achieve both the fundamental and applied goals: Elucidate the biology of nodulating as well as non-nodulating NFNC species. On this, both Kates et al.3 and we agree.