Abstract
Impatiens spp. are well-known ornamental and medicinal plants that are widely distributed in the highlands and mountains of southwestern China. This area is one of the hotspots for the distribution of Impatiens species, with typical karst landforms and abundant wild resources. Many of these species are endemic to a narrow distribution area, but their classification and relationships are relatively unclear because of insufficient field investigations, diverse morphological characteristics and lack of molecular information. In this study, chloroplast genome analysis of 13 species (including 2 synonyms) in karst habitats was conducted to study their characteristics and phylogenetic relationships. The results revealed that these chloroplast genomes all had double-stranded tetrad structures ranging in length from 151,284 bp to 152,421 bp, including a total of 113 genes, including 80 protein-coding genes, 29 transfer RNAs, and 4 ribosomal RNAs. SSRs mainly consist of A/T repeats and AT/AT repeats, while INEs mainly consist of positive repeats and palindromic repeats. The frequency of codon usage was essentially the same, with a total of 31 high-frequency codons detected, the vast majority ending in A/U. Five mutation hotspots were detected: rps16-trnQ-UUG, ndhF, ccsA-ndhD, ycf1, and trnN-GUU, among which ycf1 had the highest Pi value and the greatest potential as a DNA barcode marker. Our phylogenetic tree shows that all 13 species belong to Section Impatiens. And supported the classification of I. reptans and I. rhombifolia should as synonyms (BS = 100/PP = 1.00). This study comprehensively analyzed the cp genomes of different taxa, sheds light on the taxonomic intricacies of Impatiens species, provide valuable information into its phylogenetic and taxonomy.
Similar content being viewed by others
Introduction
The genus Impatiens L. was established by the Swedish botanist Linnaeus in the 1750s and, together with Hydrocera Bl., constitutes the family Balsaminaceae1. There are approximately 1200 recorded species of this genus worldwide, which are distributed mainly in the tropical and subtropical regions of the Old World2. Members of this genus are distributed in five diverse hotspots, namely, Madagascar, the tropical African continent, southern India and Sri Lanka, the eastern Himalayas, southwestern China and the broader Southeast Asia region3,4. There are over 350 known species of Impatiens in China, accounting for approximately one quarter of the total number of species worldwide5. The Yunnan–Guizhou Plateau has the highest species richness of Impatiens in China, and these species are endemic mostly to a narrow region. The floristic elements in this region have ancient origins and have always been important hotspots for the distribution of Impatiens species6.
Owing to the rich diversity and high-level convergent evolution of Impatiens species, as well as the fragility of their flower parts, specimens are often incomplete or folded, making separation and restoration difficult7,8. Moreover, there is significant diversity and continuity in the morphology of flowers, leaves, fruits, and seeds. The magnitude and pattern of the morphological variation are unclear, leading to complexity in taxonomic research and blurred interspecies boundaries in this genus, which has always been difficult to classify.
In recent years, with the continuous increase in field investigations, many new taxa have been discovered and reported, especially in the southwestern region of China, such as I. gongchengensis Z.C. Lu, B. Pan & Yan Liu, I. rapiformis Y.Y. Cong & Y.X. Song, I. bijieensis X.X. Bai & L.Y. Ren, I. cavaleriei X.X. Bai & R.X. Huang, I. liupanshuiensis X.X. Bai & T.H. Yuan, and I. beipanjiangensis Jian Xu & H.F. Hu9,10,11,12,13,14,15,16. With the discovery of new taxa, the classification of species in Impatiens needs to be revised. Scholars worldwide have conducted synonymous merging or published new combinations of Impatiens. For example, Huang et al. merged five species of the genus Impatiens, including I. rhombifolia and I. reptans, with the synonymous species I. procumbens17. Singh et al. proposed a new combination of Indian Impatiens species and compiled 39 synonyms18. These combinations are based almost solely on subjective macroscopic morphological features, leading to controversy over the relationships among taxa. Gogoi et al. proposed a viewpoint opposite to that of Singh et al.’s paper, pointing out that the evidence presented does not readily support their taxonomic classification19. Some previously merged scientific names have also been restored, such as I. namchabarwensis, which is usually regarded as a synonym of I. arguta, but scholars have subsequently argued that these are actually two separate taxa20. I. hookeriana was once considered synonymous with I. grandis but was later recognized by some scholars as a different species, restoring its taxonomic status21. The above research was based mainly on macroscopic morphological features for taxonomic classification, but Impatiens species generally have complex interspecies and intraspecies trait variations, which are sensitive to changes in habitat. Some taxa may be overly subdivided, and taxonomic problems such as synonyms and illegitimate names are common. The application of new methods in taxonomic research is a skill that taxonomists must master. On the basis of previous research, Yu et al. combined morphological and molecular systematics methods to construct a phylogenetic tree by integrating the nuclear gene ITS and chloroplast gene atpB-rbcL fragments from 150 species22. This method establishes a general classification system framework for the Impatiens and has been widely recognized by researchers. However, there remains controversy in the treatment of species-level classification, and the relationships of species-level classification are still relatively disorganized. Therefore, it is necessary to carry out species-level taxonomic research.
The chloroplast genome has a self-replication mechanism, relatively independent evolution, small genome size, and low mutation rate and can thus provide more accurate information for systematic evolutionary relationships23,24,25. With the development of sequencing technology and the reduction in sequencing costs, the property that chloroplast genomes can be sequenced and assembled with ease enables them to possess a significantly greater abundance of data resources compared to nuclear genomes and mitochondrial genomes, and they have been widely applied in the classification of families, genera, and related species, overcoming many challenges in taxonomic research26,27,28,29. At present, most phylogenetic studies on Impatiens species use DNA fragments (such as nr DNA ITS, cpDNA atpB-rbcL, and trnL-F). However, owing to the limited amount of information, there are significant conflicts between the research results of some phylogenetic analyses and those of morphological studies, and there are significant challenges associated with species-level taxonomic classification. The chloroplast genome has been applied in the taxonomic research of Impatiens because of its rich information loci, moderate evolutionary rate, and other advantages. For example, in 2023, Qin et al. reconstructed the phylogenetic framework of Impatiens species in the Northern Hemisphere via chloroplast genomes, dividing Impatiens into seven major evolutionary branches (BS = 100)6. Some scholars have described the chloroplast genomes of some Impatiens species30,31,32,33. However, for the vast genus Impatiens, there are few relevant reports, and most of them are based on the analysis and comparison of different groups of species within or between genera, with almost no research on intraspecies and interspecies relationships.
This study analyzes the chloroplast genomes of 13 I. subg. Impatiens in I. sect. Impatiens, including 6 newly sequenced chloroplast genomes from the type locality (Guizhou Province) (I. bijieensis, I. labordei, I. liupanshuiensis, I. lasiophyton, I. cavaleriei, I. sigmoidea). On the basis of complete chloroplast genome data of species: (1) the structure and characteristics of the chloroplast genomes of Impatiens spp., including repeat sequence content and distribution, amplification and contraction of IR regions, codon preference and chloroplast genome differences, were studied; (2) comparative analysis of nucleotide differences in the chloroplast genomes of 13 Impatiens species was performed, screening highly variable regions that could serve as potential molecular markers; and (3) a phylogenetic tree based on 31 chloroplast genomes of Balsaminaceae was constructed, providing important information for systematic and taxonomic research of the genus Impatiens.
Results
The structure of chloroplast genes
The chloroplast genomes of the 13 species were composed of four connected regions forming a circular structure, with a typical tetrad structure (Fig. 1). The total sequence length ranged from 151,284 bp (I. fargesii) to 152,421 bp (I. lasiophyton), including a large single-copy (LSC) region (83,358 bq–82,228 bp) and a small single-copy (SSC) region (17,652 bp–17,243 bp), separated by a pair of IR regions (51,746 bp–51,332 bp). Each chloroplast genome contained 113 genes (Table 1), which could be classified into three categories: (1) transcription and RNA genes, including 4 transcription-related genes (rpoA, rpoB, rpoC1, and rpoC2), 21 ribosomal protein-encoding genes, 4 ribosomal RNA genes (rrn4.5, rrn5, rrn16, and rrn23), and 29 transfer RNA genes; (2) 47 photosynthesis-related genes, including genes encoding Rubisco, ATP synthase, photosystem I, the cytochrome b/f complex, photosystem II, cytochrome c synthesis, and NADPH dehydrogenase; and (3) 8 other genes.
The GC contents of the chloroplast genomes of the 13 species were very similar (Table 2), with total GC contents ranging from 36.78 to 36.88%. The GC content in the LSC region was 34.47%–34.61%, the GC content in the SSC area was 29.2–29.54%, and the IR region had a high GC content, ranging from 43.06 to 43.13%. The total length of the protein coding sequence (CDS) ranged from 73,341 bp (I. lucorum) to 86,870 bp (I. bodinieri). In addition, the GC contents of the first codon position, the second codon position, and the third codon position, from low to high, were as follows: the third position (28.31-29.61%), the second position (37.35–37.62%), and the first position (44.17–45.02%).
Repeat sequence analysis
In this study, a total of 81 (I. sigmoidea) to 98 (I. undulata) nucleotides of 6 types (Fig. 2), including 62–81 mononucleotides (Mono), 6–12 dinucleotides (Di), 2–4 trinucleotides (Tri), 3–7 tetranucleotides (Tetra), 1–2 pentanucleotides (Penta), and 0–1 hexanucleotide (Hexa), with repeat types composed mainly of A or T, were detected. A majority of the repeats were mononucleotide repeats, so these repeats may play a more important role in genetic variation than other types of repeat sequences. Among the mononucleotide SSR types, the most common were A/T repeats, with very few C/G repeats; even in I. rhombifolia and I. reptans, there were no duplicates of the C/G type. Among dinucleotide repeat sequences, AT was the only repeat type. Pentanucleotide repeats were present in I. lucorum, I. rhombifolia, I. reptans, I. bijieensis, and I. liupanshuiensis, and only I. bijieensis had a hexanucleotide repeat (AAAAAT).
The analysis of scattered repeat sequences via REPuter identified 27–28 scattered repeat sequences (INEs), and no complementary (C) types was found. The most common sequences were forward (F) and palindromic (P) repeat sequences, which were much more abundant than reverse (R) repeat sequences (Fig. 2). Reverse repeat sequences appeared only in I. fargesii, I. lucorum, I. labordei, and I. lasiophyton. The species with the most repetitions was I. sigmoidea. These sequences included 10 forward repeat sequences and 11 palindromic repeat sequences. The species with the fewest repeat sequences were I. lucorum and I. labordei, both of which had 7 forward repeat sequences, 7 palindromic repeat sequences, and 1 reverse repeat sequence. Interestingly, repeat sequences with lengths greater than 59 bp all had a length of 78 bp, and there were no repeat sequences of this length in I. bijieensis, I. sigmoidea, or I. lasiophyton. Overall, the length of the repeat sequences ranged from 30 to 78 bp, with most ranging from 30 to 39 bp and only a small portion ranging from 40 to 78 bp.
Shrinkage and expansion of IR boundaries
Although the size of the IR region is highly conserved compared with that of other regions, the contraction and expansion of IR boundaries are considered to play important roles in genome size34. To elucidate the expansion and contraction of the IR boundaries, this study compared 13 Impatiens species. The results revealed the gene structure of the four connection points in the four gene regions (Fig. 3).
The IRB-LSC junction (JLB) was located in the rps19 gene, and the lengths of rps19 in the IRB and LSC regions of the 13 samples were the same. The distance between rpl22 located in the LSC region and IRb varied from 137 to 151 bp, whereas in I. bodinieri, I. fargesii, I. lucorum and I. labordei, rpl22 was 143 bp away from IRb; the distance in I. rhombifolia and I. reptans was 138 bp; and the distance in I. cavaleriei and I. liupanshuiensis was 145 bp.
The IRB-SSC junction (JSB) was near ycf1 and ndhF. The ndhF genes of I. bodinieri, I. undulata, I. notolopha, I. cavaleriei, I. labordei, I. liupanshuiensis, I. lasiophyton, and I. sigmoidea were amplified from the IR region. The amplified fragments from I. undulata, I. notolopha, and I cavaleriei were 19 bp in length, and those from I. liupanshuiensis and I. lasiophyton were 10 bp in length. I. fargesii, I. lucorum, I. rhombifolia, I. reptans, and I. bijieensis did not harbor the ndhF gene in the IR region; these species are synonymous with I. procumbens, I. rhombifolia and I. reptans. The distance between the ndhF gene and JSB was 17 bp in both of these species.
The IRA-SSC junction (JSA) is located in another ycf1 gene segment, which covers the IRA and SSC regions. The length of this gene in the SSC region ranged from 4,125 bp to 4,450 bp. In I. bijieensis, I. liupanshuiensis, and I. lasiophyton, the ycf1 gene had a consistent length of 4,382 bp in the SSC region. In I. undulata and I. cavaleriei, the ycf1 gene had the same length in the SSC region, i.e., 4450 bp, whereas in I. notolopha, the length was only 4,423 bp. The distributions of the ycf1 genes in I. rhombifolia and I. reptans remained consistent. The IRA-LSC junction points (JLA) were all located at the starting point of trnH, and no significant expansion or contraction of the IR region was found.
Comparative analysis of chloroplast genomes
To identify regions with sequence differences, mVISTA software was used to detect highly variable regions, compare the entire chloroplast genome, and select Impatiens fanjingshanica and I. macrovexilla var. yaoshanensis from the genus Impatiens as references for multiple sequence alignment (Fig. 4). The four gene regions of the chloroplast genomes of the 13 Impatiens species were roughly the same, with relatively small differences. The number and sequence of genes in the IR region were relatively conserved, and the differences in the number and sequence of genes in the LSC and SSC regions were relatively small. The sequence variation in the noncoding region was greater than that in the coding region. However, in the SSC region, the degree of variation in the coding regions of genes such as ndhF, ndhD, ndhA, and ycf1 was greater.
Sequence identity plots of the 13 Impatiens chloroplast genomes. With the annotation of I. fanjingshanica as a reference, the X-axis represents the genome coordinates, and the y-axis represents the percent identity (50–100%). The arrows indicate the direction of each gene. UTR = untranslated region; CNS = noncoding area.
To further quantify the nucleotide polymorphism level of the chloroplast genome, MAUVE was used for DNA nucleic acid polymorphism (Pi) analysis (Fig. 5). The results revealed that the nucleic acid diversity (Pi) of the 13 Impatiens species ranged between 0 and 0.04362, with an average value of 0.00843. Five highly variable regions were identified on the basis of a Pi value greater than 0.025. The highly variable region identified in the LSC region was rps16-trnQ-UUG, and those in the SSC region were ndhF, ccsA-ndhD, and ycf1. There was also trnN-GUU at the junction of the SSC and IR regions. ycf1 had the highest Pi value. In addition, the Pi values in the IR region were significantly lower than those in the SSC and LSC regions, and all mutation hotspots presented lower Pi values (< 0.005), indicating that the changes in the LSC and SSC regions were much greater than those in the IR region.
Codon preference
Based on the frequency of appearance of a codon in the DNA sequence relative to its corresponding amino acid (RSCU), statistical analysis and visualization were performed on the cp genes of 13 Impatiens species (Fig. 6), and a total of 64 different RSCU values were obtained for each species (Table S1). Except for the termination codons UAA, UAG, and UGA, the remaining 61 codons were edited with 20 amino acids. Notably, methionine (Met) and tryptophan (Trp) each have only one synonymous codon, whereas leucine (Leu), arginine (Arg), and serine (Ser) all have six synonymous codons. The RSCU represents the ratio between the actual usage value and the theoretical usage value of the codon. An RSCU < 1 indicates that the codon is not frequently used and is a low-frequency codon. An RSCU > 1 indicates that the codon is frequently used and is a high-frequency codon. An RSCU of 1 indicates a lack of codon preference. According to the statistics, the RSCU values of 31 codons were > 1.00, with all these codons ending in A/U except for UUG (leucine). No preference was observed for codons encoding AUG (methionine) and UGG (tryptophan).
Systematic development analysis
On the basis of the complete chloroplast genome, we analyzed the relationships among the 13 species in the group via phylogenetic trees. These chloroplast genes were derived from two subgenera and four sections of Impatiens, with most belonging to I. sect. Impatiens, and Hydrocera triflora, which is a sister group of Impatiens, as an outgroup. The best alternative model for the ML method is GTR + F + I + I + R5, and for BI, the best alternative model is GTR + F + I + G4. The phylogenetic trees constructed via the two methods (ML and BI) presented similar topological structures, and each branch presented a high support rate (BS = 100, PP = 1.00) (Fig. 7, Figure S1). All 13 species belong to sect. Impatiens. I. liupanshuiensis, I. sigmoidea, I. bijieensis, and I. lasiophyton clustered into the same branch (Clade 1), and I. notolopha, I. undulata, and I. cavaleriei clustered into Clade 2. In Clade 3, I. lucorum, I. fargesii, and I. labordei were all members of the same branch. In Clade 4, I. rhombifolia and I. reptans are located together, and the phylogenetic tree fully supported their grouping into the same branch, supporting their merging by Huang et al. in 202317. The species within the above branches had relatively close genetic distances and were closely related to each other. I. bodinieri and I. forrestii clustered together, which was fully supported by the phylogenetic tree (BS = 100, PP = 1.00), indicating a relatively distant relationship with the other species studied herein.
Discussion
The chloroplast genomes of the 13 Impatiens species studied herein were similar to those of most angiosperms, ranging from 151,284 bp to 152,421 bp in length. They consisted of a circular DNA molecular structure consisting of four parts, including an LSC region, an SSC region, and two IR regions. A total of 113 functional genes were annotated in 13 chloroplast genomes, which differs from the results of Luo et al. and Qiu et al.31,33. In their study, the Impatiens species contained 114 functional genes. Compared with the chloroplast genome map in the research article, there were gene duplication annotations (trnP-UGG and trnP-GGG) in the annotated files uploaded to the NCBI database for the species examined in their research. Therefore, the chloroplast gene structure studied herein is generally consistent with previous research results, and selecting different reference genes will lead to differences in the annotation results. The GC content in the IR region of the 13 chloroplast genomes was much greater than that in the LSC and SSC regions. The IR boundaries were relatively conserved, with only slight differences, and were generally consistent with the phylogenetic relationships. In some species, ndhF genes are amplified in the IR region. Notably, the IR boundaries of I. rhombifolia and I. reptans, both of which are synonymous with I. procumbens, were highly consistent, with only slight differences in gene length.
In this study, 64 codons were detected in the chloroplast genomes of 13 Impatiens species. Methionine and tryptophan had only one codon, with no codon preference detected, which is consistent with the research results of Luo et al. from 2021. There were also 31 high-frequency codons, all of which ended in A/U except for the codon UUG encoding leucine. This might have contributed to the molecular evolution35, further confirms the hypothesis that codon preference in higher species may be related to the A/T content in the cp genome36,37. Generally, the codon usage preferences of the same species or closely related species are similar38,39, and the phylogenetic relationships provide good evidence for this. In previous studies, different species of Impatiens have detected different codon preferences31,32,33, indicating differences in codon usage at the species level. However, no significant differences were observed among the 13 species that are the focus of this study. The preferences of codons may not be a key criterion for distinguishing closely related species of Impatiens.
The majority of SSRs identified in this study were mononucleotide adenine/thymine (A/T) repeats, with very few cytosine/guanine (C/G) repeats observed. In particular, for the first time, this study revealed a species with six-nucleotide repeats in I. sect. Impatiens, namely, I. bijieensis. In the chloroplast genomes of the 13 Impatiens species, there were only three types of scattered repeat sequences, namely, forward, palindromic, and reverse sequences, and there were significant differences in quantity and type. This difference may be related to the frequent expansion and contraction of the chloroplast genome. Therefore, the differences in the quantity and type of scattered repeat sequences could provide a scientific basis for identifying different groups of genomes. In addition, these detected SSRs and scattered repeat sequences could also provide relevant data for the development of molecular markers in population lineages or germplasms of Impatiens species. Like in most angiosperms, the majority of gene sequence variations occur in noncoding regions40, but in the SSC region, the coding regions of genes such as ndhF, ycf1, and ndhH exhibit greater variability30,31. An analysis of polymorphisms revealed six highly variable regions, among which ycf1 in the SSC region had the highest Pi value and the greatest potential as a DNA barcode marker for the genus Impatiens.
The phylogenetic tree constructed on the basis of 31 chloroplast gene sequences in this study was consistent with the findings of Yu et al.22 and Qin et al.6. There were close relationships among the 13 species of Impatiens; usually, species with similar morphologies tended to have closer phylogenetic relationships. Notably, I. cavaleriei was considered a sect. Uniflorae species when it was published; research has confirmed that it actually belongs to sect. Impatiens. I. cavaleriei clustered with I. notolopha and I. undulata on the same evolutionary branch (BS = 100/PP = 1.00), they both have ovate or ovate-elliptic leaves, yellow flowers and a cymbiform lower sepal, exhibiting similar morphologies15. I. rhombifolia and I. reptans were merged as synonyms of I. procumbens in 202313. The phylogenetic tree revealed that these genes shared similar relationships and genetic distances, which strongly supports the conclusion that they are synonymous with each other. I. lucorum, I. fargesii and I. labordei clustered into the same branch, whereas I. bodinieri, which had a similar morphology, shared a branch with I. forrestii, indicating a relatively distant genetic relationship with the former. This sample of I. bodinieri was collected from Guangxi Province, which is farther from the model origin13, possibly because of geographical isolation. Species with similar morphologies should be further studied via analyses of biogeography, micromorphology and other aspects to explore their interspecific relationships further.
Conclusion
In this study, the chloroplast genomes of 13 species in I. sect. Impatiens were annotated and analyzed. These chloroplast genes presented typical tetrad structures, ranging in size from 151,284 bp to 152,421 bp, with highly similar GC contents, gene sequences, and functions. There were significant differences in the number and types of scattered repeat sequences, providing a scientific basis for identifying different groups of genomes, and an Impatiens species with six-nucleotide repeats was identified. Five highly variable regions (rps16-trnQ-UUG, ndhF, ccsA-ndhD, ycf1, trnN-GUU) were detected. These regions could provide genetic information for establishing potential molecular markers and studying genetic diversity, with ycf1 having the greatest potential as a molecular marker. The frequency of codon usage was similar among the 13 species, the preferences of codons may not be a key criterion for distinguishing closely related species of Impatiens. A phylogenetic tree constructed on the basis of the complete chloroplast genome, wherein species with similar morphologies often exhibit closer phylogenetic relationships, and some taxa may actually be synonymous. Further verification is needed by combining research from disciplines such as morphology and biogeography. These results added information on the chloroplast genome of Impatiens in karst areas, provide important information for the comprehensive exploration of chloroplast phylogenetic relationships and the resolution of the complex taxonomy of Impatiens.
Materials and methods
Plant materials and DNA extraction
Six samples were collected from the type locality (or its surrounding areas). Fresh and healthy leaves were collected, quickly dried and stored in silica gel. Total DNA was extracted via the CTAB method. Agarose gel electrophoresis (1%) was used to detect the quality of the DNA, and a fluorometer for nucleic acid quantification (Qubit®) was used to measure the purity of the DNA. Seven chloroplast genome datasets were downloaded from the NCBI database. The voucher samples were stored in the Herbarium of Guizhou University, China (GZAC) (Table 3).
The above 13 species were all species of sect. Impatiens, and their common characteristic was the formation of small yellow flowers. The type locality is entirely in southwestern China. Among the plants sampled, I. reptans and I. rhombifolia are both synonymous with I. procumbens.
DNA sequencing, assembly, and annotation
High-quality DNA was sequenced by Biotechnology Co., Ltd., on the Illumina high-throughput sequencing platform. The original sequencing data obtained from sequencing were trimmed and filtered via FastQ41 and then matched to the published reference sequence for Impatiens species. SOAPdenovo 242 and NOVOPlasty43 were used to assemble the filtered data from scratch, resulting in a circular cp gene map. The chloroplast genomes of 7 Impatiens species were downloaded from the NCBI database for annotation, with a total of 13 species annotated. I. macrovexilla var. yaoshanensis (NCBI number: OK310516) and I. fanjingshanica (NCBI number: MW411294.1) were used as reference sequences, and the online program GeSeq44 (https://chlorobox.mpimp-golm.mpg.de/geseq.html) was used to annotate the chloroplast genomes of 13 species, which were realigned and manually adjusted with reference sequences via Genesis R9.0.245. Finally, the online program OGDRAW v1.3.146 (http://ogdraw.mpimp-golm.mp-g.de/) was used to construct the chloroplast genome map.
Repetitive sequence analysis
MISA v2.147 was used to detect SSRs in the chloroplast genomes. The minimum numbers of mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeats were set to 10, 5, 4, 3, 3, and 3, respectively. In addition, the online tool REputer48 was used to identify four types of repetitive sequences, namely, forward, reverse, complementary, and palindromic sequences, with the Hamming distance set to 3 and a minimum repeat size limit of 30 bp.
Comparative analysis of differences in chloroplast genome sequences
To investigate the contraction or expansion of inverse repeat (IR) regions in chloroplast genes, CPJSdraw49 was used to compare and visualize genomic IR boundaries (LSC/IRB, IRB/SSC, SSC/IRA, and IRA/LSC). With mVISTA50 (https://genome.lbl.gov/vista/mvista/submit.shtml) and the chloroplast genome of Hydrocera triflora as a reference, the chloroplast genomes of 13 species of Impatiens were compared. MAFFT v.7.5.1.151 (https://MAFFT.cbrc.jp/alignment/server/) for alignment, the level of nucleic acid polymorphism (Pi) in DnaSP v.6.12.0352 was calculated to explore highly variable regions in the chloroplast genome of 13 Impatiens spp. To determine preferences for certain synonymous codons, we screened CDSs larger than 300 bp and performed RSCU analysis via CodonW 1.4.253.
Phylogenetic analysis
The chloroplast genome sequence of Impatiens species was downloaded from the NCBI database, with Hydrocera triflora as the outgroup, and a phylogenetic tree was constructed from a total of 31 sequences (Table 3, Table S2). After the sequences were aligned in MAFFT v.7.5.1.151 (https://MAFFT.cbrc.jp/alignment/server/), the ModelFindermok module54 in Phylosuite v.1.2.355 was used to search for the optimal base substitution model. The maximum likelihood (ML) method and Bayesian inference (BI) were used to construct a phylogenetic tree. ML analysis was conducted in IQtree56, with a selection criterion of 1000 bootstrap (BS) tests used to evaluate the reliability of branches. BI was conducted in MyBayes57, running 10 million generations in two independent analyses. The reliability of each branch was evaluated by calculating its posterior probability (PP), and finally, iTOL v6 was used58 (https://itol.embl.de/) to visualize and output phylogenetic trees.
Data availability
The chloroplast genomes of six Impatiens assembled in this study have been deposited in GenBank of NCBI at https://www.ncbi.nlm.nih.gov. GenBank accession numbers: PQ156316 (I. bijieensis, https://www.ncbi.nlm.nih.gov/nuccore/PQ156316), PQ156317 (I. labordei, https://www.ncbi.nlm.nih.gov/nuccore/PQ156317), PQ156318 (I. liupanshuiensis, https://www.ncbi.nlm.nih.gov/nuccore/PQ156318), PQ156319 (I. lasiophyton, https://www.ncbi.nlm.nih.gov/nuccore/PQ156319), PQ156320 (I. cavaleriei, https://www.ncbi.nlm.nih.gov/nuccore/PQ156320), PQ156321 (I. sigmoidea, https://www.ncbi.nlm.nih.gov/nuccore/PQ156321). The other cp genomes used in this study were downloaded from the NCBI. The accession number can be found in Table 3 and Table S2. The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.
Change history
19 March 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41598-025-94281-x
References
Linnaeus, C. Species Plantarum 2. 561–1200 (Laurentius Salvius, 1753).
Grey-Wilson, C. Hybridization in African Impatiens: studies in Balsaminaceae: II. Kew. Bull. 34 (4), 689–722 (1980).
Song, Y., Yuan, Y. M. & Küpfer, P. Chromosomal evolution in Balsaminaceae, with cytological observations on 45 species from Southeast Asia. Caryologia 56 (4), 463–481 (2003).
Yuan, Y. M. et al. Phylogeny and biogeography of Balsaminaceae inferred from ITS sequences. Taxon 53 (2), 391–404 (2004).
Chen, Y. et al. Species diversity and geographical distribution patterns of Balsaminaceae in China. Diversity 15, 1012 (2023).
Qin, F. et al. Past climate cooling and orogenesis of the Hengduan Mountains have influenced the evolution of Impatiens sect. Impatiens (Balsaminaceae) in the Northern Hemisphere. BMC Plant. Biol. 23, 600 (2023).
Ruchisansakun, S. et al. Phylogenetic analyses of molecular data and reconstruction of morphological character evolution in Asian Impatiens section Semeiocardium (Balsaminaceae). Syst. Bot. 40, 1063–1074 (2015).
Rahelivololona, E. M., Fischer, E., Janssens, S. B. & Razafimandimbison, S. G. Phylogeny, infrageneric classification and species delimitation in the Malagasy Impatiens (Balsaminaceae). PhytoKeys 110, 51–67 (2018).
Zeng, L., Liu, Y. N., Gogoi, R., Zhang, L. J. & Yu, S. X. Impatiens tianlinensis (Balsaminaceae), a new species from Guangxi, China. Phytotaxa 227 (3), 253–260 (2015).
Lu, Z. C., Pan, B., Huang, F. Z. & Liu, Y. Impatiens gongchengensis (Balsaminaceae), a new species from Guangxi, Southern China. Taiwania 65 (1), 1–4 (2020).
Peng, S. et al. Impatiens bullatisepala (Balsaminaceae), a new species from Guizhou, China. Phytotaxa 500 (3), 217–224 (2021).
Song, Y. X., Peng, S., Cong, Y. Y. & Zheng, Y. M. Impatiens rapiformis, a new species of Impatiens with root tuber from Yunnan, China. Nord. J. Bot. 39(5), e03151 (2021).
Ren, L. Y. et al. Impatiens bijieensis (Balsaminaceae), a new species from karst plateau in Guizhou, China. PhytoKeys 192, 1–10 (2022).
Yuan, T. H. et al. Impatiens liupanshuiensis (Balsaminaceae), a new species from Guizhou. China PhytoKeys. 192, 37–44 (2022).
Huang, R. X., He, B. Q., Chen, Y., Li, M. J. & Bai, X. Impatiens cavaleriei (Balsaminaceae), a new species from the Miaoling Mountains in Guizhou Province, China. Taiwania 68 (1), 85–89 (2023).
Hu, H. F., Xu, J., An, M. T., Guo, Y. & Yang, J. W. Impatiens beipanjiangensis (Balsaminaceae), a new species from Guizhou, China. PhytoKeys 241, 201–213 (2024).
Huang, R. X., Yuan, T. H., Chen, Y., Li, M. J. & Bai, X. X. Five new synonyms for Impatiens procumbens (Balsaminaceae) in China. PhytoKeys 222, 179–191 (2023).
Singh, R. K., Borah, D. & Taram, M. Typifications, new combinations and new synonyms in Indian Impatiens (Balsaminaceae). Biodivers. Res. Conserv. 61 (2), 1–27 (2021).
Gogoi, R. et al. Misinterpretations and plagiarism in a publication about Himalayan Impatiens: Polemics with the paper of singh R.K. Biodiversity Research and Conservation 63(1), 1–30 (2021).
Abrahamczyk, S. & Steudel, B. Impatiens namchabarwensis is distinct from I. arguta. Nord. J. Bot. 2023 (4), e03900 (2023).
Ramasubbu, R. & Sreekala, A. K. How distinct is Impatiens hookeriana Arnott from I. grandis Heyne (Balsaminaceae)? Feddes Repertorium. 131 (4), 225–232 (2020).
Yu, S. X. et al. Phylogeny of Impatiens (Balsaminaceae): integrating molecular and morphological evidence into a new classification. Cladistics 32, 179–197 (2016).
Ren, T., Yang, Y., Zhou, T. & Liu, Z. L. Comparative plastid genomes of Primula species: sequence divergence and phylogenetic relationships. Int. J. Mol. Sci. 19 (4), 1050 (2018).
Tang, D. et al. Analysis of chloroplast differences in leaves of rice isonuclear alloplasmic lines. Protoplasma 255 (3), 863–871 (2018).
Yao, J. J. et al. 10 Complete chloroplast genome sequencing and phylogenetic analysis of two Dracocephalum plants. Biomed. Res. Int. 2020, 4374801 (2020).
Nock, C. J. et al. Chloroplast genome sequences from total DNA for plant identification. Plant Biotechnol. J. 9 (3), 328–333 (2011).
Shaw, J. et al. Chloroplast DNA sequence utility for the lowest phylogenetic and phylogeographic inferences in angiosperms: the tortoise and the hare IV. Am. J. Bot. 101, 1987–2004 (2014).
Yang, Z., Zhao, T., Ma, Q., Liang, L. & Wang, G. Comparative genomics and phylogenetic analysis revealed the chloroplast genome variation and interspecific relationships of Corylus (Betulaceae) species. Front. Plant Sci. 9, 927 (2018).
Schneider, J. V. et al. Resolving recalcitrant clades in the pantropical ochnaceae: insights from comparative phylogenomics of plastome and nuclear genomic data derived from targeted sequencing. Front Plant Sci. 12, 638650 (2021).
Luo, C. et al. Comparative chloroplast genome analysis of Impatiens species (Balsaminaceae) in the karst area of China: insights into genome evolution and phylogenomic implications. BMC Genom. 22, 571 (2021).
Luo, C. et al. Complete chloroplast genomes of Impatiens cyanantha and Impatiens conticola: insights into genome structures, mutational hotspots, comparative and phylogenetic analysis with its congeneric species. PLoS ONE 16(4), e0248182 (2021).
Luo, C. et al. Complete chloroplast genomes and comparative analyses of three Ornamental Impatiens species. Front. Genet. 13, 816123 (2022).
Qiu, H. et al. Plastome evolution and phylogenomics of Impatiens (Balsaminaceae). Planta 257 (2), 45 (2023).
Gu, C., Tembrock, L. R., Zhang, D. & Wu, Z. Characterize the complete chloroplast genome of Lagerstroemia floribunda (Lythraceae), a narrow endemic crape myrtle native to Southeast Asia. Conserv. Genet. Resour. 9, 91–94 (2017).
Sharp, P. M. & Wen-Hsiung, L. Codon usage in regulatory genes in Escherichia coli does not reflect selection for ‘rare’ codons. Nucleic Acids Res. 19, 7737–7749 (1986).
Campbell, W. H. & Gowri, G. Codon usage in higher plants, green algae, and cyanobacteria. Plant Physiol. 92 (1), 1–11 (1990).
Eguiluz, M., Rodrigues, N. F., Guzman, F., Yuyama, P. & Margis, R. The chloroplast genome sequence from Eugenia unifora, a Myrtaceae from Neotropics. Plant. Syst. Evol. 303, 1199–1212 (2017).
Li, J. F. et al. Analysis on Codon Usage Bias of Chloroplast Genome in Catalpa fargesii. Genomics Appl. Biology. 41 (4), 843–853 (2022).
Yu, C. H. et al. Codon Usage Influences the Local Rate of Translation Elongation to regulate co-translational protein folding. Mol. Cell. 59 (5), 744–754 (2015).
Clegg, M. T., Gaut, B. S., Learn, G. H. Jr & Morton, B. R. Rates and patterns of chloroplast DNA evolution. Proc. Natl. Acad. Sci. U.S.A. 91 (15), 6795–6801 (1994).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34 (17), i884–i890 (2018).
Luo, R. B. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1 (1), 18 (2012).
Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 45(4), e18 (2017).
Tillich, M. et al. GeSeq–versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45 (W1), W6–W11 (2017).
Kearse, M. et al. Geneious Basic: an integrated andextendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28 (12), 1647–1649 (2012).
Lohse, M., Drechsel, O., Kahlau, S. & Bock, R. OrganellarGenomeDRAW-a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucleic Acids Res. 41, W575–W581 (2013).
Beier, S., Thiel, T., Münch, T., Scholz, U. & Mascher, M. MISA-web: a web server for microsatellite prediction. Bioinformatics 33, 2583–2585 (2017).
Kurtz, S. et al. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 29, 4633–4642 (2001).
Li, H. et al. CPJSdraw: analysis and visualization of junction sites of chloroplast genomes. PeerJ 11, e15326 (2023).
Frazer, K. A., Pachter, L., Poliakov, A., Rubin, E. M. & Dubchak, I. VISTA: computational tools for comparative genomics. Nucleic Acids Res. 32, W273–W279 (2004).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Rozas, J. et al. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol. Biol. Evol. 34, 3299–3302 (2017).
Shield, D. C. & Sharp, P. M. Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. Nucleic Acids Res. 15 (19), 8023–8040 (1987).
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods. 14, 587–589 (2017).
Zhang, D. et al. PhyloSuite: an integrated and scalable desktop platform for streamlined molecular sequence data management and evolutionary phylogenetics studies. Mol. Ecol. Resour. 20 (1), 348–355 (2020).
Minh, B. Q. et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37 (5), 1530–1534 (2020).
Ronquist, F. et al. MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542 (2012).
Letunic, I. & Bork, P. Interactive tree of life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 52, W78–W82. https://doi.org/10.1093/nar/gkae268 (2024).
Funding
Guizhou Provincial Science and Technology Projects: Taxonomic studies of wild Impatiens L. in Guizhou Province (Qiankehe Foundation-ZK [2023] General 102); The special fund for innovation capacity construction of Guizhou research institution (Qiankehefuqi [2024.013])Guizhou University Talent Introduction Research Project: Systematic Evolution and Genetic Diversity of Wild Impatiens in Guizhou ((2022-36)); Guizhou University Cultivation Project: Molecular Mechanism of Phylogenetic Development of Wild Impatiens in Karst Regions ([2023]25); National Natural Science Foundation of China: Leaf photophobic drooping is a unique mechanism to avoid photooxidation against sunflecks in karst-dwelling Impatiens hainanensis (No.32201282); National Natural Science Foundation of Hainan: Physiological and ecological mechanisms adapt to different water conditions in karst-dwelling of Impatiens hainanensis (No. 322QN249).
Author information
Authors and Affiliations
Contributions
Q.Q.Y.: methodology, data curation, writing-original draft preparation, and editing; X.X.B.: collect samples and identification, conduct experiments; Z.L., C.L. and J.L.Z.: technical guidance and language revision; M.J.L.: supervision and project administration. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this Article was revised: The Funding section in the original version of this Article was omitted. Full information regarding the correction made can be found in the correction for this Article.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yong, Q., Li, M., Li, Z. et al. Complete chloroplast genomes of 13 species of the Impatiens genus for genomic features and phylogenetic relationships studies. Sci Rep 15, 4258 (2025). https://doi.org/10.1038/s41598-025-87254-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-87254-7