Background & Summary

The southern islands vascular flora dataset (SIVFLORA)

The Southern Islands Vascular Flora dataset (SIVFLORA)1 compiles essential biodiversity data on vascular plants from islands across the Southern Ocean. While numerous botanical expeditions since the mid-19th century have explored these islands, producing a wealth of botanical treatises and scientific publications2,3, a comprehensive global dataset on their vascular flora diversity has remained absent. This gap is notable given the islands’ significance as natural laboratories for studying evolutionary and biogeographic processes4. In contrast, Arctic flora has been more systematically studied and cataloged5,6, highlighting the need for a similarly detailed and unified dataset for the Southern Ocean islands.

SIVFLORA1 is an open-access resource designed to address the lack of unified and interoperable biodiversity data for the islands of the Southern Ocean. This repository compiles comprehensive pan-floristic data on species presence and plant diversity, drawing from diverse sources, including historical exploration literature, taxonomic monographs, ecological research, checklists, and specimen databases. The dataset provides an up-to-date catalog of vascular plant diversity, emphasizing native, endemic, and exotic species.

The sub-Antarctic region has been defined in several ways7, but a “core” region of sub-Antarctic islands and archipelagos includes South Georgia Island, Marion Island, Prince Edward Island, the Crozet Archipelago, the Kerguelen Archipelago, Heard Island, McDonald Island, and Macquarie Island (Fig. 1). These islands are characterized by an absence of woody plants and terrestrial vertebrates, and they experience cool, oceanic climates with reduced seasonality compared to Antarctica7. Other islands, such as Snares Archipelago and Campbell Island, as well as Tierra del Fuego Archipelago and the islands south of the Beagle Channel, are also described as sub-Antarctic, due to some of these islands experiencing colder conditions throughout the year7. Additionally, there is biological connectivity, especially in marine life, with islands like the Falklands/Malvinas Archipelago, Gough Island, Amsterdam Island, Saint Paul Island, and the subtropical Tristan da Cunha Archipelago, which have many terrestrial species absent from the core region. The Southern Ocean islands, some of the most isolated on the planet, provide a unique system to study native flora and natural dispersal processes in environments characterized by extreme climatic conditions. However, historical land use, including grazing, whaling settlements, and the introduction of exotic species, has significantly altered plant communities, as evidenced by the presence of numerous invasive species that have formed distinct vegetation assemblages8. This degradation highlights the need for a comprehensive floristic approach to understand species dynamics within a historical context and in relation to current human activities.

Fig. 1
figure 1

Spatial distribution of Southern Ocean islands included in SIVFLORA. Map generated in QGIS v.3.2.8 (https://www.qgis.org) with a raster base map from Natural Earth (https://www.naturalearthdata.com/).

In response to these challenges, significant efforts have been made to compile exhaustive and up-to-date floristic catalogs4,9,10,11,12,13. SIVFLORA1, with its extensive geographical and botanical scope, is a valuable tool for conducting meta-analyses of plant diversity patterns and for understanding the processes driving the southward expansion of angiosperms as growing conditions become increasingly favorable14,15,16. This dataset complements other major initiatives and integrates seamlessly with related datasets, including alien species of Antarctica and some Southern Ocean islands17, the global floras and traits9, and the checklist of Antarctic terrestrial and freshwater biodiversity18. Together, these resources help identify research gaps and guide future botanical surveys and environmental monitoring efforts.

This study aimed to compile comprehensive and accurate data on the flora of islands located between the northern limit of the Subtropical Front and 60°S, the boundary established by the Antarctic Treaty. The dataset includes georeferenced species records, taxonomic revisions with accompanying comments, phytogeographic classifications, and biological collection details. These data were taxonomically validated, harmonized, and integrated into a standardized and interoperable format using a global taxonomic standard. Capturing the diversity of plant species across 22 islands and archipelagos, this dataset was developed through a workflow encompassing study area delineation, data compilation, taxonomic verification, attribute structuring, and the creation of an open-access spreadsheet file. This ensures the dataset’s usability and reproducibility for diverse research applications.

Methods

The process of SIVFLORA1 dataset generation involved five major steps (Fig. 2), which included: delimitation of the study area (Step 1), dataset generation (Step 2), taxonomic validation and harmonization (Step 3), dataset field information structuring (Step 4), and creation of the SIVFLORA1 open access file format (Step 5).

Fig. 2
figure 2

Overview of the workflow for compiling, validating, and generating the Southern Islands Vascular Flora database (SIVFLORA). The process includes: (1) delimitation of the study area, (2) dataset generation, (3) taxonomic validation, (4) dataset structuring, and (5) defining the SIVFLORA file format and ensuring access.

The northern boundary includes islands located south of the Subtropical Front7,19. Tristan da Cunha Archipelago, Amsterdam Island (37°S), and Saint Paul Island (38°S) were included as they are frequently classified as sub-Antarctic7,19. To ensure the presence of vascular plants, islands larger than 5 km² were prioritized, with exceptions made for Diego Ramírez Archipelago, McDonald Island, and Snares Archipelago, which have areas under 3 km². Islands near the southern tip of South America were required to be at least 10 km from the mainland to minimize the influence of the fjord system, which includes hundreds of unexplored islands.

The study’s southern limit at 60°S aligns with the Antarctic Treaty, which defines this boundary for the Antarctic region. Signed in 1959, the treaty governs activities south of this latitude. The Antarctic native vascular flora south of 60°S comprises only two species, Colobanthus quitensis (Caryophyllaceae) and Deschampsia antarctica (Poaceae), both widely distributed across sub-Antarctic islands20. Additionally, Poa annua, while geographically restricted, is recorded as an invasive species in this Antarctic region21. Given that the distribution of native and invasive vascular plants on islands south of 60°S is well-documented17, their inclusion in SIVFLORA1 was unnecessary. These criteria ensure the selected islands represent distinct geographic zones across the Southern Ocean.

SIVFLORA1 was compiled through an exhaustive, systematic review of specialized sources (February 2022–April 2024). This methodology combined broad bibliographic exploration of both contemporary and historical works with targeted keyword queries and cross-referencing. Island-specific searches were conducted using standardized search strings incorporating the name of each island or archipelago (e.g., “Campbell Island,” “South Orkney Archipelago”) across specialized academic databases (e.g., JSTOR; https://www.jstor.org/). When initial results exceeded 200 references, queries were refined iteratively by adding terms such as “vascular plants.” For instance, an initial search for “Campbell Island” on JSTOR returned 876 results, which were narrowed to 92 upon including “vascular plants.” This approach ensured references were both comprehensive and specifically focused on vascular plant literature for each island. To further expand coverage, all relevant literature was examined for botanical citations, floras, specimen records, and monographs. Reference trails were followed meticulously to locate historical literature not readily captured by standard database queries. Information on the phytogeographic origin or “establishment_means” of species on islands was obtained from the reviewed literature. Among the resources detailed below used to document island flora, the Global Biodiversity Information Facility (GBIF; https://www.gbif.org) records were not utilized for this purpose, as they provide species-specific geographic context but do not indicate whether a species is alien, endemic, or native.

Digitized specimen records available online serve as invaluable resources for biodiversity research. We used online databases such as The Herbarium Catalogue Data Portal of the Royal Botanic Gardens, Kew22, the British Antarctic Survey Antarctic Plant Database23, the Vascular Plants Database hosted by the Muséum National d’Histoire Naturelle24, the Atlas of Living Australia25, the Chilean Digital Herbarium26 (Herbario Digital), and the Instituto de Botánica Darwinion27. Additionally, we incorporated species lists and checklists from the New Zealand Plant Conservation Network28, the Australian Antarctic Program29, and the Flora of Staten Island30. Together, these platforms provide access to high-quality, digitized records.

The taxonomic revision for validating records adheres to a structured hierarchical approach. Primary reliance is placed on authoritative taxonomic platforms, including Plants of the World Online31,32, the International Plant Names Index33, the Instituto de Botánica Darwinion27, the Catalogue of Life Checklist34, World Plants35, World Ferns36 and the World Flora Online37. This multi-source approach enables robust validation of nomenclature and taxonomy, ensuring accuracy and consistency across datasets. The dataset supports comparative analysis across various taxonomic levels. Records with incomplete taxonomic identification (sp., spp., aff., or cf. rank), doubtful identifications, and horticultural crop species were excluded from this checklist. To determine the accepted binomial names, the criteria outlined in the International Code of Nomenclature for algae, fungi, and plants (Shenzhen Code)38 are followed.

For broader interoperability, the GBIF backbone taxonomy39 is adopted for taxonomic data harmonization40, facilitating consistent searches, browsing, and reporting across occurrence datasets, species pages, and external resources such as GenBank®41, and IUCN Red List of Threatened Species (https://www.iucnredlist.org/). The GBIF taxonomic backbone is regularly updated and uses sources such as with the Catalogue of Life Checklist34 and further enriched by merging data from other authoritative sources. It also incorporates identifiers for Operational Taxonomic Units (OTUs) derived from barcoding initiatives, such as the International Barcode of Life (https://ibol.org/), supporting the integration of molecular data with traditional taxonomy.

Independent taxonomic revisions, such as those conducted in this dataset construction, may occasionally reveal discrepancies with global catalogues like the GBIF Backbone Taxonomy42. While most species names in the SIVFLORA1 dataset, listed in the “reviewed_name” column, match the GBIF checklist as indicated by matchType = “EXACT” in the “gbif_matchType” column, occasional mismatches occur between the “reviewed_name” and “gbif_name” columns. These discrepancies likely arise from unresolved taxonomic issues within the GBIF taxonomic backbone. For instance, in the case of Grammitis poeppigiana, the GBIF Backbone Taxonomy returned “Notogrammitis Parris” as the “gbif_name,” providing only the genus name and author due to a “basionym relation derived” issue. This occurs when GBIF infers a relationship to an original name (basionym) through name and authorship comparison, even though this relationship is not explicitly present in the source data. In contrast, our taxonomic revision provides more robust and transparent results by citing all sources used. We encourage users to report any potential errors or concerns to the corresponding author to further improve the dataset.

Data Records

SIVFLORA1 is provided in a user-friendly spreadsheet format, available for download as a flat spreadsheet file from the Zenodo repository. It is openly accessible, and adheres to the FAIR principles43, ensuring it is findable, accessible, interoperable, and reusable. The dataset is also open for revision by interested parties, promoting collaborative updates and improvements.

The dataset encompasses 22 islands and 62 localities across diverse geographic and climatic zones (Table 1). Dataset fields are organized into seven categories—Record, Location, Taxonomy, Phytogeographic Information, Collection Information, Determination Information, and GBIF Taxonomy Comparison—ensuring comprehensive data coverage for ecological, taxonomic, and biogeographic analyses (Table 2). Some islands are notable for their small size, such as Diego Ramírez Archipelago, McDonald Island, and Snares Archipelago, all of which are under 3 km². In contrast, large archipelagos like Kerguelen Archipelago (7,215 km²) and the Falkland/Malvinas Archipelago (more than 12,000 km²) as whole, represent substantial landmasses. Amsterdam Island, Campbell Island, and Crozet Archipelago, are uninhabited, with only non-permanent scientific and conservation personnel allowed. These islands fall under extensive conservation frameworks, such as Antarctic French National Parks or UNESCO World Heritage designations. Also, the Auckland Archipelago and Macquarie Island are part of the New Zealand and Australian sub-Antarctic World Heritage Sites, respectively, and are protected for their rich biodiversity. In contrast, islands like those in the Falklands/Malvinas and Tristan da Cunha Archipelago have small permanent settlements alongside established conservation efforts, including National Nature Reserves and Marine Protected Areas.

Table 1 Overview of islands and archipelagos included in the SIVFLORA database.
Table 2 Dataset fields included in SIVFLORA, grouped across seven categories: Record, Location, Taxonomy, Phytogeographic Information, Collection Information, Determination information, and GBIF Taxonomy Comparison. The fields include unique record identifiers, geographic coordinates, climatic and administrative data, taxonomic classifications (verbatim and reviewed), species establishment status, specimen collection details, and GBIF taxonomic harmonization.

SIVFLORA1 incorporates data from over 300 references, resulting in an initial compilation of 15,393 records. Of these, 795 were discarded due to incomplete taxonomic identification or dubious citations, leaving 14,598 valid records distributed across 42 orders, 95 families, 323 genera, and 886 species and infrageneric ranks. The species composition of the islands reveals a different phytogeographic origin of plant diversity. Native species dominate, comprising 58.62% of the total flora (537 species), highlighting the islands’ natural biodiversity. Alien species make up 31.77% (291 species), reflecting the significant impact of introduced species on these ecosystems. Endemic species, unique to specific islands, represent 9.61% (88 species), underscoring the islands’ role as centers of endemism.

The most diverse families in SIVFLORA1 are Poaceae (138 taxa, 15.49%; Fig. 3, Fig. 4), Asteraceae (117 taxa, 13.13%), and Cyperaceae (60 taxa, 6.73%), followed by Caryophyllaceae, Brassicaceae (31 taxa each, 3.48%), and Fabaceae (29 taxa, 3.25%). Other rich families include Plantaginaceae (27 taxa, 3.03%), Juncaceae and Hymenophyllaceae (26 taxa each, 2.92%) and Apiaceae, Ranunculaceae with 24 taxa each 2.69%). A notable record in SIVFLORA1 is the diversity of orchids (22 taxa), particularly in the Auckland Archipelago, which hosts nine distinct species. The genus Corybas is the most represented in the dataset, with six species across islands such as the Antipodes Archipelago, Auckland Archipelago, Campbell Island, and Macquarie Island. The genus Gavilea also shows significant diversity, with five species present in the Falkland/Malvinas Archipelago and Staten Island. Additionally, Codonorchis lessonii has a broad distribution, occurring on Staten Island, the Cape Horn Archipelago and the Falkland/Malvinas Archipelago. In terms of distribution breadth, Callitriche antarctica and Poa annua are the most widely distributed, each found on 18 islands (81.82% of the islands included in the dataset), Austroblechnum penna-marina and Stellaria media occur on 17 islands, while Crassula moschata and Ranunculus biternatus are found on 14 islands, demonstrating their prevalence in sub-Antarctic environments. Additionally, Acaena magellanica, Montia fontana, and Poa pratensis each occupy 13 islands, illustrating their persistence across multiple island ecosystems.

Fig. 3
figure 3

Plant diversity across regions: The pie-donut charts show plant diversity across different regions, with plant orders represented in the inner pie and families in the outer donut, labeled with species counts. Percentages indicate the relative contribution of orders and families, with small or overlapping values omitted for clarity. (a) Subtropical Islands Flora. (b) Magellanic sub-Antarctic Islands Flora. (c) Sub-Antarctic Islands Flora. (d) Maritime Antarctic Islands Flora.

Fig. 4
figure 4

Treemap of the proportion of plant families, with each rectangle’s size proportional to the number of species relative to the total species inhabiting Southern Ocean islands.

SIVFLORA1 documents the highest taxonomic richness in the Falkland/Malvinas Archipelago, which includes 380 species spanning 67 families (Fig. 5). The Auckland Archipelago follows, with 258 species and 51 families, while Campbell Island features slightly fewer taxa, with 222 species across 48 families. Staten Island and the Cape Horn Archipelago exhibit comparable family diversity (54 and 49 families, respectively) but differ in species richness, with Staten Island hosting 193 species and the Cape Horn Archipelago containing 164 species. Islands such as Tristan da Cunha Archipelago, the Kerguelen Archipelago, and the Crozet Archipelago, display moderate levels of diversity, with species counts ranging from 162 to 120 and family numbers from 40 to 31. In contrast, Amsterdam Island and the Antipodes Archipelago, the most geographically isolated islands, are characterized by lower diversity, with 112 and 95 species, respectively, and smaller family representation (31 and 29 families). The dataset also highlights the islands with the lowest taxonomic richness. The South Sandwich Archipelago are represented by only 4 species distributed across 3 families (Poaceae, Polypodiaceae, and Rubiaceae), while the South Orkney Archipelago is documented with just 3 species from 2 families (Poaceae and Caryophyllaceae).

Fig. 5
figure 5

Taxonomic richness across different Southern Ocean islands. Each slice represents the number of unique species inhabiting each island, with the size of the slice proportional to the species count. The color gradient indicates species richness, with darker shades representing higher species counts.

Technical Validation

SIVFLORA1 was developed through rigorous validation to ensure scientific accuracy and comprehensive coverage of Southern Ocean island floras. Each record is sourced from peer-reviewed literature, specialized taxonomic references, and curated botanical collections. Data harmonization processes were applied to standardize nomenclature, taxonomy, correct errors, consolidate synonyms, and remove duplicate records. Accepted and verbatim names are included to support ongoing validation and integration.

Usage Notes

SIVFLORA1 is organized to ensure its usability and integration into diverse research efforts, with data fields structured to provide comprehensive and accessible information. The dataset includes multiple data types grouped into key categories to enhance its functionality and support a wide range of applications. The “Record” category includes unique identifiers for each taxonomic record, detailed bibliographic citations, publication types, and references to related resources, enabling traceability and compatibility with other datasets. It also includes GBIF-specific information, such as download DOIs and identifiers, to facilitate integration with global biodiversity databases. “Location information” captures geographic and administrative details, including latitude, longitude, country, and regional climate zones, which are essential for spatial and environmental analyses. Attributes such as island names, surface areas, and verbatim geographic data are included to ensure precise ___location references. The “Taxonomy” category provides detailed classifications at multiple hierarchical levels, from family to subspecies, including authorship and synonyms. This section incorporates reviewed names and references, ensuring taxonomic accuracy and consistency across the dataset. “Phytogeography information” category documents the life forms, plant habits, and establishment means for species, distinguishing between native, endemic, introduced, and cosmopolitan species. Additional fields capture the degree of establishment and historical use, providing context for species’ ecological and cultural significance. “Collection information” category encompasses details about collection events, including collectors’ names, collection dates, herbarium material references, and institutional codes. It also includes type statuses and associated media, offering valuable insights for specimen-based studies. “Determination information” category records expert identifications and annotations, along with details of additional studies conducted on specimens. Lastly, the “GBIF taxonomy comparison” comparison provides a harmonization layer by matching dataset records with the GBIF Backbone Taxonomy. It includes taxonomic ranks, match confidence, synonyms, and classifications across multiple hierarchical levels, promoting interoperability and validation.

All species occurrence data were sourced from refs. 1,2,18,22,23,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342.