Introduction

Cyprus is the third largest island in the Mediterranean Sea and has been at the crossroad of many civilizations over the centuries. Its prestigious strategic ___location within the southeast corner of the Mediterranean basin, bridging the Middle East and Europe, always attracted foreign rulers that sought to control the area and thereby extend the trade pathways. The island of Cyprus was first inhabited during the aceramic neolithic period by people from the Middle East and Asia Minor, more than 12,000 years ago, according to archeological evidence and genetic phylogeographic investigations1. Over the years it was influenced, or was occupied by, several rulers (Table 1), the most recent occupations of which were the Ottoman Empire between the years of 1571–1878, after which it was taken over by the British, 1878–1960. After a liberation struggle, 1955–1959, the Republic of Cyprus was first established as an independent country in 1960, at which time it joined membership to the United Nations. In 2004, Cyprus joined the European Union and in 2008 adopted the Euro currency. The island’s recent and mainly older tumultuous history, including major and minor migration waves, has had its contribution to the Cypriot gene pool, as evidenced by several genetic studies performed over the past 30 years2.

Table 1 Brief report of visitors-conquerors and key events in the history of Cyprus: The Cypriot population adopted the Greek culture, language and religion since the mycenaean greeks reached the island, nearly 3,500 years ago. Several migration waves shaped the population gene pool. Three minorities were formed, armenians and maronites who arrived in several waves after the 6th and 7th century AD, respectively, while the latins represent the remnants of the catholic francs during the reign of the Lusignan, 12th -15th century (see introduction and ref2).

In the past 50 years, the Cypriot population has experienced major economic and lifestyle changes, due to major historical events and the economic crisis that has impacted especially the Eastern European countries in the past 10 years. Since the Turkish invasion in 1974, the population of Cyprus in the European non-occupied part of the island has increased from 648,585 to 923,272 individuals3, not necessarily of Greek or Turkish origin. According to a 2021 census, the population of Cyprus altogether was estimated at 74% Greek Cypriots, including three small religious minorities of the Maronites, Armenians, and Latins, as well as Turkish Cypriots, altogether comprising about 1% of the population3. The population living in the northern part of the island under Turkish control, is estimated to be about 230,000. Despite the recent COVID-19 pandemic, life expectancy has only slightly changed between 2015 and 2021 (81.7 vs. 81.2 years on average, respectively).

Chronic non-communicable diseases (NCDs), such as cardiovascular disease (CVD), diabetes, thyroid disorders, and cancer, pose a substantial health burden, significantly impacting morbidity and mortality rates. According to a 2018 report by the World Health Organization (WHO), NCDs accounted for over 90% of morbidity in Cyprus4. This prevalence of NCDs can be attributed to factors including lifestyle, dietary habits, and tobacco use. Also, genetic factors and gene-environment interactions contribute significantly to NCD development. Regional variation in NCDs has also been reported, suggesting the involvement of non-genetic factors that have yet to be identified. However, the presence of potential limitations in sample collection, such as the lack of representative samples and under-representation of specific regions may contribute to the fragmented understanding of NCD landscape in Cyprus4. From available information, cardiovascular disease is the primary cause of death amongst Cypriots, while the incidence of renal replacement therapy is 284 per million population, the highest in Europe5.

Although most Cypriots declare a healthy status, two out of five suffer from a chronic disease, with CVD, cancer, and diabetes being the most prevalent conditions6. While CVD rates have been steadily decreasing since 2015, cancer rates remain at the same levels, albeit lower compared to the rest of the EU. On the contrary, tobacco use rates in Cyprus are amongst the highest in the EU, and together with unhealthy dietary habits, they comprise two of the most frequent mortality risk factors in Cyprus (19% and 14%, respectively)6. Additionally, while obesity rates in Cyprus are comparable to the average rates of other EU countries, childhood obesity is rapidly increasing and poses a significant public health issue for Cyprus.

To gain insights into the etiology and risk factors associated with NCDs and address limitations including the sampling of data from a diverse and representative population, the establishment of a Cypriot Biobank was imperative. While some causes of these diseases are already known based on studies conducted in other populations and regions7,8 current knowledge is insufficient to explain the substantial geographic differences in disease rates observed globally and within Cyprus itself9. The development of a Cypriot Biobank will enable large-scale studies of both genetic and environmental contributions, as well as their interactions, in the development of chronic diseases. Furthermore, it will address the limitations of previous studies, including inadequate sample sizes, lack of blood and/or urine samples, inconsistencies in data collection and processing, limited data collection on risk exposures (e.g. lifestyle, diet, disease comorbidities) and outcomes, and restricted study populations.

The biobank.cy Center of Excellence (CoE) in Biobanking and Biomedical Research was established in 2019 at the University of Cyprus (UCY), after acquiring funding through the Horizon 2020 Widening Participation/Teaming program (Grant Agreement no. 857122). The program is a partnership of the UCY with two Advanced Partners, the European Biobanking consortium BBMRI-ERIC10 and the Medical University of Graz11, and a local Partner, the medium-sized enterprise TALOS Rtd LTD12. Through this partnership, biobank.cy has successfully received funding to advance the country’s medical research infrastructure to match European standards, with two major objectives: (a) Upgrade the first Biobank of the country and complete the Cyprus Human Genome Project; (b) build the capacity and upgrade its research infrastructure with the aim to prepare Cyprus for the next generation of biomedical research. For the history, biobank.cy is the result of upgrading the Molecular Medicine Research Center (MMRC) which was first established in 2010 with another strategic funding of the Regional Development Fund of the European Union and the Republic of Cyprus through the Cyprus Research Promotion Foundation, as part of the Smart Specialization Strategy of the country. In its current status, the Biobank is one of five Pillars of the biobank.cy CoE, the other four being, the MMRC as the research arm of the CoE, and the Molecular Diagnostics, the Education and Innovation pillars.

Towards the public health benefit and to support the development of the Cyprus Human Genome Project, the Government of Cyprus, and the University of Cyprus co-funded biobank.cy. As a first milestone, up to 16,500 permanent Cypriot residents will be invited to partake in the biobank cohort, voluntarily, until 2027. Data collection and storage includes baseline clinical, biochemical, and behavioral traits as well as blood and urine samples.

We herein provide comprehensive information on the operation of the first Biobank in Cyprus, sharing data on the demographics and health archives of participants.

Methods

Biobank setting

The biobank.cy, as an EU-funded CoE in Research and Innovation, hosts and maintains the Cyprus Biobank, which is not based in a hospital setting but enrolls volunteers who respond to open invitations while also collaborating with public and private clinics and ad hoc volunteer enrolment centers around the island. Operation of the Biobank in Cyprus is not regulated by a specialized law, although a legislation proposal regulating biobanks creation and operation is under consideration. There are national and EU laws regarding regulating the use of biological samples for research13, the law providing for the protection of natural persons with regard to the processing of personal data and for the free movement of such data14, the law for the establishment of the Cyprus National Bioethics Committee (CNBC)15, and the EU GDPR 2016/679 law for the General Data Protection Regulation, which plays a significant role in emphasizing data protection principles, as the Biobank is in full compliance to its provisions16. These laws collectively ensure data protection, ethical standards, and informed consent in Biobank operations.

Approval for the operation of this first Biobank was first obtained by the CNBC in November 2011 (approval code: EEBK/ΕΠ/2011/04). The approved consent form allows for broad use of the data and material in Cyprus and abroad provided that any future research project secures its own separate ethics approval (English version of consent form is Supplementary Material_1).

Data collection and storage

The Biobank gathers information through a structured and comprehensive questionnaire, clinical phenotyping, and biological samples collected from Cypriot residents of the general population, aged ≥ 18 years old. Population data of the pilot study are collected and stored using REDCap®17,18 and OpenSpecimen®19 tools respectively, to enable data security and tracking as well as to eliminate collection bias. Specifically, REDCap® (Research Electronic Data Capture) is a secure, web-based application designed for data collection and management in research studies, used to amass the following measurements: personal and family history as well as clinical (complete blood count, biochemical) and biometric (blood pressure, height, weight) (Supplementary Material_2). OpenSpecimen® is used as the Biobank’s Information Management System, by organizing information related to the collection, processing, storage, and distribution of samples.

Participant/volunteer enrolment

Since the very first enrollment on the 1st of November 2019, participant awareness and engagement in the project were achieved through media and the press (television, radio, social media and newspapers, leaflets respectively) as well as social events which are organized by the Biobank personnel. Additionally, friends and family referrals were frequent throughout the 3-year period. Interested individuals book an appointment either through the biobank.cy website20 or directly through a phone call. At baseline, participants are invited to the central collection point of the Biobank facility at the Shacolas Educational Center for Clinical Medicine, or elsewhere during outreach activities, where they undergo a 3-stage interview (25 min on average) and physical and clinical measurements (15 min on average) (Fig. 1; Table 2). The Biobank participation process commences with trained research nurses providing an initial briefing to the participants. During this session, participants are fully informed about the research project’s purpose, their rights, and the procedures for participation and withdrawal, while they may ask questions about the project, prior to signing a broad consent form. The consent form is approved by the Cyprus National Bioethics Committee (approval code: EEBK/ΕΠ/2020/04), ensuring compliance with ethical requirements. A copy of the signed consent form is given to the participant.

Upon completion of the informed consent process, participants are requested to fill out a questionnaire providing essential information about their personal and family medical history. This information is securely collected in dedicated REDCap® projects.

Health and lifestyle questionnaire administered by a trained research nurse

In addition to data of personal character, the health and lifestyle section of the questionnaire includes questions on socio-demographic factors, including the origin of both parents, medical conditions along with medication used, family history and quality-of-life (QoL) questions, including smoking behavior, dietary habits, alcohol consumption, intensity of physical activity21 and sleeping patterns22, education level and employment status. As the questionnaire is part of the registration and clinic visit, a Biobank-trained research nurse is always available to answer any questions of the participant.

Importantly, questions on dietary assessment are directly linked to the Mediterranean score (MedDietScore)23 which indicates the level of the participants’ adherence to the traditional Mediterranean diet. This specific type of diet has been widely used for the past ~ 20 years as it has been proven to significantly reduce total mortality24, thus having a great impact on clinical and public health. More specifically, a higher MedDiet score showed strong evidence against incidents of cardiovascular disease, diabetes, hypertension, obesity, and Alzheimer’s disease among others25. Participants are specifically asked to report the consumption frequency of whole grain food (bread, pasta, rice, etc.), potatoes, fruit and juice, vegetables and salads, legumes, fish and soup, red meat and its products/derivatives, poultry, full-fat dairy, and olive oil.

Fig. 1
figure 1

Baseline visit summary.

Table 2 Instruments used at the Biobank for data measurements.

Physical-clinical measurements and biological material collected

At baseline, anthropometric and physiological traits were measured for each Biobank participant along with blood and urine sample collection. Anthropometric and physiological traits comprised body weight/height and diastolic/systolic blood pressure respectively. Blood pressure was measured once, in a seated position. A blood pressure measurement was repeated if the first measurement returned a value of > 160mmHg for systolic blood pressure or < 80 mmHg for diastolic blood pressure.

At the baseline enrolment visit, the participants provided peripheral blood for the isolation of DNA, plasma, and serum, which are stored at -80°C. Urine was also collected and aliquots were stored likewise. Specifically, we collected: 1 × 10 ml EDTA for DNA extraction, 1 × 4 ml EDTA for FBC, 1 × 4 ml EDTA for plasma, 1 × 8.5 ml serum, 1 × 3.5 ml serum (prompt test), and 1 × 100 ml urine. Aliquoted and stored are 4 aliquots of 50 µl DNA at 50ng/µl, plus a concentrated DNA stock, 4 aliquots of 2 ml serum, 2 aliquots of 1 ml plasma, and 5 aliquots of 2 ml urine.

Fifty-eight clinical biomarkers are measured by a private diagnostic provider using a proportion of the stored blood (Table 3). DNA concentration and quality are tested by spectrophotometric measurements and calculation of the ratio at OD 260/280 and OD at 260/230. Urine is stored after a 5-minute centrifugation at 1000 g at room temperature. The quality of frozen serum samples is assured in random retesting of selected biochemical parameters.The Biobank employs a data curator who constantly and systematically tests for quality assurance and consistency of data input to minimize errors and avoid bias. A comprehensive timeline of biobanking and data/sample retrieval, with brief information on Standard Operating Procedures (SOPs) and quality assurance measures applied throughout the procedure is provided in Fig. 2. The detailed standard operating procedures used at the Biobank are available on request.

Table 3 Clinical biomarkers measured at biobank.cy (N = 58).
Fig. 2
figure 2

Timeline of biobanking and data/sample retrieval, with brief information on SOPs and quality assurance measures applied throughout the procedure. The purple dots represent the quality assurance checkpoints and the blue dots represent the activities based on monitored SOPs.

Results

Sample size

We performed a power calculation using the “pwr” package in R software to estimate the required sample size for the Cypriot Biobank, based on a population of 923,372 individuals. The calculation was conducted using a 5% significance level and we adjusted for an expected response rate of 80%, which resulted in a required sample size of 480 individuals. In determining the sample size, we combined the power calculation with practical considerations. We accounted for factors such as the number of research nurses that would be hired, the expected number of participants per day after discussion with local clinicians and healthcare providers and the capacity of our facilities. Furthermore, we reviewed the approaches used by established biobanks to determine appropriate sample sizes. Burton et al.26, emphasize that biobanks intended for human genome epidemiology studies should consider using larger sample sizes to address potential heterogeneity in disease risk, enhance the precision of estimates, and improve sensitivity and specificity in diagnostic testing. Using this integrated approach of the theoretical sample size calculation, educated guesses and practical considerations, we estimated that a sample size of 16,500 individuals would be sufficient for the Biobank to be scientifically robust and operationally feasible. The pilot study enrolled 1348 participants not included in any disease-specific cohort of the Biobank, with a geographical distribution based on the father’s origin (Fig. 3).

Socio-demographic characteristics

The participants of the Biobank pilot study comprised 579 men (43%) and 769 women (57%) spanning an age range from 18 to 85 years, with a median age of 48 years (SD = 13.9 years) (Fig. 4). More than half of men (n = 323 (55.8%)) and women (n = 420 (54.6%)) were educated to a university degree level and above, while the most frequent employment status for both men and women was paid employee (n = 358 (62%) and n = 486 (63%) respectively) (Table 4). The most common status of the enrolled individuals was either single or married (n = 282 (20.9%) and n = 981 (72.8%) respectively).

Fig. 3
figure 3

Geographical distribution of the 1348 participants of the pilot study, based on the origin of the father. The circle diameter reflects the density of the donors’ enrolment in the corresponding area.

Fig. 4
figure 4

Age distribution of men and women enrolled in the Biobank pilot study.

Table 4 Socio-demographic characteristics of the enrolled participants in the Biobank pilot study (November 2019 – October 2022). P-values for the comparison between sex groups were calculated using Chi-squared and Fisher’s exact tests.

Medical history

Based on the NHS and Heart UK guidelines (https://www.nhs.uk/conditions/high-cholesterol/cholesterol-levels/), nearly half of the participants had high total (n = 681 (50.5%)) and LDL (n = 630 (46.7%)) cholesterol levels. Additionally, 15.8% (n = 213) had low HDL levels and 19% (n= 257) had high triglyceride levels. Most of these participants did not fast prior to the blood sample analysis. Among those with abnormal levels, only 151 out of 681 (22.2%) with high total cholesterol, 150 out of 630 (23.8%) with high LDL cholesterol, 31 out of 213 (14.6%) with low HDL cholesterol, and 34 out of 257 (13.2%) with high triglyceride levels fasted before the tests. The results of our pilot study are in line with recent findings regarding the high prevalence of elevated blood lipid levels in the European population27,28,29.

Even though most participants self-reported as healthy, a varying percentage had underlying past or current conditions, with the most prevalent being hypertension (17.2%), osteoporosis (6.9%) and diabetes (6.0%) (Fig. 5). These findings are in line with OECD’s research on the most common chronic conditions in Europe for 2022 (https://health.ec.europa.eu/system/files/2022-12/2022_healthatglance_rep_en_0.pdf). As useful as they are, they will prompt public policy decisions and enhance existing prevention programs. Additional conditions recorded include cardiovascular disease (myocardial infarction and stroke), rheumatoid arthritis, osteoarthritis, asthma, nephropathies, cancer and depression (Fig. 4). Most individuals with a previously diagnosed condition were under medication use (Table 5).

Fig. 5
figure 5

Diseases recorded among the participants of the pilot study.

Table 5 Medical history and medication use (n = 1348). ǂThe disease is self-reported according to the doctor diagnosis.

Collaborations and perspectives

The availability of the Biobank infrastructure has fostered collaborations with groups at national and international Universities and Research Centers, including Europe and the USA. Sharing of data and material is based on bipartite partnerships, either serving as custodians to data/sample archiving and storage or more intimately involved in common research projects and grant writing. Signed Material Transfer Agreement/Data Transfer Agreement (MTA/DTA) are in place, usually using our own approved documents. State-of-the-art equipment, software and expertise in several fields such as biostatistics, epidemiology, bioinformatics and genetics/genomics, have resulted in several important collaborations for massive genetic analyses (Whole Exome Sequencing) of diverse disease cohorts (e.g. inherited kidney diseases, inherited cardiomyopathies, traumatic brain injury). Also, the CYPROME data, based on 1000 Cypriot whole exomes, that is now used to serve as the Cypriot reference genome, has already started being instrumental in interpreting results from research projects and diagnostic services. Further infrastructure development includes purchasing robotic systems for substantially increasing the results output and turnover, while another larger server is pending for storing the Big Data generated in-house or through outsourcing and collaborations. Finally, to maximize the use of samples and data, the Biobank personnel is also closely engaged in research performing tasks.

Discussion

The Cyprus-Biobank

The establishment of the first Biobank in Cyprus, in 2011, and its enhancement in 2019 as part of the biobank.cy Center of Excellence, provides an opportunity for ground-breaking research both nationally and internationally, through a rich collection of high-quality phenotypic data that will unveil the causes of common and rare diseases affecting the Cypriot population. The data are retrieved from organized questionnaires, the collection of blood and urine samples, and detailed clinical measurements. The Biobank aims to reach a total sample size of 16,500 individuals by 2027 and intends to perform follow-up studies in a sub-group of volunteers in the long-term to record disease progression, based on criteria to be determined. After informed consent from the participants, the data is stored for research purposes with the goal to canvas the architecture of the Cyprus human genome. biobank.cy is equipped with clinic, laboratory and liquid sample storage facilities and offices for non-clinical research staff.

The Biobank was founded and launched as a national research infrastructure in 2011 with national and EU funding and was significantly enhanced in 2019 through a Horizon 2020/Widening/Teaming Program, just before the COVID-19 pandemic started. Despite the challenges and the relatively short period since its establishment, it already has a high impact on the health of the Cypriot population. During the pandemic, in addition to the original scope, biobank.cy collected samples from COVID-19 cases to study the production of circulating immunoglobulin class G (IgG) antibodies against SARS-CoV-2 in the serum of convalescent individuals30. Individuals who participated in the project received the results from their sample collection two weeks later, thus gaining the opportunity to discuss potential abnormal ranges with their personal doctors. The whole exome sequencing (WES) of 1500 convalescent participants in the project is pending. Finally, notwithstanding the challenge of not being hosted at a hospital, we managed to turn it into an opportunity by organizing outreach excursions to the periphery or at the premises of large enterprises and organizations, for enrolling volunteers who would not easily reach us in Nicosia. It is of note that owing to the small geography of the island, samples reach the Biobank within a maximum of two hours after completion of the enrolments, by 2:00 pm the latest. This unrestricted enrolment strategy will enable the design of epidemiological and other public health studies including the establishment of longitudinal cohorts based on the data collected.

According to the data collected in this pilot study, volunteers diagnosed with hypertension represent 17.2% of the general population, diabetics represent 6%, patients with osteoporosis represent 6.9%, and patients with asthma represent 4.8%. With the understanding that these were not targeted patient groups, we anticipate that with the systematic projected increase in participant recruitment, the Cypriot population will soon be represented in greater depth which will validate and enhance our conclusions, with its separate disease cohorts available for further research.

Strategy and vision

Cyprus human genome project

biobank.cy was formed with the vision of being the island’s central node for archiving high-quality biomaterial and associated clinical records, with state-of-art research infrastructure facilities utilized for precision medicine. Towards this goal, biobank.cy has launched the largest research study to ever run in Cyprus, with the precise aim of understanding and treating diseases by characterizing the Cypriot genome. Specifically, biobank.cy has initiated the Cyprus Human Genome Project (CHGP). This is the first in its kind of scale and prospect study in Cyprus, with the aim of implementing Whole Exome Sequencing (WES) of 1000 Cypriots at phase-1 and up to 5000 in the next 3 years. Also, Whole Genome Sequencing is a strategic objective. This data will constitute the blueprint for the first Cypriot reference genome (CYPROME). Concurrently, additional volunteers of the general population, including chronic disease patients, are selected for WES based on additional ongoing separately funded research projects.

The establishment of the Biobank was a landmark development that enabled the materialization of the CHGP as a significant milestone and a turning point for the health of the Cypriot population and precision medicine. The CHGP will provide unprecedented insights in terms of the pathogenic variant load and disease prevalence in the Cypriot population, while it will be of use to targeted epidemiological studies across all disciplines and non-communicable disease investigations. Furthermore, the CHGP will provide the backbone to address specific diseases on the island by documenting their prevalence and revealing putative environmental factors that affect them. Moreover, biobank.cy with its increasing potential as a central medical research hub, will provide an unsurpassable opportunity for Cypriot researchers and medical professionals to expand their collaborations nationally and internationally while it will offer the opportunity to Cypriot patients to have access to clinical trials and treatments not currently available in Cyprus.

Over the past 30 + years, genetic studies in the Cypriot population unraveled many interesting findings concerning mainly rare diseases, including kidney conditions, neurological conditions, endocrinological conditions and others2. Founder mutations were discovered for several diseases while a Cypriot consensus whole genome sequence (including the non-coding part) is still lacking. The CHGP is promising to identify more DNA variants of pathogenic potential and enable comparative studies in relation to the minor allele frequencies among the neighboring populations, thus leading to a better understanding of our genetic architecture.

Imminent objectives

With regards to banking itself, a major effort is to increase the number of participants to cover the population to a greater depth, thereby empowering relevant epidemiological and genetics studies on all pathologies and enabling twin and population screening studies. To achieve our strategic objective of enrolling 16,500 participants by 2026 (now extended to 2027), as outlined in our EU-funded proposal, the Center of Excellence (CoE) has dedicated significant resources to raise awareness. Our Communication Office systematically organizes successful awareness campaigns using all available means, including social media, in full transparency. Additionally, the Biobank in collaboration with the Ethics, Legal, and Social Implications (ELSI) and Education Management, actively engages with local and regional authorities and the Human Resources departments of major enterprises. The Biobank organizes lectures, presentations, and exhibitions for young people at schools and other venues including our premises to promote biobanking and the research culture. Also, the campaigns proved effective in increasing the visibility of our website and online registration. These efforts, combined with providing engaging activities and educational material, aim to attract new volunteers to support our cause. An added value of these actions is achieving and strengthening public engagement and trust to the Biobank, preparing the next generation of citizens-scientists.

biobank.cy, since its inception, actively engages in Cyprus’s eHealth ecosystem with a focus on becoming a fully operational eHealth Biobank. Aligned with Cyprus’s 2019 eHealth law, it emphasizes citizen-centricity and interoperability. The core infrastructure includes a Biobank Information Management System (BIMS) serving as the nerve center for diverse health data, integrating clinical, sample, laboratory, genomic, and imaging data. The BIMS is designed for integration with the Cyprus’s national EHR system, using the HL7 Fast Healthcare Interoperability Resources (FHIR) standard aiming to actively contribute to primary and secondary health data use. With a vision of becoming a key player in the Cyprus National and European Health Data Space, biobank.cy designs its infrastructure to become an early adopter of initiatives like International and European Patient Summary while it already achieved integration with the BBMRI-ERIC Locator system using HL7 FHIR.

In conclusion, as with any other Biobank, the Biobank of Cyprus is creating the prospects for next-generation biomedical research. It is empowering investigations that include epidemiological and genetic determinants with the aim to identify and evaluate the contribution of factors involved in major public health burdens caused by primarily not communicable diseases.