Background & Summary

Studies from the early 21st century have revealed a significant bias in social science research, with most studies conducted in WEIRD (Western, Educated, Industrialized, Rich, and Democratic) countries1,2,3. Research on close relationships is no exception. To illustrate, Klein et al.4 analyzed five high-impact journals dedicated to sexuality and found that a substantial majority of studies (ranging from 68% to 88%) drew samples from WEIRD populations. Bode and Kowal5 reported similar results in their review of the biological underpinnings of passionate love: Only 11 out of 42 (26%) studies were conducted outside of WEIRD countries.

Fortunately, the situation is gradually improving6,7, with more researchers emphasizing the need to ‘go beyond’ WEIRD samples. Indeed, the number of studies in non-WEIRD countries published in high-impact journals is on the rise6, as is the number of cross-cultural studies covering multiple countries from various continents8. To contribute to this growing body of research, moving beyond WEIRD samples, and to improve the generalizability of research on close relationships, we conducted a large-scale cross-cultural study involving 404 researchers from 175 countries (see Fig. 1), focusing on pair bonds, their dynamics, and cultural and environmental factors that may potentially relate to such relationships.

Fig. 1
figure 1

A schematic overview of the study flow.

Pair bonds are commonly defined as dyadic attachments between two reproductive partners which last more than one reproductive cycle9. Although pair bonding is considered one of the most crucial aspects of reproductive behavior for some species, it is also exceptionally rare across some taxa. While 90% of bird species exhibit features of monogamous pair bonding, monogamy is relatively rare in mammals (3% to 9%)10,11. Among great apes, only humans engage in pair-bonding12. Consequently, human pair-bonding has received scholarly attention across a variety of theoretical perspectives. For instance, evolutionary approaches highlight the importance of finding a suitable mate likely to invest in offspring13. Developmental psychology posits that establishing an intimate romantic bond with a long-term partner is one of the key stages in human development14. Sociocultural theory underscores the societal importance of this phenomenon, suggesting that pair bonding is influenced by cultural norms that can shape the initiation, progression, and dynamics of romantic relationships15.

Given the profound impact of romantic relationships on individuals’ lives, extensive research efforts have been dedicated to exploring the antecedents of forming committed partnerships. Many empirical studies have focused on mate preferences, identifying the traits that make individuals highly attractive in the mating market. The underlying logic is that certain characteristics—such as physical attractiveness, intelligence, honesty, health, and kindness—may enhance one’s success in attracting potential partners13,16. Some have even suggested that a person’s desirability on the mating market may be estimated mathematically17. Such statistical models might be tested on large datasets, for which the present dataset could be well suited.

Yet, possessing desirable traits does not guarantee that a person will be chosen as a life partner. The solution to this enigma might lie in ‘a matter of heart’18 or, expressed in more scientific terms, in romantic love. Mate preferences and sexual drive toward particular individuals might stem from or lead to feelings of romantic love19. According to the commitment device hypothesis, romantic love evolved to foster commitment between partners, thereby enhancing their reproductive success20. Previous research has provided evidence that although romantic love is a nearly universal human experience21,22,23, there is substantial cultural and environmental variation in love experiences that reflect, for example, evolutionary legacy, modernization, collectivism, and average annual temperatures24 (for a review, see25). Such hypotheses might be further explored with the present cross-cultural dataset containing information on love experiences with measures that have already been cross-culturally validated22,26. Furthermore, once a romantic relationship is formed, numerous other affective phenomena emerge, including feelings of jealousy, relationship satisfaction, and commitment—all of which were also measured in the present dataset. In past research, these phenomena have been examined in isolation and primarily studied in a limited number of countries (for reviews, see27,28).

Here, to broaden the scope of close relationship research, we address the existing gap and offer a large-scale cross-cultural dataset. It consists of a comprehensive collection of variables on demographics, intimacy, passion, and commitment within romantic relationships, physical attractiveness enhancing behaviors, gender equality endorsement, collectivistic attitudes, personal history of pathogenic diseases, relationship quality, jealousy, infidelity, relational mobility, mate preferences, current mate ratings, self-ratings, and acceptance of sugar relationships. Furthermore, because our survey was translated into 45 languages, it provides a basis for validating various linguistic versions of the scales used (e.g.,22,26).

Methods

When describing the following section, we used articles presenting social sciences data published in Scientific Data as blueprints29,30,31.

Ethical considerations

The protocol for this study was approved by the Institutional Review Board (IRB) at the Institute of Psychology, University of Wrocław (number IPE0022). Data collection was conducted by team members in accordance with the ethical guidelines established by their respective IRBs, following either the Principal Investigator’s IRB approval or the ethical clearances obtained from their local IRBs. Furthermore, all participants provided informed consent prior to their involvement in the study. Specifically, they confirmed that they were over 18 years old and acknowledged that their data, anonymized and stripped of any identifiable information, would be analyzed and disseminated in scientific reports and papers. Anonymity and confidentiality were guaranteed to participants as well as the voluntary nature of their participation.

Survey

The English version of the survey, along with all 45 translated linguistic versions, can be accessed on the Open Science Framework32. It contains the following sections (for the visual overview of the survey’s content, see Fig. 2):

  1. 1.

    Demographics: Gender, sex at birth, age, country of birth and residence (if different), time spent in country of birth, relationship status, number of children, religious affiliation, sexual orientation, employment status, average daily time spent on social media, on TV, and time spent on leisure activities.

  2. 2.

    Romantic love: For partnered individuals: the Triangular Love Scale (TLS-1522,33), relationship length. For all individuals: Kephart’s18 question on the importance of romantic love20, being in love with anyone and the strength of these love feelings.

  3. 3.

    Gender equality: A subscale of the Gender Equitable Men Scale34.

  4. 4.

    Collectivistic attitudes: A subscale of the Collectivism Scale35.

  5. 5.

    Personal Pathogen History: Pathogen Prevalence Index36.

  6. 6.

    Physical Attractiveness-Enhancing Behaviors Scale: Importance and time spent on eight types of physical-attractiveness enhancing behaviors26.

  7. 7.

    Long-term relationship preferences: Six items adapted from the MPQ1517.

  8. 8.

    Short-term relationship preferences: Six items adapted from the MPQ15 17.

  9. 9.

    Preference Importance Measure: 30 points allocated across six traits, including health, kindness, physical attractiveness, religiousness, financial prospects, and correct age, inspired by the budget allocation method used in Li et al.37 and Conroy-Beam et al.17.

  10. 10.

    Self-ratings: Six items adapted from the MPQ1517.

  11. 11.

    Mate ratings: Six items adapted from the MPQ1517.

  12. 12.

    Relationship satisfaction: Six items from the 18-item Perceived Relationship Quality Components (PRQC38).

  13. 13.

    Jealousy Scale: Two items adapted from Buss et al.39.

  14. 14.

    Sociosexual Attitudes Scale: Three items from the revised Sociosexual Orientation Inventory (SOI-R40).

  15. 15.

    Infidelity Scale: Two items assessing the perceived morality of sexual and emotional infidelity, developed based on findings from Carpenter’s41 meta-analysis.

  16. 16.

    Acceptance of Sugar Relationships: Developed for younger companion providers (ASR-YWMS) and older resource providers (ASR-OMWS)42.

Fig. 2
figure 2

A visual overview of the study’s content (for detailed list of all variables, see Codebook).

A detailed list of all items with their response ranges and codes can be found in the Codebook file titled “Codebook.xlsx” on the Open Science Framework, OSF32. The survey also included the Dance Perceptions Scale. However, this portion of the data is not included in the current dataset, as it is reserved for a forthcoming publication within the scope of a larger, long-term project on dance perceptions. These data will be made available upon the release of the final paper.

Participants

In total, we collected 119,781 responses. Excluding data from those who did not consent to participate in the study (n = 639), who previewed (n = 4) or tested the survey (n = 549) but did not complete it, who mistakenly doubled their submission (n = 21), who were recruited by one team member who collected data before asking their local IRB for approval, which violated the local IRB’s rules, and thus was asked to withdraw the data (n = 244), and who failed the attention check (n = 1031) resulted in a final dataset of 117,293 participants from 175 countries. Out of these, 71,361 participants (61%) from 158 countries completed the whole survey, whereas 86,966 (74%) from 165 countries completed at least half of the survey.

Basic demographic characteristics of participants from the final dataset are presented in Table 1, whereas demographic characteristics across countries with at least 30 participants (k = 97) are given in the “Demographics across countries.xlsx” file on the OSF32. Figure 3 shows where the data were collected, colored according to the sample size.

Table 1 Basic demographic characteristics of the final sample (N = 117,293).
Fig. 3
figure 3

A world map visualizing the number of participants across countries, with the color scale representing the sample size (the darker, the larger). Note. Gray areas represent countries not covered by the data.

Translations

The English version of the survey embedded in the HTML codes was pasted into 45 separate Google Spreadsheet files. Each of the 45 translation teams, consisting of bilingual collaborators, received a separate Google Spreadsheet file that consisted of four sheets. On the first sheet, there were instructions on how to perform the forward-back translation43. The second sheet was intended for the forward-translation of the survey from English into a local language. The third sheet was intended for back-translation from the same local language into English. The final sheet was intended to prepare the final version of the survey in the local language. One or more native speakers performed the forward-translation, then other(s) performed the back-translation, and, finally, both forward- and back-translation teams discussed the differences, agreed on their resolution, and prepared the final linguistic version of the survey. Detailed instructions, along with a short video explaining the translation task, are presented in the “Instructions for translating teams.docx” file on the OSF32. All linguistic versions of the survey can be accessed in the “Translated survey - all languages.xlsx” file and “Translation Farsi.doc” file on the OSF32.

Procedure

After the translation process was completed, the study for that given language was launched, starting with English and Polish on April 8, 2021, to validate the Qualtrics survey and data protocols. Data collection was conducted over a span of five months, from April to August 2021. Data were collected primarily online via the Qualtrics website or related online platforms, except for two countries (i.e., Algeria and Morocco), where potential participants could not access the Qualtrics website for technical reasons. Therefore, the team members collected data from these two countries in person using a paper-and-pencil method. Moreover, due to difficulties accessing the Qualtrics website in Iran, we recreated the survey and collected data using Google Forms. Lastly, one Russian Collaborator collected data via the Toloka website (a crowdsourcing platform popular in Russia, similar to Prolific or Mechanical Turk). Collaborators strived to collect data from as diverse samples as possible, including inviting participants of various ages, genders, from various regions (including rural and urban areas), from the community and university samples, and so forth. While answering the survey, participants were also encouraged to share the link to the survey on their social media platforms with an already prepared invitation text (see the “Invitation text.docx” file on the OSF32). Approximately 6% of the data were collected using outsourcing platforms (e.g., Prolific, MTurk, Toloka).

Data cleaning

Most collaborators collected data via personalized Qualtrics links to the general survey (with a few exceptions, described in the Procedure paragraph). Moreover, due to certain collaborators’ requests, connected with, for instance, adding personalized information within the survey, redirections to other websites, the need for better monitoring of data inflow, or collecting more information to grant course credits to students who helped with data collection, seven labs recruited participants through separate Qualtrics branches. All these datasets were merged into the final, ready-to-use dataset.

Data coming from outside of the general survey link were prone to minor coding mistakes. For instance, manually prepared datasets from Algeria and Morocco contained typos, such as doubled scores (e.g., a response of “44” on a 1–5 scale range). All these errors are addressed in the final dataset, the details for which are described in detail in the R script available on the OSF (see below).

Data Records

All materials associated with this large-scale, cross-cultural project can be found on the project’s repository (comprising four folders) on the OSF32. The folder entitled “Dataset” contains the final, ready-to-use dataset (named “Final_dataset.csv”) and the Codebook of all variables with their response ranges (named “Codebook.xlsx”). The folder “R code” consists of a .txt file (named “R script.txt”) that was run to merge and clean raw datasets. Raw datasets are not shared because they may contain personal information about participants and collaborators (e.g., email addresses, student identification numbers, and detailed names of university groups and courses). The full anonymization code is available in the file “R Script.txt.” The folder “Survey” contains the .docx files “Instructions for translating teams.docx” and “Invitation text.docx”, which aimed to encourage participants to share the link to the survey with their friends, families, and on social media. The Survey folder also contains the .pdf file with the English version of the survey (named “Large_scale_project_English_with_codes.pdf”), the “README.txt” file which reminds users about any incongruence between the coding from the .pdf version of the survey and final coding in the dataset (as explained in the “Usage Notes” section), as well as .xlsx and .docx files with all translated versions of the survey (named “Translated survey - all languages.xlsx” and “Translation Farsi.docx”). The folder “R code” contains a .txt file (named “R code large-scale study.txt”) with R code that was used to prepare the final dataset. Finally, the “Data description” folder contains the .xlsx file (named “Demographics across countries.xlsx”) with demographic characteristics of participants for countries with at least 30 participants. It also contains the “Reliabilities across countries.xlsx” file containing information on the reliability of the multi-item scales across these countries, and the “Means across countries.xlsx” file with the means of the scales across these countries.

Technical Validation

For technical validation, we examined the data quality (e.g., Cronbach’s alpha scores, correlations) from 97 countries with at least 30 participants. Overall, 70 of these 97 countries (72%) had more than 200 participants, whereas 32 of them had more than 1,000 participants. The average age in this sample was similar to that of the whole sample, that is, 30.35 (SD = 12.54), but varied across the countries, ranging from 21.48 (SD = 5.06) in Thailand to 47.32 (SD = 16.86) in Argentina. The proportion of women in this sample was 66.30%, and again, this varied across countries, ranging from 5.4% in Ghana to 84% in Greece. The proportion of individuals who attained a tertiary level of education (i.e., Bachelor’s degree or higher) was 58%, and also differed across countries, ranging from 6.1% in Ghana to 81.4% in Kenya. Although we cannot determine the representativeness of the included country populations, we believe that the data are still valuable in examining important research questions across a range of cultures varying widely in their norms surrounding relationships, sexual behavior, sexuality, and mate selection.

Despite variation in demographic variables across countries, the internal consistency of the scales ranged from good to excellent. For instance, Cronbach’s alpha for the TLS-15 = 0.94, Intimacy = 0.90, Passion = 0.87, Commitment = 0.8922, gender equality = 0.8534, collectivism = 0.7635, Perceived Relationship Quality Components (PRQC) = 0.93, relationship satisfaction = 0.94, relationship commitment = 0.8938, Acceptance of Sugar Relationships = 0.95, receiving subscale = 0.93, giving subscale = 0.9342. The “Reliabilities across countries.xlsx” file (accessible on the OSF32) presents Cronbach’s alphas for the scales across countries, with at least 30 participants answering the given scale. Basic descriptive statistics for these scales, including means and standard deviations, are presented in the “Means across countries.xlsx” file (accessible on the OSF32).

For further validation of the data, we investigated correlations across variables that should correlate and those for which there are no expected associations. For example, we predicted high positive Pearson correlations between age and relationship length and age and the number of children40. Indeed, that is what we observed (r = 0.74 and r = 0.62, respectively). Similarly, we expected high positive correlations between subscales of the Triangular Love Scale (TLS-1522) and relationship satisfaction38: Intimacy r = 0.70, Passion r = 0.63, Commitment r = 0.63. In line with previous research44, we also observed a high correlation between one’s own and partner’s age (r = 0.87). Conversely, we did not expect to see any significant links between subscales of love and time spent on social media, and, indeed, such correlations were marginal in effect size (Intimacy r = 0.04, Passion r = 0.07, Commitment r = 0.002).

Usage Notes

The data are freely available, cleaned, and ready for analyses. We recommend that interested researchers first consult the codebook (“Codebook.xslx”) before using the final version of the dataset (available on OSF32). The codebook presents all variables in the final dataset along with a brief explanation of both their scoring and what they represent. This is especially important because the original coding of some of the variables described in the .pdf version of the survey was recoded to be more intuitive. For example, there is a demographic question regarding the participant’s social class in the .pdf version of the survey, originally coded so that higher values represent lower social classes. In the final dataset, however, this item was reversed so that higher values represent higher social classes. In general, the naming of the variables follows the underlying logic so that higher scores represent “more” of the given psychological construct. Therefore, for instance, the “Gender_equality” items denote the participant’s agreement with more gender-equal views. Variables can be used individually or with the calculated average scores. To identify individuals from a specific country, the variables “Country_live” or “Country_raised” should be used, depending on whether researchers wish to use participants’ country of residence, the country in which they were raised, or both.

Additionally, interested researchers should be cautious about using the mate budget allocation task, in which participants had to distribute 30 points across six potential traits (health, kindness, physical attractivity, religiousness, financial prospects, correct age) in a romantic partner. When collecting data with the paper-and-pencil method in Algeria and Morocco, there was no validation of the total sum of allocated points. Consequently, the sum of allocated points exceeds 30 in almost 100 Algerian participants.