Introduction

Animal models are used throughout biomedical research to enrich our understanding of human disease and the biological mechanisms involved1,2. Minipigs have been increasingly used as animal model due to their possession of several advantageous features, including a compact body size and anatomical structures that are similar to those of humans3,4,5,6. Consequently, various minipig breeds have been developed (i.e., Minnesota, Yucatan, Göttingen, Bama, and Korean minipigs) and these are used as animal models in a wide range of biological studies, including xenotransplantation6. Relative to other pig breeds, minipigs are most notably different in that they have a smaller body and consequently have smaller organs. Body and organ size variation is known to influence biological processes, especially homeostatic processes that are essential for maintaining the internal stability of living organisms7,8. For example, larger species or individuals generally exhibit higher metabolic rates and oxygen consumption9. In addition, body size is closely related to the primary functions of each organ. For instance, previous studies have suggested that larger-bodied species and individuals generally have slower heart rates and lower kidney filtration rates10,11,12,13. Therefore, understanding the molecular basis of the distinctive biological characteristics of minipigs that arise from their small body and organ size, is essential for using minipigs as a model system for biomedical research. Furthermore, since various minipig tissues are used for different purposes in biological research, understanding tissue-specific transcriptomic characteristics is also crucial. However, few studies have used transcriptomic analyses to investigate minipig-specific phenotypes and traits on the tissue level.

Long non-coding RNAs (lncRNAs) are non-coding transcripts longer than 200 nucleotides that affect various biological processes, including metabolism, cell differentiation, and development14. Previous studies have found that lncRNAs can regulate gene expression on multiple levels, such as chromatin regulation and transcription regulation15. In addition, lncRNAs are also known to show tissue-specific expression patterns and functions16. Therefore, studying lncRNA-linked gene regulatory mechanisms on the tissue level, as well as the specific protein-coding genes and biological processes influenced by lncRNAs, may help to develop a deeper transcriptomic understanding of living organisms and their component tissues. Moreover, as recognition of the importance of the regulatory roles played by lncRNAs have increased, studies of lncRNAs in animal models such as mice and pigs have become more common17,18,19,20. However, relative to other animal models, comprehensive studies of lncRNAs in minipigs remain lacking.

In this study, we therefore investigated the regulatory roles of lncRNAs in different minipig tissues by identifying lncRNAs and conducting comparative transcriptomic analyses using RNA-seq data from five tissues (heart, kidney, liver, lung, and spleen) sourced from different pig breeds (Korean minipig, Bama minipig, as well as Duroc and Landrace pigs) with differences in size. Korean (https://www.apures.com/) and Bama21 minipigs are Asian minipig breeds that have been generated via crossbreeding with Asian pigs. Moreover, the Korean minipig is the only minipig breed registered with the United Nations and Agricultural Food Organization (FAO) and is further subclassified into different types based on body size22,23. In this study, we used ET (Extra tiny; 18–26 kg) and L (Large; 37–85.6 kg)-type minipigs. The Bama minipig is a genetically stable minipig breed that has been used in biomedical research for over 30 years24. Recently, active research exists for both minipig breeds, including the generation of chromosome-level genome assemblies22,25 and multi-omics data analyses23,26. Using comparative transcriptomic analyses, we aimed to identify lncRNAs associated with biological processes linked to traits shared by the two minipigs but not by the pig breeds used in this study. In addition, we uncovered lncRNA-linked gene regulatory mechanisms specific to each tissue in the three minipig breeds and used this to identify the specific biological pathways influenced by these mechanisms.

Results

Comprehensive analysis of long non-coding RNAs in minipigs

To study the regulatory roles played by minipig lncRNAs, we first identified a set of lncRNAs by using the pig reference lncRNA annotation combined with RNA-seq data from five tissues taken from 13 minipig samples (five ET-type and five L-type Korean minipigs, along with three Bama minipigs) and 14 pig samples (four Duroc and ten Landrace samples) (Supplementary Table S1). As a result, we identified a total of 5,288 lncRNAs, including 170 novel lncRNAs. In this lncRNA set, the average lengths of known and novel lncRNAs were 23,807 nt and 15,318 nt, respectively (p ≈ 0.02; one-sided t-test) (Supplementary Table S2 and Supplementary Fig. S1). Furthermore, a majority of lncRNAs (93.17%; 4,927 lncRNAs) were intergenic, with intronic, sense, and antisense lncRNAs accounting for 4.41% (233 lncRNAs), 1.21% (64 lncRNAs), and 1.21% (64 lncRNAs), respectively (Supplementary Fig. S1). Furthermore, we calculated the expression levels of all identified lncRNAs and normalized these values for subsequent analyses.

Next, we performed principal component analysis (PCA) on the normalized lncRNA expression profiles to identify overall patterns of lncRNA expression in different minipig and pig breeds (Fig. 1a). The first three principal components (PCs) were combined to explain 39.32% of the observed variance in lncRNA expression profiles and resolved clusters that could be used to distinguish between different breeds and tissues. In detail, we found that different breeds formed distinct clusters in the PC1-PC2 plot, while the five tissue types were separated along the PC3 axis in the PC1-PC3 plot. As expected, ET- and L-type Korean minipig samples were indistinguishable and formed a single cluster. Furthermore, we measured the transcriptomic distances between samples using normalized lncRNA expression profiles and clustered entire samples using sample-to-sample distances (Fig. 1b). As for the PCA, samples from the same breeds formed large clusters, whereas samples from the two types of Korean minipigs clustered together without distinguishing between types. In addition, we observed that the tissue-specific clusters were distinguishable within each breed cluster. Overall, these lncRNA expression results were markedly different than those for protein-coding gene expression profiles (Supplementary Fig. S2-S3). While breed variability explained the largest proportion of variance in lncRNA expression profiles and served as the primary factor causing their cluster formation, analyses of protein-coding gene expression profiles revealed that tissue variability was the most important factor used to distinguish among samples (Fig. 1 and Supplementary Fig. S2-S3).

Fig. 1
figure 1

Expression patterns of long non-coding RNAs in five tissues from minipig and pig breeds. (a) Principal component analysis (PCA) results of long non-coding RNA (lncRNA) expression profiles in five tissues from minipig and pig breeds. (b) Cluster analysis of sample-to-sample distances calculated using lncRNA expression profiles of all minipig and pig samples.

Tissue-common regulatory effects of long non-coding RNAs in minipig

To determine the potential regulatory impact of lncRNAs on five different minipig tissue samples, we first identified differentially expressed lncRNAs (DELs) between each minipig-pig pair for each tissue (Fig. 2a and Supplementary Table S3; an average number: 438). Of the five tissues, the number of DELs found in lung tissue samples was higher than in any of the others, showing an average number of DELs of 721. In addition, the numbers of DELs identified between Korean minipig-pig pairs were significantly higher than those identified between the Bama minipig-pig pairs (\(p \approx 0.00001\); one-sided t-test). Next, we identified differentially expressed protein-coding genes (DEGs; Supplementary Table S4) and found that the average number of DEGs between pairs of samples was 4,424. For each tissue, we then selected DELs and DEGs that showed differences among all three minipig types relative to pigs; the results of this comparison could then be used to discover novel lncRNAs and protein-coding genes that are potentially related to how the traits of minipigs differ from those of pig breeds (Methods; Supplementary Fig. S4-S8). As a result, we obtained an average of 164 DELs and 2,729 DEGs in all five tissues (Supplementary Tables S3-S4).

Fig. 2
figure 2

Correlation-based associations between long non-coding RNAs and protein-coding genes across five tissues in minipigs. (a) Bar chart showing the number of differentially expressed long non-coding RNAs (DELs) in five tissues across minipig-pig pairs. (b) Functional analysis of tissue-common differentially expressed protein-coding genes (tcDEGs) regulated by tcDELs. (c) Correlation network for tcDEL-tcDEG pairs associated with SMAD protein and the TGF-beta signaling pathway. The rectangles and ovals within the network represent tcDEGs and tcDELs, respectively. Oval color indicates the average fold change value of each tcDEL across all minipig tissues; red indicates higher expression in minipigs, and blue indicates lower expression. (d) Box plots showing normalized expression values of six target protein-coding genes in all five tissues of minipigs and pigs.

To study tissue-common effects of DELs in minipigs, we analyzed all DELs and DEGs that were observed in all five minipig tissues. We identified 18 and 464 tissue-common DELs (tcDELs) and DEGs (tcDEGs), respectively, and found 775 candidate tcDEL-tcDEG pairs with putative functional associations (Supplementary Table S5). To identify specific biological mechanisms associated with tcDEL-tcDEG pairs, we performed subsequent functional analyses and identified that they were related to metabolic processes, RNA metabolism, developmental processes, and signaling pathway, especially SMAD protein and steroid hormone receptor signaling (Fig. 2b and Supplementary Table S6). In addition, pathway analyses also identified specific KEGG pathways related to tcDEL-tcDEG pairs, including those related to ribosomal proteins and TGF-beta signaling (Supplementary Table S6 and Supplementary Fig. S9). In general, SMAD protein and TGF-beta signaling pathways are known to be related to organ size27,28,29; we further examined their correlation-based relationships, which may reflect putative functional associations, including the protein-coding genes AMH (ENSSSCG00000035812), BMPR2 (ENSSSCG00000016113), CSNK2B (ENSSSCG00000001414), NCOR1 (ENSSSCG00000018039), RHOA (ENSSSCG00000022080), and SH2B1 (ENSSSCG00000007804). Next, we constructed a network based on expression correlation patterns observed among tcDEL-tcDEG pairs (Fig. 2c). Our data revealed two distinct clusters, with several tcDELs correlated with multiple target protein-coding genes simultaneously. One of the two clusters included five tcDEGs (AMH, CSNK2B, NCOR1, RHOA, and SH2B1), which were mostly upregulated in minipig samples, negatively correlated with four tcDELs (Fig. 2c-2d). Among these four tcDELs, MINILNC_0553, MINILNC_0562, and MINILNC_2980 (ENSSSCT00000077462) were associated with all five tcDEGs in the cluster. Furthermore, another cluster consisted of a single tcDEG, BMPR2, which showed lower expression levels in minipig samples than in pig samples, as well as two tcDELs, MINILNC_0364 and MINILNC_2417 (Fig. 2d).

Given these findings, we further examined whether these six tcDELs—initially identified in the context of organ size regulation—are correlated with a broader set of protein-coding genes across multiple tissues in minipigs (Fig. 2c). Through tcDEL-tcDEG network analysis, we found that the tcDELs were related to a substantial number of additional tcDEGs beyond those directly associated with organ size and were particularly connected to the SMAD protein and TGF-beta signaling pathways. Specifically, we identified 209 tcDEGs that were correlated with these six tcDELs and were also related to tissue-common phenotypes in minipigs (Supplementary Table S7). Further functional analysis of these 209 tcDEGs revealed significant associations with RNA metabolism, RNA processing, and gene expression regulation (GO enrichment analysis; \(FDR<0.05\)), suggesting a broader transcriptomic association of these tcDELs across multiple tissues (Supplementary Table S8).

Tissue-specific regulatory effects of long non-coding RNAs in minipig

Previous studies have shown that lncRNAs often have tissue-specific effects30,31. To investigate potential tissue-specific regulatory effects of lncRNAs in minipigs, we first identified tissue-specific DELs (tsDELs), defined as those differentially expressed only in a particular minipig tissue type (Fig. 3a and Supplementary Table S9). Across the five tissue types, we found an average of 55 tsDELs and 159 corresponding tissue-specific DEGs (tsDEGs) per tissue (Supplementary Table S9). Interestingly, spleen samples exhibited the highest number of tsDELs among all tissues. To assess the biological relevance of these lncRNAs, we performed GO and pathway enrichment analyses for tsDEGs that showed expression correlations with tsDELs from each tissue (Fig. 3b, Supplementary Fig. S10, and Supplementary Table S10). The functional analyses revealed that these tsDEGs were generally involved in tissue-specific homeostatic processes. For example, tsDEGs linked to heart-specific DELs were enriched in biological processes related to cardiac function and cardiac muscle contraction. Similarly, kidney-specific DEL-DEG pairs were related to kidney development, transport, and various metabolic processes, while the GO terms and KEGG pathways of liver tissue samples were related to lipid and cholesterol metabolism. In the lung and spleen, tsDEGs associated with their respective tsDELs were involved in various immune responses. These tissues were also linked to KEGG pathways related to motor proteins and smooth muscle contraction. Notably, genes involved in the cGMP-PKG signaling pathway were enriched among the tsDEG that showed expression correlation with heart- and spleen-specific tsDELs (Fig. 3b and Supplementary Fig. S11-S12).

Fig. 3
figure 3

Correlation-based associations between long non-coding RNAs and protein-coding genes in specific tissues of minipigs. (a) A Venn diagram representing the overlap of differentially expressed long non-coding RNAs (DELs) observed in five different tissues. (b) Functional analyses of tissue-specific differentially expressed protein-coding genes (tsDEGs) that showed expression correlation with tsDELs from five minipig tissues. (c, d) Correlation networks of tsDEL-tsDEG pairs for minipig (c) heart and (d) spleen tissue samples. Rectangles and ovals represent tsDEGs and tsDELs, respectively. Oval color indicates average fold change value of each tsDEL across all minipig tissues; red indicates higher expression in minipigs, and blue indicates lower expression. (e, f) Box plots showing the normalized expression values of target protein-coding genes in target tissues (heart and spleen) and other samples.

Next, to further explore the involvement of tsDELs in the cGMP-PKG signaling pathway, we first searched for tsDEL-tsDEG pairs associated with this pathway in minipig heart samples. We identified 16 tsDEL-tsDEG pairs, including the protein-coding genes ADCY7 (ENSSSCG00000039408), CACNA1C (ENSSSCG00000041764), IRS4 (ENSSSCG00000031027), NPPB (ENSSSCG00000003431), and SLC8A2 (ENSSSCG00000029567). Using these tsDEL-tsDEG pairs, we constructed a correlation network for heart tissue samples (Fig. 3c). In this network, we identified a large cluster encompassing most of the identified pairs. We also found that three protein-coding genes—CACNA1C, NPPB, and SLC8A2—were co-associated with MINILNC_0669 (ENSSSCT00000077252). The expression levels of these protein-coding genes in minipig heart tissue samples were distinguishable from the levels found in other tissues or breed samples (Fig. 3e). In particular, the expression level of NPPB gene, which was correlated with nine different tsDELs, was elevated in minipig heart samples (Fig. 3c-3e).

In minipig spleen samples, we identified nine spleen-specific tsDEL-tsDEG pairs related to the cGMP-PKG signaling pathway. Based on expression correlations, we constructed a network that included the protein-coding genes CACNA1D (ENSSSCG00000011455), KCNMB1 (ENSSSCG00000017005), NFATC2 (ENSSSCG00000007477), NPPC (ENSSSCG00000026852), and ROCK2 (ENSSSCG00000008629) (Fig. 3d). Within this network, we identified three different tsDELs—MINILNC_0633 (ENSSSCT00000080123), MINILNC_3868 (ENSSSCT00000084314), and MINILNC_5017 (ENSSSCT00000076369)—that were found to be correlated with NFATC2. We also identified several spleen-specific DELs that showed correlation with multiple tsDEGs within the network. For example, MINILNC_1345 and MINILNC_4899, associated with KCNMB1, were also linked to two other protein-coding genes, CACNA1D and NPPC. Finally, the expression levels of these tsDEGs in minipig spleen samples were different from those observed in other samples (Fig. 3f).

Next, we selected several KEGG pathways associated with tissue-specific functions and constructed tsDEL-tsDEG correlation networks for the other tissues. For example, the aldosterone-regulated sodium reabsorption pathway was selected as a target pathway for kidney tissue samples (Supplementary Fig. S13), while the PPAR signaling and WNT signaling pathways were examined in the liver and lung, respectively (Supplementary Fig. S14-S15). Further analysis of the tsDEL-tsDEG networks related to these pathways for kidney, liver, and lung tissues revealed that the tsDELs involved in these pathways regulated multiple tsDEGs at once and formed complex network structures (Supplementary Fig. S16). In addition, the expression levels of both tsDELs and tsDEGs showed greater differences among minipig tissues relative to pig breeds or tissues (Supplementary Fig. S16-S17).

Discussion

In this study, we investigated how lncRNAs may contribute to phenotypic and transcriptomic variation in minipigs by performing a comparative transcriptomic analysis across five tissues from various minipig and pig breeds (Supplementary Table S1). Based on 5,288 identified lncRNAs (Supplementary Table S2), we analyzed expression patterns across breeds and samples, and found that lncRNAs showed more pronounced breed-level variation than protein-coding genes (Fig. 1 and Supplementary Fig. S2-S3). PCA and clustering analyses revealed that minipig samples were first grouped by breed, with tissue-level distinctions emerging within each breed. These findings are consistent with known differences in evolutionary constraints: protein-coding genes, which perform essential cellular functions, tend to be highly conserved and maintain similar functional roles across different breeds or species32. In contrast, lncRNAs, which primarily function as regulatory elements, are subject to weaker functional constraints and exhibit rapid turnover during evolution33. Thus, our findings suggest that lncRNAs and protein-coding genes in minipigs and pigs are under distinct evolutionary constraints, with lncRNAs being shaped by both breed-dependent genetic divergence and tissue-specific regulatory programs.

To explore whether certain lncRNAs might be involved in shaping shared transcriptomic features across minipig tissues, we focused on DELs and DEGs that were consistently detected in all five tissues from minipig-pig comparisons (Fig. 2a, Supplementary Tables S3-S4, and Supplementary Fig. S4-S8). Expression correlation analysis revealed 775 tcDEL–tcDEG pairs, many of which were associated with biological processes relevant to organ size and metabolic regulation (Fig. 2b, Supplementary Table S5, and Supplementary Fig. S9). Among these, the correlation network analysis on tcDEL–tcDEG pairs related to SMAD protein and TGF-beta signaling pathways directly implicated in organ size and developmental process regulation27,28,29 revealed a complex structure involving several key protein-coding genes and lncRNAs co-associated with multiple targets (Fig. 2c, d). These findings suggest that tcDELs are connected to a broad range of biological processes, both directly and indirectly linked to phenotypic traits in minipigs. It implies that lncRNAs may play roles in establishing shared transcriptomic features across multiple tissues in minipigs, potentially contributing to tissue-common phenotypes through broad expression coordination with protein-coding genes.

To investigate how lncRNAs might contribute to tissue-specific features in minipigs, we examined tsDELs and their correlated tsDEGs (Fig. 3a and Supplementary Table S9). Functional analyses revealed that tsDEGs were closely associated with core physiological processes characteristic of each tissue34,35,36,37 (Fig. 3b, Supplementary Table S10, and Supplementary Fig. 10). These results suggest that lncRNAs may be involved in shaping tissue-specific traits in minipigs by modulating gene expression programs linked to key biological functions. Notably, protein-coding genes involved in the cGMP–PKG signaling pathway—known to play critical roles in various essential biological processes38,39,40—were notably enriched in both heart and spleen tsDEG sets (Fig. 3b and Supplementary Fig. S11-S12). We constructed tissue-specific networks based on tsDEL–tsDEG correlations within this pathway (Fig. 3c-3d). In the heart tissue, NPPB, which encodes a cardiac hormone and regulates cardiovascular homeostasis41, showed markedly higher expression in minipigs and was associated with nine distinct tsDELs (Fig. 3e). In the spleen tissue, NFATC2, a protein-coding gene involved in T cell homeostasis and spleen size regulation42, was co-expressed with multiple tsDELs (Fig. 3f). Taken together, these findings suggest that even within the same biological process, lncRNAs may be associated with distinct regulatory mechanisms and elements depending on the target tissue, consistent with previous studies30,31. This highlights the potential role of lncRNAs in helping define diverse tissue-specific phenotypes in minipigs.

In conclusion, our study highlights the potential regulatory role of lncRNAs in shaping both tissue-common and tissue-specific traits in minipigs. By analyzing transcriptomic data across multiple tissues and breeds, we found that lncRNAs are associated with diverse physiological processes that distinguish minipigs from pig breeds. These findings provide a valuable transcriptomic resource and offer insights into how lncRNAs may contribute to the unique biology of minipigs, reinforcing their utility as animal models. While our study provides novel insights, several limitations suggest the need for cautious interpretation of the results. First, the associations between lncRNAs and protein-coding genes were inferred solely from expression correlations, which may not fully capture the complexity of underlying regulatory mechanisms, including the directionality of regulation. Second, experimental validation, such as RT-qPCR, is needed to confirm the RNA-seq data-based analyses conducted in this study. Third, our study focused on only three Asian minipig breeds due to limited data availability, and thus the findings may reflect characteristics specific to Asian minipigs. In the future, we aim to confirm the results of this study by integrating multi-omics data, functional validation experiments, and broader sampling across minipig breeds to further elucidate the roles of lncRNAs in minipigs.

Methods

RNA-seq data collection and preprocessing

Five tissue samples, taken from the heart, kidney, liver, lung, and spleen, were collected from a total of 10 Korean minipigs (i.e., five each of the extra tiny (ET)-type and the large (L)-type). The minipigs were euthanized with an anesthetic injection given into the ear vein with an overdose of Alfaxan. All animal experiments were approved by the National Institute of Animal Science (NIAS) and were conducted following the relevant guidelines and regulations. All procedures are reported in accordance with the ARRIVE guidelines. Total RNA was then extracted from all samples using TRIzol reagent. Sequencing libraries were then prepared using a TruSeq Stranded mRNA LT Sample Prep Kit (Illumina, San Diego, CA, USA) before being sequenced on an Illumina platform. RNA-seq data for the Bama minipig, Duroc pig, and Landrace pig were retrieved from the NCBI database using accession numbers PRJNA663759, PRJNA392949, and PRJEB1213, respectively (Supplementary Table S1).

To remove adaptor and low-quality sequences from RNA-seq data, raw reads were first trimmed using Trimmomatic (v0.36)43 using the following options: “LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36.” Next, we used SortMeRNA (v4.2.0)44 to filter rRNA sequences from RNA-seq data, with all options set to the default. Next, filtered RNA-seq data were evaluated using FastQC (v0.11.7)45, then aligned to the pig reference genome (Sscrofa 11.1) using STAR (v2.7.1)46 with default parameters.

Identification of long non-coding RNAs in minipig and pig breeds

To identify lncRNAs in minipig and pig samples, we generated transcript assemblies using RNA-seq data and filtered the final lncRNA set according to a previous study47 (Supplementary Fig. S1). Briefly, a transcript assembly for each sample was first generated using StringTie (v2.1.4)48 using the “-f 0.1 -c 2.5” options, then the transcript assemblies of all samples were merged using the “-merge -F 0.5” options. Next, we selected transcripts longer than 200 nucleotides from the merged transcript assembly, compared them to the pig reference gene annotation (Ensembl release 100), and annotated the identified transcripts using GffCompare (v0.12.1)49. All single-exon transcripts were discarded using the “-M” option, while those with the “i”, “o”, “u”, and “s” class codes or those that matched non-coding genes present in the reference gene annotation were selected for subsequent analyses. Coding potential scores for the selected transcripts were then calculated using three different methods: CPC2 (v2.0)50, PLEK (v1.2)51, and CPAT-S (v3.0.0)52. For CPC2 and PLEK, the default cutoff values (0.5 and 0, respectively) were used to distinguish between coding and non-coding transcripts. For CPAT-S, since the tool does not provide a predefined cutoff value, a tenfold cross-validation was conducted using pig reference transcripts, after which a cutoff value of 0.4 was chosen (Supplementary Fig. S2). Then, transcripts identified as non-coding transcripts in all three methods were selected. Especially, to validate the newly identified lncRNAs, RNA-seq data from all samples were mapped to the lncRNA transcripts using STAR (v2.7.1)46. For each lncRNA, the number of supporting breeds, samples, and RNA-seq reads were calculated using MAPQ > 20 properly mapped read pairs. Finally, novel lncRNAs averaging ≥ 10 mapped reads per sample and present in ≥ 5 samples were included in the final lncRNA set. According to their respective genomic origins, the lncRNAs identified were classified as intergenic, intronic, sense, and antisense lncRNAs53.

Principal component and cluster analyses

We performed principal component analysis (PCA) and cluster analysis on the lncRNA expression profiles for all minipig and pig samples calculated using RSEM (v1.3.0)54 using the “–star” option. The resulting lncRNA expression profiles for all samples were then normalized using the “vst” function of the DESeq2 R package (v.1.22.2)55. PCA for all minipig and pig samples was conducted using normalized lncRNA expression values by following the “PCA” method of the scikit-learn Python module (v0.21.3)56. For the cluster analysis, the sample-to-sample distances were computed using the “dist” function of the DESeq2 R package. Clusters were then identified and visualized using sample-to-sample distances using the pheatmap R package (v1.0.12)57. Finally, we compared the features of lncRNA and protein-coding gene expression patterns using normalized expression profiles of protein-coding genes; these were also generated using RSEM and DESeq2 (as described above) and were used in subsequent cluster analyses.

Differentially expressed long non-coding RNA and protein-coding gene analyses

To identify differentially expressed lncRNAs (DELs) and protein-coding genes (DEGs) between minipig and pig tissue samples, we compared the lncRNA and protein-coding gene expression profiles of all minipig-pig pairs for each tissue using the DESeq2 R package (v.1.22.2)55. In these analyses, we used batch factors for the model to control for batch differences and to balance samples across experimental batches, as per a previously described method58. Protein-coding genes and lncRNAs that showed both \(\left| {\log_{2} FC} \right| \ge 1\) and \(adjusted p < 0.05\) were selected as DEGs and DELs, respectively.

To identify candidate DELs and DEGs that may affect Korean and Bama minipig phenotypes, we identified lncRNAs and protein-coding genes that were differentially expressed in all three minipig breeds (ET-type, L-type, and Bama) relative to Duroc or Landrace pigs; these were selected for each tissue then used for subsequent analyses. Using these DEL and DEG sets, we defined potential tissue-common DELs (tcDELs) and DEGs (tcDEGs), for each of the five tissues used in this study. In contrast, DELs and DEGs identified in only one tissue were defined as tissue-specific DELs (tsDELs) and DEGs (tsDEGs).

Functional analysis of protein-coding genes associated with long non-coding RNAs

To investigate the biological functions associated with DELs in minipig, we identified DEGs that showed expression correlations with DELs. To do so, we calculated Pearson’s correlation coefficient (PCC) values using normalized lncRNA and protein-coding gene expression profiles to identify putative DEL-DEG regulatory relationships using the “pearsonr” method as implemented in the scikit-learn Python module (v0.21.3)56. Similar to the previous study59, we selected DEL-DEG pairs with \(\left| {PCC} \right| > 0.7\) and \(adjusted p < 0.05\) as candidate pairs.

Next, we investigated the biological functions of the DEGs regulated by DELs by performing Gene Ontology (GO) and pathway enrichment analyses. To do so, we used g:Profiler60 for GO enrichment analyses using the GO biological process terms and used ShinyGO (v0.81)61 to perform KEGG62,63,64 pathway enrichment analyses and visualize pathways enriched with target protein-coding genes. All annotated protein-coding genes were used as background genes for both GO and KEGG pathway enrichment analyses, and enrichment significance was assessed using the false discovery rate (FDR).