Abstract
Predicting community evolution in dynamic social networks is crucial for relevant authorities to understand trends and implement safety measures in advance. Most existing algorithms for predicting community evolution rely on extracting community state features to forecast evolutionary events. However, in highly interactive social networks, such as corporate collaboration networks in financial markets, extracting high-quality community state features is extremely challenging. This study proposes a community evolution prediction method based on feature change patterns, aiming to explore the changing features during community evolution, and designs an algorithm to learn the rules of feature changes, thereby obtaining the feature change pattern of the community. Compared to traditional methods that rely on static state features, our proposed approach captures richer dynamic information and more accurately reflects community evolution trends. Additionally, we have designed a parallel learning strategy with parameter sharing, based on the consistency of community environments. Experimental results show that our method, based on feature change patterns, achieves approximately 25% improvement in maximum predictive performance on the AS, DBLP, and Facebook datasets compared to baseline methods (TNSEP, GNAN, and MF-PSF). Additionally, the parallel learning mechanism reduces training time by nearly half.
Similar content being viewed by others
Introduction
Social networks consist of community member nodes with stable connections1,2. Individual behavior changes cause communities to evolve, making social networks dynamic entities3,4. These networks contain rich information and diverse patterns5, making knowledge mining increasingly popular, especially for predicting critical evolution events6.This belongs to the research category of social network analysis7,8. Communities will experience various events during the evolution process9, including formation, dissolution, growth, shrinkage, survival, merger, and split, as shown in Fig. 1. This is significant across fields: analyzing social networks aids in predicting crime10; tracking infected communities helps anticipate disease outbreaks11; analyzing rumor dissemination patterns maintains social stability12; and detecting malicious content in social media networks13.
Community evolution prediction involves forecasting future evolution events based on historical features with time-series relationships in the community evolution process. Currently, the framework for community evolution prediction is studied in four steps: dividing the time windows of dynamic networks14,15, detecting the community structure within networks under different time windows16,17, tracking the evolutionary sequences of different communities and identifying critical events18, and predicting the next event based on the tracked information19. As the final step, designing an effective method to predict events is crucial. The challenge lies in extracting characteristic information and designing a reasonable classification model for prediction.
Mainstream prediction methods rely on historical community state features to predict future events, as shown in Fig. 2. Researchers have introduced various methods to describe the state of communities, including direct calculation of structural features and discovering latent structural features. When the network state undergoes drastic changes, capturing the complete evolution process becomes challenging, affecting the quality of community state feature descriptions. Unlike historical states, community feature change rules tend to be more stable. This study comprehensively considers how changing feature patterns influence community evolution direction.
We hypothesize specific feature change patterns correspond to each type of evolutionary event, using changed feature information (Fig. 2). In our prediction method, we design an approach to obtain a complete sequence of change features. However, this field mainly relies on classical classifiers, which fail to fully capture the relationship between changing feature sequences and event sequences. This study adopts the inspiration of recurrent neural networks (RNN)20,21. In our prediction method, we designed a parallel Long Short-Term Memory (LSTM) model to learn the relationships. The parallel learning mechanism, which includes multi-process concurrency and parameter sharing, aims to reduce time consumption. This model captures the impact of changes in community feature patterns on critical events and fully utilizes time series information.
The prediction method based on feature change patterns fully mines the knowledge contained in the evolutionary sequences. The significance of our work can be summarized as follows:
-
To reveal the feature change patterns within community evolution, this paper proposes a prediction method. It uses more advantageous differential features in time series to characterize these patterns and considers their influence on the evolution direction.
-
To absorb knowledge from feature change patterns, we developed a parallel LSTM model. Recognizing that each community is influenced by the same network environment, we introduced a parallel mechanism into the existing LSTM model to enhance operational efficiency.
-
Compared to mainstream methods for predicting critical events, our method achieves better prediction accuracy in real-world networks under the setting of using the same backbones.
-
Compared with other deep learning models, the time cost of our proposed model with a parallel mechanism is lower.
Related work
Several approaches have tackled the challenge of predicting dynamic social network evolution. Dakiche et al.22 partitioned timeframes based on network activity distribution. Li et al.23 proposed a multi-objective optimization-based community detection algorithm. Berahmand et al.24proposed an efficient attributed graph clustering/community detection algorithm named WSNMF, which is an innovative extension of SNMF. This method introduces node attribute similarity to compute a weight matrix, effectively bridging the gap for attributed graph clustering. SDAC-DA25 transforms the attribute network into a dual-view network, applies a semi-supervised autoencoder layering approach to each view. The resulting representation layer contains high clustering-friendly embeddings, which are optimized through a unified end-to-end clustering process for effectively identifying clusters. Bródka et al.26 introduced the GED algorithm for tracking critical events. Ilhan et al.27 developed a method for community similarity and evolution tracking. Li et al.28 utilized resistance distance for community evolutionary chain tracking. These studies enhanced network partitioning, community discovery, or tracking methods, all contributing to the final step of predicting future events.
To enhance predictive accuracy, numerous studies have proposed innovative approaches to community state feature extraction. For instance, Gliwa et al.29 introduced group features, while Br’odka et al.30 described community states using micro-node and meso-level characteristics. Pavlopoulou et al.31 added temporal features to structural features. Dakiche et al.32 proposed a feature set combining community structure and member influence for event prediction based on feature change rates. Tajeuna et al.33 developed a feature set to assist Cox regression, and Mohammadmosaferi et al.34 introduced AFIF for identifying critical structural features. Ding et al.35 introduced a 48-dimensional feature set for enhanced prediction accuracy.
Some methods employ graph mining techniques to extract community state features. Wang et al.36 developed algorithms for hypergraph construction to mine inter-community features. Revelle et al.37 used a graph neural network(GNN) with an attention mechanism to obtain community representations. Chen et al.38 integrated static features with dynamic information, exploring community structural features through DeepWalk and spectral propagation. Feature selection among a large number of community state features may be a good strategy. For example, Sheikhpour et al.39 proposed a feature selection method formulated in the trace ratio form, integrating hypergraph Laplacian-based semi-supervised discriminant analysis (SDA) and the mixed convex and non-convex \(\ell _{2, p}\)-norm\((0<p\le 1)\) regularization.
Choosing an appropriate classification model is crucial for leveraging feature effectiveness. Saganowski et al.40 found that the random forest classifier, ranked second among 15 classifiers, can classify all events and is widely used in existing research. Dakiche et al.41 validated the effectiveness of customized timeframe partitioning using four traditional machine learning methods. Rajita et al.42integrated multiple classifiers to improve prediction accuracy over individual models. They also applied GANs43 to stabilize data distribution. These models still rely on historical state characteristics and underutilize temporal information in samples.
The above methods focus fully on state feature information. This article describes a pattern by learning the relationship between the sequence of differential features and the sequence of events, and proposes a prediction method to obtain complete sequence information. For the prediction method based on feature change patterns, we need a classification model that learns variation patterns and time series information. Therefore, we chose the LSTM model and enhanced it with parallel mechanisms and learning strategies for event prediction.
Problem description
In a dynamic social network, a timestamp \(t\) typically denotes a specific time point. Each timestamp corresponds to a static social network graph, denoted as \(\{V, E\}\), where \(V\) represents the set of all nodes and \(E\) represents the set of all edges. We utilize a series of time-ordered graphs to record the evolution of a dynamic social network \(G\). \(m\) is the number of timestamps.
A framework for community evolution prediction processes the data in the following steps, as shown in Fig. 3a:
-
The timeframe partitioning strategy segments the dynamic network into a series of timeframes \(\{T_1, T_2,\dots , T_\tau \}\). Each timeframe represents a time window comprising several consecutive timestamps, describing the evolution of the social network over that period. Here, \(\tau\) denotes the number of timeframes. The dynamic network is represented as \(\bigcup \limits _{i = 1}^{\tau } g_{T_i}\).
-
The community discovery algorithm is applied to identify the community structure within \(g_{T_i}\), resulting in the community set \(\bigcup \limits _{j}C_{T_i}^{j}\).
-
The community tracking algorithm tracks a series of evolutionary sequences and critical events with time series relationships. For instance, an evolutionary sequence with a length \(L\) of 3 can be represented as \(\{C_{T_{L-2}}^{a}, C_{T_{L-1}}^{b}, C_{T_{L}}^{c}\}\). Historical critical events can be represented as \(\{\textrm{event}_{T_{L-2}}^{a'}, \textrm{event}_{T_{L-1}}^{b'}\}\).
-
The features of the community evolutionary sequence are calculated, and a classification model is subsequently trained to predict future critical events \(\{\textrm{event}_{T_{L}}^{c'}\}\).
The framework aims to predict the critical events that might occur in the \(T_{\tau +1}\) timeframe of the community. Our main focus is on the fourth step of the framework. Accurately capturing community evolution, as well as utilizing classifiers that can effectively extract this information, is crucial for enhancing the accuracy of event predictions.
Methods
Datasets
Our method was applied to five real datasets with different categories and sizes: they were the communication network Autonomous System (AS) dataset44, the AS-Caida dataset44, the co-authorship network DBLP dataset45, the post network Facebook dataset46, and the question-and-answer network Sx-askubuntu-c2q dataset47. The basic parameters of the network are shown in Table 1. The fifth column represents the time span, the sixth column represents the number of timestamps, the seventh column represents the average node count, and the eighth column represents the average edge count.
The proposed prediction algorithm based on feature change patterns
As the final step of the evolutionary prediction problem, the approach is outlined in Fig. 3b. After screening and normalizing the community features, an appropriate machine learning method is selected for classification prediction. However, general methods only focus on the role of past community state information on events, as shown in Fig. 3c. The state features represent the static features of a community at a specific moment, while evolution events represent the process of change in the community from one moment to the next. Associating state features with evolution events clearly overlooks the dynamic nature of these events. Using differential features can solve this problem.
We propose a novel prediction algorithm based on feature change patterns, outlined in four steps as depicted in Fig. 3d: First, calculate community features in differential form. Second, design a regression model and training strategies to predict future feature changes. Third, use a linear layer that extract high-order features to describe feature change patterns. Fourth, utilize a parallel LSTM model to learn these patterns and predict future events. These patterns encompass both long-term trends and short-term variations. An LSTM model trained with community differential features can simultaneously capture both aspects. The differential features at the current moment represent short-term variations, while historical differential features imply long-term trends. This enables the LSTM model to learn the feature change patterns based on the differential features effectively. In the following sections, we detail the prediction algorithm.
To facilitate subsequent discussions, we succinctly represent an evolutionary sequence of length \(L\) as \(\bigcup _{i=1}^{L} C_{T_i}\). The representation of historical critical event sequence is \(\bigcup _{i=1}^{L-1} event_{T_i}\). The goal is to predict the critical event \(event_{T_L}\) when \(C_{T_L}\) evolves into \(C_{T_{L+1}}\).
Community feature extraction
For each community state features (\(SF\)), we adopt the 48-dimensional features proposed by Ding35, as shown in Table 2. Ding’s research indicates that these features effectively characterize community attributes and contribute to successful evolutionary analysis. The calculation formula for 48-dimensional features is detailed in Appendix A.
Fig. 3c shows that traditional prediction strategies concatenate historical features as input and use \(event_{T_L}\) as the label, without adequately addressing the influence of changes in community features on evolutionary events. We examine the changing patterns of these features and establish their relationship with events. To achieve this, we apply a differencing operation to the community’s 48-dimensional feature vectors across adjacent time frames, as detailed in Eq. (3). This process results in a sequence of changing features (\(CF\)), denoted as \(\bigcup _{i=1}^{L-1} CF_{T_i}\).
LSTM regression network
The objective is to predict \(event_{T_L}\), but computing the differential feature corresponding to \(event_{T_L}\) is unfeasible, because the features of \(C_{T_{L+1}}\) are unknown. Therefore, we devise an LSTM regression network to forecast the changing features \(CF_{T_L}\) for each evolutionary sequence. The LSTM regression network is chosen as the learning model for its capability to predict 48-dimensional features simultaneously and its efficient memory management for sequential tasks.
The LSTM regression model consists of an LSTM layer with a hidden dimension of 64 and a linear layer with an output dimension of 48. The training strategy of the model is as follows.
Step 1: The sequence of changing features is divided into an input set and a label set for the regression model. The differential feature of the current timeframe is used as the label for the preceding timeframe (Eq. (4)). Then, the regression model is trained. The mean square error (MSE) is used to calculate the loss during model training (Eq. (5)). In this equation, \(CF_{T_i}\) represents the actual value, \(\hat{CF_{T_i}}\) is the predicted value, and \(n\) denotes the number of evolutionary sequences. To minimize the loss function value, the gradient descent method is employed.
Step 2: The trained model is used to input all changing feature sequences and predict future feature values (Eq. (6)).
Step 3: Combine historical differential feature sequences to obtain the complete sequence of feature changes, denoted as \(\bigcup _{i=1}^{L} CF_{T_i}\).
Extracting high-order features
The 48-dimensional features cover micro, meso, and macro aspects of the network, exhibiting high dimensionality and diverse types. Each community’s differential features evolving from \(T_{i-1}\) to \(T_{i}\) correspond to a specific event. Capturing higher-order features (\(HF\)) from the differentials aids the model in learning these associations in subsequent steps. High-order feature sequences describe more intricate patterns of feature changes, offering greater accuracy than using differential features directly. Extracting high-order features via a linear layer can also reduce their dimensionality.
A linear layer performs a linear transformation on the input (Eq. (7)).
Where \(n\) is the number of samples, \(s\) is the number of input neurons (the number of sample features), \(o\) is the number of output neurons (the number of high-order features of the sample), \(W\) is the parameter to be learned by the module, and \(b\) is an \(o\)-dimensional vector bias. The output is a linear combination of input features. In this experiment, the linear layer halves the feature dimension (Eq. (8)). After processing through this module, the high-order feature sequence \(\bigcup _{i=1}^{L} HF_{T_i}\) is obtained, which is used to describe the feature changing patterns during community evolution.
Parallel LSTM prediction model
We utilize the proposed parallel LSTM prediction model to learn the patterns of feature changes. The reasons for choosing this model are as follows: Firstly, leveraging an RNN to learn the correspondence between two sets of sequences aligns with the sequence-to-sequence pattern of RNNs. To mitigate the problem of gradient vanishing in RNNs, the prediction model is designed based on the LSTM model. Secondly, as the scale of social networks continues to expand, processing large-scale data faces the challenge of high computational complexity. Parallel learning models can reduce time consumption, but independent training will increase the number of model parameters. Since different communities are influenced by the same overall network environment during the evolution process, we propose a parallel LSTM prediction model based on parameter sharing.
All feature change sequences \(\bigcup _{i=1}^{L} HF_{T_i}\) constitute the feature set (Feature_Seqs), while all critical event sequences \(\bigcup _{i=1}^{L} event_{T_i}\) constitute the event set (Event_Seqs). Algorithm 1 outlines the flow of the parallel LSTM model to predict critical events.
Each LSTM model consists of an LSTM layer with a hidden dimension of 64 and a linear layer, which transforms the output vectors of the LSTM layer into label vectors (line 5). The learning process of a neural network is essentially the process of adjusting the weight matrix. After each LSTM model is trained, a set of weight parameters, labeled \(W_{all}\), is obtained.
The prediction model adopts two mechanisms: weight sharing and multi-process concurrency, to reduce training time. The training set is evenly divided into five parts. One part is trained with an LSTM model, and the learned parameter \(W_{\textrm{all }\_1}\) is shared with the remaining four LSTM models. These four LSTM models are then trained concurrently using multi-processing, based on the shared parameters, and continue to train on the remaining four datasets. For the final result of the test set, a voting method is employed. Each sample is predicted using all five LSTM models, and the result with the highest number of votes is selected as the final result.
When the number of sample sets is less than 1000, using the parallel mechanism results in extremely imbalanced categories within each small sample set, which is not conducive to model learning. At this point, we introduced a supplementary learning mechanism using a single LSTM model, as indicated by the purple arrow in Fig. 4
Setting parameters
The parameter dimensions are shown in the Table 3. \(W_{ih}\) and \(b_{ih}\) are the weight matrix and bias between the input layer and the hidden layer, respectively. \(W_{hh}\) and \(b_{hh}\) are the weight matrix and bias within the hidden layer, respectively. \(W_{o}\) and \(b_{o}\) are the weight matrix and bias between the hidden layer and the output layer, respectively. In the subsequent experiments, the number of hidden layers is set to 1 and the number of hidden units is set to 64.
The reasons for setting parameters in this way and more detailed parameter settings can be found in Appendix B.
Experimental results and analysis
This paper proposes a prediction method to characterize the pattern of community feature changes, learn their impact on evolutionary events, and forecast critical events in the next timeframe. In this section, we will analyze the effectiveness of the prediction method from two aspects: the accuracy of the prediction algorithm and the time complexity of the model.
Baselines
To verify the effectiveness of our proposed prediction method, we compared it with seven other recent mainstream community evolution prediction algorithms. Evolution event prediction methods based on machine learning are used in Bródkas’s algorithm, Dakiche’s algorithm, Tajeuna’s algorithm, TNSEP algorithm, MF-PSF algorithm and SATPM algorithm such as random forests, logistic regression, and probability-based methods. The GNAN algorithm uses a GNN-based prediction method. In addition, we also validated our proposed prediction algorithm on the newly community tracking algorithm(ECDR).
-
Bródka’s algorithm30: This algorithm is based on the results of the GED community tracking algorithm. It uses a classifier to learn the historical state features.
-
Dakiche’s algorithm32: Features are proposed from two aspects: community structure and influential member. The rate of change in these features is used to predict events.
-
Tajeuna’s algorithm33: For this algorithm, eleven-dimensional state features are proposed to describe the community state, and the Cox model is used to predict critical events.
-
Dakiche’s algorithm (TNSEP)22: A new framework is used to identify the timeframe partitioning. The community status is described using ten-dimensional features.
-
Revelle’s algorithm (GNAN)37: This algorithm builds a GNN model, which learns the characteristics of nodes and groups.
-
Li’s algorithm (ECDR)28: In this algorithm, the resistance distance model is used to track the community structure and discover events.
-
Chen’s algorithm (MF-PSF)38: The algorithm combines multivariate feature sets and potential structural features to describe community.
-
Ding’s algorithm (SATPM)35: The algorithm adaptively divides the timeframe size, proposes 48-dimensional features to describe the community.
Evaluation metrics
The \(\mathrm {F_{measure}}\) is used for each category as a metric, combining precision and recall in a harmonic mean form, where an \(\mathrm {F_{measure}}\) of 1 indicates optimal performance and 0 indicates poor performance.
The next three metrics work together to allow for a more comprehensive evaluation of the overall performance of a method. Accuracy refers to the ratio of the number of samples correctly classified by the classifier to the total number of samples. Macro average accuracy (Macro avg) calculates the arithmetic mean of the \(\mathrm {F_{measure}}\) values for each category, ensuring equal consideration of all categories during evaluation. Weighted average accuracy (Weighted avg) weights the \(\mathrm {F_{measure}}\) values based on the sample size of each category, giving greater influence to categories with more samples. The specific calculation formulas are as follows:
The number of event types is denoted as \(NT\). \(N\) represents the total number of event, and \(N(i)\) represents the number of events in a specific category.
Cross-validation is primarily used to evaluate the performance of the model. We employed the five-fold cross-validation method to calculate the average accuracy (\(\mathrm {\overline{Acc}}\)). If the average performance of these five models is good, it indicates that the model has a certain level of generalization ability. The variance reflects the stability of the five outcomes in five-fold cross-validation. \(Acc_i\) represents each individual result. The formula for variance (\(\sigma ^2\)) can be expressed as:
To more clearly and intuitively demonstrate the effective improvements of our proposed method, we opt to use the magnitude of improvement as the standard for presentation. Performance Improvement (PI) quantifies the degree of enhancement in predictive performance of our proposed algorithm (Prop) compared to the baseline algorithm (Base). We use standard deviation (\(\sigma\)) to reflect the volatility of the predicted results by the proposed method. This degree of improvement can be quantified by the following formula:
We validated the time efficiency advantage of the model by comparing the actual time spent in the prediction process. Our experiment measured the training time, testing time, and total time for one fold of five-fold cross-validation in seconds.
Community evolution prediction results and analysis
To evaluate the effect of our proposed feature change pattern-based prediction algorithm on the results, we conducted two sets of experiments for validation: First, we evaluated the effectiveness of the characterized feature change patterns. This was verified by comparing the conventional community state feature-based prediction algorithm against the one relying on feature change patterns for event predictions. Second, the performance of the proposed prediction method was determined by contrasting the feature change pattern-based algorithm against other prediction algorithms. Additionally, the efficacy of the parallel mechanism within the parallel LSTM prediction model was also verified.
As we proposed a new method in the fourth step of the social network evolution predictive framework, the algorithm used in the first three steps remained consistent with the comparison algorithms, with changes made only in the fourth step.
Validity analysis of characterized patterns
In the fourth step, the traditional prediction method learned the relationship between the historical state features and critical events (SF_E), with inputs and labels shown in Eq. (17). This relationship was improved by learning the relationship between the change patterns of community features and critical events (CF_E), with inputs and labels shown in Eq. (18).
The Pearson correlation coefficient is first used to verify the effectiveness of the change features compared to the state features. The Pearson correlation coefficient is an index for measuring the linear correlation between two variables.
For two vectors \(X=[x_1,...,x_n]\), \(Y=[y_1,...,y_n]\), their Pearson correlation coefficient is
where \(\bar{x}\) is the mean of X and \(\bar{y}\) is the mean of Y. Among multiple vectors, the average correlation coefficient of each pair of vectors is regarded as the correlation coefficient of these vectors.
The community disappears after the dissolution event, so the change features cannot be calculated. Therefore, the correlation coefficient of the change features of the dissolution event cannot be obtained.
Table 4 shows Pearson correlation coefficients of 48-dimensional change features and state features in the five datasets for various events and the Pearson correlation coefficients of all events. In the same event, the Pearson correlation coefficient of change features is larger than that of state features in most of the time. In the same data set, the Pearson correlation coefficient of the change features or state features of all events is smaller than the Pearson correlation coefficient within the same event. This indicates that both the change feature and the state feature can characterize the event, but the effect of the proposed change feature is better than the state feature.
In the following, the community features of the comparison algorithm and the 48-dimensional community features are respectively marked as \(\mathrm {SF_{T_i}^{Alg}}\) and \(\mathrm {SF_{T_i}^{48}}\). Then, the differential features calculated respectively are expressed as \(\mathrm {CF_{T_i}^{Alg}}\) and \(\mathrm {CF_{T_i}^{48}}\). To further validate the effectiveness of the feature change patterns, we compared the results of the model using \(\mathrm {CF_{T_i}^{Alg}}\) and \(\mathrm {SF_{T_i}^{Alg}}\) respectively, and the results of the model using \(\mathrm {CF_{T_i}^{48}}\) and \(\mathrm {SF_{T_i}^{48}}\) respectively. Detailed parameter settings for the model are provided in the supplementary Appendix B.
In the GNAN algorithm, we averaged the node features of the same group to construct a state feature vector suitable for our method. Since the ECDR method was designed based on resistance distance, its calculation time increases as the size of the social network increases. Therefore, its application was confined to two comparatively smaller datasets: Facebook and Sx-askubuntu-c2q. Due to the ECDR algorithm tracing events without a prediction step, the community feature extraction method aligns with SATPM method.
The first set of comparisons illustrates the improvement that feature change patterns bring over traditional methods when utilizing each algorithm’s proposed features to describe community state. The comparison results are shown in Fig. 5. As shown in Figs. 5a, in the AS dataset, the prediction results based on feature change patterns are slightly superior to those relying solely on state features. This can be attributed to the significant imbalance in event categories. Notably, in the remaining four datasets with relatively balanced event categories, predictions relying on feature change patterns typically outperform those based on state features. Due to the flexible time frame selection allowed by the TNSEP method, its variance is calculated as the average of the prediction results from five-fold cross-validation across three different methods, leading to significant fluctuations.
In the second set of comparisons, all compared algorithms utilize 48-dimensional features to describe communities. The improvements of predictions based on feature change patterns over those based on state features are illustrated in Fig. 6. When applying the ECDR algorithm to the Facebook dataset, predictions based on state features perform better. This is attributed to the relatively fewer critical events tracked by the ECDR algorithm, resulting in shorter evolutionary sequences and inadequate captured community change information, thereby affecting pattern learning. In addition to the above, despite variations in the quality of evolution sequences tracked by different algorithms, predictions generally excel based on feature change patterns rather than state features. More details on tracked events by algorithms are in supplementary Appendix C.
We also compared the results of model using \(\mathrm {CF_{T_i}^{48}}\) and \(\mathrm {CF_{T_i}^{Alg}}\) respectively. The results are shown in Fig. 7. It is evident from the figure that utilizing 48-dimensional features facilitates more effective mining of feature change patterns, resulting in higher prediction accuracy.
In summary, our proposed prediction method, based on feature change patterns, effectively reveals the rules of evolution, playing an active role in predicting future evolutionary events. For detailed results of the three sets of comparative experiments in this section, refer to Appendix D.
The performance of proposed prediction method
The entire community evolution prediction algorithm presented in this paper comprises: using the SATPM method for timeframe division, utilizing the CPM algorithm for community detection, applying the GED algorithm for community tracking, and implementing the prediction method based on feature change patterns. Initially, the effectiveness of the proposed prediction method was validated, followed by the validation of the entire community evolution prediction algorithm’s effectiveness. Additionally, the efficacy of the parallel mechanism was confirmed.
Table 5 displays the predictive results with/without the prediction method based on feature change patterns (P). Fig. 8 demonstrate the improvement in prediction results before and after the comparison. Since the ECDR algorithm lacks a prediction step algorithm, the prediction algorithms before and after aligns with SATPM or the methodology proposed in this article.
In the Facebook dataset, the ECDR and MF-PSF methods showed poor predictive results, when using our proposed prediction method. This is attributed to the small number of communities discovered by the ECDR method, as well as the MF-PSF method dividing the dynamic network into a small number of time windows, leading to shorter evolutionary sequences with limited change information. Consequently, the change patterns were challenging to learn for the parallel LSTM model. However, as depicted in Fig. 8d, the proposed prediction method effectively improves accuracy in the Bródka and Dakiche algorithms, because of richer evolutionary sequence information. The timeframe division results of the algorithms are in Appendix C.
In the AS, AS-Caida, DBLP, and Sx-askubuntu-c2q datasets, the parallel LSTM model learned patterns and improved results on all four metrics compared to the comparison algorithm. As shown in Figs. 8a,b,c and e, the parallel LSTM model effectively captured the changing patterns and improved the prediction value of future evolutionary events.
In conclusion, our prediction method achieved high accuracy in most datasets, but some results showed lower accuracy. This was primarily due to insufficient information describing changes during evolution. Therefore, optimizing community features is essential to precisely predict future evolution. For prediction results of various evolutionary events, refer to Appendix E.
As shown in Table 5, using our proposed method in the final step achieves generally better results compared to using traditional machine learning algorithms such as Random Forest and Logistic Regression, as well as evolutionary event prediction algorithms based on probability or GNN.
Next, we investigated the effectiveness of the parallel mechanism. Given that the quantity of samples accepted is influenced by the method of time window division, we assessed the efficacy of the parallel mechanism under three timeframe partitioning methods: Disjoint, Overlapping and SATPM.
Table 6 presents the prediction results with and without parallelism. The disjoint timeframe partition method generally yields fewer critical events and evolutionary sequences. So, applying parallel mechanisms reduces predictive performance due to insufficient samples for each LSTM and category imbalance, which limits pattern learning capabilities. Conversely, the application of parallel mechanisms enhances performance when using overlapping and SATPM partition methods. Through parallel mechanisms, each model can learn the data pattern more accurately, thereby mitigating the difficulties in fitting due to an excess of samples. Furthermore, our voting mechanism allows the integration of results from the five models to yield a final prediction. As such, the parallel LSTM model assesses based on the number of samples and will temporarily suspend the parallel mechanism when sample count is fewer than 1000.
Complexity analysis
The number of parameters, also known as the capacity of the model, is a measure to evaluate the space complexity of the model. The computational complexity of each parameter is defined as \(O(1)\). During the training process of a single LSTM in the parallel LSTM model, the complexity of each time step can be measured by \(O(\mathrm {N_{LSTM}})\). Among the comparison algorithms, the GNAN method uses a deep learning model, and the symbols used in the calculation process of the model parameters are consistent with those in its paper37. The number of weights and bias parameters for both models can be calculated as Eq. (20) and Eq. (21)
The LSTM model has significantly more parameters than the GNAN model because the three gates introduce numerous weights and biases. This enables the parallel LSTM model to better extract correlations between the two sequences and learn change patterns, positively impacting future event prediction.
The time complexity of multi-process concurrency cannot be expressed by a formula, so runtime analysis was conducted. First, to validate the effectiveness of the parallel mechanism, we measured the time consumption of the parallel LSTM prediction model with and without parallel measures. Second, to verify the proposed model’s runtime advantage over other deep learning models, we conducted time statistics on prediction algorithms based on deep learning models.
Time complexity analysis of the model under parallel mechanism
Modify the final step of each comparison algorithm to the predictive method proposed in this paper, and compare the time consumption of the parallel LSTM prediction model with/without parallel mechanisms. The time consumed by the model is proportional to the number of evolutionary sequences traced. As show in Figs. 9b-9d, to make the histogram clearer, the ranges of the ordinates on both sides of the pink dotted line are different, but they are in seconds.
As shown in Figs. 9a-9e, the parallel mechanism reduced the training time. Since the test time was at the level of milliseconds and microseconds, the parallel mechanism also demonstrated an advantage in the total time consumption of the model. From this, we can conclude that the parallel mechanism effectively reduces time consumption, and the parallel LSTM model has certain advantages in terms of time efficiency.
Time complexity analysis of deep learning models
In general, deep learning models consume more time than traditional classifier models, because traditional classifier models do not require extensive training processes. Therefore, we analyze the time consumption results of the deep learning models applied in social network evolution problem. The prediction times of the parallel LSTM model, the comparison algorithm GNAN and the ordinary LSTM model are compared. The experimental settings are as follows: the first three steps of the prediction framework are consistent with the algorithm used by GNAN, while the fourth step uses respective prediction methods. The ordinary LSTM model is based on the relationship between community state characteristics and critical events. The structure and parameters of this model are consistent with the single LSTM model in the parallel LSTM model.
As seen from Fig. 10a, the parallel LSTM model achieved the shortest training time. This indicates that the parallel mechanism played a role in reducing time consumption. However, as shown in Fig. 10b, the parallel LSTM model did not have the shortest time in the test phase. This is because the voting mechanism adopted increased the testing time slightly. Nevertheless, the time consumption in the test phase was in milliseconds. Fig. 10c shows the total prediction time, where the parallel LSTM model had the least total time consumption across the five datasets. This demonstrates that the parallel LSTM model has a great advantage in time consumption compared with other deep learning models.
Conclusion and future work
In this study, we aimed to propose a predictive method that could focus on the changing patterns of features in evolutionary sequences. Experiments showed that the prediction algorithm based on feature change patterns effectively characterized the evolution laws, reducing misjudgment when predicting future critical events, and the parallel measures effectively reduced time consumption. The use of parallel measures not only improved prediction performance compared to other deep models but also reduced time consumption.
We described the changing characteristics of the evolutionary sequence based on the 48-dimensional features. Other information about the differences of communities in adjacent timeframes was not fully mined. In the future, we hope to develop a new perspective on the evolutionary sequence to extract changing information, helping us predict the direction of evolution.
Data availibility
You can access these datasets by following the links below.
AS dataset: http://snap.stanford.edu/data/as.html. AS-Caida dataset: http://www.caida.org/data/active/as-relationships/. DBLP dataset: http://arnetminer.org/citation. Facebook dataset: http: //socialnetworks.mpi-sws.org. Sx-askubuntu-c2q dataset: http://snap.stanford.edu/data/sx-askubuntu.html.
References
Backstrom, L., Huttenlocher, D., Kleinberg, J. & Lan, X. Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 44–54 (2006).
Shirzadian, P., Antony, B., Gattani, A. G., Tasnina, N. & Heath, L. S. A time evolving online social network generation algorithm. Scientific Reports 13, 2395 (2023).
Kong, Y.-X., Wu, R.-J., Zhang, Y.-C. & Shi, G.-Y. Utilizing statistical physics and machine learning to discover collective behavior on temporal social networks. Information Processing & Management 60, 103190 (2023).
Jeong, D.-H. et al. An empirical study on social network analysis for small residential communities in gangwon state, south korea. Scientific Reports 14, 11648 (2024).
Yu, S., Liu, M., Dou, W., Liu, X. & Zhou, S. Networking for big data: A survey. IEEE Communications Surveys & Tutorials 19, 531–549 (2016).
Dakiche, N., Tayeb, F.B.-S., Slimani, Y. & Benatchba, K. Tracking community evolution in social networks: A survey. Information Processing & Management 56, 1084–1102 (2019).
Singh, S. S. et al. Social network analysis: A survey on process, tools, and application. ACM Computing Surveys 56, 1–39 (2024).
Singh, S. S. et al. Social network analysis: a survey on measure, structure, language information analysis, privacy, and applications. ACM Transactions on Asian and Low-Resource Language Information Processing 22, 1–47 (2023).
Bródka, P., Saganowski, S. & Kazienko, P. Ged: the method for group evolution discovery in social networks. Social Network Analysis and Mining 3, 1–14 (2013).
Troncoso, F. & Weber, R. A novel approach to detect associations in criminal networks. Decision Support Systems 128, 113159 (2020).
Luke, D. A. & Harris, J. K. Network analysis in public health: history, methods, and applications. Annu. Rev. Public Health 28, 69–93 (2007).
Luo, Y., Ma, J. & Yeo, C. K. Bcmm: A novel post-based augmentation representation for early rumour detection on social media. Pattern Recognition 113, 107818 (2021).
Khan, Z., Khan, Z., Lee, B.-G., Kim, H. K. & Jeon, M. Graph neural networks based framework to analyze social media platforms for malicious user detection. Applied Soft Computing 155, 111416 (2024).
Dakiche, N., Tayeb, F. B.-S., Slimani, Y. & Benatchba, K. Sensitive analysis of timeframe type and size impact on community evolution prediction. In 2018 IEEE international conference on fuzzy systems (FUZZ-IEEE), 1–8 (IEEE, 2018).
Caceres, R. S., Fish, B. & States, M. L. L. L. U. A supervised approach to windowing detection on dynamic networks. Tech. Rep., Technical report, MIT Lincoln Laboratory Lexington United States (2017).
Fani, H. et al. User community detection via embedding of social network structure and temporal content. Information Processing & Management 57, 102056 (2020).
Ramesh, A. & Srivatsun, G. Evolutionary algorithm for overlapping community detection using a merged maximal cliques representation scheme. Applied Soft Computing 112, 107746 (2021).
Greene, D., Doyle, D. & Cunningham, P. Tracking the evolution of communities in dynamic social networks. In 2010 international conference on advances in social networks analysis and mining, 176–183 (IEEE, 2010).
Tajeuna, E. G., Bouguessa, M. & Wang, S. Tracking the evolution of community structures in time-evolving social networks. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 1–10 (IEEE, 2015).
Hammou, B. A., Lahcen, A. A. & Mouline, S. Towards a real-time processing framework based on improved distributed recurrent neural network variants with fasttext for social big data analytics. Information Processing & Management 57, 102122 (2020).
Li, X., Du, Y. & Wang, Y. Time-enhanced neighbor-aware network on irregular time series for sentiment prediction in social networks. Information Processing & Management 60, 103500 (2023).
Dakiche, N., Benbouzid-Si Tayeb, F., Benatchba, K. & Slimani, Y. Tailored network splitting for community evolution prediction in dynamic social networks. New Generation Computing 39, 303–340 (2021).
Li, W. et al. Multi-objective optimization algorithm based on characteristics fusion of dynamic social networks for community discovery. Information Fusion 79, 110–123 (2022).
Berahmand, K., Mohammadi, M., Sheikhpour, R., Li, Y. & Xu, Y. Wsnmf: Weighted symmetric nonnegative matrix factorization for attributed graph clustering. Neurocomputing 566, 127041 (2024).
Berahmand, K., Bahadori, S., Abadeh, M. N., Li, Y. & Xu, Y. Sdac-da: Semi-supervised deep attributed clustering using dual autoencoder. IEEE Transactions on Knowledge and Data Engineering (2024).
Bródka, P., Kazienko, P. & Kołoszczyk, B. Predicting group evolution in the social network. In Social Informatics: 4th International Conference, SocInfo 2012, Lausanne, Switzerland, December 5-7, 2012. Proceedings 4, 54–67 (Springer, 2012).
İlhan, N. & Öğüdücü, Ş. G. Predicting community evolution based on time series modeling. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 1509–1516 (2015).
Li, W. et al. Evolutionary community discovery in dynamic social networks via resistance distance. Expert Systems with Applications 171, 114536 (2021).
Gliwa, B. et al. Different approaches to community evolution prediction in blogosphere. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 1291–1298 (2013).
Saganowski, S. et al. Predicting community evolution in social networks. Entropy 17, 3053–3096 (2015).
Pavlopoulou, M. E. G., Tzortzis, G., Vogiatzis, D. & Paliouras, G. Predicting the evolution of communities in social networks using structural and temporal features. In 2017 12th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), 40–45 (IEEE, 2017).
Dakiche, N., Tayeb, F. B.-S., Slimani, Y. & Benatchba, K. Community evolution prediction in dynamic social networks using community features’ change rates. In Proceedings of the 34th ACM/SIGAPP symposium on applied computing, 2078–2085 (2019).
Tajeuna, E. G., Bouguessa, M. & Wang, S. Modeling and predicting community structure changes in time-evolving social networks. IEEE Transactions on Knowledge and Data Engineering 31, 1166–1180 (2018).
Mohammadmosaferi, K. K. & Naderi, H. Afif: Automatically finding important features in community evolution prediction for dynamic social networks. Computer Communications 176, 66–80 (2021).
Ding, J. et al. Community evolution prediction based on a self-adaptive timeframe in social networks. Knowledge-Based Systems 275, 110687 (2023).
Wang, Z., Xu, Q. & Li, W. Multi-layer feature fusion-based community evolution prediction. Future Internet 14, 113 (2022).
Revelle, M., Domeniconi, C. & Gelman, B. Group-node attention for community evolution prediction. In Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 176–183 (2021).
Chen, J. et al. Community evolution prediction based on multivariate feature sets and potential structural features. Mathematics 10, 3802 (2022).
Sheikhpour, R., Berahmand, K., Mohammadi, M. & Khosravi, H. Sparse feature selection using hypergraph laplacian-based semi-supervised discriminant analysis. Pattern Recognition 157, 110882 (2025).
Saganowski, S., Bródka, P., Koziarski, M. & Kazienko, P. Analysis of group evolution prediction in complex networks. PloS one 14, e0224194 (2019).
Dakiche, N., Benatchba, K., Tayeb, F. B.-S. & Slimani, Y. Impact of tailored network splitting and community features’ change rates on prediction accuracy in dynamic social networks. In WEBIST, 95–102 (2021).
Rajita, B., Ranjan, Y., Umesh, C. T. & Panda, S. Spark-based parallel method for prediction of events. Arabian Journal for Science and Engineering 45, 3437–3453 (2020).
Rajita, B., Halani, V., Shah, D. & Panda, S. Gan-c: A generative adversarial network with a classifier for effective event prediction. Computational Intelligence 38, 1922–1955 (2022).
Leskovec, J., Kleinberg, J. & Faloutsos, C. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, 177–187 (2005).
Tang, J. et al. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 990–998 (2008).
Viswanath, B., Mislove, A., Cha, M. & Gummadi, K. P. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM workshop on Online social networks, 37–42 (2009).
Paranjape, A., Benson, A. R. & Leskovec, J. Motifs in temporal networks. In Proceedings of the tenth ACM international conference on web search and data mining, 601–610 (2017).
Acknowledgements
The work was supported in part by the Fundamental Research Funds for the Central Universities under Grant ZYTS23058, in part by the National Natural Science Foundation of China under Grant 61902294, in part by the China Postdoctoral Science Foundation Funded Project under Grant 2018M643586. Thank you to the High Performance Computing Center of Xidian University for providing the resource support.
Author information
Authors and Affiliations
Contributions
J.D.: Conceptualization, methodology, resources, and drafting the manuscript. G.S.: Software, validation, investigation, and drafting the manuscript. T.W.: Software, validation, investigation, and drafting the manuscript. L.J.: Reviewing, editing, and supervision. J.D.: Reviewing and editing. J.W.: Reviewing and editing. H.W.: Visualization. R.C.: Visualization.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ding, J., Sun, G., Wang, T. et al. Community evolution prediction based on feature change patterns in social networks. Sci Rep 15, 14608 (2025). https://doi.org/10.1038/s41598-025-91766-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-91766-7