Introduction

Badminton, celebrated for its fast-paced and competitive nature, enjoys global popularity1. However, the complexity of factors influencing match outcomes presents a significant challenge for researchers, coaches, and athletes2. Existing studies have explored these determinants from various perspectives, such as athletes’ physical conditions, technical skills, psychological states, and physiological tests3,4,5,6. Despite these efforts, accurately predicting match outcomes remains a daunting task due to inherent uncertainties in gameplay dynamics7.

Recent research has highlighted the crucial role of mutual counteraction between distinct tactical styles in determining match outcomes8. Players employing specific tactical styles may inherently possess advantages or disadvantages against opponents with contrasting styles5. Understanding this dynamic is pivotal for devising effective strategies. The frequency of technical actions has emerged as a critical indicator of tactical approaches, shedding light on the complexities of gaining a strategic edge in badminton9.

Significant advancements in machine learning have further expanded the scope of badminton technical and tactical analysis. Previous studies have explored challenges such as motion recognition, tactical pattern mining, and match outcome prediction10. For example, research on An Se-young successfully established a scoring and losing prediction model using an improved data classification method, achieving a prediction accuracy of 87.5% with support vector machines11. Another study proposed a deep reinforcement learning-based approach that evaluates tactical behaviors using Q-functions, offering more detailed insights into players’ actions12. Furthermore, the development of a 3D visualization system, TIVEE, has provided a multi-level exploration tool for analyzing spatial dynamics of tactics13, while drone-based datasets have facilitated team coordination analysis in doubles matches through control area probability distributions14. These advancements underscore the potential of machine learning techniques in uncovering valuable insights into technical and tactical aspects of badminton, yet further exploration is needed to enhance their predictive precision and practical application.

Building on these findings, our study delves deeper into the potential of machine learning for match outcome prediction, posing the core question: can a high-precision prediction model be constructed using technical action frequency features? To address this question, we employ the Random Forest algorithm to analyze the differences in technical action frequencies between servers and receivers, quantifying tactical style disparities and exploring tactical patterns under different playing styles. Unlike our previous studies5, this research leverages a larger dataset and uses technical action frequencies as the core observational unit, thereby enhancing the depth and reliability of tactical analysis. Additionally, we developed a web application to visualize the research findings, aiding in training plan formulation and strategy optimization, and further improving the practical applicability of the results.

Methods

Data collection

In this study, we focus on the top ten male and female singles players as ranked by the Badminton World Federation (BWF), selecting videos of their matches from Super750-level and higher competitions between 2019 and 2023 as observational data. A total of 303 matches were analyzed, including 153 men’s singles matches encompassing 358 games, and 150 women’s singles matches encompassing 344 games. The selected matches cover major international badminton tournaments, such as the 2020 Tokyo Olympics, the World Championships, the All England Open, the Sudirman Cup, the Thomas Cup, and the Uber Cup. The match videos were primarily sourced from national team archives, Bilibili, CCTV broadcasts, and he official website of the Badminton World Federation (BWF).

The study employed video observation methods to statistically analyze technical actions and match outcomes5,15. Four badminton players holding a second-level athlete classification from the research group, each with over six months of experience, were selected as data collectors. These collectors uploaded the recorded videos to the “Match Analysis” application, a software for analyzing badminton skills and tactics developed in collaboration with Shanghai Danzhu Sports Technology and Culture Co., Ltd. Subsequently, the four members independently observed and recorded statistics from the videos. To ensure the accuracy and reliability of the data, both inter-rater and intra-rater consistency tests were conducted. The data collected during the initial session by the four individuals were evaluated using the Intraclass Correlation Coefficient (ICC), with an ICC value of 0.94 (p < 0.01), indicating high consistency among the raters. For the intra-rater reliability test, Cohen’s Kappa coefficient was utilized, which is commonly used for assessing agreement between two raters for categorical variables16. The Kappa coefficient was 0.96 (p < 0.01), demonstrating a high degree of consistency in the data17.

Preprocessing

After collecting the data, we first conducted a statistical analysis of the types of technical actions in all matches, identifying a total of 23 initial features (High, Smash, Dribble, Push, Slice/Drop, Lift, Block Smash, Net Front, Clear, Net Drop from Lift, Lift from Slice, Pull, Hook, Slice Lift, Block, Lift Smash, Block Hook, Hook from Lift, Drive, Net Drop from Slice Drive, Lift from Slice Drive, Flat High, and Block from Slice Drive). Then, we calculated the frequency of technical actions for both the serving side and the receiving side in each game to construct vectors. A feature vector was formed by subtracting the vector of the receiving side from that of the serving side. Furthermore, we defined the labels based on the outcome of each game: if the serving side won the game, the label was set to 1; if the receiving side won, the label was set to 0.

Model description

We implemented a random forest classifier using the scikit-learn package in Python 3.97 programming language (https://www.python.org/). To generate the training (70%) and test (30%) datasets, we split the dataset using the train_test_split function, which randomly partitions a dataset into training and test subsets with test_size = 0.3 parameter. We applied this function to each sex individually. To identify the key technical actions critical for predicting match outcomes, we first constructed a random forest classifier using the default parameters based on the entire training dataset18. We extracted the feature importance parameters from this classifier and sorted the features according to their importance. Subsequently, the features were incrementally included in the default random forest classifier based on their ranked importance. This process was validated using a 5-fold cross-validation method19 to determine the optimal number of technical actions for describing the model features. To build a random forest classifier with the best hyperparameters, we implemented the exhaustive grid search approach using the GridSearchCV function to the training dataset with five-fold cross-validation. A total of 10,000 random forest classifier models were evaluated with different combinations of hyperparameters: max_features="auto”; n_estimators ranging from 100 to 1,000 with an interval of 100; max_depth ranging from 2 to 20 with an interval of 2; min_samples_leaf ranging from 2 to 20 with an interval of 2; and min_samples_split ranging from 2 to 20 with an interval of 2. As a result, we selected the male model with n_estimators = 600, max_depth = 8, min_samples_leaf = 2 and min_samples_split = 2 hyperparameters, which showed the highest average AUC at 0.9656. Also, the female model was built with n_estimators = 1000, max_depth = 12, min_samples_leaf = 2 and min_samples_split = 8 hyperparameters, which showed the highest average AUC at 0.8950. After identifying the optimal hyperparameters, the SHAP (Shapley Additive Explanations) values are used to evaluate the contribution and direction of features20,21. Finally, the indices of sensitivity, specificity, accuracy, positive predictive value (PPV) and negative predictive value (NPV) were utilized to evaluate the performance of finalized models22,23.

Website construction and availability

In the development of our interactive web platform, we employed the Shiny framework from the R programming environment24,25. Shiny enables the creation of interactive web applications, allowing for the integration of analytical computations with visualization capabilities15,26. Designed for intuitive use, the platform facilitates player comparisons and predictive analytics in badminton, with Shiny modules enabling data processing and visualization. The statistical models and analytical methods are grounded in R, ensuring data analysis rigor and result reliability. This platform demonstrates the effective use of R and Shiny in sports analytics.

Website availability: https://dachengzi.shinyapps.io/prediction_model_on_babminton/.

Results

Identification of critical technical movements contributing to success in competitions

In this study, we collected data from international badminton competitions from 2019 to 2023, aiming to statistically analyze the frequency of each technical action in each rally of the competitions. The considered technical actions include High, Smash, Dribble, etc., totaling 23, see Methods for details. The frequency of these technical actions was used as feature vectors to analyze the winning patterns of both the serving and receiving sides, further aiming to build a model capable of predicting competition outcomes.

The data, divided by sex, years, and competition outcome, was split into a training set and a test set with a ratio of 7:3. Specifically, the training set for men’s competitions contained 250 samples, while the test set contained 108 samples; the women’s competitions training set included 240 samples, and the test set contained 104 samples. This data division method ensured sufficient model training and reliable test results (Fig. 1).

Fig. 1
figure 1

Overview of study.

In this study, firstly Random Forest algorithm was used on the training set to build the model, evaluating the importance of each technical action feature. Through these evaluation results, we ranked the features by importance and used forward stepwise selection combined with 5-fold cross-validation to filter out key features that significantly impact the model’s performance. Using the AUC (Area Under the Curve) as the performance evaluation metric, it was found that for male athletes’ data, the model’s performance reached its optimum upon introducing the most critical five features (Net Front, Slice/Drop, Flat High, High, Push), with an AUC value of 0.9726. In contrast, for female athletes’ data, the model’s AUC value reached 0.8730 after introducing the top 22 most important features (Net Front, Slice/Drop, Flat High, High, Push, Smash, Block Smash, Drive, Block, Hook, Lift Smash, Lift, Dribble, Net Drop from Lift, Pull, Block Hook, Hook from Lift, Slice Lift, Lift from Slice, Lift from Slice Drive, Clear, Net Drop from Slice Drive) (Fig. 2A, B). This result reflects the tactical styles of female badminton players in serving and receiving are more complex compared to their male counterparts to a certain extent. Finally, we built the final model after determining the optimal hyperparameters based on 5-fold cross-validation.

Fig. 2
figure 2

feature selection and model evaluation. (A,B) Effect of Incremental Feature Addition on male/female model AUC, respectively. (C,D) SHAP values of top 5 important features on male/female model, respectively. (E) Confusion matrix for models. (F) Model evaluation performance illustrated by sensitivity, specificity, accuracy, PPV and NPV.

Quantitative analysis and model performance validation of the impact of key technical actions on badminton competition outcomes

After the model was constructed, to further understand the specific impact of each feature on the outcomes of competitions, we calculated the SHAP values for each feature. The analysis showed that among the top five key features evaluated, the technical actions of ‘Net Front’, ‘Slice/Drop’, and ‘Push’ had a significant impact on the outcomes of both men’s and women’s competitions, presenting a consistent pattern. Particularly for ‘Net Front’, its SHAP values increased with the value, indicating that a higher frequency of this technical action by the serving side compared to the receiving side is more likely to lead to victory in the competition. Conversely, the situations for ‘Slice/Drop’ and ‘Push’ were opposite (Fig. 2C,D).

When evaluating the model’s performance based on the preserved test set, the results were obtained as follows: For males, the metrics were: Sensitivity 0.93, Specificity 0.80, Accuracy 0.87, PPV 0.84, and NPV 0.91. For females, the values were: Sensitivity 0.92, Specificity 0.68, Accuracy 0.83, PPV 0.82, and NPV 0.85. These results indicate that the model captures the technical action features of male players more accurately compared to females, consistent with the findings from the feature selection phase, and both significantly exceeded the baseline of 0.5 (Fig. 2E,F).

Deciphering victory factors: an interactive web application for analyzing badminton matches

To make the study’s findings more accessible, we developed an interactive web application. Users can input the top 5 contributing feature frequencies identified by our SHAP value analysis for the Server and the Receiver separately to explore their contributions in the match-up. Taking the first game of the 2022 World Championships men’s singles final for example, with LEE Z.J. serving against SHI Y.Q. receiving. As shown in Fig. 3, for LEE Z.J., the four out of five actions negatively impacted his performance. Only the Push action had a slight positive effect. This analysis aligns with SHI Y.Q.‘s victory in the game. Furthermore, it was observed that the more impactful the Net Front action is on the outcome, the more crucial it is for LEE Z.J., as the server, to increase its usage frequency among all technical actions to secure victory in the match.

Fig. 3
figure 3

Interactive Web Application Display: Comparative Analysis of Technical Action Frequencies - Lee Zii Jia vs. Shi Yuqi.

Discussion

This study initiates from the perspective of the serving and receiving players’ technical actions in each game, elucidating the respective key technical action frequency factors of both parties in the competition process and constructing a predictive model based on these insights for subsequent researchers and coaches to study and reference. By meticulously analyzing the frequency of 23 distinct technical actions in international badminton competitions and employing a sophisticated Random Forest algorithm, we have not only identified the pivotal technical actions that significantly impact match outcomes but have also quantified their influence through the calculation of SHAP values. This innovative approach has allowed us to reveal that technical actions such as ‘Net Front’, ‘Slice/Drop’, and ‘Push’ play a crucial role in determining the victory, offering a nuanced understanding that transcends traditional analyses. The competition prediction models were also constructed measured by Sensitivity, Specificity, Accuracy, PPV and NPV, which could be directed applied by future related researchers and athletes. In this study, through precise data analysis and the construction of machine learning models, we unveiled the significant impact of three technical actions—‘Net Front’, ‘Slice/Drop’, and ‘Push’—on predicting the outcomes of badminton matches. These actions demonstrated similar importance across models analyzing both male and female players’ data, ranking them among the top five key factors influencing match results. This finding not only deepens our understanding of the tactical nuances in badminton but also provides invaluable guidance for coaches and athletes in formulating targeted training and competition strategies. Furthermore, our results align closely with findings in the existing literature, which commonly acknowledges the critical role of ‘Net Front’ techniques for short net play, ‘Slice/Drop’ for executing drop shots, and ‘Push’ for driving shots in winning badminton matches27,28. Specifically, ‘Net Front’ plays a decisive role in controlling the pace of the game and forcing opponents to make mistakes, while ‘Slice/Drop’ and ‘Push’ are primarily utilized to disrupt the opponent’s defensive setup and create offensive opportunities. This concurrence further validates the effectiveness of our study’s outcomes, demonstrating the potential of data-driven approaches in revealing the impact of core technical actions in high-level badminton competitions.

In this study, models constructed for both male and female players demonstrated good performance, highlighting the importance of technical action frequency in capturing the tactical styles of badminton athletes. Notably, the model based on male player data exhibited superior performance across various metrics, suggesting fundamental differences in tactical styles between male and female players in competitions9. Male matches tend to emphasize power and speed4, resulting in a higher importance and discriminability of key technical action frequencies in the model; whereas female matches are more distinguished by technical finesse and complex tactical setups29.This indicates that capturing the tactical styles in female matches requires considering a broader range of factors to improve prediction accuracy, including aspects beyond technical actions like match tempo, tactical adjustments between players, and coping with psychological pressure during matches. Additionally, discovering these gender differences also provides guidance for coaches and athletes in developing targeted training and competition strategies, emphasizing the importance of considering gender characteristics in planning training content and competition tactics30. In summary, this study not only confirms the value of technical action frequency in predicting the outcomes of badminton matches but also reveals the impact of gender differences on tactical styles, offering guidance and insights for future sports performance research and practical application.

The web interface developed in this study offers an innovative approach to support tactical adjustments in badminton matches by inputting characteristics of players and their opponents, in conjunction with SHAP value analysis. The application of SHAP values quantifies the influence of various features on match outcomes, guiding players to identify personal strengths and weaknesses relative to their opponents. Furthermore, integrating SHAP plots provides a visual reference for strategic adjustments. This not only enhances the scientific rigor and specificity of match preparation but also highlights the practical application value of machine learning technology in sports competitive decision-support systems. It offers a data analysis-based tactical strategy analysis tool for badminton and a wider range of sports projects, paving new research directions and practical cases in the field of sports data analysis for tactical planning.

There are some limitations in our research. Notably, the dataset is primarily constrained to matches of the Super750 level or higher from 2019 to 2023, ensuring high-quality data but potentially affecting the model’s generalization ability across a broad range of badminton competitions. Moreover, the model focuses on analyzing singles matches, overlooking the complexity and tactical diversity of doubles matches, as well as not adequately considering the impact of individual athlete differences on match outcomes. To address these limitations, future research should expand the data collection scope, explore advanced machine learning algorithms such as deep learning31, and more finely analyze athlete characteristics and the competition environment to enhance the model’s predictive accuracy and applicability. Such improvements not only have the potential to overcome current limitations but will also bring deeper insights into the field of sports data analysis, furthering the scientific training and strategic planning of badminton and other sports disciplines.

Conclusion

This study successfully developed well-performing badminton match outcome prediction models based on the interplay and counteraction among different tactical styles, and through a user-oriented web application, achieved visualization and interaction of the model, highlighting the significant impact of key technical actions—Net Front, Slice/Drop, and Push—on match outcomes and providing a scientific basis for badminton training and match strategies.