Abstract
Existing computer vision-based surface defect detection techniques for metal materials typically encounter issues with defect overlap, significant differences within classes, and similarity between defect samples. These issues compromise feature extraction accuracy and result in missed and false detections. This study proposed a feature optimization-guided high-precision and real-time metal surface defect detection network (FOHR Net) to improve defect feature expressiveness. Firstly, the network presents a multi-layer feature alignment module that enhances the feature information relevant to the target defect by fusing shallow and deep features using a multi-layer feature alignment approach. Secondly, the slice features are reorganized using a dual-branch feature recombination module, and the channel-level soft attention is applied to produce the channel-optimized feature map. The dual-branch transformation stage’s output features are adaptively merged, which may effectively lower feature information loss, improve feature expressiveness, and allow the model to collect useful feature information. Finally, we carried out thorough tests on the NEU-DET, GC10-DET, and APDDD datasets. Our results show that our average mean average precision is superior to other widely used defect detection techniques, with 78.3%, 70.5%, and 65.9%, respectively. Furthermore, we further illustrated the efficacy of our approach using several ablation trials and visualization detection outcomes.
Similar content being viewed by others
Introduction
Metal materials are widely used in industrial production to manufacture steel structures, bridges, ships, and vehicles. However, due to factors such as imperfect manufacturing processes and production environments, metal surface defects are inevitable during the production process. With the development of deep learning technology1,2,3,4, effective metal surface defect detection methods are increasingly favored by the industry5,6,7,8,9,10,11,12,13. However, it still faces the following challenges in the actual production process: (1) Different types of defects are similar in grayscale value and texture, resulting in reduced characterization and discrimination of different types of defects, as shown in Fig. 1a,b; (2) The scales and shapes of defects of the same type vary greatly and may cause defect overlap, resulting in insufficient characterization of the defect area and even information loss, as shown in Fig. 1c,f; (3) The clarity of the defect image is low, resulting in blurred defect contours, further increasing the difficulty of defect detection, as shown in Fig. 1d,e. Traditional defect detection methods rely on artificially designed features, which are time-consuming and labor-intensive processes and have weak generalization capabilities. In contrast, network models based on deep learning can largely overcome the shortcomings of traditional methods and make significant progress in metal surface defect detection. For example: He et al.14 proposed an end-to-end detection method by fusing multi-level features, which achieved good performance in terms of accuracy and speed. Hao et al.15 proposed a network model based on deep learning, which enriches feature information by adding a spatial pyramid module for multi-scale detection, and further improves the accuracy of detection by adding an additional output layer for detection. Yu et al.16 added an attention mechanism to focus on the features of effective areas during detection and suppress the influence of invalid areas. At the same time, dilation convolution can be used to increase the receptive field of the model without increasing the computational load of the model. Further improve the effect of hot-rolled steel surface defect detection. Compared with traditional manual features, the convolutional neural network model based on deep learning17,18 does not require repeated and tedious design, and significantly improves the accuracy of model defect detection through powerful feature learning capabilities.
Although the defect detection method based on deep learning has excellent performance, there are still some problems that limit the effectiveness of the model in complex defect detection tasks. On the one hand, when the shallow features and deep features are fused, there is indirect information transmission between the feature layers extracted by model backbone network.
Challenges of metal surface defect detection: (a) and (b) similar defects between classes; (c) and (f) overlapping defects with large differences within classes; (d) and (e) low-resolution defects; the red box is the metal surface defect, and the orange box is the defect magnified for visualization.
The deep features cannot directly obtain the shallow feature information, but need to indirectly obtain these features through layer-by-layer transmission. This kind of information transmission between layers will increase the complexity and time cost of information transmission, weaken the integrity and accuracy of feature information, and directly affect the detection effect of the model when dealing with complex defect detection tasks. On the other hand, during the convolution process of the image in CNN, due to the local perception and downsampling of the convolution operation, the feature information of some defects will be lost, resulting in insufficient utilization of effective features19,20, which in turn affects the accuracy of the extracted features and reduces the detection accuracy.
This paper offers a feature optimization-guided high-precision and real-time metal surface defect detection network (FOHR Net) to tackle the aforementioned issues. In particular, the multi-layer feature alignment module (MFAM) and the dual-branch feature reorganization module (DFRM) are proposed. MFAM uses feature alignment to comprehensively extract the contextual information of defect features in multi-level features from feature maps of different scales, and combines the channel attention mechanism and residual aggregation to optimize the complementary information between feature levels and effectively enrich feature information; DFRM selectively enhances features, reorganizes feature slices, optimizes feature maps with soft attention, and adaptively merges the output features of the dual-branch transformation stage, effectively compensating for the loss of detail feature information, enabling the model to obtain effective feature information and obtain more accurate detection results. In summary, the main contributions of this paper are summarized as follows:
-
This paper proposes a feature optimization-guided high-precision real-time metal surface defect detection network FOHR Net, which enhances the representation ability of defect features and improves the performance of the model in complex metal surface defect detection tasks.
-
The MFAM is proposed, which extracts the context information of defect features in multi-level features by optimizing the fusion method between feature levels, and enriches defect features by combining channel attention mechanism and residual aggregation.
-
The DFRM is introduced to enhance feature selectivity. By combining the feature slice reorganization method with soft attention to optimize the feature map, the output features of the dual-branch transformation stage are adaptively merged to effectively compensate for the loss of feature information.
-
FOHR Net performs well on the NEU-DET, GC10-DET and APDDD datasets, which fully proves the effectiveness of the proposed method in dealing with complex defect detection tasks
These parts make up the rest of this paper. The related research on metal surface defect detection based on deep learning, feature fusion strategy, and dual-branch network structure is presented in the “Related work” section. A detailed description of the multi-layer feature alignment module and dual-branch feature reorganization module is in the section “Methodology”. The “Experiments and analysis” section contains the experimental results and visual analysis of outcomes. The approach proposed in this study is evaluated in the “Conclusion” section. Finally, the “Discussion” section discusses the limitations of the work and future research directions.
Related work
Metal surface defect detection based on deep learning
Metal surface defect detection based on deep learning has been widely used in industrial production, providing a new benchmark and idea for faster and more accurate defect detection. It greatly improves detection efficiency and reduces the potential economic losses caused by defects. Typical examples of one-stage detection algorithms include the YOLO series21,22,23,24,25,26,27,28, SSD29, and RetinaNet30. Two-stage detection algorithms such as SPPNet31 and Faster R-CNN32 are also widely adopted. Many studies have continuously improved the existing classic models to improve defect detection performance. To alleviate the problems of insufficient feature extraction and insufficient feature fusion in steel strip surface defect recognition, Zhang et al.33 used a multi-scale feature fusion strategy to improve feature expression capabilities and designed a residual module to enhance the extraction of fine-grained features, improve the model’s perception of details and global features, and improve the defect detection effect. However, the real-time performance is insufficient. Wu et al.34 improved YOLOX by using a multi-scale feature fusion attention module and a lightweight feature extraction network to improve the detection performance of the model. Cheng et al.11 combined differential channel attention and adaptive spatial feature fusion to improve RetinaNet, improving defect ___location and precision. Yet, the performance in real time doesn’t work at the highest level and may be considerably enhanced. Ren et al.35 improved the Faster R-CNN model to detect surface defects of strip steel. A depth-separable convolution replaced the convolution layer used for feature extraction, and the center loss was added to the original loss function, increasing the network operation speed and improving the ability to distinguish different defects. Although it can basically meet the real-time performance requirements for detection, its real-time capability remains suboptimal. It demonstrates excellent accuracy, however, its real-time performance is relatively poor, highlighting a significant limitation. Zhong et al.36 propose STFE-Net to capture better texture features by stacking pooling attention residual modules, embed shallow feature weighted branches to achieve cross-layer shallow feature enhancement and add feature layers to extract rich semantic features and combine ECIoU loss to optimize model performance. Although the one-stage model is more suitable for scenarios that require high-speed and real-time processing, and the two-stage model is more suitable for situations with higher requirements for detection accuracy, further research is still necessary to meet the requirements of metal surface defect detection accuracy and detection speed.
Feature fusion strategy
Feature pyramid is a classic method for effectively utilizing multi-scale features in object detection. In previous studies, many object detection models have fused image features extracted by the backbone network and used features of different scales to detect objects of different sizes in the image, improving the model’s performance. FPN37 designed a top-down path to fuse features of different scales so that shallow features can obtain more semantic information while deep features have richer scale information. PANet38 introduced a top-down and bottom-up path to aggregate multi-scale features to comprehensively consider the advantages of low-level features in obtaining more semantic information and deep features in having richer scale information. Two-way FPN39 proposed a simultaneous top-down and bottom-up feature fusion method. EfficientDet40 proposed a repeatable BiFPN for multiple iterations of feature fusion to further improve the detection performance. NAS-FPN41 and Auto-FPN42 use optimization methods such as reinforcement learning and evolutionary algorithms to find the best feature pyramid network structure to discover and design a better feature pyramid network automatically. To better fuse defect features, many models designed for detecting metal surface defects have improved the feature pyramid network part. Chen et al.43 proposed AP-Net to enhance the utilization of multi-layer feature information and improve the detection accuracy of the model by introducing a lightweight, adaptive attention module and an improved feature pyramid module. The model has a large parameter scale and a high demand for computing resources. To alleviate the interference of the environment in industrial production, Wang et al.44 proposed the YOLO-MSAPF strategy of merging multi-scale feature fusion and parallel feature screening to further enrich features and improve detection accuracy. The model has large parameter scale and poor real-time performance. MSC-DNet45 uses a parallel architecture detection network with dilated convolutions with different dilation rates to capture multi-scale defect context information and uses feature enhancement and selection modules to enhance single-scale features and select multi-scale features to reduce confusing details. The model achieves high accuracy, but its real-time performance is suboptimal. These methods have achieved significant performance improvements in the field of defect detection and provide a reference for the design of model feature fusion architecture.
Dual-branch network structure
Since there are usually fewer samples and less difference between different defects in defect detection tasks, defect detection methods based on deep learning still face challenges. To meet the challenges, researchers have proposed various improvement methods, among which the dual-branch feature recombination method has attracted much attention. It enhances the representation ability of the model by integrating features extracted by multiple branch networks. Zuo et al.46 proposed dual-branch feature fusion to solve the challenge of low defect contrast and used a lightweight attention mechanism to fuse multi-scale features to highlight defect areas and suppress irrelevant areas with good defect detection capabilities. Liu et al.47 proposed CGTD-Net using a dual-branch network to pay more attention to spatial information and channel information to assist the backbone network in extracting deep features, enhancing the edge detection ability of strip surface defects and improving the detection ability of small objects. An et al.48 proposed STDPNet to solve the difference in defect areas, image blur and complex background interference, using dual paths as the target detection network architecture and improving the model performance by introducing a transfer learning strategy between paths.
Methodology
This section first introduces the overall architecture of the feature optimization-guided high-precision real-time metal surface defect detection network (FOHR Net), and describes the multi-layer feature alignment module and the dual-branch feature recombination module in detail.
FOHR Net network overview
The structure of FOHR Net proposed is shown in Fig. 2. In order to ensure the performance of the model, CSPDarkNet-53 is used as the backbone network for feature extraction. By introducing a multi-layer feature alignment module, the features extracted by multiple branch networks can produce richer and more discriminative defect representations. Then, PANet further combines the deep features with the shallow features. Finally, the features are sliced, reorganized, and fused through a dual-branch feature reorganization module to further improve the expressiveness of the features.
Multi-layer feature alignment module
The network adopts a traditional pyramid structure containing multiple branches for multi-scale feature fusion. However, it can only completely fuse the features of adjacent layers and can only indirectly obtain information from other layers. Information loss may occur during transmission, and the effectiveness of feature fusion may be limited. To alleviate this problem, this paper constructs a novel multi-layer feature alignment module to fuse multi-scale feature information, enhance the feature fusion capability of the neck, and make full use of the features extracted by the backbone network. As shown in Fig. 3, the feature layers \(P_{2}\), \(P_{3}\), \(P_{4}\), and \(P_{5}\) extracted by the backbone network, where \(P_{i} \in \mathbb {R}^{N \times C_{P_{i}}\times R_{P_{i}}}\), the batch sizes is N, the channels is C, and the dimensions by \(R=H\times W\). Moreover, the dimensions of \(P_{2}\), \(P_{3}\), \(P_{4}\), and \(P_{5}\) are \(\frac{1}{4}R\), \(\frac{1}{8}R\), \(\frac{1}{16}R\), \(\frac{1}{32}R\), and the input size R is \(640\times 640\). Specifically, with feature layer \(P_{4}\) as the benchmark, feature layers \(P_{2}\) and \(P_{3}\) adopting avgpool, \(P_{5}\) using upsample, and concatenate the feature maps to get feature \(F_{Cat}\). It is subjected to a depth-separable convolution and ECA attention mechanism. Finally, \(F_{Conv}\) is concatenated with \(F_{P_{5}}\) to obtain \(F_{Align}\) further enhance feature information. The formula is as follows:
where, F is feature map, \(Cat(\cdot )\) represents concatenation, \(Avgpool(\cdot )\) represents avgpool, \((\cdot )\uparrow _{2\times }\) represents upsample, \(DW(\cdot )\) represents depthwise separable convolution, \(ECA(\cdot )\) is attention mechanism, and \(Conv(\cdot )\) represents convolution.
Dual-branch feature reorganization module
In order to further enhance the feature information, this paper introduces a dual-branch feature recombination module, as shown in Fig. 4. It uses the separation and recombination operation to distinguish feature maps with rich information from those with less information. It uses channel-level soft attention to obtain channel-optimized feature maps. It adaptively fuses the output features of the dual-branch transformation stage to produce a more comprehensive and discriminative feature representation, thereby improving the model’s performance.
Firstly, given an intermediate feature map, the scaling factor in the GN layer is used to evaluate the information content of different feature maps to the range (0, 1) by the sigmoid function, and multiplied by the feature elements evaluated by the GN layer to obtain the feature \(X_{1}\). The formula can be expressed as:
where, \(E_{m}\left( \cdot \right)\) represents element-by-element multiplication and \(GN\left( \cdot \right)\) represents group normalization layer.
Secondly, using the split operation factor is 0.5, the feature map is divided into feature maps and feature maps, and the number of feature map channels is kept consistent through \(1 \times 1\) convolution to obtain feature maps \(X_{2}\) and \(X_{3}\). The whole process can be expressed as:
where, \(Split\left( \cdot \right)\) means that the channel is divided into two parts.
Finally, after performing the separation transformation, the output feature maps from the upper and lower transformation stages are adaptively merged, and global avgpool and maxpool are applied, respectively. The channel soft attention operation is used to extract rich discriminative features, which are multiplied with the feature maps \(X_{2}\) and \(X_{3}\) to get \(X_{2}^{W}\) and \(X_{3}^{W}\), and then \(X_{2}^{W}\) and \(X_{3}^{W}\) are element-wise multiplied. The formula is as follows:
where, \(X_{2}^{W}\) and \(X_{3}^{W}\) are the optimized feature map, \(E_{m}\left( \cdot \right)\) represents element-wise multiplied, \(\sigma \left( \cdot \right)\) represents the sigmoid function, \(E_{s}\left( \cdot \right)\) represents element-by-element addition, and \(X^{R}\) represents the feature after dual-branch feature reorganization.
Experiment and analysis
Dataset
The proposed method is evaluated on three public metal surface defect detection datasets. For experiments, the datasets are randomly divided into train and test sets with a ratio of 7:3, as shown in Table 1.
NEU-DET49: The dataset contains six types of surface defects of hot-rolled steel strips, including crazing (Cr), inclusion (In), patches (Pa), pitted surface (Ps), rolled-in scale (Rs), and scratches (Sc). Each type of defect contains 300 images. The defect samples are shown in Fig. 5.
GC10-DET50: The dataset is a hot-rolled steel surface defect dataset collected in industrial production. It contains 2298 images and 10 defect types, namely, punching (Pu), welding line (Wl), crescent gap (Cg), water spot (Ws), oil spot (Os), silk spot (Ss), inclusion (In), rolled pit (Rp), crease (Cr), and waist fold (Wf). The defect sample is shown in the Fig. 6.
APDDD: The dataset contains 1885 images of aluminum surface defects, including 10 types of defects: Depression (De), Non-conductivity (Nc), Scratches (Sc), Orange peel (Op), Bottom leakage (Bl), Bruises (Br), Pits (Pi), Convex powder (Cp), Coating cracks (Cc), and Dirty spots (Ds). The defect is shown in the Fig. 7.
Evaluation metrics
This paper evaluates the results of the detection experiment by using mAP and FPS. The mAP is the average AP of all categories and measures overall detection accuracy. These are two crucial detection evaluation metrics calculated by measuring the area under the precision-recall curve. The frame-per-second (FPS) assesses the object detection model’s real-time performance by indicating the number of photos it can analyze in one second.
where P is the precision, R denotes the recall, and n represents the number of classes.
Experimental settings
In order to verify the effectiveness of FOHR Net, experiments were conducted on a computer with an Intel Xeon Gold 6130 CPU, a NVIDIA GeForce RTX 3090 24GB GPU and the Windows10 operating system. The deep learning framework is PyTorch 2.0.1, CUDA 11.8 and cuDNN 8.6.0. The model is optimized using the SGD optimizer, and the momentum size is set to 0.937. The batch size is set to 8, the initial learning rate is set to 0.01, the training epochs are 500, and the input image size is 640 × 640. To prevent overfitting, we used an early stopping value, which was to stop the model after 50 epochs. In order to ensure the fairness and reliability of the comparative experiments, all models are trained without using pre-training weights.
Performance comparison
This paper compares the model performance of FOHR Net with other mainstream one-stage and two-stage detection models in recent years. FOHR Net is evaluated on the NEU-DET dataset. The experimental results are shown in Table 2. Obviously, it has achieved excellent performance in most indicators. The mAP of the FOHR Net model reached 78.3%, achieving the best detection accuracy, which is 2.2% higher than the second-ranked YOLOv8s. In order to further verify the robustness and generalization ability of the improved model, FOHR Net was compared on the GC10-DET dataset. The comprehensive comparison results of the accuracy of various types of defects and model mAP of each model are shown in Table 3. In contrast, the detection accuracy of FOHR Net reached 70.5% mAP, achieving the highest detection accuracy, which is 4.3% higher than the second-ranked YOLOv8s, and most of the accuracy of various defect detections has achieved excellent performance. In addition, the comparison results on APDDD are shown in Table 4. The mAP of FOHR Net achieved the best detection accuracy of 65.9%, which is 1.6% and 2.0% higher than the second and third models, respectively. According to the indicators of most defect detection results in the table, FOHR Net has achieved excellent performance.
Comparison of inference speed of the top three models in detection accuracy of NEU-DET, GC10-DET, and APDDD datasets. The coordinate points (a, b), a represents the model detection accuracy corresponding to the vertical coordinate, and b represents the model inference time corresponding to the horizontal coordinate.
In order to further evaluate the performance of FOHR Net in terms of inference speed on NEU-DET, GC10-DET and APDDD datasets. To present the model’s performance in a more logical manner, this paper selects the top three models in terms of detection accuracy on the datasets. It visualizes the inference speed and detection accuracy of the model processing images for comparison, as shown in Fig. 8. The results of the experiment demonstrate that FOHR Net is able to determine quickly and achieve excellent detection accuracy in a variety of datasets, indicating the potential benefits of FOHR Net in practical applications. Moreover, we compared the mAP and FPS with other models on NEU-DET and GC10-DET datasets. According to the references7,58, speeds above 30 FPS are generally considered the benchmark for industrial real-time detection. Our method produces excellent results with both mAP and FPS, as in Table 5 shows.
Ablation experiments
To verify the influence and effectiveness of the multi-layer feature alignment module and the dual-branch feature reorganization module proposed by the FOHR Net model, this paper conducts a large number of ablation experiments on NEU-DET and GC10-DET datasets, as shown in Table 6. The model FOHR Net without the multi-layer feature alignment module and the dual-branch feature reorganization module, called the baseline.
The impact of multi-layer feature alignment module
In Table 6, by introducing the experimental results of the multi-layer feature alignment module on NEU-DET, it can be observed that the average accuracy is improved by 1.2% compared with the baseline network model and increased by 5.3% on the GC10-DET dataset. The above ablation study shows that the multi-layer feature alignment module can effectively fuse the shallow and deep features extracted by the backbone network CSPDarkNet53, enrich the information related to the target defect, and improve the detection accuracy of the model.
The impact of dual-branch feature reorganization module
As shown in Table 6, after the introduction of the dual-branch feature recombination module, the detection effect on the NEU-DET and GC10-DET datasets was improved by 1.3% and 4.7%, respectively, compared with the baseline network model. The dual-branch feature recombination module enhances the expressiveness of features and reduces the loss of feature information. The model can obtain more effective feature information and improve the detection effect of the model.
Visual analysis of results
To further demonstrate the effectiveness of the research method, this paper visualizes the defect detection results of the top three models of FOHR Net on NEU-DET, GC10-DET and APDDD datasets. These visualization results intuitively demonstrate the performance of the model, as shown in Figs. 9, 10 and 11. Compared with the top three models in mAP, FOHR Net has higher prediction accuracy and more accurate defect ___location.
Discussion
Limitation and future work
Even though our suggested model performs exceptionally effectively, some issues still need to be solved. At the initial stages, changes to environmental parameters and defect imaging settings in steel environments might impact our model’s performance because defect samples need to be adjusted according to actual conditions. We will also conduct more extensive data collection and experiments in subsequent research to ensure that the model can perform well in a wider range of practical applications. To address the previously mentioned limitations and enhance the model’s effectiveness, we intend to broaden its dataset to include a broader range of environmental situations. In real industrial production environments, there are many factors that may interfere with the detection performance, such as lighting conditions, surface textures of defects, etc. In future research, we will incorporate user feedback from real operational environments to optimize and adjust the model to better meet the requirements of practical applications.
In addition, the paper uses cross-validation to evaluate the model’s performance. By dividing the dataset into different proportions and repeatedly training and validating the model, the stability and generalization ability of the model are ensured, avoiding overfitting and improving the accuracy and reliability of the final model. The division of the dataset NEU-DET and GC10-DET with a training, validation, and test set ratio of 7:1:2. Table 7 shows the cross-validation results on the NEU-DET and GC10-DET datasets, where STFE-Net achieves the highest 82.3% mAP on the NEU-DET, and the FOHR Net achieves 81.5% mAP, showing strong competitiveness. On the GC10-DET dataset, we achieved the second-best performance. FOHR Net performs well in the detection of multiple defect types, showing robustness and reliability, indicating that it has potential application prospects in metal surface defect detection.
Conclusion
The FOHR Net for metal surface defect detection is proposed for this study. In order to efficiently combine shallow and deep features and enhance the data pertaining to the target defect, this approach initially employs a multi-layer feature alignment module. In order to enhance feature expression and minimize feature information loss, this research then presents a dual-branch feature recombination module. The experiments were conducted on three datasets, including NEU-DET, GC10-DET, and APDDD, which prove FOHR Net outperformed other advanced methods, demonstrating the efficacy of the methodology demonstrated in this study. However, in actual industrial production, the production environment varies greatly. For example, light, smoke, vibration, etc., may affect the performance of the model. Therefore, this paper plans to seek defects with various complex backgrounds in the future to improve the generalization ability of the model in different detection environments.
Data availability
This study did not report any data. The method was evaluated on three publicly available steel surface defect detection datasets that are widely used in the field of defect detection: NEU-DET (https://aistudio.baidu.com/datasetdetail/18746), GC10-DET (https://www.kaggle.com/datasets/alex000kim/gc10det) and APDDD (https://tianchi.aliyun.com/dataset/148297).
References
Pang, D., Guan, Z., Luo, T., Su, W. & Dou, R. Real-time detection of road manhole covers with a deep learning model. Sci. Rep. 13, 16479 (2023).
Shen, M. et al. Defect detection of printed circuit board assembly based on yolov5. Sci. Rep. 14, 19287 (2024).
Lu, J., Yu, M. & Liu, J. Lightweight strip steel defect detection algorithm based on improved yolov7. Sci. Rep. 14, 13267 (2024).
Xiao, G., Hou, S. & Zhou, H. Pcb defect detection algorithm based on cdi-yolo. Sci. Rep. 14, 7351 (2024).
Su, J. et al. Pmsa-dytr: Prior-modulated and semantic-aligned dynamic transformer for strip steel defect detection. IEEE Trans. Ind. Inform. 20, 6684–6695 (2024).
Chen, H., Du, Y., Fu, Y., Zhu, J. & Zeng, H. Dcam-net: A rapid detection network for strip steel surface defects based on deformable convolution and attention mechanism. IEEE Trans. Instrum. Meas. 72, 1–12 (2023).
Wang, W. et al. A real-time steel surface defect detection approach with high accuracy. IEEE Trans. Instrum. Meas. 71, 1–10 (2022).
Wang, X. & Zhuang, K. An improved yolox method for surface defect detection of steel strips. In 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA), 152–157 (IEEE, 2023).
Zhang, H. et al. An efficient and real-time steel surface defect detection method based on single-stage detection algorithm. Multimed. Tools Appl.[SPACE]https://doi.org/10.1007/s11042-024-19565-2 (2024).
Zhong, H. et al. Lifso-net: A lightweight feature screening optimization network for complex-scale flat metal defect detection. Knowl.-Based Syst. 304, 112520 (2024).
Cheng, X. & Yu, J. Retinanet with difference channel attention and adaptively spatial feature fusion for steel surface defect detection. IEEE Trans. Instrum. Meas. 70, 1–11 (2020).
Wang, Y., Wang, H. & Xin, Z. Efficient detection model of steel strip surface defects based on yolo-v7. IEEE Access 10, 133936–133944 (2022).
Cheng, Z., Gao, L., Wang, Y., Deng, Z. & Tao, Y. Ec-yolo: Effectual detection model for steel strip surface defects based on yolo-v5. IEEE Access 12, 62765–62778 (2024).
He, Y., Song, K., Meng, Q. & Yan, Y. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans. Instrum. Meas. 69, 1493–1504 (2019).
Hao, X., Dong, T. & Zhang, D. A highly efficient surface defect detection approach for hot rolled strip steel based on deep learning. In 2021 6th International Conference on Robotics and Automation Engineering (ICRAE), 318–322 (IEEE, 2021).
Yu, Y. et al. Surface defect detection of hot rolled steel based on attention mechanism and dilated convolution for industrial robots. Electronics 12, 1856 (2023).
Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part I 13, 818–833 (Springer, 2014).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).
Kamilaris, A. & Prenafeta-Boldú, F. X. A review of the use of convolutional neural networks in agriculture. J. Agric. Sci. 156, 312–322 (2018).
Krichen, M. Convolutional neural networks: A survey. Computers 12, 151 (2023).
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 779–788 (2016).
Redmon, J. & Farhadi, A. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7263–7271 (2017).
Redmon, J. & Farhadi, A. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
Bochkovskiy, A., Wang, C.-Y. & Liao, H.-Y. M. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).
Li, C. et al. Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022).
Wang, C.-Y., Bochkovskiy, A. & Liao, H.-Y. M. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7464–7475 (2023).
Wang, C.-Y., Yeh, I.-H. & Liao, H.-Y. M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv preprint arXiv:2402.13616 (2024).
Wang, A. et al. Yolov10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458 (2024).
Liu, W. et al. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 21–37 (Springer, 2016).
Ross, T.-Y. & Dollár, G. Focal loss for dense object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2980–2988 (2017).
He, K., Zhang, X., Ren, S. & Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015).
Ren, S., He, K., Girshick, R. & Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems28 (2015).
Zhang, H. et al. Surface defect detection of hot rolled steel based on multi-scale feature fusion and attention mechanism residual block. Sci. Rep. 14, 7671 (2024).
Wu, R. et al. Enhanced you only look once x for surface defect detection of strip steel. Front. Neurorobot. 16, 1042780 (2022).
Ren, Q., Geng, J. & Li, J. Slighter faster r-cnn for real-time detection of steel strip surface defects. In 2018 Chinese Automation Congress (CAC), 2173–2178 (IEEE, 2018).
Zhong, H. et al. Stfe-net: A multi-stage approach to enhance statistical texture feature for defect detection on metal surfaces. Adv. Eng. Inform. 61, 102437 (2024).
Lin, T.-Y. et al. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117–2125 (2017).
Liu, S., Qi, L., Qin, H., Shi, J. & Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8759–8768 (2018).
Chandio, A. et al. Precise single-stage detector. arXiv preprint arXiv:2210.04252 (2022).
Tan, M., Pang, R. & Le, Q. V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision And Pattern Recognition, 10781–10790 (2020).
Ghiasi, G., Lin, T.-Y. & Le, Q. V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7036–7045 (2019).
Xu, H., Yao, L., Zhang, W., Liang, X. & Li, Z. Auto-fpn: Automatic network architecture adaptation for object detection beyond classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6649–6658 (2019).
Chen, F., Deng, M., Gao, H., Yang, X. & Zhang, D. Ap-net: A metallic surface defect detection approach with lightweight adaptive attention and enhanced feature pyramid. Clust. Comput. 27, 3837–3851 (2024).
Wang, G.-Q. et al. Yolo-msapf: Multiscale alignment fusion with parallel feature filtering model for high accuracy weld defect detection. IEEE Trans. Instrum. Meas. 72, 1–14 (2023).
Liu, R., Huang, M., Gao, Z., Cao, Z. & Cao, P. Msc-dnet: An efficient detector with multi-scale context for defect detection on strip steel surface. Measurement 209, 112467 (2023).
Zuo, L. et al. A new dual-branch network with global information for the surface defect detection on solar PV wafer. IEEE Sens. J. 24, 9197–9207 (2024).
Liu, H. et al. Cgtd-net: Channel-wise global transformer based dual-branch network for industrial strip steel surface defect detection. IEEE Sens. J. 24, 4863–4873 (2024).
An, D. et al. Stdpnet: A dual-path surface defect detection neural network based on shearlet transform. Vis. Comput. 40, 1–16 (2023).
Song, K. & Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci. 285, 858–864 (2013).
Lv, X., Duan, F., Jiang, J.-J., Fu, X. & Gan, L. Deep metallic surface defect detection: The new benchmark and detection network. Sensors 20, 1562 (2020).
Cai, Z. & Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6154–6162 (2018).
Tian, Z., Shen, C., Chen, H. & He, T. Fcos: A simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1922–1933 (2020).
Zhang, S., Chi, C., Yao, Y., Lei, Z. & Li, S. Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9759–9768 (2020).
Duan, K. et al. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6569–6578 (2019).
Pebrianto, W., Mudjirahardjo, P., Pramono, S. H., Setyawan, R. A. et al. Yolov3 with Spatial Pyramid Pooling for Object Detection with Unmanned Aerial Vehicles. arXiv preprint arXiv:2305.12344 (2023).
Ge, Z. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021).
Zhao, Y. et al. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16965–16974 (2024).
Li, J., Su, Z., Geng, J. & Yin, Y. Real-time detection of steel strip surface defects based on improved yolo detection network. IFAC-PapersOnLine 51, 76–81 (2018).
Li, G., Shao, R., Wan, H., Zhou, M. & Li, M. A model for surface defect detection of industrial products based on attention augmentation. Comput. Intell. Neurosci. 2022, 9577096 (2022).
Wang, X., Zhang, Q. & Chen, C. Dual-branch information extraction and local attention anchor-free network for defect detection. Sci. Rep. 14, 10886 (2024).
Chen, F., Deng, M., Gao, H., Yang, X. & Zhang, D. Aca-net: An adaptive convolution and anchor network for metallic surface defect detection. Appl. Sci. 12, 8070 (2022).
Yasir, S. M. & Ahn, H. Faster metallic surface defect detection using deep learning with channel shuffling. arXiv preprint arXiv:2406.14582 (2024).
Zhang, L., Chen, J., Chen, J., Wen, Z. & Zhou, X. Ldd-net: Lightweight printed circuit board defect detection network fusing multi-scale features. Eng. Appl. Artif. Intell. 129, 107628 (2024).
Liu, J., Li, H., Zuo, F., Zhao, Z. & Lu, S. Kd-lightnet: A lightweight network based on knowledge distillation for industrial defect detection. IEEE Trans. Instrum. Meas. 72, 1–13 (2023).
Yu, X. et al. Progressive refined redistribution pyramid network for defect detection in complex scenarios. Knowl.-Based Syst. 260, 110176 (2023).
Shao, R., Zhou, M., Li, M., Han, D. & Li, G. Td-net: tiny defect detection network for industrial products. Complex Intell. Syst. 10, 1–12 (2024).
Liu, Q., Liu, M., Jonathan, Q. & Shen, W. A real-time anchor-free defect detector with global and local feature enhancement for surface defect detection. Expert Syst. Appl. 246, 123199 (2024).
Wang, Z., Zhu, H., Jia, X., Bao, Y. & Wang, C. Surface defect detection with modified real-time detector yolov3. J. Sens. 2022, 8668149 (2022).
Zhang, L. et al. Multiscale local and global feature fusion for the detection of steel surface defects. Electronics 12, 3090 (2023).
Zhang, L. et al. Tafenet: A two-stage attention-based feature-enhancement network for strip steel surface defect detection. Electronics 13, 3721 (2024).
Acknowledgements
This work is partially supported by the Zhejiang Provincial Natural Science Foundation of China (No. LY23F020023, LZ23F020001) and the National Natural Science Foundation of China (Grant No. 61906168,62272267,62176237).
Author information
Authors and Affiliations
Contributions
All authors reviewed the manuscript. Conceptualization: S.C., H.Z., S.L., F.H. and X.Z.; Investigation: S.C., S.L., J.M., F.H. and X.Z.; Software and validation: S.C., S.L., F.H. and X.Z.; Writing & original draft preparation: S.C., S.L., F.H. and X.Z.; Formal analysis: H.Z., S.L., J.M., F.H. X.Z. and S.C.; Prepare figures: H.Z., S.L., J.M., F.H., X.Z. and S.C.; Funding acquisition: J.M., X.Z. and S.C.; Interpretation of data: H.Z., S.L., J.M. and F.H.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Chan, S., Li, S., Zhang, H. et al. Feature optimization-guided high-precision and real-time metal surface defect detection network. Sci Rep 14, 31941 (2024). https://doi.org/10.1038/s41598-024-83430-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-83430-3