Abstract
This study aims to improve the detection of dental burs, which are often undetected due to their minuscule size, slender profile, and substantial manufacturing output. The present study introduces You Only Look Once-Dental bur (YOLO-DB), an innovative deep learning-driven methodology for the accurate detection and counting of dental burs. A Lightweight Asymmetric Dual Convolution module (LADC) was devised to diminish the detrimental effects of extraneous features on the model’s precision, thereby enhancing the feature extraction network. Moreover, to augment the efficiency of feature integration and diminish computational demands, a novel fusion network combining SlimNeck with BiFPN-Concat was introduced, effectively merging superficial spatial details with profound semantic features. A specialized platform was developed for the detection and counting of dental burs, and rigorous experimental assessments were performed. Promising results were achieved. YOLO-DB yielded a Mean Average Precision ([email protected]) of 99.3% on the dental bur dataset, with a notable 3.2% increase in [email protected]:0.95 and a sustained detection pace of 128 frames per second. The model also achieved a 14.4% reduction in parameter volume and a 17.9% decrease in computational expenditure, while achieving a flawless counting accuracy of 100%. Our approach outperforms current detection algorithms in terms of detection capability and efficiency, presenting a new method for the precise detection and counting of elongated objects such as dental burs.
Similar content being viewed by others
Introduction
Dental burs are specialized cutting instruments used in dental practice; they are affixed to dental phones for the rotary abrasion and shaping of teeth or dental materials. Comprising a working tip and a shank, these burs are disposable devices in oral surgery. Considering their substantial annual output, diverse types, and small size, the process of inspecting and quantifying dental burs currently relies heavily on manual labor. This approach is not only arduous but also inefficient, limiting the industry’s progression toward automation and smart manufacturing1. Two techniques are currently applied in counting dental burs. The photoelectric counting method operates by emitting a light beam from a sensor; each obstruction of this beam by the object in question is registered, thereby tallying the quantity2. However, this technique does not provide accurate counts when objects intersect or overlap, and involves a relatively slow detection pace. The alternative weighing method ascertains the quantity of objects in bulk by measuring the aggregate weight and calculating the count from the known weight of a single unit. However, this approach is susceptible to cumulative errors due to manufacturing variances, leading to inaccuracies, and remains a cumbersome and slow process. Therefore, both methodologies have low efficiency and accuracy, failing to meet the requirements for precision and expedience in industrial applications3.
Sophisticated computer vision technologies have been developed based on deep learning, and have given rise to a plethora of object detection techniques, including the two-stage architecture of Faster-RCNN4, the spatial pyramid pooling network SPP-Net5, and the single-stage detectors YOLO6 and SSD7. Despite remarkable advancements in natural image detection, these conventional methods often utilize horizontal bounding boxes to delineate objects. This approach offers limited efficacy when detecting elongated objects such as dental burs, especially when densely arranged and randomly oriented, due to the inclusion of excessive background and redundant data, diminishing the accuracy of detection8,9,10.
To improve the precision and detection speed, this paper adopts the YOLOv8n-obb11 single-stage rotating object detection algorithm introduced by Ultralytics in 2024. Despite its robust performance, this algorithm still exhibits a significant number of missed and false detections when dealing with dense objects featuring large aspect ratios. This paper refines the YOLOv8n-obb algorithm to mitigate these issues. Considering the high similarity among dental burs and the tendency of detection models to extract multiple redundant features, thereby increasing model complexity and affecting classification and localization decisions12,13, the feature extraction and fusion network of the YOLOv8n-obb algorithm were enhanced. Reducing the redundant features effectively improved the model’s detection speed and accuracy, meeting the industrial requirements for high-precision and high-speed detection of dental burs. The main contributions of this paper are as follows:
-
(a)
A comprehensive dental bur dataset encompassing 480 high-resolution images enriched with extensive instance annotations was constructed, providing a robust foundation for the detection and counting tasks.
-
(b)
A detection and counting method for dental burs based on YOLO-DB has been proposed, which effectively addresses the limitation of photoelectric counting methods in counting dental burs under intersecting conditions, achieving efficient and high-precision detection and counting.
-
(c)
To enhance the feature extraction process and curtail the influence of superfluous features, the Lightweight Asymmetric Dual Convolution module (LADC) was introduced, effectively refactoring the YOLOv8n-obb network to distill more refined features.
-
(d)
The SlimNeck14 and BiFPN-Concat15 architectures were incorporated into YOLOv8n-obb, and the integration of both shallow and deep features were optimized, resulting in a more nuanced and comprehensive feature fusion that significantly boosts the model’s performance.
Related work
Small object detection
The detection of small objects in imagery is impeded by their limited pixel coverage, which results in substantial feature information loss during the feature extraction process as resolution diminishes. This scarcity of effective features results in model detection accuracy. Chen et al.16 have operationally defined small objects as those within the same category with a median ratio of the bounding box area to the image area falling between 0.08% and 0.58%. In recent years, multiple studies have focused on the detection of small objects.
In the realm of remote sensing detection, Fan et al.17 investigated the real-time identification of maritime vessels, introducing the CSDP-YOLO algorithm. This method leverages deep large-core convolution to expand the receptive field of shallow features and incorporates the MPDIoU loss function to counteract interferences such as clouds, islands, and fishing nets around small objects. Zhang et al.18 developed an innovative large-scale neural network that synergizes contextual semantic information with fine details, thereby optimizing detection tasks in remote sensing imagery. Furthermore, Pham et al.19 reported an enhanced single-stage detection network, YOLO-fine, with refined network architecture and training methodologies to bolster the detection precision of minuscule objects and mitigate class imbalance.
Within the ___domain of infrared object detection, Wang et al.20 proposed a non-local sparse self-attention encoding-decoding network, enhancing the identification of small objects by accentuating their intrinsic features and dampening environmental noise. Tong et al.21 developed an asymmetric attention mechanism network that employs intra-layer feature exchange and cross-layer feature integration to refine feature representation. Moreover, Liu et al.22 devised an image enhancement-based detection algorithm that amplifies object salience through detail enhancement and pixel upscaling, thereby augmenting the detection of small objects.
Han et al.23 investigated defect identification in insulators, reconstructing the YOLOv4 framework with GhostNet modules and integrating the ECA-Net24 attention mechanism to enhance the model’s precision in small object recognition. Huang et al.25 addressed the challenge of detecting complex surface defects in automotive part manufacturing with an enhanced YOLOv726 algorithm, which refines the backbone and feature fusion networks and employs the Alpha-IoU loss function to elevate detection accuracy. Mei et al.27 researched fabric defect detection and improved the YOLOv8n model structure with selective attention mechanisms and AFPN, while introducing the WIoUv3 loss function to regulate gradient gain, achieving a remarkable detection accuracy rate of 97.4%.
Slender object detection
Slender objects are characterized by their extreme aspect ratios. Wan et al.28 provide a mathematical definition of slenderness using Eq. (1):
In this equation, w and h denote the width and height of the rectangle with the smallest area that encloses the object, respectively. The slenderness ratio s is classified into three categories: extra slender (XS) for s < 1/5, slender (S) for 1/5 < s < 1/3, and regular (R) for s > 1/3. Huang et al.29,30 explored the application of deep learning to identify slender objects in optical satellite imagery, utilizing shadow compensation techniques to retrieve details obscured during image acquisition. In addition, Ma et al.31 introduced an innovative automatic and intelligent system for the detection of road cracks, leveraging adversarial networks to overcome the scarcity of imagery and enhancing the detection capability with a refined version of YOLOv3.
Xiang et al.32 further advanced the YOLOv5 detection algorithm by incorporating a Transformer module, which significantly improved the detection of cracks. Moreover, Wang et al.33 developed a method for the detection of surface texture defects that effectively removes background textures and accentuates the defects at the full image resolution.
Proposed method
This paper presents a detailed exposition of our proposed approach. The YOLOv8n-obb model was used as the baseline for dental bur rotation detection in this research, constituting the core methodology for the detection of dental burs. Subsequently, the novel Lightweight Asymmetric Dual Convolution module (LADC) was seamlessly integrated into the convolutional Backbone to augment the model’s capacity for feature extraction. Thereafter, a pioneering feature fusion network that combines SlimNeck and BiFPN-Concat was introduced, thereby refining the mechanism of feature amalgamation.
The overall structure of YOLO-DB
Considering the similarity among dental burs, which display only minor variations in morphology, overly intricate models often capture a plethora of non-essential features, resulting in diminished object localization and classification. Therefore, the model architecture was streamlined while bolstering detection performance. This study introduces a novel lightweight, single-stage detection model termed YOLO-DB, as illustrated in Fig. 1. The model leverages the Lightweight Asymmetric Dual Convolution (LADC) module, convolutional layers (Conv), and Spatial Pyramid Pooling with Feature map (SPPF) as the foundational components of its convolutional Backbone. For the feature fusion network, it adopts the composite structure of the Feature Pyramid Network (FPN) and Path Aggregation Network (PAN). Within this framework, the VOV-GSCSP and GSConv modules are integrated to meticulously enhance feature refinement. Moreover, the model employs an adaptive feature fusion technique, BiFPN-Concat, which facilitates a nuanced, weighted integration of features. This approach considers the differential significance of both shallow and deep features, thereby enriching the model’s capacity to expressively capture and utilize feature information.
Lightweight asymmetric dual convolution module
In the realm of rotating object detection, the diverse object sizes, the predominance of elongated shapes, and arbitrary orientation often result in conventional square convolutions capturing an excess of background noise. To address this, the present study introduces two asymmetric convolutions that refine the feature extraction network from horizontal and vertical perspectives in conjunction with DualConv34, establishing the LADC module, as illustrated in Fig. 2. The input feature map is subjected to identity mapping via a convolutional layer (Conv) to maintain the integrity of spatial dimensions and is subsequently bisected along the channel dimension. Each segment undergoes horizontal feature extraction via a 1 × 3 asymmetric convolution, followed by vertical feature extraction through a 3 × 1 asymmetric convolution. In parallel, the DualConv architecture, detailed in Fig. 3, processes the input data, streamlining the information processing sequence, and augmenting the efficiency of feature extraction while ensuring the network’s capacity for rich and deep feature representation. This structure adeptly leverages both 3 × 3 and 1 × 1 convolutions to handle the input feature map channels, combining the strengths of each to refine the feature extraction process with a minimized computational and parameter footprint. To bolster the model’s gradient flow, the DualConv output is integrated with an intermediary feature map \({f_{\text{1}}}\) through residual connections and concatenated with another feature map \({f_{\text{2}}}\) along the channel dimension. A 1 × 1 convolutional kernel, termed Convolution-Batch Normalization-SiLU (CBS), is then applied to seize channel feature correlations, resulting in the refined output feature map.
SlimNeck
In industrial object detection, the detection speed and precision are of equal significance. To achieve an efficient integration of the shallow and deep features extracted by the backbone network, this study presents a novel reconstruction of the Feature Pyramid Network integrated with Path Aggregation (FPN + PAN), employing Grouped Spatial Convolution (GSConv) and an advanced cross-stage partial block (VOV-GSCSP). The GSConv structure, as illustrated in Fig. 4, skillfully combines regular convolutions, depthwise separable convolutions, and a Shuffle operation. This module adeptly blends feature information generated by standard convolutions, utilizing a Shuffle strategy to integrate it with the output from depthwise separable convolutions. This innovative approach closely preserves the output of traditional convolutions in terms of computational results while substantially reducing the computational expense. The integration of depthwise separable convolutional layers and Shuffle layers significantly amplifies the nonlinear expressive capacity of the feature information, rendering GSConv an optimal solution for lightweight model detection systems. The computational complexity for standard convolution is displayed in Eq. (2), that of the depthwise separable convolution is shown in Eq. (3), and GSConv in Eq. (4).
In the given formula, W and H represent the width and height of the input feature map, respectively; K indicates the convolutional kernel size, and \({C_{{\text{in}}}}\) and \({C_{{\text{out}}}}\) represent the channel counts for the input and output feature maps, respectively. Equation (4) illustrates that as the number of input channels increases, the computational load of GSConv is decreased to about half compared to that of traditional convolution, while both maintain identical feature extraction capabilities.
To expedite the model’s inference velocity and augment its capacity for feature extraction, this study introduces a lightweight bottleneck structure termed the GSBottleneck module, as shown in Fig. 5. This module features a similar concept to a residual network and is engineered to minimize the loss of information and gradient flow by sequentially employing two GSConv operations and concatenating the input and output feature maps. The VOV-GSCSP module establishes an inter-stage partial network through a unified aggregation approach; combined with GSConv, a comprehensive feature fusion network is formed.
BiFPN-Concat
In the feature extraction process of dental bur imagery, a progressive reduction in image resolution occurs, while the channel dimension expands. Feature maps with shallower layers boast a smaller receptive field, providing a wealth of fine-grained details such as color, texture, and positional information. In contrast, deeper-layer feature maps possess a more extensive receptive field, enriched with higher-level semantic information, but have a lower capacity for detail perception. Consequently, the YOLOv8-obb architecture employs a Concatenation technique to merge these features directly to leverage the advantages of features of different depths. However, owing to the varying contribution levels of feature information across different channels, a straightforward Concatenation is grossly inefficient. Therefore, to preserve vital feature information while enhancing the fusion efficiency, this study introduces the BiFPN-Concat strategy as an advanced alternative to the conventional Concatenation approach. The underlying principle of BiFPN-Concat is detailed in Eq. (5).
Within the equation, W signifies the adaptive weights, with I referring to the feature map and \(\varepsilon\) representing a minuscule numerical value designated as the learning rate. Throughout the model training regimen, the magnitude of W is dynamically learned in accordance with the distinct feature maps, employing adaptive weights to modulate the input from varying feature maps. This modulation effectively distinguishes between pivotal and ancillary features. Numerical instability is avoided by fixing the value of \(\varepsilon\) at 0.0001. Figure 6 presents a schematic of the BiFPN architecture, wherein P3 to P7 correspond to feature maps of diverse scales, obtained from the feature extraction network. These intermediate nodes harness adaptive weights to finely tune the respective contributions of each feature map, thereby facilitating their efficient combination.
Experiments and analysis
Experimental environment and parameter settings
The experiments within this study were executed on the Ubuntu operating system, employing PyTorch 2.1.1 as the framework for deep learning. The hardware setup comprised an NVIDIA A800 graphics processing unit with 80GB of dedicated memory, a 16-core central processing unit, and a substantial 64GB of random-access memory. The development environment was established with Python 3.11.5. All the experimental procedures were conducted with a uniform parameter configuration, as shown in Table 1.
Dental bur dataset
In this study, a Samsung camera (13 megapixels) was employed to capture a dataset of dental burs, as depicted in Fig. 7. To conform to the image acquisition method of dental burs in industrial settings, the burs were randomly arranged within a tray, and the acquisition process was carried out under both natural and artificial lighting, with the artificial lighting being utilized to mitigate the shadows cast by the dental burs under natural light. Given that dental burs are commonly produced in large quantities of a single type on industrial assembly lines, different models of dental burs are not intermingled. The study categorized the dental burs into six classes based on their distinct morphological characteristics, resulting in a total of 480 images being collected, with the number of dental burs in each image randomly ranging between 100 and 300. Subsequently, all images were uniformly resized to 2000 × 2000 pixels and the dataset was divided into training, validation, and testing sets in an 8:1:1 ratio, with some samples of the dataset illustrated in Fig. 8. The dental burs’ bounding boxes were annotated using the RolabelImg software, with the six categories being designated by the letters A to F. During the annotation process, each bounding box’s size and angle had to be manually adjusted to ensure that each oriented rectangular box precisely encompassed each dental bur, thereby minimizing the influence of background areas and adjacent targets. Any inaccuracies in the size and angle of the bounding boxes could significantly impact the model’s detection efficacy in areas with densely dental burs.
As illustrated in the example of Fig. 9, the ratios of length (l) to maximum diameter (d) for six types of dental burs were measured in this study, and the aspect ratios were calculated by determining the ratio of l to d, resulting in six different aspect ratios. Figure 10 presents the distribution of the number of dental bur instances, aspect ratios, and pixel area proportions. It can be observed from the figure that the aspect ratios of the target instances range between 4.94 and 19.35, which are in accordance with the definition of elongated targets. Additionally, the pixel area proportions of the six categories of dental burs all fall within the range of 0.08–0.36%, which aligns with the definition of small targets. Subsequently, a detection platform was assembled by combining a camera, a stand, a fill light, and a laptop, as shown in Fig. 11.
Evaluation metrics
This study utilized a suite of six evaluation metrics to quantitatively assess performance, including precision, recall, mean average precision at an intersection over union (IoU) threshold of 0.5 ([email protected]), model parameters, floating point operations per second (FLOPs), and frames per second (FPS). A detailed explanation of each metric is provided below.
(1) Mean average precision (mAP) is a widely recognized metric for measuring the performance of object detection algorithms. It is calculated using the following formula:
Within the equation, n signifies the total number of object classes identified; i corresponds to each individual detection event; AP is the average precision for an individual class, calculated by averaging the precision scores at each recall level within the class; mAP represents the mean average precision, which is the arithmetic mean of the AP values for all classes, offering a single performance metric that aggregates the detection accuracy across the entire range of classes.
The equation employs the term P, which stands for precision, a measure of exactness in classifying the instances that are truly positive. TP indicates true positives and denotes the count of instances that have been accurately identified as belonging to the positive class. Conversely, FP indicates false positives and represents the count of instances that have been incorrectly categorized as positive.
(2) FPS stands for frames per second and evaluates the speed of the object detection process. This value indicates how many frames can be processed in one second and is calculated using Eq. (9):
Within the equation, Framenum represents the cumulative count of images that have undergone detection; ElapsedTime denotes the total duration, measured in seconds, that the detection process has required.
(3) The term Params represents the count of parameters within the model, measured in millions (M). Specifically for convolutional layers, the parameter count is determined according to Eq. (10):
In the given formula, \({{\text{C}}_{{\text{in}}}}\) and \({{\text{C}}_{{\text{out}}}}\) signify the number of channels for the input and output feature maps, respectively. Additionally, \({{\text{K}}_w}\) and \({{\text{K}}_h}\) correspond to the dimensions of the convolutional kernel in terms of width and height.
(4) FLOPs, which stands for Floating Point Operations per second, is a metric representing the computational intensity of the model, with the unit being FLOPs (G). FLOPs are calculated by Eq. (11):
Within the equation, K signifies the dimensions of the convolutional kernel, while H and W correspond to the height and width of the input feature map, respectively. This notation is standard in the computation of the model’s floating-point operations, which is essential for understanding the computational complexity of convolutional operations within deep learning architectures.
Ablation experiments
To further investigate the influence of the LADC module on both the Backbone and Neck components of the network, this study conducted four comparative experiments, integrating the LADC module into various network configurations. The outcomes are presented in Table 2. The results revealed that the integration of the LADC module into the Backbone yielded the highest improvement in [email protected]:0.95, peaking at 76.2%, corresponding to a 1.0% enhancement. This integration also led to a notable reduction in parameters and computational requirements. Conversely, embedding the LADC in the Neck resulted in a marginal increase of 0.2% in [email protected]:0.95. Interestingly, when the LADC was incorporated into both the Backbone and Neck, a slight decrease of 0.8% was observed in [email protected]:0.95. Collectively, these findings suggest that the LADC module effectively eliminates redundant features and augments the model’s capacity for feature extraction pertaining to dental burs by reducing the parameter count and computational load.
To explore the efficacy of the three enhancements proposed within this study, a series of six experiments were meticulously designed, ensuring that each experiment was operated under identical environmental conditions and parameter configurations. The tabulated outcomes of these experiments are delineated in Table 3, offering a comprehensive view of the performance impact of the implemented improvements.
The original model YOLOv8n-obb’s [email protected] rate reaches an impressive 99.0%. Enhancements through LADC, SlimNeck, and BiFPN-Concat collectively improved the [email protected]:0.95 metric, achieving an increment within the range of 1.0–2.3%. Specifically, the integration of LADC alone yielded a 1.0% augmentation in [email protected]:0.95, coupled with a diminution in parameter count and computational demand. The incorporation of SlimNeck precipitated a 1.5% enhancement in the precision of detection. Notably, the integration of BiFPN-Concat produced the most pronounced improvement in [email protected]:0.95, demonstrating the superior efficiency of adaptive weights in the combination of extracted features. The concomitant introduction of the LADC module and SlimNeck further amplified the model’s detection precision and reduced its complexity. In the final set of experiments, the integration of all three enhancements into the original model yielded a peak elevation of 3.2% in [email protected]:0.95, a 14.4% reduction in parameter volume, a 17.9% alleviation in computational load, and an increase in detection speed of 32 frames per second. These outcomes confirm the efficacy of the three proposed enhancements.
Comparative experiments
This study aimed to evaluate the improvements in [email protected]:0.95 for the improved dental bur detection algorithm across each class of dental bur. Therefore, a comparative experimental analysis was performed to evaluate the model’s performance before and after the refinements, as displayed in Fig. 12. The results indicate that the algorithm delivers robust detection accuracy for a variety of dental bur classes. The [email protected]:0.95 improvement for classes A through F is as follows: 5.0%, 3.2%, 5.8%, 2.7%, 2.1%, and 0.6%, respectively. While the color feature complexity of class F burs only yielded a moderate gain in accuracy, considerable enhancements were observed for classes A to E, showing an accuracy increase within the 2.1–5.8% range.
Comparison of the [email protected]:0.95 improvement effect of the six classes.
Recall, an indicator of a model’s comprehensiveness, is directly correlated with the rate of undetected instances. In contrast, precision measures the model’s exactness and is directly associated with the rate of false positives. The comparative analysis of these metrics before and after the model’s enhancement supports the viability of the updated approach. Figure 13 illustrates the comparative assessment of precision and recall for the dental bur detection algorithm across various classes. The results demonstrate a notable enhancement in both precision and recall in the refined model. Particularly, class A exhibited the largest improvement in recall by 3.4%, and class C showed the most significant improvement in precision, increasing by 5.7%. These outcomes confirm the efficacy of the algorithm introduced in this study.
To substantiate the enhanced detection capabilities of the refined algorithm, a series of detection experiments and statistical comparisons were performed on a dental bur dataset under uniform experimental conditions and parameter configurations. A selection of industry-standard models was evaluated, comprising YOLOv3-obb, YOLOv5n-obb, YOLOv5s-obb, and YOLOv6n-obb. Furthermore, to benchmark the improved approach, three enhancements were integrated into the larger baseline models YOLOv8s-obb and YOLOv8m-obb, yielding the results for the Ours-s and Ours-m groups, as presented in Table 4.
The findings detailed in Table 4 reveal that the proposed method in this study attained an [email protected] of 99.4% and an [email protected]:0.95 of 78.4%, surpassing the detection precision of comparable models such as YOLOv5n-obb and YOLOv6n-obb. Compared to the excellent rotating target detection methods proposed in references35,36,37, YOLO-DB has a significant detection accuracy and speed. Specifically, the Ours-s model achieved an [email protected]:0.95 of 81.5%, marking a 1.1% enhancement in accuracy and a 13 fps improvement in detection rate over the YOLOv5s-obb model of equivalent scale. The Ours-m model excelled with an [email protected]:0.95 of 83.6%, outperforming the detection precision of YOLOv3-obb and demonstrating a marked reduction in parameter volume and computational demand, as well as a 27 FPS faster detection speed compared to YOLOv3-obb. However, the Ours model, compared to Ours-s and Ours-m, exhibits a significant speed advantage and provides better comprehensive performance. The comparative experiments demonstrate that our method can achieve a better balance between detection accuracy and speed with a lower computational cost.
Generalization experiments
To evaluate the generalization performance of the model, this paper validates YOLO-DB on two public remote sensing aerial datasets, with the experimental results shown in Table 5. DOTA v1.0 is a dataset for rotating objects, and NWPU VHR-10 is a dataset for horizontal objects. YOLO-DB(H) is the horizontal object detection method obtained by combining three improvements of this paper with YOLOv8n. The experimental results indicate that on both public datasets, the improved method of this paper can effectively enhance the detection accuracy and speed of both horizontal and rotating objects, showing consistent improvements.
Visualization of test results
In this study, six sets of dental bur images were randomly selected to conduct a qualitative comparative analysis of the detection efficacy between the original YOLOv8n-obb model and the refined YOLO-DB model, employing a confidence threshold of 0.4. The comparative experimental outcomes are depicted in Fig. 14. The color boxes in the figure represent the model’s object localization, with annotations specifying the detected category and the associated confidence level. Specifically, instances where the YOLOv8n-obb model exhibited suboptimal detection have been magnified. In the first example of Fig. 14a, an erroneous green bounding box appeared, and in other examples, there was a common occurrence of missed detections. Figure 14b demonstrates the detection effects of the YOLO-DB model, notably mitigating the problems of missed and false detections, which enables more accurate detection and counting.
To validate that YOLO-DB can maintain excellent detection performance even when faced with intersecting dental burs, this study manually arranged a large number of dental burs in an intersecting manner to simulate the special conditions that may randomly occur in practical applications. Figure 15 demonstrates the detection effects of the original YOLOv8n-obb and the YOLO-DB model. It can be observed that, compared to the detection effects of the original YOLOv8n-obb, YOLO-DB is capable of accurately regressing the boundaries of the dental burs without any missed detections, confirming that YOLO-DB can address the detection and counting issues of dental burs in intersecting situations.
Failure case analysis
Although YOLO-DB has achieved good detection performance on the dental bur dataset, there are also cases of detection failure. To analyze these failed cases, Fig. 16 shows some missed detection results. When dental burs are densely distributed, the model struggles to accurately detect the boundaries of each dental bur. In such cases, it is necessary to further optimize the annotation boxes of the dental burs to include as little background area as possible and enhance the model’s ability to recognize individual bur features. In Fig. 16b, it is observed that the dental burs that were missed had a relatively dark visual effect, which may be an important factor leading to detection failure. For this situation, it is necessary to adjust the direction of the light source, add more light sources, and illuminate from different angles to increase the reflectivity of the dental burs. In addition to this, more images need to be collected under low-light conditions to improve the model’s ability to recognize dental burs of varying brightness.
Discussion
In this study, a dental bur detection and counting method based on YOLO-DB was proposed, which innovatively introduced the LADC module. This module effectively extracts the features of dental burs through asymmetric convolution, reducing the impact of redundant features. Furthermore, the introduction of SlimNeck and BiFPN-Concat has further improved the efficiency of feature fusion and the detection performance of the model, significantly reducing the model’s false positive and false negative rates. Compared to the original detection methods, even when dental burs are densely arranged, this method can not only accurately detect the targets but also precisely regress the target bounding boxes. In scenarios where dental burs are placed in an overlapping manner, with the lower burs being obscured by the upper ones, the bounding boxes detected by YOLO-DB can still penetrate the occluded areas to accurately capture the complete boundaries of the dental burs. This indicates that this method can not only effectively capture the correlations between local features but also capture long-range dependencies, demonstrating stronger robustness. Future work will involve collecting dental bur data under dark conditions and training YOLO-DB, as well as exploring model pruning and quantization to enhance the model’s robustness and efficiency.
Conclusion
This paper presents a detection and counting model for small and slender dental burs, and constructs a hardware platform for image acquisition and detection counting of dental burs. Through system performance testing and comparison, the model can efficiently and accurately achieve online detection and quantity counting of burs, meeting the expected outcomes.
In the feature extraction network, addressing the issue of large inter-class similarity among dental burs, this paper proposes an LADC module and reconstructs the BackBone network, optimizing the model’s ability to extract similar features. In the feature fusion network, this paper introduces SlimNeck and BiFPN-Concat technologies. The introduction of SlimNeck lightens the dental bur model, while BiFPN-Concat further enhances the model’s ability to distinguish between key and non-key features through an adaptive weighting mechanism, strengthening the model’s detection performance. The YOLO-DB dental bur detection and counting model proposed in this paper has been experimentally verified to accurately detect dense and intersecting dental burs, overcoming the limitations of traditional photoelectric counting methods, and showing a clear performance advantage compared to other mainstream rotating target detection algorithms. Future research is expected to promote the development of dental bur detection technology in industrial automation.
Data availability
The data that support the findings of this study are available on request from the corresponding author, HHL, e-mail(s): [email protected], upon reasonable request.
References
Wang, B. et al. Smart manufacturing and intelligent manufacturing: A comparative review. Engineering 7(6), 738–757. (2021).
Xu, M., Wang, X. & Liu, X. Detection of heavy metal ions by ratiometric photoelectric sensor. J. Agric. Food Chem. 70(37), 11468–11480 (2022).
Peres, R. et al. Industrial artificial intelligence in industry 4.0-systematic review, challenges and outlook. IEEE Access 8, 220121–220139 (2020).
Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017).
He, K. et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence 1904–1916.https://doi.org/10.1007/978-3-319-10578-9_23 (2014).
Redmon, J. et al. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 779–788 (2016).
Liu, W. et al. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 21–37 (Springer International Publishing, 2016).
Zhou, J. et al. TS4Net: Two-stage sample selective strategy for rotating object detection. Neurocomputing 501, 753–764 (2022).
Xu, Y. et al. End-to-end subtitle detection and recognition for videos in East Asian languages via CNN ensemble. Signal. Process. Image Commun. 60, 131–143 (2018).
Qiu, Z. et al. A deep learning approach for segmentation, classification, and visualization of 3-D high-frequency ultrasound images of mouse embryos. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 68(7), 2460–2471 (2021).
Wang, G. et al. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors 23(16), 7190 (2023).
Bakkouri, I. & Bakkouri, S. 2MGAS-Net: Multi-level multi-scale gated attentional squeezed network for polyp segmentation. Signal. Image Video Process., 1–10 (2024).
Bakkouri, I., Afdel, K. & DermoNet: A computer-aided diagnosis system for dermoscopic disease recognition. In Image and Signal Processing: 9th International Conference, ICISP 2020, Marrakesh, Morocco, June 4–6, 2020, Proceedings 9 170–177 (Springer International Publishing, 2020).
Li, H. et al. Slim-neck by GSConv: a lightweight-design for real-time detector architectures. J. Real-Time Image Proc. 21(3), 62 (2024).
Tan, M., Pang, R. & Le Quoc, V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020).
Chen, C. et al. R-CNN for small object detection. In Computer Vision–ACCV: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20–24, 2016, Revised Selected Papers, Part V 13 214–230 (Springer International Publishing, 2017).
Fan, X. et al. A small ship object detection method for satellite remote sensing data. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 17.
Zhang, Gongjie, S., Lu & Zhang, W. CAD-Net: a context-aware detection network for objects in remote sensing imagery. https://doi.org/10.1109/TGRS.2019.2930982 (2019).
Pham, M. et al. YOLO-Fine: one-stage detector of small objects under various backgrounds in remote sensing images. Remote Sens. 12(15), 2501 (2020).
Wang, K. et al. AFE-Net: Attention-guided feature enhancement network for infrared small target detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. (2024).
Tong et al. EAAU-Net: Enhanced asymmetric attention U-Net for infrared small target detection. Remote Sens. 13, 3200 (2021).
Liu, S., Chen, P. & Marcin Woźniak. Image enhancement-based detection with small infrared targets. Remote Sens. 14, 3232 (2022).
Han, G. et al. A lightweight algorithm for insulator target detection and defect identification. Sensors 23(3), 1216 (2023).
Wang, Q. et al. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020).
Huang, H. & Zhu, K. Automotive parts defect detection based on YOLOv7. Electronics 13(10), 1817 (2024).
Wang, C. Y., Bochkovskiy, A., Hong-Yuan Mark & Liao YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023).
Mei, S., Shi, Y., Gao, H. & Tang, L. Research on fabric defect detection algorithm based on improved YOLOv8n algorithm. Electronics 13(11), 2009 (2024).
Wan, Z. et al. Slender object detection: diagnoses and improvements. https://doi.org/10.48550/arXiv.2011.08529 (2020).
Huang, Z., Wang, F., You, H. & Hu, Y. Shadow information-based slender targets detection method in optical satellite images. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2021).
Huang, Z., Wang, F., You, H. & Hu, Y. STC-Det: A slender target detector combining shadow and target information in optical satellite images. Remote Sens. 13(20), 4183 (2021).
Ma, D. et al. Automatic detection and counting system for pavement cracks based on PCGAN and YOLO-MF. IEEE Trans. Intell. Transp. Syst. 23(11), 22166–22178 (2022).
Xiang, X., Wang, Z. & Qiao, Y. An improved YOLOv5 crack detection method combined with transformer. IEEE Sens. J. 22(14), 14328–14335 (2022).
Wang, S. et al. Automatic detection and classification of steel surface defect using deep convolutional neural networks. Metals 11(3), 388 (2021).
Pan, J. et al. Learning dual convolutional neural networks for low-level vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3070–3079 (2018).
Qing, Y. et al. Improved Yolo network for free-angle remote sensing target detection. Remote Sens. 13(11), 2171 (2021).
Shin, Y. et al. DCEF2-YOLO: Aerial detection YOLO with deformable convolution–efficient feature fusion for small target detection. Remote Sens. 16(6), 1071 (2024).
Shen, L., Lang, B. & Song, Z. DS-YOLOv8-Based object detection method for remote sensing images. IEEE Access 11, 125122–125137 (2023).
Funding
This document is the results of the research project funded by the Shaanxi Provincial Natural Science Basic Research Project (2023-JC-YB-452), and Innovation and Entrepreneurship Project( 202410720011).
Author information
Authors and Affiliations
Contributions
Hou planned the research content. Yang conducted experiments and wrote papers. Li made a dataset. Zhu processed experimental data. Zhao and Wu revised research content and the paper. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Hou, H., Yang, A., Li, X. et al. Dental bur detection system based on asymmetric double convolution and adaptive feature fusion. Sci Rep 14, 31874 (2024). https://doi.org/10.1038/s41598-024-83241-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-83241-6