Introduction

Nanoparticles, with their extremely small size and unique electrical, optical, magnetic and thermal properties1, have great potential for applications in agricultural production, biomedicine, new energy sources and aerospace2, as well as providing the material science basis for many advanced technologies. Given the intricate nature and nanoscale dimensions of nanoparticles, investigating them demands the deployment of highly accurate instruments, such as optical microscopes, atomic force microscopes and electron microscopes3. Extracting nanoparticle data from large-scale equipment and distilling valuable information is labor-intensive and inefficient. Moreover, the complex information inherent in nanoparticles is challenging to extract using manual methods, leading to a slow pace in the research and application of nanoparticles4.

In recent years, deep learning technology has been successfully applied to many scientific fields5,6,7,8,9,10,11,12. At the same time, deep learning technology has been well applied in fields such as new material prediction, material structure and scale determination13. Recently, many methods have emerged for the segmentation problem of nanoparticles. Zhijian Sun et al. proposed a general framework based on deep learning and used a lightweight deep learning network (NSNet) to achieve segmentation, shape extraction, and statistical analysis of nanoparticles, with an accuracy rate of 86.2% and is capable of processing 11 SEM/TEM images per second on an embedded processor14. Khuram Faraz t al. used deep learning and multi-target tracking to implement an procedure to automatically track, scan and observe nanoparticlese, with MOTA reaching 99.5%, MOTP reaching 94.9%, FP being 0, and FN being 9, indicating high accuracy and precision.15. Zelin Wang et al. designed a Transformer Enhanced Segmentation Network (TESN) using a hybrid CNN Transformers architecture based on Transformer13 and Mask R-CNN16 to accurately segment and measure nanoparticles, with an error range of TESN is from 0.38 to 3.52%17. Leonid Mill et al. solved the problem of nanoparticle data volume and labeling by synthesizing images, which can train state-of-the-art deep neural networks.For SiO2 nanoparticles, the F1 score of U-Netreal is 0.950, while the F1 score of U-Netsim is 0.930. For TiO2 nanoparticles, the F1 score of U-Netreal is 0.943, and the F1 score of U-Netsim is 0.92318. Current methods for segmenting nanoparticles primarily rely on data statistics, machine learning, and a limited number of deep learning techniques. Given the significant variability in nanoparticle shapes, with many tending to aggregate, the application of more traditional network models like U-Net19., SegNet20., and ResNet21. For nanoparticle segmentation lacks universality and cutting-edge advancements, which is detrimental to the accurate segmentation of these particles.

In this work, we propose a HRU2-Net model based on the U2-Net22 and U-HRNet23 models, which has a new U-shapled of multi-level semantic information fusion module and can be used to analyse titanium dioxide nanoparticle image data obtained from scanning electron microscopy.Our main contributions are summarized as follows:

  • A new U-shaped of multi-level semantic information fusion module is proposed, which is able to give a precision segmentation results.

  • The novel deep learning model is performed for nanoparticles segmentation, which provides a new solution in nanomaterial research.

  • Results from the experiments on the nanoparticles dataset show that the model integrates feature maps of different resolutions, capturing global context information while preserving high-resolution details, making the model’s segmentation of details more refined.

Methods

HRU 2 -Net architecture

The HRU2-Net model combines the idea of multi-level semantic information fusion design and the resolution improvement of feature map on the basis of U2-Net model. A multilevel information fusion module is added to the top-level encoder and decoder of the nested U-shaped structure, which improves the resolution and segmentation speed of the feature map. The overall structure is shown in Fig. 1. The En_1 and De_1 are the designed multi-level semantic information fusion modules (MSIF). The MSIF module is consistent with U2-Net model in the whole layer, and it can be applied to the En_2, En_3, De_2, De_3 parts as needed. The MSIF module can connect the semantic information between the feature maps of different resolutions, which makes the semantic information flow between different resolutions better. Therefore, the segmentation ability of the model image is enhanced, the model computation is reduced, and the resolution of the feature image is improved.

Fig. 1
figure 1

HRU2-Net model structure.

The MSIF module can bridge the semantic information between feature maps of varying resolutions, facilitating better circulation of semantic information across different scales. This enhances the model’s image segmentation capabilities, reduces the computational load, and improves the resolution of the feature maps. Additionally, the MSIF module is incorporated only in the encoder part of the high-resolution structural design, which results in a more compact model size and consequently improved operational efficiency. The structure of the model is illustrated in Fig. 2.

Fig. 2
figure 2

The En_1 and De_1 structures (a) The En_1, De_1 structure of the U2-Net† model (b) Our U-shaped of multi-level semantic information fusion module (UMSIF).

Figure 2a displays the original En_1, De_1 structure of the U2-Net† model. Figure 2b is the U-shaped MSIF module designed in this paper, which is applied to the En_1, De_1 part of the HRU2-Net† model. The U-shaped MSIF module is consistent with the U2-Net† model in terms of the overall number of layers, and its structure is designed for the fusion of upper and lower level semantic information. Moreover, this enhancement can be applied to the En_2, En_3, De_2, De_3 sections as required.

Loss function

Depending on the size and category of the dataset this paper uses a cross entropy loss function24 to represent the degree of difference between the predicted and actual data, which is expressed as.

$$Loss = - \frac{1}{N}\sum\limits_{i} {\sum\limits_{c = 0}^{M} {\mathop y\nolimits_{ic} } } \log (\mathop p\nolimits_{ic} )$$
(1)

In the formula, Pic is the predicted probability that the observed sample i belongs to category c, Yic is the sign function, if the true category of sample i is equal to c take 1 otherwise take 0. M is the number of label types, N is the total number of pixel points.

Training setup

During the training process, the nanomaterial images were resized to 512 × 512 pixels. ResNet, HRNet25 and STDC26 were used as the backbone networks of the partial semantic segmentation model. When pre-training the network model, we observed that the loss function converged around 50 iterations, hence we set the number of training iterations to 100, as depicted in Fig. 3. We used the cross-entropy loss function to represent the degree of difference between the predicted and actual data, set the initial learning rate to 0.01 and used the learning rate decay method for gradual decay. The optimization method is based on SGD26, with a batch size of 16 and random initialization, and the whole experiment takes about 400 h. Table1 illustrates the HRU 2-Net network structure configuration. The experimental equipment is based on 64-bit Windows 11 operating system and PaddleSeg27 is used to build the training and testing network. The detailed configurations are as follows: Anaconda3, PaddlePaddle2.4.0, Paddleseg2.7.0, OpenCV4.6.0, Cuda11.2 and Cudnn8.2.

Fig. 3
figure 3

Loss function curve.

Table 1 HRU2-Net† Network Structure Configuration.

Results

Datasets

The nanoparticle dataset was obtained from the open source project of Bastian Rühle et al28, whose image data were generated and annotated with the aid of a GAN network and could be used for the training of a convolutional neural network. The titanium dioxide nanoparticles were labeled as white and the background as black, as shown in Fig. 4. The image data was cropped to a total of 200 images of 512 × 512 pixels and expanded using the Data Expansion and Enhancement tool29 as required. Finally, the processed image data were randomly divided into a training data set and a test data set in the ratio of 8:2, with 2000 training data sets and 500 test data sets.

Fig. 4
figure 4

Image of titanium dioxide nanoparticles (a) original data image (b) The annotated image of (a).

The performance of the HRU 2 -Net on nanoparticles

Nanoparticles, due to their minuscule size, diverse and intricate structures, pose numerous challenges in research and preparation. Meanwhile, the physical and chemical properties of various nanoparticles are quite different from each other, and some nanoparticles even have toxic side effects. Therefore, it is crucial to select a reliable semantic segmentation model to accomplish the image segmentation task of nanoparticles. A network model for practical application should not only consider its accuracy, but also its robustness, scalability and resource dependence. To comprehensively evaluate these network models, seven evaluation metrics were used in this paper30,31,32: (1) Mean intersection over union (MIoU), (2) Accuracy, (3) Kappa coefficient (Kappa), (4) Dice coefficient (Dice), (5) Intersection over union (IoU).

(1) MIoU30 is a semantic segmentation metric that calculates the average of the ratio of intersection and concatenation of all classes. Its public representation is as follows.

$$MIoU = \frac{1}{{N_{cls} }}\sum\nolimits_{x = 1}^{{N_{cls} }} {\frac{{N_{xx} }}{{\sum\nolimits_{y = 1}^{{N_{cls} }} {N_{xy} } + \sum\nolimits_{y = 1}^{{N_{cls} }} {N_{yx} } - N_{xx} }}}$$
(2)

According to the confusion matrix, where Ncls denotes the total number of categories, Nxx denotes true positives, Nxy denotes false positives, Nyx denotes false negatives, and Nyy denotes true negatives.

(2) Accuracy30 is a metric used to evaluate classification models, i.e. the proportion of the total number of correct model predictions, with the following formula.

$$Accuracy = \frac{TP + TN}{{TP + TN + FP + FN}}$$
(3)

According to the confusion matrix, where TP is true positive, TN is true negative, FP is false positive and FN is false negative.

(3) Kappa coefficient32 is an indicator of consistency test, which refers to whether the model prediction results and the actual classification results are consistent, and it can be used to measure the classification effect. Its formula is as follows.

$$Kappa = \frac{{\mathop p\nolimits_{o} { - }\mathop p\nolimits_{e} }}{{1 - \mathop p\nolimits_{e} }}$$
(4)

where Po is the sum of the number of correctly classified samples in each category divided by the total number of samples, i.e. the overall classification accuracy. Pe is the "sum of the products of the actual and predicted numbers" for each of the categories, divided by the "square of the total number of samples".

(4) Dice31 is a set similarity measure function, which is used to calculate the similarity of two samples and is often used to evaluate the goodness of segmentation algorithms. Its public expression is as follows.

$$Dice = \frac{2|A \cap B|}{{|A| + |B|}}$$
(5)

where |AB| is the intersection between A and B, and the subtables |A| and |B| denote the number of elements of the sum. The factor of 2 in the numerator is due to double counting of the common elements between A and B in the denominator.

(5) IoU31 represents the result of dividing the overlapping part of two regions by the aggregated part of the two regions, which is one of the semantic segmentation metrics. The formula is expressed as follows.

$$IoU = J(A,B) = \frac{|A \cap B|}{{|A \cup B|}}$$
(6)

It is defined as the area of the intersection between the predicted segmented image A and the truth image B, divided by the area of the union between the two images, with a value ranging from 0 to 1.

UMSIF is a cutting-edge deep learning technology that captures image features across multiple spatial scales, endowing the model with the ability to understand both local details and global context of an image. The core advantage of this technology lies in its capability to traverse point cloud features across different geometric radii, capturing rich semantic information from a local perspective, reducing reliance on single semantic segmentation results, and thereby lowering the risk of bias. UMSIF integrates and aggregates multi-scale semantic information through graph structures, effectively handling complex spatial relationships, and extracting features from both global and local contexts. The combination of global and local features provides the model with a more comprehensive image representation, significantly enhancing its accuracy and robustness in object detection and segmentation tasks. Through the detailed data and visual presentation in Table 1 and Fig. 5, we can clearly observe the effects of UMSIF in capturing and fusing multi-level semantic information, and these results further confirm the significant effectiveness of UMSIF in enhancing model performance.

Fig. 5
figure 5

Nanoparticle segmentation results.

Table 2 lists the evaluation data of the deep learning based semantic segmentation methods on the nanoparticle dataset. The results show that the MIoU of all models is above 83%, the accuracy is around 97%, the Kappa coefficient is above 83%, and the dice coefficient is above 91%, which indicates that deep learning-based semantic segmentation models exhibit superior segmentation performance and enhanced accuracy.The MIoU of the classical segmentation models, such as U-Net and PSPNet33, is around 86%. This is due to the fact that they employ deeper convolutional neural networks, which to some extent enhance the accuracy. However, the larger model size is not conducive for deployment, and there are also efficiency-related drawbacks. U2-Net, DDRNet34, BiseNetV235, PPLiteSeg, Efficientformerv236 and RTFormer37 are lightweight network models, which are small for deployment and transformed segmentation environments, with MIoUs in the range of 83%-87%. HRU-Net uses HRNet as the backbone network, and its model is large in size and runs slowly with an MIoU of 87.21%. We design an improved HRU2-Net† model characterized by its compact size and rapid segmentation capabilities. With a MIoU of 87.37%, it outperforms other models, demonstrating superior overall performance and strong competitiveness.

Table 2 Prediction results for nanoparticle data on each model.

As shown in Fig. 5, the actual segmentation effect images for each model are displayed, where the yellow box indicates a comparison of segmentation details, and the pink box highlights the presence of tiny regions that have not been segmented. It can be observed from the figure that the HRU2-Net model is capable of accurately segmenting the tiny nanoparticles within the image during nanoparticle segmentation, and at the same time, the model exhibits no segmentation errors The HRU2-Net† model is capable of achieving fine segmentation for nanoparticles of varying morphologies. Among them, the U2-Net† model excels over the DDRNet, BiseNetV2, PP-LiteSeg38, and RTFormer models in terms of segmentation details. Regarding boundary clarity, the DDRNet and PP-LiteSeg models exhibit poor segmentation, while the BiseNetV2, RTFormer, SwinUnet, and EfficientFormerV2 models suffer from a significant lack of segmentation details.The U2-Net† model exhibits some areas that are not fully segmented, whereas the DDRNet and PP-LiteSeg models display a more significant number of unsegmented regions. Moreover, the BiseNetV2 and RTFormer models have a major parts failing to be segmented. In terms of segmentation failures, the U2-Net† model experiences minor segmentation issues in the image boundary regions, while the DDRNet, BiseNetV2 and PP-LiteSeg models demonstrate fewer segmentation failures. These findings indicate that our improved HRU2-Net†model exhibits superior adaptability and segmentation capabilities for nanoparticles, which is of greater significance for broadening the morphological analysis of nanoparticles and achieving a more efficient workflow.

Conclusions

In the nanoparticle segmentation experiments, we used a variety of state-of-the-art semantic segmentation models as well as our improved HRU2-Net† model based on nanoparticle characteristics. In order to validate the deep learning-based semantic segmentation technique on nanoparticle data images, we selected titanium dioxide nanoparticle images captured under scanning electron microscope for particle segmentation experiments. These models achieved better segmentation of the nanoparticle images, with an overall MIoU of 85%. The HRU2-Net† model designed in this paper showed the best performance with an MIoU of 87.37%, exhibiting superior segmentation effects and robust segmentation capabilities. Compared to traditional methods, the deep learning-based semantic segmentation model offers greater efficiency and time savings, and is capable of accurately and meticulously segmenting nanoparticles observed under an electron microscope.

In conclusion, the deep learning-based semantic segmentation model represents a novel tool for the rapid identification and segmentation of a variety of nanoparticles. This development paves the way for new approaches to analyze the morphology of nanoparticles and investigate their properties. Deep learning technology has achieved unprecedented breakthroughs, and the integration of deep learning-based semantic segmentation technology into the field of nanoparticle segmentation has demonstrated significant practical value. Moving forward, we plan to collect a larger dataset of high-quality nanoparticle images, design more advanced semantic segmentation models, and introduce more cutting-edge deep learning techniques.