Introduction

Self-adhesive label printed materials have been widely used in daily life, such as daily chemicals, medical, and electronics. The production process of these printed materials is susceptible to various factors that may result in printing defects, such as environmental interference, material variations, and mechanical faults. Real-time detection of defective products during the production process is crucial for guaranteeing the quality of printed materials. Given the inherent difficulties in observing printing defects, manual inspection methods have proven to be inefficient. Consequently, automated rapid optical quality inspection for printed materials holds immense promise. With the rapid development of industrial automation, machine vision and image processing technology have been widely used in surface defect detection1,2,3,4,5,6, including printing defects7.

Among traditional methods for printing defect detection, template matching is a commonly used approach. It directly assesses printing quality based on the similarity between the test image and the template image8,9, making it simple, intuitive, and easy to implement. Luo et al.10 set up multi-templates with different rotation deviations to match the target image and assess the printing quality. Despite this approach decreased the false detection rate due to image rotate variance, the process of multiple template matching was time-consuming and affected the real-time performance. Ma et al.11 distinguished the foreground and background of printed materials with simple text content and then detected defects in the foreground using template matching. Although this method had low-time–cost image matching, achieving precise separation of foreground and background for printed materials with complex graphics remained challenging. Liu et al.12 created a template by fusing edge features of multiple images and identified defects using feature similarity matching based on Euclidean distance, yet this method was only applied to detect edge defects and not suitable for objects with complex printing graphics. Liu et al.13 aligned the test image using adaptive template matching and utilized the Structural Similarity (SSIM) to measure the differences between the test image and the template image. Overall, template matching mainly focused on the global pixel correlations between images and had high requirements for lighting stability, making it difficult to detect small defects. These limitations render template matching insufficient for meeting actual industrial requirements.

Another popular approach is to perform difference operation between a test image and a reference image after image alignment. Li et al.14 employed image difference method to extract cigarette label printing defects and applied the minimum external rectangle in analyzing defect shapes. However, the simple image difference method introduced pseudo defects, which reduced the detection accuracy. Guan et al.15 generated the grayscale threshold image using the Gaussian distribution principle, which was then subjected to second difference processing with the difference image to eliminate the pseudo defects. Wang et al.16 employed twice grayscale difference to segment non-edge defects and used gradient difference to segment edge defects. Despite this method eliminated the pseudo defects by fusing the binary images of the two types of defects, it was difficult to detect defects with low gray levels and small size. Li et al.17 proposed the RGB subimage block sliding method to address the pseudo defects caused by local deformation of printed materials, then used the improved cosine similarity measure to perform twice gradient matching on the defect candidate regions. Although existing methods have achieved decent results in eliminating pseudo defects, they have shown particular sensitivity to lighting and limited effectiveness for objects with complex printing graphics. Zhang et al.18 constructed a bright image and dark image as reference templates and performed defect detection through the difference operation between the test image and the two templates, which mitigated the issue of changes in brightness to a certain extent. However, its detection accuracy remained limited when facing printed materials with complex graphics.

In recent years, numerous studies have emerged focusing on detection through deep learning methods6,19,20. Unlike traditional methods, deep learning-based ones primarily utilize convolutional neural networks (CNN) to extract image defect features21, leveraging their unique advantages in mining defect information and demonstrating more robust recognition performance for surface defects22,23,24,25,26,27. Among them, You Only Look Once (YOLO)28, as a mainstream of object detection, is often considered for defect detection tasks. Liu et al.29 achieved accurate detection of printing defects by inserting a Coordinate Attention mechanism into YOLOv5s. Li et al.30 enhanced the model's capability to detect defects in printed paper by incorporating diverse attention mechanisms into YOLOv7. However, the success of supervised deep learning methods in achieving impressive detection results heavily relies on a large number of training samples and the balance between samples31. Li et al.32 addressed the issue of imbalanced defect samples by augmenting the existing small amount of defect sample types, thereby improving the model’s classification ability for defects. For printed materials, printing defects include incomplete text, white specks, ink splatters, stains, misprints, and missed print, as shown in Fig. 1. Due to the diversity and different shape characteristics of the same defect type, the need for a large annotated defect dataset is a significant challenge, as manually collecting and annotating data can be both expensive and time-consuming. Although there are some unsupervised zero-shot anomaly detection methods33,34, they still struggle to avoid the time-consuming nature of network training, which is unacceptable in practical self-adhesive printing production.

Fig. 1
figure 1

Representative printing defect samples. Standard images(top). Defect images(bottom).

During the image acquisition process of printed materials, irregular brightness fluctuations may occur locally due to factors such as mechanical vibration and inconsistent positioning and orientation of the printed materials. In addition, as shown in Fig. 2, the complex graphic regions in images of printed materials often exhibit textures with diverse shapes and sizes, making the detection of printing defects more difficult. Consequently, mitigating the impact of brightness fluctuations and texture complexity constitutes a significant aspect. This paper aims to address the challenge of a high false detection rate in printing defect detection caused by uneven brightness, complex textures, and pseudo defects introduced by the image difference method. It solves this problem by delving into two core aspects: texture filtering with brightness correction, and two-stage defect discrimination.

Fig. 2
figure 2

Examples of image texture.

In summary, the main contributions of this paper are as follows:

  1. (1)

    To alleviate the interference caused by complex textures and uneven illumination, a joint bilateral filter that incorporates a nonlinear brightness correction factor is introduced to preprocess the images of self-adhesive printed materials.

  2. (2)

    An improved image difference method based on a dual difference template group is proposed for the rapid localization of printing defects, serving as the first stage of detection.

  3. (3)

    In the second stage of detection, a discriminative method utilizing multi-feature similarity is employed to filter out small-scale pseudo-defective areas to minimize the false detection rate.

The structure of this paper is as follows. Section “Related work” briefly reviews the relevant work, including brightness correction, image filtering, detection model based on difference templates, and image feature description algorithms. Section “Methodology” provides an in-depth analysis of the proposed method. To verify its effectiveness, Section “Experiments and analysis” provides the results of relevant experiments and visual analysis, and Section “Conclusions” concludes this article.

Related work

Image brightness correction algorithm

At present, in the field of image brightness or color calibration, the main methods focus on enhancing or balancing the brightness of low-light or unevenly illuminated images35,36,37,38,39. However, these methods were insufficient to ensure the similar brightness or color distribution between the two images. Niu et al.40,41 used the reference image to calibrate the color of the target image and align its color distribution. However, this calibration method tended to completely eliminate defect areas, making it unsuitable for surface defect detection.

Image filtering algorithm

In the field of image processing, a series of filters are adopted to smooth noise or texture. Early filters such as the mean filter42, median filter43, and Gaussian filter44 were commonly used and demonstrated satisfactory smoothing effects on scattered noise. However, their performance was limited for textures presented in area blocks. Subsequently, the bilateral filter45, guided filter46, and joint bilateral filter47 which extends from the bilateral filter were proposed.

Peak Signal-to-Noise Ratio (PSNR) is a widely used metric to quantify the error between the original image and the processed image based on pixel-level differences before and after image processing. It serves as a prevalent indicator to evaluate the effect of image texture smoothing, a high PSNR value indicates that the quality difference between the two images is small. As shown in Fig. 3, the PSNR between the image after joint bilateral filtering and the texture-free image is the largest, which means that it is closer to the texture free state. Obviously, the joint bilateral filter exhibits a superior texture-smoothing effect.

Fig. 3
figure 3

The comparative effects of image filtering.

Difference template detection algorithm

The defect detection model based on difference template18 computed the mean and standard deviation from multiple grayscale defect-free images, then constructed the bright and dark templates by combining the mean and deviation images with a predefined threshold. Pixels in the test image were deemed abnormal if their values exceeded those of the bright template or fell below those of the dark template.

The mean image and deviation image are calculated as follows:

$$ I_{r} \left( {x,y} \right) = \frac{1}{n}\sum\limits_{i = 1}^{n} {I_{i} \left( {x,y} \right)} $$
(1)
$$ I_{\sigma } \left( {x,y} \right) = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {I_{i} \left( {x,y} \right) - I_{r} \left( {x,y} \right)} \right)^{2} } } $$
(2)

where Ii (i = 1, 2, …, n) represents the defect-free samples, Ir and Iσ represent the mean image and the deviation image, respectively, and n denotes the number of samples.

The bright and dark templates are calculated as follows:

$$ \left\{ \begin{gathered} I_{b} \left( {x,y} \right) = I_{r} \left( {x,y} \right) + \max \left\{ {a_{1} ,b_{1} \times I_{\sigma } \left( {x,y} \right)} \right\} \hfill \\ I_{d} \left( {x,y} \right) = I_{r} \left( {x,y} \right) - \max \left\{ {a_{2} ,b_{2} \times I_{\sigma } \left( {x,y} \right)} \right\} \hfill \\ \end{gathered} \right. $$
(3)

where a1, b1, and a2, b2 are the absolute and relative thresholds for constructing bright and dark templates, respectively, and are four empirical values that need to be determined experimentally, usually a1 = a2 and b1 = b2.

Image feature description algorithm

Image features are data or symbolic representations used to describe structural information in an image. Common image features include texture and shape descriptors such as Histogram of Oriented Gradient (HOG)48,49,50 and Local Binary Pattern (LBP)49,50, pixel frequency distribution represented by Fast Fourier Transform (FFT)51,52, and geometric structure characterized by Hu moments48,53,54. Assessing the similarity of image features enables the evaluation of the matching between two objects55.

Methodology

Overview

The process of our proposed method is divided into three parts: image filtering with brightness correction, dual difference template group detection model as the first stage, and multi-feature fusion and comparison as the second stage, as shown in Fig. 4.

Fig. 4
figure 4

Overall framework of the proposed method.

All input images are registered and cropped before image rectification and detection to ensure consistency in position, rotation angle, and size. In the brightness correction and filtering part, a nonlinear brightness correction function is introduced to properly adjust local brightness and eliminate complex textures, ensuring that the brightness distribution of the rectified test image is closer to the standard reference image. In the first stage, a dual difference template group is used to perform an image difference operation with the test image to locate defects. The relatively small pixel blocks obtained are regarded as defect candidates. In the second stage, subimages are extracted from the templates and the test image based on the regions of these small defect candidates. The HOG and FFT feature distances are calculated between the subimages, and a comparison operation is then performed after the distances are adaptively and dynamically weighted, accurately identifying the actual small defects.

Texture filtering with brightness correction

As shown in Fig. 5, the spatial distance between neighborhood pixels and the central pixel determines corresponding spatial ___domain weights, the weights decrease as the distance increases. In the reference image used for guiding filtering, pixel ___domain weights are determined by evaluating the difference between the pixel value at the central position and those of the neighboring pixels, the weights decrease as the differences between pixel values increase. Similarly, the brightness correction weights are determined by the pixel value difference between the original image and the reference image at the same pixel position. The reference image is the mean image computed according to formula (1). To reduce the computational overhead, weight lists are generated. This allows for rapid access to the brightness correction weights and filtering weights through indexing. The weight lists are established as follows:

$$ List\left( m \right) = {\text{e}}^{{ - \frac{{m^{2} }}{{2 \cdot \sigma^{2} }}}} $$
(4)

where List is the weight table, m(m = 0, 1, …, N)denotes the pixel value difference or the spatial distance length of the pixels in the x and y directions, and N is the maximum length of the List. The maximum length of the weight list in the pixel ___domain and correction ___domain is the maximum pixel value of the image, while the maximum length of the weight list in the spatial ___domain is the filter kernel size. σ serves as the weight adjustment parameter.

Fig. 5
figure 5

The principle of joint bilateral filtering with brightness correction. The Listb, Listr, and Lists represent the weight lists of the brightness correction, pixel ___domain, and spatial ___domain, where the number codes denote the index in lists.

The formulas as follow depict the method for calculating filtering with a brightness correction factor:

$$ s_{k} = s_{k,x} \times s_{k,y} $$
(5)
$$ C_{k} = 1 - b_{k} \times \ln \frac{{I_{k} + \varepsilon }}{{J_{k} + \varepsilon }} $$
(6)
$$ I^{\prime}_{p} = \frac{1}{{\sum {(r_{k} \times s_{k} )} }}\sum\limits_{{k \in \omega_{p} }} {(r_{k} \times s_{k} \times I_{k} \times C_{k} )} $$
(7)

where p and k represent the central pixel position and the neighborhood pixel position, respectively. I and J represent the pixel of the original image and the reference image, respectively. bk is the brightness correction weight, ε is a minimum value, which is used to prevent counting errors, and Ck is the brightness correction factor. sk,x, and sk,y represent the components of spatial ___domain weights in the x and y directions, respectively. rk denoted the pixel ___domain weight, and Ip' is the pixel after rectification.

By incorporating the pixel values ratio between the original image and reference images, the original image pixels perform a nonlinear brightness transform when filtering. As previously mentioned, σ serves as the weight adjustment parameter, thus the adjustment parameters for brightness correction can be denoted as σb, while those for filtering weight include σr and σs. By adjusting these parameters, the extent of brightness correction and smoothing of the image could be changed.

First stage detection with dual difference template group

Although the difference template model18 has achieved satisfactory performance for printed materials with simple graphical content, using the same deviation image for both bright and dark templates will lead to over description of brightness variation.

As shown in Fig. 6a, the bright and dark templates are constructed by statistically analyzing the maximum and minimum pixel values between the defect-free image samples in the R, G, and B channels, respectively. The computation method can be represented as follows:

$$ \left\{ \begin{gathered} I_{b} \left( {x,y} \right) = {\text{Maximum}}\left( {I_{i} \left( {x,y} \right)} \right) \hfill \\ I_{d} \left( {x,y} \right) = {\text{Minimum}}\left( {I_{i} \left( {x,y} \right)} \right) \hfill \\ \end{gathered} \right. $$
(8)

where Ib and Id denote the bright template and dark template, respectively.

Fig. 6
figure 6

Detailed illustration of the dual difference template group detection model. (a) The construction process of the template group. (b) The defects detection process, including abnormal pixels extraction, defect segmentation and localization.

The template represents the extreme value of brightness fluctuation in the defect-free samples, thus describing the change of illumination in the shooting environment to a certain extent. Refer to formula (2), the bright deviation template is calculated by the bright template and the mean image. Similarly, the dark deviation template is obtained by the dark template and the mean image. The template and deviation template together form a difference template group.

As shown in Fig. 6b, based on the characteristics of a single-channel image, any defect could be classified as a bright or dark defect. After the subtraction operation is performed between the test image and the bright difference template group, pixels exceeding the bright threshold are classified as bright abnormal pixels. Similarly, pixels exceeding the dark threshold after subtracting the tested image from the dark difference template group are categorized as dark abnormal pixels. The abnormal pixel is distinguished by:

$$ \left\{ \begin{gathered} I\left( {x,y} \right) - I_{b} \left( {x,y} \right) - V_{b} \left( {x,y} \right) > T_{b} \hfill \\ I_{d} \left( {x,y} \right) - V_{d} \left( {x,y} \right) - I\left( {x,y} \right) > T_{d} \hfill \\ \end{gathered} \right. $$
(9)

where I(x,y) represents the pixel value of the image to be detected. Vb and Vd are the bright and dark deviation templates. Tb and Td denote the discrimination thresholds of bright and dark defects, respectively, usually Tb = Td.

If the area of a connected abnormal pixel region, obtained from the binarized defect segmentation map that combines abnormal pixel mark maps from the R, G, and B channels, exceeds 100 pixels, the region is deemed a defect, while the small pixel blocks with 20–100 area obtained are regarded as defect candidates.

Second stage detection with local feature fusion

In the first stage of detection, there may be small false abnormal areas caused by the pixel value near the critical threshold, leading to false detection. So, we perform feature comparison on the defect candidate regions for further judgment.

Due to significant differences in contour and shape between defects and non-defective regions, HOG features excel in describing target contours and shapes, enabling effective discrimination between genuine defects and false ones. However, HOG features may misclassify complex image regions lacking distinct contours and gradient direction information as defect areas. On the other hand, since the similarity in pixel distribution structure among similar graphical regions, leveraging the advantages of FFT features in describing the overall image structure and pixel frequency information can partially compensate for the shortcomings of HOG features. Therefore, a second stage detection is performed on the candidate regions using HOG and FFT features.

The flowchart of the feature extraction and distance fusion is shown in Fig. 7. Subimages are extracted from the templates and test image according to the candidate regions, then converted to grayscale. These grayscale subimages are divided into several cells, which are further grouped into blocks that could overlap each other. The gradient directions and amplitudes of all pixels within each cell are counted to generate histograms. In these histograms, the horizontal axis represents the gradient direction, while the vertical axis represents the cumulative amplitude for each gradient direction interval. Generally, the gradient direction ranges from 0° to 180°, and is divided into nine equal bins. The eigenvectors of each cell unit in the block are connected in a series to obtain the block descriptor, then the final HOG feature of the subimage can be obtained by cascading all feature vectors in the blocks. The FFT feature is computed for the entire subimage, with the feature vector obtained by flattening the transform result, including both magnitude and phase information.

Fig. 7
figure 7

The detail of the features extraction and distance fusion.

Since the acquired candidate regions vary in size, the dimensions of the divided cells and blocks for HOG feature extraction are not predetermined, but dynamically adjusted. The specific division process is shown in Fig. 8. The default initial size of cells and blocks is assumed to be 2 × 2. To reduce the dispersion of the feature data, which affects the descriptive effectiveness, the cell number is limited to 30. When the data count does not meet the requirements, the cell size is incrementally enlarged along the width and height axis of the subimage. In addition, if the dimensions of cells fail to form a 2 × 2 block, the block size adjusts to 1 × 1.

Fig. 8
figure 8

The size setting process of cell and block.

In feature-based defect detection methods, image similarity is usually evaluated by calculating the feature distance between the test image and reference image to distinguish defects. However, the importance of information described by different features varies for different regions. Appropriate weights should be assigned to different types of features so that defects can be reliably detected based on the weighted distance.

The chi-square distance is used to calculate the HOG feature distance between the subimages of the defect candidate region in the test image, bright template, and dark template. Similarly, Euclidean distance is used to calculate FFT feature distance.

$$ D_{hog} \left( {H_{1} ,H_{2} } \right) = 2\sum\limits_{I} {\frac{{\left( {H_{1} \left( I \right) - H_{2} \left( I \right)} \right)^{2} }}{{H_{1} \left( I \right) + H_{2} \left( I \right)}}} $$
(10)
$$ D_{fft} \left( {F_{1} ,F_{2} } \right) = \sqrt {\sum\limits_{i} {\left( {F_{1} \left( i \right) - F_{2} \left( i \right)} \right)^{2} } } $$
(11)

where H and F represent the HOG feature vectors and FFT feature vectors, respectively. Dhog and Dfft denote the feature distance between two subimages. I and i refer to the indices of the columns in the respective feature row vectors.

Feature distances are computed for three pairs of subimages: bright template and dark template, bright template and test image, and dark template and test image, resulting in a total of six distance data for the two feature categories.

$$ X = \left( {\begin{array}{*{20}c} {x_{11} } & \ldots & {x_{1m} } \\ \vdots & \ddots & \vdots \\ {x_{n1} } & \cdots & {x_{nm} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} \begin{gathered} D_{hog,bd} \hfill \\ D_{fft,bd} \hfill \\ \end{gathered} & \begin{gathered} D_{hog,bt} \hfill \\ D_{fft,bt} \hfill \\ \end{gathered} & \begin{gathered} D_{hog,dt} \hfill \\ D_{fft,dt} \hfill \\ \end{gathered} \\ \end{array} } \right) $$
(12)

Drawing from the preceding distance data, the coefficient weights for the HOG and FFT feature distances are deduced through the employment of the coefficient of variation method, with the calculation formulas presented as follows:

$$ V_{j} = \frac{{S_{j} }}{{A_{j} }} $$
(13)
$$ w_{j} = \frac{{V_{j} }}{{\sum\nolimits_{j = 1}^{n} {V_{j} } }} $$
(14)

where Aj and Sj are the mean and the standard deviation of the distance data of each feature category, respectively. Vj denotes the coefficient of variation of the data, and wj is the weight corresponding to the feature distance of each category.

Since the coefficient of variation method assigns greater weights to categories with larger internal data discrepancies, and our method prioritizes feature distance with smaller difference, the deduced weights are inversely proportional processed before appending to the corresponding distance.

$$ \left( {\begin{array}{*{20}c} {D_{fusion,bd} } & {D_{fusion,bt} } & {D_{fusion,dt} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {e^{{ - w_{1} }} } \\ {e^{{ - w_{2} }} } \\ \end{array} } \right)^{T} \times \left( {\begin{array}{*{20}c} \begin{gathered} D_{hog,bd} \hfill \\ D_{fft,bd} \hfill \\ \end{gathered} & \begin{gathered} D_{hog,bt} \hfill \\ D_{fft,bt} \hfill \\ \end{gathered} & \begin{gathered} D_{hog,dt} \hfill \\ D_{fft,dt} \hfill \\ \end{gathered} \\ \end{array} } \right) $$
(15)

where Dfusion,bd, Dfusion,bt, and Dfusion,dt represent the weighted fused feature distance between the bright template and dark template, the bright template and test image, and the dark template and test image after nonlinear weight enhancement.

If both Dfusion,bt and Dfusion,dt are greater than Dfusion,bd, the current subimage (It) is deemed abnormal, indicating a defect in the corresponding region of the original image. The discrimination is as follows:

$$ I_{t} = \left\{ \begin{gathered} {\text{Abnormal, }}D_{fusion,bt} > D_{fusion,bd} {\text{ and }}D_{fusion,dt} > D_{fusion,bd} , \hfill \\ {\text{Normal, otherwise}}. \hfill \\ \end{gathered} \right. $$
(16)

Experiments and analysis

Experimental preparation

Dataset construction: As shown in Figure 9, we use an array camera to sample images from self-adhesive printed materials of five different patterns. The dataset consists of 1772 RGB images with a resolution of 2448 × 2048 pixels, and sample counts from patterns 1 to 5 are 371, 474, 446, 249, and 232, with 45, 56, 54, 21, and 19 defect samples, respectively. For each pattern, 20 defect-free images are collected for constructing difference templates, and the remaining images are for testing.

Fig. 9
figure 9

Exemplary samples and image sampling platform.

Evaluation metrics: we use three standard quantitative indicators: precision (P), false detection rate (FDR), and missed rate (MR) to evaluate the defect detection results.

Brightness correction and filtering parameters setting

As shown in Fig. 10, we conduct experiments and analysis separately on the settings of brightness correction and filtering parameters. Images with a size of 600*200 pixels divided into three intervals are used to verify the experimental effects. Each interval is assigned a fixed pixel value distribution range to simulate the brightness and noise distribution of the images.

Fig. 10
figure 10

The brightness correction and filtering effects under different parameters. (a) The pixel value distribution ranges of the three intervals for the original image and reference image are ([58,62][70,75][200,205]) and ([70,75][70,75][170,175]), respectively. (b) The pixel value distribution ranges of the three intervals for the images are ([55,65][75,85][120,170]). The horizontal axis in the graphs represents the pixel coordinates of the image, while the vertical axis represents the pixel value. The red profile line indicates the pixel value variation of the 100th row of pixels in the image.

The brightness correction effects of the image can be observed through the profile lines in Fig. 10a. Increasing σs does not affect the brightness correction effect while maintaining the σb. Conversely, when σs is unchanged, a larger σb results in a more pronounced brightness correction effect. Moreover, the image to be corrected achieving the same brightness range as the reference image is only possible when σb exceeds the brightness difference between the two images.

Fig. 10b illustrates the image smoothing effects under different filtering parameters, the smoothness of the image profile lines, the slope of these lines at the boundaries of the three intervals, and the clarity of the image boundaries provide intuitive indications of the filtering effect. Observing the profile lines of the third interval reveals that only σr exceeds the range of noise variation for achieving a better smoothing effect. In the image comparison in the third row, it is evident that larger σr values result in greater blurring of the edges. Additionally, σs contributes minimally to the smoothing effect, but larger values of this parameter actually lead to greater blurring of the image edges.

According to the analysis above and the brightness variation range as well as the texture pixel fluctuation range of the experimental samples, to ensure the preservation of edge details in images of printed materials while avoiding the elimination of defect regions during brightness correction, the optimal parameters are ultimately selected: σb = 20, σs = 3, and σr = 20.

Brightness correction and filtering experiment

Figure 11 illustrates an example of the effects of texture smoothing and brightness correction respectively. The histograms in Fig. 11a show the brightness correction results that the brightness distribution of the rectified image is close to the reference image. Meanwhile, Fig. 11b–e enumerate pixel values of image profile lines in the second column, with Fig. 11e demonstrating that our filtering with brightness correction method got the superior smooth effect. As shown in the third column figures, the texture in the smoothed image had been effectively eliminated.

Fig. 11
figure 11

The effects of brightness correction and filtering. (a) Gray histograms of the V-channel images in the HSV images corresponding to the original image, reference image, filtered image, and Rectified image. (b)–(e) The first column figures represent Locally enlarged subimages corresponding to the above images, respectively. The second column figures are the 250th row profile lines corresponding to the first column subimages. The third column figures are the local texture regions corresponding to the first column subimages.

First stage defect detection experiment

We employed both the detection model outlined in reference18 and our improved version to conduct defects localization experiments on the samples of five printing patterns. The results of the experiments are delineated in Tables 1 and 2, respectively, it becomes apparent that the reference method encounters challenges in effectively mitigating both the false detection rate (FDR) and missed rate (MR) simultaneously. Furthermore, the improved detection model necessitates the configuration of relatively fewer detection parameters.

Table 1 Experimental results of the difference template detection model18.
Table 2 Experimental results of Dual difference template group detection model.

Second stage defect detection experiment

Feature distance fusion experiment: Firstly, we independently validate the effect of HOG features and FFT features in representing images, with detailed comparisons illustrated in the first and second row graph of Fig. 12. We respectively selected 12 typical normal and abnormal region subimages to illustrate the representational effect of the HOG and FFT features on these subimages. Referring to discrimination formula (16), a subimage is deemed abnormal if both the “Image-bright template” and the “Image-dark template” distances are greater than the “Bright–dark template distance” (both “orange triangle” and “green rectangle” are above “blue circle”), otherwise is deemed normal. It becomes evident that FFT features perform well in discriminating normal subimages by the distribution plots in Fig. 12a. However, the HOG feature distance data incorrectly represents the normal subimage as an abnormal state. Conversely, the HOG features in Fig. 12b exhibit strong representational capabilities for anomalous image subimages, yet the FFT feature distance data struggle to accurately identify all anomalous subimages.

Fig. 12
figure 12

Feature distance data distribution diagram of subimages. (a) Normal region subimages. (b) Abnormal region subimages.

Based on the above analysis, adaptive weight allocation is applied to the HOG feature distance and FFT feature distance according to formulas (10)–(12), subsequently calculating a comprehensive distance with superior representation capability. It is evident from the graph that the fusion feature distance adeptly distinguishes between normal and abnormal subimages.

Defect detection experiment: we employed a comparative approach of the above feature fusion distances to perform the second stage detection experiments on the five pattern samples, comparing the outcomes with those of the first stage detection, as delineated in Table 3. The comparative results unveil that the pseudo-defective areas were effectively filtered out by employing image feature discrimination, resulting in a significant reduction in false detection rates. Fig. 13 demonstrates the final detection results of different defect types.

Table 3 Comparison of first and second stage detection results.
Fig. 13
figure 13

Results of printing defect detection. Binary mark maps of defect segmentation(top). The defect localization images(bottom).

Ablation study of brightness correction: To demonstrate the necessity of image brightness correction for defect detection, we compared the detection results under optimal detection thresholds without and with brightness correction on printed samples, as shown in Table 4. Fig. 14 illustrates that the brightness unevenness of pattern 4 and 5 samples is relatively minor, leading to inconspicuous differences in FDR. However, even adjusting the threshold to achieve a lower FDR when without correction, a higher MR may still occur.

Table 4 Comparison of detection results without and with brightness correction.
Fig. 14
figure 14

Analysis results of brightness unevenness in the samples of each pattern. The variance represents the degree of brightness variation.

Comparison experiment and time complexity

Table 5 illustrates the performance comparison with existing recent baselines. The proposed method yields the highest precision and the lowest FDR and MR, indicating significant detection superiority, especially for Liu et al.13, the average indicators improved by 34.3%, 34.5%, and 29.2%, respectively. Their method performed adaptive threshold based binarization before performing difference detection on the images, which was significantly affected by lighting, making it difficult to achieve satisfactory results. Li et al.17 showed good detection precision for patterns 1 and 3 samples and a generally lower MR than Liu et al.13. However, their method was less compatible with other printing patterns, and the process of sliding subimage blocks was extremely time-consuming.

Table 5 Comparison with existing baselines.

Defect detection experiment on additional datasets

To validate that our method is still effective in practical industrial datasets, we did further experiments using a text dataset56 and the leather dataset from the MVTec57, with detailed detection results shown in Table 6. It is evident that the proposed method results in a higher MR. The primary reason could be that the constructed detection templates have an overly broad tolerance range, which is a consequence of the considerable overall brightness variation because of the original samples in the text dataset having undergone artificial data augmentation. Additionally, the rough and irregular surface textures of the leather samples necessitate a larger smoothing parameter to eliminate texture interference, which significantly diminishes the prominence of many defect features. Nevertheless, our method still exhibits substantial detection effectiveness, with the detection effects illustrated in Fig. 15.

Table 6 Experimental results on additional datasets.
Fig. 15
figure 15

Qualitative defect detection effects on additional datasets. Binary mark maps of defect segmentation(top). The defect localization images(bottom).

Conclusions

Few existing methods for printing defect detection pay attention to self-adhesive label printed materials with complex textures under uneven lighting conditions, and the common method of image difference often generates pseudo-defective regions. To solve these problems, we propose a two-stage printing defect detection method with brightness correction and texture filtering. The following conclusions in this work could be drawn:

  1. (1)

    A brightness correction factor is incorporated into the joint bilateral filter, which facilitates local brightness adjustments to a certain extent and texture smoothing in images of printed materials, initially reducing the interference of uneven brightness and complex textures in defect detection.

  2. (2)

    Compared with the traditional printing defect detection method, the proposed difference method based on bright–dark difference template groups accurately describes the actual brightness variations of the image through templates, enabling excellent extraction and localization of defects despite the interference of slight brightness fluctuations.

  3. (3)

    To overcome the challenge of pseudo defects generated by the image difference method, we conduct a second-stage detection on small defect candidate regions by leveraging the advantages of HOG in describing image contours and FFT features in describing image pixel frequency distributions, which further reduces the false detection rate.

The experimental results indicate that the proposed method achieves excellent performance, with false detection rate and missed rate both below 0.5%.