Introduction

Archaeological ceramic artifacts, characterized by their extensive historical lineage and profound cultural implications, represent the crystallization of ancient laborers’ intellectual prowess. These relics are imbued with significant collectible value in the realm of cultural heritage preservation and scholarly study. The study of archaeological ceramics holds profound significance within the realm of exploring the evolution and historical trajectory of technology, providing crucial insights into trade networks, and cultural exchanges.

Over the past few decades, extensive research has been conducted on archaeological ceramic identification, broadly classified into four categories: (1) authenticity identification, (2) provenance classification, (3) production time identification, and (4) market valuation. Due to the extensive need for prior knowledge in manually identifying the aforementioned classification issues and the fact that even extensively trained experts may not exhibit consistent judgments on the same ancient ceramics, quantitative research techniques have experienced rapid development. In the identification of ancient ceramics, ceramic characterization methods include type, size, glaze characteristics and color, etc. Commonly used methods include the study of the microscopic structure of archaeological ceramics1,2,3, firing temperature, raw material treatment4,5,6, differences in elemental composition7,8, and color characteristics. These can be obtained by 3D scanning, microscopic observation, X-ray diffraction (XRD), scanning electron microscopy (SEM), spectral analysis and chromatic aberration measurement. These techniques can not only restore the size, shape and proportion of the ceramic, reveal the composition, thickness, firing temperature of the glaze and its interaction with the matrix, but also quantify the color characteristics of the ceramic, and thus infer its production process, functional use, origin, historical period and cultural background. Most of the aforementioned techniques require a certain degree of material destruction, expensive equipment, and sample preparation, making them costly and time-consuming.

The compositional discrimination method identifies the kiln origins of archaeological ceramics by analyzing the differences in materials used for the body and glaze of ceramics from different kilns9,10,11. Due to the scarcity and precious nature of archaeological ceramic samples, nondestructive sampling methods are required when collecting chemical composition data. Qi et al. utilized non-destructive testing, specifically energy dispersive X-ray fluorescence (ED-XRF), to obtain the chemical composition of archaeological ceramic samples, and applied fully convolutional networks to classify black glazed wares form adjacent kilns, with accuracy was 92.76%. Ruschioni G et al.12 applied ten different supervised machine learning algorithms to classify ceramic samples based on chemical element concentrations, with accuracies ranging from 0.85 to 0.95. Due to notable variations in the body and glaze compositions among different kilns, obtaining representative samples for standardization purposes presents substantial challenges.

Diffuse reflectance spectroscopy technology eliminates the need for sampling directly from archaeological ceramics. thereby mitigating potential damage to the artifacts. The equipment used is relatively simple. In the visible light wavelength range, diffuse reflectance spectroscopy data can reflect the color characteristics of archaeological ceramics. In the ultraviolet and near-infrared wavelength range, the diffuse reflectance spectroscopy data, resulting from the interaction of light with the internal molecules of archaeological ceramics, can reflect the rich structural and material property information carried by the ceramics13,14,15. Due to the nonlinear and non-stationary nature of the diffuse reflectance spectroscopy signals from archaeological ceramics, the difficulty lies in acquiring the spectral characteristics in different bands and assigning corresponding spectral feature vector weights in this method.

Ultrasonic characterization of the ceramic material also used to the provenance classification of archaeological ceramics. Salaza et al. employed semi-supervised active learning and ultrasonic features, including time, frequency, and statistical variables defined by a ceramic material reflectivity model, to facilitate the non-destructive classification of ceramic artifacts16. While this method has demonstrated a high success rate, it is not without the drawback of necessitating manual design for describing ultrasonic features.

Jin et al. analyzed the color characteristics of celadon from different kiln sites and dynasties, establishing a correspondence between sensory descriptions of celadon color and quantitative descriptions in color space17. This approach enabled the creation of visualized color space maps for various celadon types, achieving standardization of celadon color.

Cardarelli L proposed a deep variational convolutional autoencoder for unsupervised feature extraction of ceramic contours to quantitatively analyze ceramic morphology18. Sun et al. focused on ceramic vessels from the Tang to Song and Yuan dynasties, extracting and analyzing their shape parameters19. They discovered that the morphological characteristics of spouts and handles, as well as the position of the center of gravity, serve as key parameters for determining the dating of ceramics. These varying shapes and features reflect the cultural, production processes, and lifestyles of different dynasties. Navarro P et al. used a deep learning architecture to automatically extract ceramic contour features without prior knowledge or engineered features20. They applied transfer learning by retraining a residual neural network on a binary image database of Iberian wheel-thrown pottery contours, achieving high accuracy in classifying ceramic vessel types, with an average accuracy of 0.96.

Microscopic image analysis of ceramic fragments can serve as an effective basis for sourcing and dating ceramics. The textures, pores, and bubble structures presented in microscopic images can be utilized to analyze the distinctive features of ceramics from different kiln origins. Li et al. conducted a microscopic structural analysis of celadon. In comparison with typical celadon, they observed that the rice-colored glaze was simpler and more uniform, with fewer bubbles present in the glaze. The study summarized the differences in microscopic images between rice-colored celadon and conventional Yue celadon from different periods21. Zhao et al. compared and analyzed the microscopic features of celadon from different periods at the Yaozhou kiln using both stereomicroscopy and polarizing microscopy22. Wang et al. conducted comparative analysis of microscopic features between the body and glaze regions of ancient Chinese ceramics, summarizing the similarities and differences in microscopic features23. In terms of ceramic microscopic image segmentation, Li et al. proposed a novel automatic mosaic algorithm for ceramic microscopic images, resolving issues of poor matching quality and efficiency24. Carreira D et al. proposed a convolutional neural network for feature fusion of ceramic multimodal images to classify 8 different ceramic classes with high accuracy25. Bachar A et al. examined the impact of different firing processes on the microstructure and color of enamel glazes26. They found that enamel surfaces fired in traditional kilns exhibited greater heterogeneity and cracking, whereas those fired with infrared heating showed reduced heterogeneity and smoother surfaces. In the realm of microscopic image-based classification and prediction, a multitude of challenges are encountered, primarily stemming from the intricate and varied nature of microscopic imagery as well as the inherent complexities of the classification and prediction tasks27. For instance, there are issues related to image quality, including uneven grayscale distribution, illumination imbalance, and background noise; diversity in target objects, encompassing varying shapes and a wide range of categories; and data volume and annotation problems, such as limited sample size and inconsistent annotation quality28.

All the aforementioned methods involve machine learning algorithms, such as decision tree, support vector machines and random forests, to analyze the collected features and make relevant predictions. With the rapid development of deep learning, the integration of deep learning with image recognition technology has also begun to be applied in the research of ancient ceramic classification. Mu et al. studied AI-assisted identification of ancient ceramics, three primary visual features of ancient ceramics were converted into machine vision features: object contour for shape, image texture for ornamentation, and handwritten Chinese characters for inscriptions29. Wan et al. introduced a deep learning approach that integrates attention mechanism and texture features for the classification of ceramic artifacts30. Among the methods that integrate computer vision technology into ceramic identification, macroscopic features of ceramics, such as shape and patterns, are prone to damage and imitation. In contrast, microscopic features, such as the pores formed on the surface of ceramics during firing, have more stable forms that are difficult to imitate.

This paper introduces a novel provenance classification of archaeological ceramics which relies on microscopic features and an ensemble deep learning model. While numerous studies have explored diverse microscopic features of ceramics, there is currently a dearth of research dedicated to classifying ancient ceramics based on these microscopic features. It is worth noting that microscopic features in ceramics are intricately linked not only to chemical composition but also to the production processes. Furthermore, the informational content within microscopic features exceeds that of chemical composition. Selecting an appropriate deep learning model for microscopic image classification and prediction tasks poses a significant challenge, as various models exhibit differences in performance metrics, computational resource requirements, and temporal expenses. The hyperparameters of deep learning models (such as learning rate, batch size, etc.) have a significant impact on model performance. However, hyperparameter optimization is a complex process requiring continuous experimentation and adjustment31. In our model, the ensemble deep learning involves three distinct deep learning architectures—VGG-16, Inception-v3, and GoogLeNet, each automatically extracting different features of ceramic microstructures. The final classification result is obtained through an ensemble algorithm. In comparison to utilizing ultrasonic and multi-wavelength spectral diffuse reflectance spectroscopy methods, this model eliminates the need for intricate manual feature extraction.

Materials and methods

Ceramics dataset

Ancient Chinese ceramic firing techniques reached their peak during the Ming and Qing dynasties. Longquan kiln, Yue kiln, and Gong kiln are all famous ancient kilns in China. Celadon refers to a kind of light green or turquoise porcelain, which usually has a soft jade texture. Celadon is favored by people for its unique color and texture. All celadon samples were obtained from Ancient Ceramic Research Center of Jingdezhen Ceramic University, China. These samples have undergone rigorous identification procedures, with surface dust removed, and then celadon with minimal scratches was selected as the sample. In total, 90 pieces of celadon samples are used in this study, including 19 pieces from the Longquan kiln, 12 pieces from Yue kiln, 19 pieces from Gong kiln, 21 pieces from GanXianXiaPaiQian (GXXPQ) kiln, and 19 pieces from QuanNanXianShanDong (QNXSD) kiln. These samples have been verified to have been produced during the Ming Dynasty (AD 1368–1644). Figure 1shows the celadon’s ___location. Longquan Kiln is located within the territory of Longquan City, Lishui Prefecture, Zhejiang Province, China32. The glaze of Longquan kiln porcelain is mostly light green or bluish-green, which is one of its most prominent characteristics. This glaze color gives off a fresh and elegant feeling, making it deeply loved by people. The Yue Kiln is located in Shaoxing, Shangyu, Yuyao, Cixi, Ningbo, and other regions of Zhejiang Province, China33. This area belonged to Yue Prefecture in ancient times, hence the name Yue Kiln. The glaze color of Yue kiln celadon is best represented by turquoise and lake green hues, with a slightly yellowish-tinged turquoise that perfectly complements the green color of tea liquor. The Gong Kiln is located within the territory of Gongyi City, Henan Province, China34. The glaze color of Gong kiln porcelain is primarily green, transparent with a strong vitreous quality. Both the interior and exterior of the vessels are glazed, though the exterior glazing often does not extend to the base. The GXXPQ Kiln is located in Gan District, Ganzhou City, Jiangxi Province, China. The glaze of the celadon from the GXXPQ kiln are diverse, including blue glaze, bluish-white glaze, and others, with bright and varied hues. The glaze surface is smooth and delicate, with a thick glaze layer. Some glaze surfaces even exhibit ice-crack patterns, adding artistic beauty to the porcelain. The QNXSD Kiln is located in Quannan County, Ganzhou City, Jiangxi Province, China. The glaze of the celadon from the QNXSD Kiln is predominantly blue glaze, with a lush and vibrant hue that sometimes tinges with yellow or greenish-yellow, presenting an overall refreshing and elegant tone. The glaze surface is smooth and delicate, with a uniform glaze layer that is transparent or semi-transparent, giving off a warm and jade-like texture.

Fig. 1
figure 1

The ___location of the kilns of celadon and the photographs of representative vessel from the different kilns.

By studying the microstructure of porcelain in southern and northern China, Wang et al. found that the anorthite crystals between the ancient ceramic body and glaze have obvious regional characteristics, which can be divided into Yue kiln type, Longquan kiln type, Ding kiln type and Changsha kiln type23. Li et al. made a comparative study on the microstructure of celadon excavated from Housi’ao Kiln site in late China (Late Tang and Five Dynasties), and found that there were differences between common Yue celadon and beige porcelain in the form, size and quantity of porcelain body and glaze21. In this study, we only used the microstructure of the glaze. Of course, if the microstructure of the glaze and the body is combined, it is more conducive to the classification and prediction success rate, which is the direction of our later work The samples were captured and stored using a super depth-of-field three-dimensional microscope, and each celadon sample underwent 10 randomly selected captures in the same experimental environment. The 10 randomly selected imaging areas are distributed throughout the entire glaze surface of the sample. The VHX-5000, developed and manufactured by Olympus Corporation in Japan, is a microscope that can achieve magnification of up to 2000 times, allowing for the observation of minute surface structures and particles at the nanometer level. The experimental characterizations include coaxial lighting and LED illumination set at a magnification of 500 times. The VHX-5000 has ultra depth imaging capability, which ensures that the sample surface remains clear throughout the entire range by automatically adjusting the depth of focus and the angle of the light source, eliminating the need to manually adjust the focal length. We obtained a total of 900 usable microscopic images. All images have the same dimensions of 1600 × 1200 pixels and exhibit high quality, preserving ample detail even when divided into smaller segments.

Image preprocessing

Image preprocessing is a crucial component in the process of image recognition. Through preprocessing, the impact of unrelated factors on the model can be reduced, thereby enhancing the model’s accuracy. Due to the smooth surface of ceramics, the collected image data may suffer from issues such as reflections and low contrast. These problems can affect the effectiveness of model training, thus requiring image denoising, image enhancement, and image segmentation to augment the dataset. All images were first denoised using the mean filtering method35, followed by contrast enhancement through Gamma correction36and Contrast Limited Adaptive Histogram Equalization (CLAHE)37 used to enhance the contrast of ancient ceramic microscopic images and improve the clarity of these images. The principle of the mean filtering algorithm is to replace the value of a pixel with the median value of its neighborhood points. The contrast images before and after denoising are shown in Fig. 2. The image becomes clearer after noise removal, eliminating the grainy appearance.

Fig. 2
figure 2

The contrast images before and after denoising. (a) Original image with pronounced graininess, (b) Microscopic image processed with the mean filtering algorithm.

Gamma correction primarily involves manipulating the gamma curve of an image to detect the light and dark portions in the image signal. Subsequently, increasing the contrast between the two enhances the overall contrast of the image, improving its visual appearance. Figure 3 shows microscopic images of ceramic fragments under different gamma correction coefficients. From the images, it can be observed that when y = 0.5, the brightness of the image is significantly increased, but the distinctiveness is reduced, and the contrast between microscopic components decreases. When y = 1.2, the brightness of the image decreases, but the contrast between microscopic components of ceramic fragments improves, making it easier to distinguish between bubbles and crystalline structures within the glaze. When y = 2, the brightness of the image decreases significantly, causing a darker image and reduced distinctiveness of microscopic components.

Fig. 3
figure 3

Microscopic images of ceramic fragments under different gamma correction coefficients.

CLAHE can effectively address issues related to the contrast enhancement of regions that are overly bright or dark, resulting in a more natural contrast in the processed images. As shown in Fig. 4, five microscopic images from different ceramic categories were selected for gamma correction followed by CLAHE image enhancement. From the images, it can be observed that both gamma correction and CLAHE improve the contrast of the images, making microscopic components more distinct. Concerning the local contrast issues in the images, CLAHE-processed images more clearly reveal bubbles and crystal structures in the microscopic images, which is advantageous for subsequent image recognition and classification tasks.

Fig. 4
figure 4

Processed ceramic microscopic images.

The images were partitioned into four equal quarters, each measuring 800 × 600 pixels, as a strategy to augment the dataset for training and testing purposes, while ensuring the integrity of image quality. This method has been applied in research for the classification of microscopic gram stain images and has been proven to be effective38.

Ultimately, a total of 3600 microscopic ceramic images are obtained, as shown in Table 1. To fully utilize the samples, extract the features of ancient ceramic images effectively, and improve the generalization ability of our model, the samples are randomly divided into training set and test set in an approximate ratio of 9:1. The final dataset consists of 3240 images in the training set and 360 images in the test set. The training dataset was employed for training ensemble deep learning model and optimizing its performance, while the test dataset was utilized for the ultimate evaluation of the models’ performance. This methodology ensured resilience against overfitting and facilitated the transferability of results to a similar cohort employing images of the same type. Our custom data are available in the Open Science Framework repository at https://github.com/wangq1007/cmic.

Table 1 The provenance of archaeological Chinese ceramics and these numbers of microscopic images.

Problem formulation

The core of our work is to construct a one-to-one mapping between a set of ancient ceramic microscopic images \(\:\text{D}=\left\{{d}_{i}|1\le\:i\le\:{N}_{d}\right\}\)and a set of labels \(\:\text{L}=\left\{{l}_{j}|{l}_{j}\in\:\left[\text{0,1}\right],\:1\le\:j\le\:{N}_{l}\right\}\), the label represent the provenance of archaeological ceramics, where \(\:{N}_{d}\) is the number of ancient ceramic microscopic images and \(\:{N}_{l}\) represents the number of labels. For the multi-class ceramic origins task, we define the label \(\:{l}_{j}=1\) if the ancient ceramic belongs to the j-th provenance of archaeological ceramics class, otherwise \(\:{l}_{j}=0\).

Overview of our model

The system architecture of our model shown in Fig. 5. The architecture can be divided into image preprocessing, feature extraction, prediction and the fusion of prediction results module. In the feature extraction module, a dual-channel network based on three deep learning architectures—VGG-16, Inception-v3, and GoogLeNet is constructed to learn the various features of ancient ceramic microscopic images.

Fig. 5
figure 5

framework of using microscopic imaging and ensemble deep learning to classify the provenance of archaeological ceramics.

Convolutional Neural Networks (CNNs) have exhibited high recognition and strong generalization capabilities in image recognition tasks, owing to features such as local perception, parameter sharing, multi-level feature extraction, and parameter optimization. As the mainstream model in the current field of image recognition, CNN has achieved excellent performance in many applications39,40,41,42.

A typical CNN consists of five main parts: the input layer, convolutional layer, pooling layer, activation layer, and fully connected layer. The input image is passed through a series of convolutional and pooling layers. Each layer takes the output from the preceding layer as its input, to identify distinct features within the image. The fully connected layer is employed for multi-class image classification.

VGG-1643, Inception-V344, and GoogLeNet45are influential deep convolutional neural network models. They differ in network architecture, feature extraction methods, and other aspects, but all exhibit outstanding performance and generalization capabilities.

VGG-16: The VGG network, developed by the Visual Geometry Group at the University of Oxford, emerged as a top-performing model in the ILSVRC (ImageNet Large Scale Visual Recognition Competition) for image categorization task. An important innovation of the VGG-16 lies in its utilization of multiple small-sized convolutional kernels (3 × 3) to replace conventional larger-sized kernels (e.g., 5 × 5 or 7 × 7). The VGG-16 model is distinguished by the stacking of 16 convolutional and fully connected layers with a small 3 × 3 convolutional kernel size and substantial depth.

Inception-V3: Inception-V3 introduces novel design strategies in the feature extraction segment. A key enhancement involves the optimization of the Inception module’s structure. Furthermore, Inception-V3 incorporates optimization methods such as RMSprop and Batch Normalization to improve model convergence speed and accuracy. To bolster the model’s robustness, techniques including Random Perturbation and Dropout are integrated into the training process.

GoogLeNet: GoogLeNet is designed to amalgamate multi-scale feature extraction with efficient computational models, and its standout feature is the Inception module. GoogLeNet mitigates computational complexity through the implementation of 1 × 1 convolutional operations for dimensionality reduction. These innovations endow the GoogLeNet model with potent nonlinear feature extraction capabilities while concurrently keeping computational costs low. GoogLeNet has demonstrated outstanding performance in image recognition tasks and has made some improvements in addressing the issue of vanishing gradients.

This study aims to predict the five different provenances of archaeological ceramics. Softmax is used in the output layer of M1, M2 and M2 for multi-class classification problems. For a given input vector \(\:O=({o}_{1},{o}_{2}\cdots\:,{o}_{5})\), the softmax function is defined as:

$$\:\sigma\:{\left(O\right)}_{i}=\frac{{e}^{{o}_{i}}}{{\sum\:}_{j=1}^{5}{e}^{{o}_{j}}}$$
(1)

Here, i represents the index of the output vector, The exponentiation ensures that the scores are positive, and the normalization ensures that the resulting values lie between 0 and 1 and sum to 1.

Cross-Entropy Loss is commonly employed to measure the difference between predicted probabilities and actual class labels, its main goal is to assist optimization algorithms in adjusting the model’s parameters, enabling the model to generate more accurate predictions on the training data. In this study, the Weighted Cross-Entropy Loss function is employed. By adjusting the weight factors, it allows for the weighting of different categories of samples, directing the model’s focus towards correct classification of specific categories46. This is particularly useful in situations involving class imbalance or varying importance of classes. The mathematical expression for Weighted Cross-Entropy Loss is as follows:

$$\:Loss=-\sum\:_{i}^{}{\omega\:}_{i}{\cdot\:y}_{i}\cdot\:\text{log}\left(\widehat{{y}_{i}}\right)$$
(2)

where \(\:{\omega\:}_{i}\) is the weight assigned to class i, \(\:\widehat{{y}_{i}}\) is the predicted output vector from the model, and \(\:{y}_{i}\) is the actual label vector. A negative sign indicates that the closer the predicted value is to the target value, the smaller the loss is.

Model integration serves to enhance overall performance, mitigate the risk of overfitting, and bolster robustness. This approach is suitable for multiple independently trained models or predictions obtained from different sources or feature extraction methods. In this paper, the Voting method is employed to aggregate predictions from three models, resulting in the final classification outcome. In our ensemble deep learning, there are three different models, and the prediction result of each model is \(\:{y}_{ij}\), where i denotes the ith sample, j denotes the jth model, and k denotes the kth category. Then the formula of the Voting method can be expressed as:

$$\:{y}_{i}={argmax}_{k}\sum\:_{j}({y}_{ij}=k),j=\text{1,2},3$$
(3)

Evaluation metrics

To evaluate the effectiveness of the model, we use accuracy (Acc), precision(Pre), recall(Rec), and F1 score criteria.

$$\:Accuracy=1-\frac{\sum\:_{i=1}^{5}{N}^{-}\left(i\right)}{{N}_{d}}$$
(4)

Where \(\:{N}^{-}\left(i\right)\) is the number of the samples in \(\:N\left(i\right)\) that are incorrectly predicted to be of other class, \(\:{N}_{d}\) is the number of ancient ceramic microscopic images in dataset.

$$\:Precision=\frac{\sum\:_{i=1}^{5}{N}_{+}^{+}\left(i\right)}{\sum\:_{i=1}^{5}{N}_{+}^{+}\left(i\right)+{\sum\:}_{i=1}^{5}{N}_{-}^{+}\left(i\right)}$$
(5)

Where \(\:{N}_{+}^{+}\left(i\right)\) indicates the number of samples predicted as ith class that are truly ith class, \(\:{N}_{-}^{+}\left(i\right)\) indicates the number of samples predicted as ith class that are not truly ith class.

$$\:Recall=\frac{\sum\:_{i=1}^{5}{N}_{+}^{+}\left(i\right)}{\sum\:_{i=1}^{5}{N}_{+}^{+}\left(i\right)+\sum\:_{i=1}^{5}{N}_{+}^{-}\left(i\right)}$$
(6)

Where \(\:{N}_{+}^{-}\left(i\right)\) indicates the number of samples predicted as not ith class that are truly ith class

$$\:F1\:Score=\frac{2\times\:Precision\times\:Recall}{Precision+Recall}$$
(7)

The choice of which metrics to use depends on the nature of the task and the specific aspects of performance being emphasized. In imbalanced datasets, precision, recall, and F1 score might be more intuitive.

Results and discussion

In this study, we propose an artificial intelligence-based Ceramic Microscopic Image Classification (CMIC) model. We first use VGG-16(M1), Inception-V3(M2), and GoogLeNet(M3) models to classify ceramic microscopic images from Longquan kiln (Lqk), Yue kiln (Yk), Gong kiln (Gk), Ganxian Xiapaiqian kiln (Gxxpk) and Quannanxianshandong kiln (Qnxsdk), respectively, and then integrate the models to improve the classification performance. We divide the benchmark dataset into a training set and an independent test set, and analyze them separately. In the process of model training, 5-fold cross-validation and Stochastic Gradient Descent (SGD) are implemented to optimize the parameters. SGD is an iterative optimization algorithm used to minimize the cost or loss function of a model by adjusting its parameters. The algorithm involves updating the model parameters in the opposite direction of the gradient of the loss function with respect to the parameters. In 5-fold cross-validation, the training dataset is partitioned into five equally sized subsets, one of which is reserved for model testing while the remaining subsets are utilized for training. This cycle is iterated five times, ensuring that each subset serves once as a validation set.

Microscopic characteristics of ancient ceramic glaze

The glaze of ancient ceramics typically exhibits microscopic features such as spots, white filamentous substances, bubbles, cracks, crystals, etc., as shown in Fig. 6.

Fig. 6
figure 6

The microscopic characteristics of ceramic glazes, (a) are spots, (b) are crystal, (c) are bubbles, (d) are white filamentous substances, (e) are cracks.

The different colors of spots on ancient ceramic glazes may result from uneven distribution of various elements or impurities in the glaze material during the firing process. These spots could arise due to variations in glaze composition, different oxidation-reduction conditions, fluctuations in firing temperatures, or other factors. The presence of different crystalline structures on ancient ceramic glazes may be attributed to various factors such as the composition of the glaze material, firing temperature, cooling rate, and chemical reactions occurring during the firing process. The crystalline structures on the glaze of ancient ceramics may present various shapes, including but not limited to the following: Needle-like crystals, granular crystals, plate-like crystals, blocky crystals and spherical crystals. These factors include chemical reactions, moisture evaporation, incomplete degassing, thermal decomposition, interact in complex ways, and the specific conditions during firing, including temperature, atmosphere, and duration, can influence the size, distribution, and abundance of bubbles on the glaze surface. There are a small number of cracks in the ceramic glaze, and the size and density of the cracks are different. Some samples have uniform crack thickness and yellow color; Some samples have coarse black cracks and fine yellow cracks. Most samples have lighter or even colorless and finer cracks that appear extremely light yellow or colorless under the microscope. The appearance of white filamentous substances in ceramic micrographs is typically attributed to the quality of raw materials and production conditions.

The analysis and summary of the microscopic structural characteristics of ancient Chinese ceramics from different kilns are presented in Table 2.

Table 2 Microscopic characteristics of glaze at different kilns.

In order to understand the microscopic structure characteristics of celadon at different kilns, cracks in the picture were measured and counted by analySIS software, and the statistical values of the length and number of grain pieces were obtained. The statistical values of the diameter of bubbles were obtained by measuring and counting the number of bubbles in the picture. The statistical value of bubble density (the number of bubbles per unit area) was also obtained.

The diameter of celadon glaze bubbles at different kilns is between ten microns and 300 microns, mainly distributed between 25 and 170 microns, and there are more than 60–500 bubbles in the glaze per square centimeter area. From the size and density of bubbles in the glaze, there is no obvious difference between GxxpqK, GK, and QnxsdK, the average diameter of bubbles is 50–80 microns, and there are more than 60–100 bubbles in the glaze per square centimeter.

GK, Qnxsdk and YK glaze have cracks, crack length between 200 and 40,000 microns. In unit area, the smaller the average strip value, the longer the average strip length; The larger the average strip value, the shorter the average strip length. Compared with the glaze cracks in GK and Qnxsdk, the average number of cracks in YK is higher and the average length is shorter. The crack length is about 200–10000 microns and the average number of cracks is about 30.

Parameter setting

The steps of the SGD algorithm are as follows:1) Shuffle the training dataset; 2) Divide the dataset into small subsets; 3) For each small subset, compute the gradient of the loss function with respect to the model parameters; 4) Update the model parameters in the opposite direction of the gradient using a learning rate; 5) Repeat the process until convergence or for a specified number of iterations.

There are four parameters have a greater impact on the performance of the Inception-V3, GoogLeNet, and VGG-16 deep learning model as parameter ξ for the learning rate, parameter ε for the batch size, parameter η for the freeze learning, and parameter δ for the freeze epochs. The ξ is crucial for the stability and convergence of the algorithm. Too small a learning rate may result in slow convergence, while too large a learning rate may cause the algorithm to overshoot the minimum. The ε indicates how many samples will be processed in each forward/backward pass during training. The η is a boolean value used to control whether to freeze the feature layers of the pre-trained model during transfer learning. This parameter can affect the mode of information transfer and parameter update between the feature extraction layer and the subsequent layer. When η is true, the δ specifies how many epochs of training should be done before unfreezing the feature layers. Figure 7 indicates that the optimal performance was achieved when η = true, ε = 4, ξ = 0.001, and δ = 20. In general, freezing the feature extraction layer during the training process is an effective way to enhance the predictive capability of the model. As shown in Fig. 7(d), When the feature extraction layer is frozen, the model’s performance improves by approximately 5%. The effect of batch size on the model is illustrated in Fig. 7(b), where the model works better with this hyperparameter of 4.

Fig. 7
figure 7

Model performance with different learning rate, batch size, freeze epochs.

The trends in Loss and Accuracy variation across different models on the training set are relatively similar. Taking the VGG-16 model as an example, as shown in Fig. 8, the value of Accuracy remains relatively stable after 40 training epochs, while the value of Loss stabilizes around 110 training epochs. The curve of training set loss (train_loss) rapidly decreases and stabilizes, reaching a final value of around 0.05, indicating that the VGG-16 model achieves good convergence and exhibits strong overfitting suppression capability.

Fig. 8
figure 8

Loss/accuracy curves for the training set.

Analysis of the CMIC model classification result

Firstly, the confusion matrixes of VGG-16, Inception-V3, GoogLeNet, and ensemble deep learning were obtained in the five-fold cross-validation method. and then the other evaluation metrics of model based on confusion matrixes, including the precision, recall, F1 score and accuracy were also compared.

Table 3 Performance comparison between different models on training data using five-fold cross validation and on test data using independent test.

As shown in Table 3, the precision of all classification algorithms is more than 92%, among which ensemble deep learning has the highest precision rate of 95.48%. Following that, the prediction precision of VGG-16, Inception-V3 and GoogLeNet are 92.68%, 94.77% and 92.13% respectively. With regard to the accuracy of these models, the maximum value is 95.54% belonging to the ensemble deep learning and the minimum value is 92.15% belong to GoogLeNet. When the F1 score is in the range of 92.23–95.60%, it implies that the forecast closely aligns with the real provenance of archaeological ceramics. It was observed that the integration of the three models (M1, M2, M3) yielded the most favorable classification results, as evidenced by superior precision, recall, F1-score, and accuracy. By integrating them together, a more comprehensive and diverse feature representation can be obtained, reducing the impact of noise and thereby improving the model’s performance and overall generalization ability.

The VGG-16, Inception-V3 and GoogLeNet exhibit distinctions in architecture, parameter configurations, and computational methodologies, enabling them to capture feature information at varying levels and scales. Based on ceramic microscopic images, all three image recognition algorithms can effectively classify ancient ceramics. This also indicates that ceramic microscopic images can well reflect the differences in materials and processes used in the production of ceramics from different kiln sites.

Conducting an independent test provides a more reliable assessment of the prediction models’ robustness. The performance of the four models were evaluated using an independent test set, and the results are also presented in Table 3. The predictive results of the ensemble deep learning are very stable at about 0.9601, 0.9615, 0.9607 and 0.9583 in precision, recall, F1-score, and accuracy, respectively. The data in the Table 3 illustrates that the precision score achieved by the ensemble deep learning model at present is considerably higher compared to the other three models, as are the other three indicators.

To visually represent the classification efficacy, a confusion matrix was employed, where the rows and columns corresponded to the actual and predicted categories, respectively. We present the confusion matrices for the predictions of four models on ancient ceramic types and further explore the correlations between different kiln production processes. The details of misclassification of samples are listed in Fig. 9. Firstly, these misclassified samples of the Gxxpk are mainly classified into the Qnxsdk in M1, M2, M3 models. This may be because the raw materials and the manufacturing processes of the two kilns are similar. The microscopic images of the misclassified samples may exhibit similarities to those of the Qnxsdk, which makes it more difficult to distinguish them by a single model. Secondly, these misclassified samples of the Qnxsdk are mainly classified into the Lqk, Gk and Gxxpk. The reason is that Qnxsdk samples are scarce in quantity yet encompass a wide variety of products, making it challenging to generalize their corresponding characteristics. The ensemble deep learning model achieved prediction accuracies of 95.24%, 96.05%, 97.37%, 92.11%, and 100% for Gxxpqk, Gk, Lqk, Qnxsdk, and Yk, respectively. Across the classification of the five kilns, the ensemble deep learning model outperformed the individual models. The confusion matrix values indicate that the ensemble model performs better in classifying the Gxxpqk, Gk and Yk datasets. Additionally, it exhibits smaller deviations in classifying the Lqk and Qnxsdk datasets.

Fig. 9
figure 9

Confusion matrix for the results of the five the provenances of archaeological ceramics.

Conclusion

This paper introduces a method for identifying and classifying the provenance of archaeological ceramics based on ceramic microscopic features and ensemble deep learning. The method utilizes ceramic microscopic images obtained from the Ancient Ceramics Research Center of Jingdezhen Ceramic University as samples. A dataset contain microscopic images and labels of multiple kilns is constructed using data augmentation techniques to expand the sample size and diversity, mitigating issues related to sample insufficiency and overfitting. We compare the performance of four deep learning frameworks VGG-16, Inception-v3, GoogLeNet and their fusion in predicting the origin of ancient Chinese ceramics, and find that ensemble deep learning performs best by amalgamating prediction outcomes from diverse models. Based on ceramic microscopic images. Four models all can effectively classify ancient ceramics. indicating that ceramic microscopic images can well reflect the differences in materials and processes used in the production of ceramics from different kiln. This method can assist ancient ceramics enthusiasts and professionals in accurately distinguishing ceramic products from different kiln mouths, carrying significant cultural, artistic, and historical value. In the future, this method can be extended to the classification and analysis of other cultural heritage artifacts, providing technical support for intelligent archaeology. It also offering innovations in cultural heritage preservation and digital archiving. Microscopic analysis is equally important in fields such as medicine, biology, and nanomaterial research. In materials science, microscopic image analysis can be used to study the microstructure and interface characteristics of materials, guiding the optimal design of materials. Additionally, microscopic image analysis technology can be combined with other technologies, such as spectroscopic analysis and mass spectrometry, to achieve more comprehensive sample analysis and detection.