AI assistance enhances histopathologic distinction between sebaceous and squamous cell carcinoma of the eyelid

Geng, Jialu; Zhang, Kai; Dong, Li; Hui, Shiqi; Zhang, Qian; Li, Zhixi; Zhang, Ruiheng; Jiang, Xue; Wang, Mingyang; Sun, Shuantao; Zhang, Hong; Yang, Yunyun; Yang, Xinji; Piao, Yingshi; Li, Dongmei

doi:10.1038/s41746-025-01775-z

Download PDF

Article
Open access
Published: 04 July 2025

AI assistance enhances histopathologic distinction between sebaceous and squamous cell carcinoma of the eyelid

Jialu Geng¹^na1,
Kai Zhang²^na1,
Li Dong¹^na1,
Shiqi Hui¹^na1,
Qian Zhang³^na1,
Zhixi Li^4,5,
Ruiheng Zhang¹,
Xue Jiang¹,
Mingyang Wang¹,
Shuantao Sun⁶,
Hong Zhang³,
Yunyun Yang³,
Xinji Yang⁶,
Yingshi Piao³ &
…
Dongmei Li¹

npj Digital Medicine volume 8, Article number: 406 (2025) Cite this article

Subjects

Abstract

Sebaceous gland carcinoma (SGC) and some poorly differentiated squamous cell carcinomas (SC) of the eyelid may have overlapping clinical and histopathologic features, leading to potential misdiagnosis and delayed treatment. The authors developed a deep learning (DL)-based pathological analysis framework to classify SGC and SC automatically. In total, 282 whole slide images (WSIs) were used for training, validating and inner testing the DL framework and 36 WSIs were obtained from another hospital as an external testing dataset. In WSI level, the diagnostic accuracy of SGC and SC achieved 84.85% and 75.12%, respectively, in the internal testing set and reached 22.22% and 77.78%, respectively, in the external testing set. The accuracy of the pathologists could be improved with the AI framework (60.0 ± 9.8% vs. 76.8 ± 9.6%). This AI-based automatic pathological diagnostic framework achieved the performance of a well-experienced pathologist and can assist pathologists in making diagnoses more accurately, especially for non-ophthalmic pathologists.

Self supervised artificial intelligence predicts poor outcome from primary cutaneous squamous cell carcinoma at diagnosis

Article Open access 15 February 2025

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides

Article Open access 05 March 2024

Deep transfer learning with improved crayfish optimization algorithm for oral squamous cell carcinoma cancer recognition using histopathological images

Article Open access 25 October 2024

Introduction

The prevalence of eyelid malignancies is about 5.1–15.7 cases/100,000 individuals annually¹, accounting for about 5–10% of all skin malignancies². Eyelid malignancies impair the functional integrity essential for globe protection while carrying mortality risks due to metastatic dissemination to regional lymph nodes and distant organ systems^1,3. Squamous cell carcinoma (SC) and sebaceous gland carcinoma (SGC) of the eyelids account for 3.4–12.6% and 0.6–10.2% of eyelid malignancies, respectively. They are the second and third most common malignant tumors of the eyelid, surpassed only by basal cell carcinoma¹.

The treatment modalities and prognoses of SGC and SC differ, influencing patient management^1,2. For example, SGC, owing to its pagetoid spread and multifocal nature, often requires map biopsies to define tumor margins and guide surgical excision⁴. This technique is not typically applied to wide local excision with clear margins of the SC, as it tends to exhibit more localized growth⁵. Additionally, definitive radiotherapy is generally less effective for SGC than for SC, which tends to result in better radiosensitivity in clinical practice^6,7. Therefore, identifying SC and SGC at the onset of medical care is extremely important to avoid adverse outcomes due to delays in diagnosis. However, SGC typically masquerades as chalazion, chronic blepharoconjunctivitis and blepharitis, which leads to clinical misdiagnosis^7,8,9. Around 40–75% of cases of SGC are initially misdiagnosed by inexperienced pathologists, and the most common histopathological misdiagnosis of SGC is SC⁹. Therefore, innovative means need to be developed to reduce work intensity and improve efficiency.

Whole slide imaging (WSI) was developed in 1999 and has since served as a valuable tool in the field of digital pathology¹⁰. WSI allows the examination of a slide and annotation on a computer screen¹¹. Recently, artificial intelligence (AI)-based computational pathology has been used to analyze digital pathological slides automatically and assist in the traditional pathological examination of slides¹¹. In the past few years, deep learning (DL)-based AI techniques have advanced considerably in their application in medical fields, including ophthalmology^12,13. Therefore, this study established DL models to automatically differentiate eyelid SGC and SCC based on WSIs.

Results

Development of the diagnostic framework

The AI diagnostic framework consists of five modules (Fig. 1): a preprocessing module, a meaningless tile filter module, a style transfer module, a tile identification module, and a results merging module. The preprocessing module divides the WSIs (40×) into tiles (512 × 512). Then, a meaningless tile filter module was used to filter out the background and meaningless tiles, including white, black, white and black tiles, blur tiles, tiles containing broken cells due to improper storage, ratio of cells <60%¹⁴, tiles with non-cellular structures, and contaminated tiles (shown in Supplementary Fig. 1). This study trained three ResNet101 models for filtering out these useless tiles in the meaningless tile filter module with the dataset including all cut tiles annotated by three pathologists (Q. Zhang and H. Zhang and Piao). Due to the differences between dyeing methods, saving methods, saving durations, etc., the heterogeneity of WSIs from different healthcare centers could not be avoided. The heterogeneity deteriorates the performance of each module when the models are provided data from external testing datasets. To solve this problem, the cycle generative adversarial network (CycleGAN)¹⁵ model was applied to generate tiles with the appearance of training data (training dataset and validation dataset) and a style similar to that of unseen tiles (from the external testing dataset) to prevent performance degeneration for the external testing dataset. This auxiliary module did not require additional annotation, and thus, it can be quickly integrated into the tile identification module. The remaining tiles were subsequently classified as normal (including stroma), SC, and SGC tiles through the tile identification module. This study selected five candidate convolutional neural networks (CNNs) for this module: ResNet101¹⁶, ResNeXt101¹⁷, Wide ResNet101^17,18, DenseNet121¹⁹, and EfficientNetb5²⁰ with consistent hyperparameter settings (e.g., learning rate, epochs). The results merging module was applied to merge the identification results for all useful tiles in two manners: the final diagnostic result was based on the probability or the number of each type of tile (normal, SC, and SGC): the comparison between the average probability of SGC and SC or the comparison of number of the tiles identified as SGC and SC. All models were trained with weights pre-trained on the ImageNet dataset²¹.

**Fig. 1: The diagnostic framework used for classifying SGC and SC.**

For the meaningless tile filter module, the model was trained and validated via five fold cross-validation in terms of the tile level. The model was selected with the best performance to use in the framework. However, for the tile identification module, the models were trained in a subject-independent fivefold cross-validation manner^22,23; i.e., the tiles from one subject could not be divided into a training dataset and a testing dataset simultaneously. Because imbalanced classification problems occurred commonly in this study, the class weight was also adopted to tradeoff the effect of imbalanced datasets. The hyperparameters of all models in the diagnostic framework are shown in Supplementary Table 1.

Participants

This study obtained WSIs from 198 patients who underwent SGC or SC resection surgery (without biopsy samples) and pathological examination at Beijing Tongren Hospital and the Third Medical Center of Chinese PLA General Hospital. The baseline characteristics of patients in the training set, the internal testing set, and the external testing set were described in Supplementary Tables 2 and 3.

Dataset composition

The training dataset consisted of 47 WSIs (from 21 patients) and was cut into 124,360 tiles, with 26 SGC WSIs (from 19 patients) and 21 SC WSIs (from 12 patients). The internal testing dataset consisted of 235 WSIs (from 154 patients), which were cut into 2680 tiles (after filtering), with 202 SGC WSIs (from 129 patients) and 33 SC WSIs (from 25 patients). The external testing dataset was obtained from the Third Medical Center of Chinese PLA General Hospital. The external dataset included 36 WSIs from 13 patients, comprising 27 SGC WSIs from 10 patients and nine SC WSIs from three patients.

Performance of quality control for tiles

Supplementary Tables 4 and 5, Supplementary Fig. 2 presented the performance of meaningless tile filter module of internal cross-validation. Because the black, white, and black‒white tiles were easy to identify, the accuracy of the first and second filter models surpassed 0.99. The mean value of the accuracy for the third filter model surpassed 0.83 for all six classes (including others). The number of classes in the third filter model was much greater than that in the first and second filter models; thus, the result did not reach the performance of the first and second filter models. Additionally, the misclassification between the five classes (blur tiles, tiles containing broken cells due to improper storage, a ratio of cells <60%, tiles with non-cellular structure, and contaminated tiles) did not disturb the performance of the diagnosis module because the accuracy for others (qualified tiles) was almost 0.8. Notably, one SGC WSI in the internal testing dataset was discarded because all the tiles from it were evaluated as unqualified by the meaningless tile filter module. Therefore, 234 WSIs (201 SGC and 33 SC) were ultimately used as an internal testing set.

Performance for differential classification at the tile level

The confusion matrices of all five types of CNNs in the tile identification module are shown in Fig. 2. The detailed performance metrics for overall performance and each class are shown in Supplementary Tables 6 and 7. The receiver operating characteristic curve and precision-recall curves for each class were used for internal cross-validation (Supplementary Fig. 3). ResNeXt performed the best, with a mean accuracy of 0.9054, and the mean accuracy of all five models was >0.83. The area under the receiver operating characteristic curve (AUROC) values of all five models for all three classes were >0.97.

**Fig. 2: The confusion matrices of the tile identification module in cross-validation.**

Performance for differential classification at the WSI level

The heat maps in the internal testing dataset at the WSI level are shown in Fig. 3, and the accuracy for the whole internal and external testing datasets is shown in Table 1. The accuracy corresponded to two merging manners in the results merging module. Except for DenseNet, which performed the worst, the differences for the other models in the internal testing dataset were not as prominent as those in the external testing dataset. The accuracy of DenseNet was 30/33, 30/33, 90/201, and 87/201 (internal testing dataset); 9/9, 9/9, 0, and 0 (external testing dataset) for SC and SGC, respectively. The performance of Wide ResNet was the best at the level (bold words in Table 1), especially for the external testing datasets.

**Fig. 3: The heatmaps for the wide ResNet model.**

Table 1 The performance of the five types of CNNS in the tile identification module for the internal and external testing datasets

Full size table

Comparison between pathologists without and with AI assistance

As shown in Fig. 4, the accuracy of the pathologists’ assessment could be improved with the assistance of an AI framework. The accuracy of each pathologist participating in the comparison was statistically analyzed. After the AI-assistance, the diagnostic accuracy of five pathologists for 25 WSI increased from 60.0 ± 9.8% to 76.8 ± 9.6% (mean ± standard deviation), with a statistically significant difference (P = 0.002, t = 7.203, degrees of freedom = 4). The overall diagnostic accuracy of total 25 WSIs for resident pathologist 1, attending pathologist 2, attending pathologist 3, associate chief pathologist 4, and associate chief pathologist 5 has been improved by 12.0%, 24.0%, 12.0%, 20.0%, 16.0%, respectively. For resident pathologist 1, the diagnostic accuracy of SGC and SC improved by 6.67% and 20%, respectively. (from 66.7% to 73.3% and from 30% to 50%, respectively). For attending pathologist 2, the diagnostic accuracy of SGC and SC has been improved by 26.7% and 20%, respectively. (from 60% to 86.7%, 30% to 50%, respectively); for attending pathologist 3, the diagnostic accuracy of both SGC and SC has been improved by 13.3% and 10% (from 80% to 93.3%, 40% to 50%, respectively); for Associate Chief Pathologist 4, the diagnostic accuracy of both SGC and SC improved by 20.0% (from 73.3% to 93.3% and from 50.0% to 70.0%, respectively); moreover, for Associate Chief Pathologist 5, the diagnostic accuracy of SGC improved by 26.7% (from 66.7% to 93.3%), without any improvement in the diagnostic accuracy of SC (maintaining 80.0%). The changes in diagnostic accuracy for five pathologists using AI assistance are outlined in Table 2. Shown in Fig. 5, a significant reduction in average diagnostic time was observed only for associate chief pathologist 2, with a decrease of 15.6 s (from 32.24 s to 16.64 s) (P < 0.0001). The time required for diagnosis with or without AI did not differ significantly among the other pathologists.

**Fig. 4: The accuracy of five pathologists in diagnosing SGC and SC.**

**Fig. 5: The time cost of slide reading.**

Table 2 Accuracy of differential diagnosis between eyelid sebaceous gland carcinoma and squamous cell carcinoma by five DL model-assisted pathologists

Full size table

Discussion

This was the first study to develop a DL-based pathological classification system for eyelid SGC and SC. The DL framework demonstrated stable capabilities in achieving rapid differential diagnosis at the tile level and the WSI level, showing sensitivity and specificity comparable to those of experienced pathologists.

Computer-aided pathological diagnosis systems have been shown to help in the subtype classification of lung cancer²⁴ and in assessment for metastatic disease in lymph nodes²⁵. They may be combined with genomic data to establish a multimodal recurrence-scoring prediction system for renal cell carcinoma²⁶. These studies have shown that AI models may assist pathologists or potentially extract additional features that may provide additional prognostic information not immediately visible on standard histopathologic examination.

In 2020, Wang et al.²⁷ first applied computer-aided pathological diagnosis technology in ophthalmology (a DL-based system for eyelid malignant melanoma detection). The methodological parallels between our approach and the previous study illustrate the potential of AI-driven techniques for pathological applications in ophthalmology.

The framework described herein showed good discriminative ability at all levels. Misdiagnosis rates ranging from 40–75% have been reported in the histopathological assessment of SGC when general pathologists initially evaluate it without subspecialty training in ophthalmic pathology^7,9. However, the algorithm can automatically determine the diagnostic result for each tile and merge them for the whole slide, which can be completed in seconds. This study included more than two slides from the same patient in the training set, validation set, and external testing set to test the comprehensive ability.

This DL-based framework can serve as a reliable tool to assist pathologists in improving the efficiency and accuracy of diagnosis with WSIs of eyelid SC and SGC. The tumor heatmap generated by the proposed framework provides a spatially resolved probability landscape across gigapixel WSIs and may assist in visually revealing the ___location of the tumor. Our automatic diagnostic framework could repeatedly diagnose a slide within seconds, which reduced the heavy workload of pathologists to some extent. The tumor heatmap generated by framework provides a spatially resolved probability landscape across gigapixel WSIs, which could visually reveal the ___location of tumors. This system can also significantly promote medical equity and achieve remote medical care in areas with inadequate medical conditions.

The framework also exhibited excellent generalizability. The diversity of slide color and quality due to different methods of making slides, different scanners, and different digital pathology slide storage formats has hindered the promotion of AI-assisted pathology diagnosis models. Paraffin sections and frozen sections were included from the multiple-center dataset, which were scanned by two different scanners and stored in formats to train and validate the generalizability of the model.

Additionally, this model has a quality control function and can automatically discard unqualified parts. Roughly obtained, poorly fixed, or improperly processed slides can interfere with the ability of pathologists to assess morphology²⁸. Therefore, quality control, including digital pathology, is crucial for pathological diagnosis²⁹. Thus, the model was designed to remove unqualified images in the first step to prevent machines from judging the nature of tumors.

The WSI of the DL framework judgment errors were analyzed in the internal testing set. Notably, three of these WSIs also confused pathologists (one SGC slide and two SC slides), which could not be evaluated accurately at the time. These poorly differentiated pathological sections lacking classical histological features also confused chief pathologists and were confirmed through multicenter expert consultations. The advantage of AI is that it does not require the time needed to train an experienced pathologist, and there is no subjectivity in tissue pathology analysis²¹. However, the AI model for pathology was subject to certain limitations. The training of these models is limited by the quality and quantity of the training set, as well as, the heterogeneity of the tumors²¹.

This study further examined the responses of each pathologist in the comparison of diagnoses with and without AI assistance. It was found that four out of five pathologists marked one slide as “undecidable” after AI assistance. This occurred because the model identified all regions as cancerous when they were actually normal, while the true cancerous regions were removed as low-quality tiles. This indicates a need for enhancement in both classification accuracy and quality control within our model.

Our study had several limitations. First, the sample size was small, and no publicly available dataset containing digital pathological slides of two types of malignant eyelid tumors for training or validation was found. However, our sample size was larger than that used for the previous pathological differentiation system for eyelid tumors^27,30, and it was also acceptable compared to similar published studies^31,32. Second, since the study employed five preexisting CNNs, the internal process is a “black box” and not explainable²¹. Third, the quality control system could not achieve 100% specificity, implying that a small portion of high-quality small tiles were also screened out. For example, the WSIs discarded by the meaningless tile filter module may interfere with the pathologist’s judgment to some extent. A large portion (about 30%) of the areas met our quality control requirements and were meaningful for pathological diagnosis. However, our conclusion is reliable, and this deep learning model can improve the diagnostic accuracy of pathologists. Fourth, this study prospectively collected slides from one external validation hospital; although the production methods and scanners used were different, the population was also from China. The performance of the model on slides of other races needs to be validated. Fifth, this study only performed three classification tasks, achieving the most difficult pathological differentiation of two types of eyelid malignant tumors. Our research group aims to expand the pathological classification to other benign and malignant eyelid tumors based on this research. In addition, unfortunately, biopsies were not included. The performance of the AI diagnostic framework on intraoperative frozen section specimens and FFPE surgical resection samples is satisfactory.

The findings of our study suggest that deep learning models have the potential to be applied in the differential diagnosis of malignant eyelid tumors that may show some overlapping histopathological features. In future studies, this methodology will be adapted to develop a multiclass system for diverse eyelid tumors to assist in AI-driven classification in ophthalmic pathology.

Materials and methods

Study population and data preparation

This study involves human participants and was approved by The Ethics Committee of Beijing Tongren Hospital, Capital Medical University (TRECKY2018-056-GZ(2022)-07), and the Third Medical Center of Chinese PLA General Hospital. Participants gave informed consent to participate in the study before taking part. This study collected pathological slides (including formalin-fixed paraffin-embedded (FFPE) and intraoperative frozen sections from pathological slides) and medical records of patients who were diagnosed with SGC or SC and underwent surgeries at Beijing Tongren Hospital (BTH) from June 2006 to November 2022. The gold standard for diagnosis in this study was histopathological confirmation by two independent subspecialty-trained ophthalmic pathologists (with >20 years of experience in ocular pathology), with the final diagnostic consensus reached through multidisciplinary consultation, including treating ophthalmologists. Immunohistochemical markers (EMA, AR and Ki-67) were systematically applied in diagnostically challenging cases, followed by molecular testing (P53 mutation analysis) when required. In total, 228 H&E-stained sections from 148 eyelid SGC patients and 54 H&E-stained sections from 37 eyelid SC patients were included, and all selected slides were scanned by separate technicians into digital pathological slides using a digital slice scanning equipment (NANO ZOOMER XR, Hamamatsu, Japan and Pannoramic DESK P250, 3D HISTECH, Hungary). The schematic flowchart of data collection was shown in Supplementary Fig. 4. All WSIs were stored in the MRCX format (193 slides) or NDPI format (89 slides). 47 WSIs (26 SGC and 21 SC) were divided into training and validation sets; care was taken that the patients in the training and validation sets did not overlap with the subjects independently (the slides from one subject were not divided into training and validation datasets at the same time). For the training set, two pathological slide processing software programs (Slideviever, v.2.4.0, 3DHISTECH and Qupath, v.0.3.0, https://qupath.github.io) were used for delineating and labeling the regions of different kinds of tissue, including the following three classes: tissue (normal or stroma), the SGC tumor region and the SC tumor region. Two experienced pathologists (Q. Zhang and H. Zhang) (with >10 years of experience) were responsible for labeling WSIs and reviewing each other’s assessments. In case of a dispute between the two pathologists, the final decision made depended on the chief physician of the department of pathology (Piao).

Patient data with complete medical records were obtained from the Third Medical Center of the TMCPLA for the period May 2017 to May 2023 to construct an independent external testing set to test the extrapolation ability of our DL model. Finally, 36 slides (including FFPE pathological slides and intraoperative frozen sections) were scanned (27 SGC WSIs from 12 patients and nine SC WSIs from four patients).

Statistical analysis of baseline characteristics of patients was performed using SPSS software (version 25.0, Foundation for Statistical Computing, IBM, America). Frequency (percentage) was described as categorical variables, and continuous variables with normal and skewed distributions were presented as mean ± standard deviation and median (interquartile range), respectively.

Training

All models were trained on a server with two NVIDIA 3090 GPUs, an Intel(R) Xeon(R) Gold 5218R CPU @ 2.10 GHz, and 256 GB of memory. SGD³³ (stochastic gradient descent) was used as the optimization method.

Assisting pathologists in reading slides

To assess the clinical utility of our DL framework in aiding pathologists with diagnosis, five pathologists with varying degrees of clinical experience (two associate chief pathologists with >15 years of experience, two attending pathologists with >10 years of experience, and one resident pathologist with >5 years of experience) were recruited to diagnose 25 challenging digital pathological slides (15 SGC slides from 12 patients and 10 SC slides from nine patients) from the internal testing dataset. Patient information was anonymized, and pathologists were instructed to perform binary classifications. First, pathologists independently classified the slides, and after a 45-day washout period, they were identified again using heat maps of the whole slide and classified tiles provided by the DL-based framework as clues. Specifically, in the second round, the pathologists were provided with classification results of tiles automatically categorized by computers.

Performance metrics

In cross-validation, for the meaningless tile filter module, row normalized confusion matrices, accuracy, sensitivity, specificity, and (AUROC) were used to evaluate the performance. For the tile identification modules, row-normalized confusion matrices were used for overall evaluation, and the accuracy, sensitivity, specificity, and AUROC were used to evaluate the performance for each class^34,35. The corresponding confidence intervals were calculated via point estimates³⁶. During the validation of the internal and external testing dataset, the diagnostic accuracies for SC and SGC slides were summarized. For the time cost of reading slides, the Wilcoxon paired test was used to evaluate the difference between readings without AI assistance and those with AI assistance³⁷. A Paired Samples t-test was utilized to compare the accuracy of pathologists diagnosing with AI assistance versus without. A p value of <0.05 was considered statistically significant. All statistical analyses were performed using MATLAB R2016a (https://www.mathworks.com/), GraphPad Prism 5 (https://www.graphpad.com/) and SPSS software (version 25.0, Foundation for Statistical Computing, IBM, America).

Statistical analysis of baseline patient characteristics was carried out using SPSS software (version 25.0, Foundation for Statistical Computing, IBM, USA). Categorical variables were expressed as frequency (percentage), while continuous variables with normal and skewed distributions were reported as mean ± standard deviation and median (interquartile range), respectively.

Data availability

The datasets generated or analyzed during this study are available from the corresponding author upon reasonable request.

Code availability

The code for training the models in current research are available from https://github.com/Hugo0512/AI4SGCSC.

References

Yin, V. T., Merritt, H. A., Sniegowski, M. & Esmaeli, B. Eyelid and ocular surface carcinoma: diagnosis and management. Clin. Dermatol. 33, 159–169 (2015).
Article CAS PubMed Google Scholar
Silverman, N. & Shinder, R. What’s new in eyelid tumors. Asia Pac. J. Ophthalmol.6, 143–152 (2017).
CAS Google Scholar
Cook, B. E. Jr. & Bartley, G. B. Epidemiologic characteristics and clinical course of patients with malignant eyelid tumors in an incidence cohort in Olmsted County, Minnesota. Ophthalmology 106, 746–750 (1999).
Article PubMed Google Scholar
Shields, C. L., Naseripour, M., Shields, J. A. & Eagle, R. C. Jr. Topical mitomycin-C for pagetoid invasion of the conjunctiva by eyelid sebaceous gland carcinoma. Ophthalmology 109, 2129–2133 (2002).
Article PubMed Google Scholar
Sun, M. T. et al. Periocular squamous cell carcinoma: TNM staging and recurrence. Ophthalmology 122, 1512–1516 (2015).
Article PubMed Google Scholar
Reifler, D. M. & Hornblass, A. Squamous cell carcinoma of the eyelid. Surv. Ophthalmol. 30, 349–365 (1986).
Article CAS PubMed Google Scholar
Shields, J. A., Demirci, H., Marr, B. P., Eagle, R. C. Jr. & Shields, C. L. Sebaceous carcinoma of the ocular region: a review. Surv. Ophthalmol. 50, 103–122 (2005).
Article PubMed Google Scholar
Shields, J. A., Demirci, H., Marr, B. P., Eagle, R. C. Jr. & Shields, C. L. Sebaceous carcinoma of the eyelids: personal experience with 60 cases. Ophthalmology 111, 2151–2157 (2004).
Article PubMed Google Scholar
Kaliki, S. et al. Sebaceous gland carcinoma of the eyelid: clinicopathological features and outcome in Asian Indians. Eye 29, 958–963 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ho, J. et al. Use of whole slide imaging in surgical pathology quality assurance: design and pilot validation studies. Hum. Pathol. 37, 322–331 (2006).
Article PubMed Google Scholar
Qiao, Y. et al. Multi-modality artificial intelligence in digital pathology. Brief Bioinform. 23, https://doi.org/10.1093/bib/bbac367 (2022).
Dong, L. et al. Artificial intelligence for screening of multiple retinal and optic nerve diseases. JAMA Netw. Open 5, e229960 (2022).
Article PubMed PubMed Central Google Scholar
Jiang, X., Dong, L., Luo, L., Zhang, K. & Li, D. Retinal photograph-based deep learning system for detection of thyroid-associated ophthalmopathy. J. Craniofac. Surg. https://doi.org/10.1097/scs.0000000000009919 (2023).
Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl. Acad. Sci. USA 115, E2970–e2979 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhu, J.-Y., Park, T., Isola, P., Alexei A. E. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. IEEE International Conference on Computer Vision (ICCV) 2242–2251 (IEEE, 2017).
Kai Zhang, Y. Z. et al. Anatomical sites identification in both ordinary and capsule gastroduodenoscopy via deep learning. Biomed. Signal Process. Control 90, https://doi.org/10.1016/j.bspc.2023.105911 (2024).
Zhang, H. et al. Quickly diagnosing Bietti crystalline dystrophy with deep learning. iScience 27, 110579 (2024).
Article PubMed PubMed Central Google Scholar
Sergey Zagoruyko, N. K. Wide residual networks.arXiv:1605.07146 https://doi.org/10.48550/arXiv.1605.07146 (2016).
Zhang, Y. et al. Diagnosing chronic atrophic gastritis by gastroscopy using artificial intelligence. Dig. Liver Dis. 52, 566–572 (2020).
Article PubMed Google Scholar
Mingxing Tan, Q. V. L. EfficientNet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946 https://doi.org/10.48550/arXiv.1905.11946 (2019).
Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).
Article PubMed PubMed Central Google Scholar
Zhang, R. et al. Automatic retinoblastoma screening and surveillance using deep learning. Br. J. Cancer 129, 466–474 (2023).
Article PubMed PubMed Central Google Scholar
Hui, S. Q. et al. Noninvasive identification of Benign and malignant eyelid tumors using clinical images via deep learning system. J. Big Data 9, 14 (2022).
Article Google Scholar
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wu, S. et al. Artificial intelligence-based model for lymph node metastases detection on whole slide images in bladder cancer: a retrospective, multicentre, diagnostic study. Lancet Oncol. https://doi.org/10.1016/S1470-2045(23)00061-X (2023).
Gui, C. P. et al. Multimodal recurrence scoring system for prediction of clear cell renal cell carcinoma outcome: a discovery and validation study. Lancet Digit. Health 5, e515–e524 (2023).
Article CAS PubMed Google Scholar
Wang, L. et al. Automated identification of malignancy in whole-slide pathological images: identification of eyelid malignant melanoma in gigapixel pathological slides using deep learning. Br. J. Ophthalmol. 104, 318–323 (2020).
Article PubMed Google Scholar
Wick, M. R. The hematoxylin and eosin stain in anatomic pathology—an often-neglected focus of quality assurance in the laboratory. Semin Diagn. Pathol. 36, 303–311 (2019).
Article PubMed Google Scholar
Janowczyk, A., Zuo, R., Gilmore, H., Feldman, M. & Madabhushi, A. HistoQC: An open-source quality control tool for digital pathology slides. JCO Clin. Cancer Inform. 3, 1–7 (2019).
Luo, Y. et al. Deep learning-based fully automated differential diagnosis of eyelid basal cell and sebaceous carcinoma using whole slide images. Quant. Imaging Med. Surg. 12, 4166–4175 (2022).
Article PubMed PubMed Central Google Scholar
Ehteshami Bejnordi, B. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama 318, 2199–2210 (2017).
Article PubMed PubMed Central Google Scholar
Pan, X. et al. The artificial intelligence-based model ANORAK improves histopathological grading of lung adenocarcinoma. Nat. Cancer 5, 347–363 (2024).
Article CAS PubMed PubMed Central Google Scholar
Bottou, L. In Neural Networks: Tricks of the Trade: Second Edition (eds Montavon G., Geneviève B. O., & Müller K.-R.) 421–436 (Springer Berlin Heidelberg, 2012).
Zhang, Y. et al. Deep transfer learning from ordinary to capsule esophagogastroduodenoscopy for image quality controlling. Eng. Rep. 6, e12776 (2024).
Article Google Scholar
Du, K. et al. Deep learning system for screening AIDS-related cytomegalovirus retinitis with ultra-wide-field fundus images. Heliyon 10, e30881 (2024).
Article PubMed PubMed Central Google Scholar
Schmidt, H., Spieker, A. J., Luo, T., Szymczak, J. E. & Grande, D. Variability in primary care physician attitudes toward medicaid work requirement exemption requests made by patients with depression. JAMA Health Forum 2, e212932 (2021).
Article PubMed PubMed Central Google Scholar
Rosner, B., Glynn, R. J. & Lee, M. L. The Wilcoxon signed rank test for paired comparisons of clustered data. Biometrics 62, 185–192 (2006).
Article PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant number 82071005), National Natural Science Foundation of China (Nos. 82220108017, 82141128, 82401283), the Sanming Project of Medicine in Shenzhen (No. SZSM202311018), the Capital Health Research and Development of Special (2020-1-2052), Science & Technology Project of Beijing Municipal Science & Technology Commission (Z201100005520045), the priming scientific research foundation for the junior researcher in Beijing Tongren Hospital, Capital Medical University (2023-YJJ-ZZL-003) and Scientific Research Common Program of Beijing Municipal Commission of Education (No. KM202410025011). We wish to thank Dr. Xiaoli Zhao and Dr. Suping Cui (Department of Pathology, Beijing Tongren Hospital) for participating in “Comparison between pathologists without and with AI assistance” of study and Dr. Yan Hei (Department of Pathology, the Third Medical Center of Chinese PLA General Hospital) for her guidance on participation in collecting data for this study.

Author information

These authors contributed equally: Jialu Geng, Kai Zhang, Li Dong, Shiqi Hui, Qian Zhang.

Authors and Affiliations

Beijing Tongren Eye Center, Beijing Key Laboratory of Intraocular Tumor Diagnosis and Treatment, Beijing Ophthalmology＆Visual Sciences Key Lab, Medical Artificial Intelligence Research and Verification Key Laboratory of the Ministry of Industry and Information Technology, Beijing Key Laboratory of Intelligent Diagnosis, Treatment and Prevention of Blinding Eye Diseases, Beijing Tongren Hospital, Capital Medical University, Beijing, China
Jialu Geng, Li Dong, Shiqi Hui, Ruiheng Zhang, Xue Jiang, Mingyang Wang & Dongmei Li
Gyenno Science Co. Ltd., Shenzhen, China
Kai Zhang
Department of Pathology, Beijing Tongren Hospital Affiliated with Capital Medical University, Beijing Key Laboratory of Head and Neck Pathology Diagnosis, Beijing, China
Qian Zhang, Hong Zhang, Yunyun Yang & Yingshi Piao
State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, China
Zhixi Li
Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Zhixi Li
Senior Department of Ophthalmology, The Third Medical Center of Chinese PLA General Hospital, Beijing, China
Shuantao Sun & Xinji Yang

Authors

Jialu Geng
View author publications
Search author on:PubMed Google Scholar
Kai Zhang
View author publications
Search author on:PubMed Google Scholar
Li Dong
View author publications
Search author on:PubMed Google Scholar
Shiqi Hui
View author publications
Search author on:PubMed Google Scholar
Qian Zhang
View author publications
Search author on:PubMed Google Scholar
Zhixi Li
View author publications
Search author on:PubMed Google Scholar
Ruiheng Zhang
View author publications
Search author on:PubMed Google Scholar
Xue Jiang
View author publications
Search author on:PubMed Google Scholar
Mingyang Wang
View author publications
Search author on:PubMed Google Scholar
Shuantao Sun
View author publications
Search author on:PubMed Google Scholar
Hong Zhang
View author publications
Search author on:PubMed Google Scholar
Yunyun Yang
View author publications
Search author on:PubMed Google Scholar
Xinji Yang
View author publications
Search author on:PubMed Google Scholar
Yingshi Piao
View author publications
Search author on:PubMed Google Scholar
Dongmei Li
View author publications
Search author on:PubMed Google Scholar

Contributions

D.M.L., Y.S.P, and X.J.Y., initiated the study. J.L.G., K.Z., L.D., Q.Z., S.Q.H. designed and performed the experiments. K.Z. and Z.X.L. wrote the code. S.T.S., H.Z., and Y.Y.Y. collected experimental data. R.H.Z., X.J. and M.Y.W. reviewed the experiment results. D.M.L., Q.Z., L.D., R.H.Z., and X.J. critically reviewed and commended the manuscript. All authors contributed to the preparation of the manuscript.

Corresponding authors

Correspondence to Xinji Yang, Yingshi Piao or Dongmei Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Geng, J., Zhang, K., Dong, L. et al. AI assistance enhances histopathologic distinction between sebaceous and squamous cell carcinoma of the eyelid. npj Digit. Med. 8, 406 (2025). https://doi.org/10.1038/s41746-025-01775-z

Download citation

Received: 05 November 2024
Accepted: 04 June 2025
Published: 04 July 2025
DOI: https://doi.org/10.1038/s41746-025-01775-z