Fig. 2: Results for skin image selection step of the STAR-ED framework.
From: Skin Tone Analysis for Representation in Educational Materials (STAR-ED) using machine learning

Once the images are extracted from the materials, the selection step aims to identify skin images and discards non-skin images (e.g., pathology images). To this end, we extracted a set of features: Histogram of Oriented Gradient (HoG) (23) and mean and standard deviations of image channels in CIELAB (24) color space. A This shows the Principal Component Analysis (PCA) visualizations of skin (green) and non-skin (red) images in the two datasets (DermEducation and Medical Textbooks) used for the validation of the selection step. Legend: Red dot – Non-skin; Green ddot - Skin. B This demonstrates encouraging performance in identifying skin images in DermEducation using Support Vector Machines (SVM) (18) and Extreme Gradient Boosting (XGBoost) (9) classifiers in a five-fold stratified cross-validation setting. Legend: Red bar – SVM; Green bar – XGB. C It shows the comparative performance of these two classifiers when they are used in four dermatology textbooks as an external test. The overall results confirm the benefit of machine learning approaches to identify skin images, and competitive performance is achieved between SVM and XGB classifiers while the latter has a slight advantage and is used in STAR-ED.