Scale-invariant feature-enhanced deep learning framework for oral mucosal lesion segmentation

Rui ZHANG; Lu JIN; Qianming CHEN; Tingting DING; Qiyue ZHANG; Yaowu CHEN; Xiang TIAN; Yuqi CAO; Xiaoyan CHEN; Fudong ZHU

Return

Scale-invariant feature-enhanced deep learning framework for oral mucosal lesion segmentation

VernacularTitle:尺度不变特征增强深度学习在口腔黏膜病损分割中应用的研究
Author: Rui ZHANG ¹ ; Lu JIN ; Qianming CHEN ; Tingting DING ; Qiyue ZHANG ; Yaowu CHEN ; Xiang TIAN ; Yuqi CAO ; Xiaoyan CHEN ; Fudong ZHU
Author Information

1. 浙江大学医学院附属口腔医院信息中心·浙江大学口腔医学院　浙江省口腔疾病临床医学研究中心　浙江省口腔生物医学研究重点实验室　浙江大学癌症研究院　口腔生物材料与器械浙江省工程研究中心，杭州　310005
Publication Type:Journal Article
Keywords: Oral mucosa; Lesion detection; Deep learning; PixelSIFT-UNet model; Scale invariant feature transform algorithm
From: Chinese Journal of Stomatology 2025;60(3):239-247
CountryChina
Language:Chinese
Abstract: Objective:To develop PixelSIFT-UNet, a novel semantic segmentation model that integrates deep learning with scale-invariant feature transform (SIFT) algorithm to improve the segmentation accuracy of oral mucosal lesions.Methods:This investigation utilized 838 standard clinical white light images of oral mucosal diseases acquired from January 2020 to December 2022 at the Stomatology Hospital Zhejiang University School of Medicine. Randomization was achieved through Python′s random.seed function implementation. The random sample function was subsequently applied for sampling distribution. The dataset was stratified into three subsets with a 6∶2∶2 ratio: training ( n=506), validation ( n=166), and testing ( n=166). Lesion boundaries were annotated using Labelme software, and a PixelSIFT-UNet-based deep learning model was developed with VGG-16 and ResNet-50 backbone networks. Model parameters were optimized using the validation set, and performance metrics [including Dice coefficient, mean intersection over union (mIoU), mean pixel accuracy (mPA), and Precision] were assessed on the test set. The model′s performance was benchmarked against conventional semantic segmentation frameworks (U-Net and PSPNet). Results:The developed PixelSIFT-UNet model could achieve precise segmentation of three common oral mucosal lesions: oral lichen planus, oral leukoplakia, and oral submucous fibrosis. Utilizing VGG-16 as the backbone network, the model achieved Dice coefficient, mIoU, mPA, and Precision values of 0.642, 0.699, 0.836, and 0.792, respectively. Implementation with ResNet-50 backbone network yielded metrics of 0.668, 0.733, 0.872 and 0.817, demonstrating significant improvements across all performance indicators compared to conventional U-Net model (relevant metrics: 0.662, 0.717, 0.861 and 0.809) and PSPNet model (relevant metrics: 0.671, 0.721, 0.858 and 0.813).Conclusions:The proposed PixelSIFT-UNet architecture demonstrates superior performance in oral mucosal lesion segmentation tasks, surpassing conventional semantic segmentation models and providing robust quantitative improvements in segmentation accuracy.