Accuracy of multi-task network based on vision Transformer in the three-dimensional upper airway analysis
10.3760/cma.j.cn112144-20240514-00205
- VernacularTitle:基于视觉深度自注意力网络的多任务模型分析三维上气道的准确性研究
- Author:
Suhan JIN
1
;
Haojie HAN
;
Fang CHEN
;
Xiaoyan GUAN
;
Fang HUA
;
Hong HE
Author Information
1. 武汉大学口腔医(学)院 口颌系统重建与再生全国重点实验室 口腔生物医学教育部重点实验室 口腔医学湖北省重点实验室,武汉 430079
- Keywords:
Artificial intelligence;
Cone-beam computed tomography;
Image processing, computer-assisted;
Deep learning;
Upper airway
- From:
Chinese Journal of Stomatology
2024;59(9):911-918
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To explore the accuracy of a multi-task model based on vision Transformer for analyzing the three-dimensional (3D) upper airway and its subregions, and to evaluate its clinical applicability.Methods:According to the inclusion and exclusion criteria, cone-beam CT (CBCT) data of 10 patients [4 males and 6 females, (20.8±2.7) years] who had their first visit to the Department of Orthodontics in the Hospital of Stomatology, Wuhan University from January 2012 to January 2020 were retrospectively selected. The 3D slicer software was used to segment the upper airway and pharyngeal airway and measure their volumes as the gold standard. The Dolphin 3D software was used to segment the pharyngeal airway and its subregions and measure their volumes as the gold standard. A multi-task model based on vision Transformer developed by the research team for automatic segmentation and volume measurement of the upper airway and its subregions. All the measurements were conducted by the same attending physician. The Bland-Altman analysis and intraclass correlation coefficient ( ICC) were used to evaluate the consistency between the multi-task network and the gold standard in the upper airway segmentation and volume measurements, and the paired t test was used to compare the differences between the multi-tasking model and the gold standard. Results:The mean volume deviation of the upper airway segmented by multi-task model and 3D Slicer was -979.6 mm 3, and the ICC was 0.97. The mean volume deviation of the pharyngeal airway, nasopharynx, velopharynx, glossopharynx and hypopharynx segmented by multi-task network and Dolphin 3D were 2 069.5, -950.1, -823.6, -813.9 and 4 003.4 mm 3, respectively. In addition, ICC in pharyngeal airway, nasopharynx, velopharynx, glossopharynx and hypopharynx were 0.97, 0.94, 0.96, 0.96 and 0.69, respectively. Conclusions:The multi-task model based on vision Transformer produced different errors in the segmentation of 3D upper airway and its subregions. The segmentation of the nasopharynx, velopharynx and glossopharynx was in good agreement with the gold standard, while the segmentation of hypopharynx was poor, suggesting that the robustness and generalization of this model should be further enhanced.