Preliminary Study on Detecting Vocal Disorders Using Deep Learning in Laryngology
10.22469/jkslp.2025.36.1.5
- Author:
Kwang Hyeon KIM
1
;
Jae-Keun CHO
Author Information
1. Clinical Research Support Center, Inje University Ilsan Paik Hospital, Goyang, Korea
- Publication Type:Original Article
- From:Journal of the Korean Society of Laryngology Phoniatrics and Logopedics
2025;36(1):5-11
- CountryRepublic of Korea
- Language:Korean
-
Abstract:
Background and Objectives:Voice disorders can significantly impact quality of life. This study evaluates the feasibility of using deep learning models to detect voice disorders using an opensource dataset.Materials and Method We utilized the Saarbrücken Voice Database, which contains 1231 voice recordings of various pathologies. Datasets were used for training (n=1036) and validation (n=195). Key vocal parameters, including fundamental frequency (F0), formants (F1, F2), harmonics-to-noise ratio, jitter, and shimmer, were analyzed. A convolutional neural network (CNN) was designed to classify voice recordings into normal, vox senilis, and laryngocele. Performance was assessed using precision, recall, F1-score, and accuracy.
Results:The CNN model demonstrated high classification performance, with precision, recall, and F1-scores of 1.00 for normal and 0.99 for vox senilis and laryngocele. Accuracy reached 1.00 after 50 epochs and remained stable through 100 epochs. Time-frequency analysis supported the model’s ability to differentiate between classes.
Conclusion:This study highlights the potential of deep learning for voice disorder detection, achieving high accuracy and precision. Future research should address dataset diversity and realworld integration for broader clinical adoption.