Preliminary Study on Detecting Vocal Disorders Using Deep Learning in Laryngology
	    		
		   		
		   			
		   		
	    	
    	 
    	10.22469/jkslp.2025.36.1.5
   		
        
        	
        	
        	
        		- Author:
	        		
		        		
		        		
			        		Kwang Hyeon KIM
			        		
			        		
			        		
			        			1
			        			
			        		
			        		
			        		
			        		
			        		;
		        		
		        		
		        		
			        		Jae-Keun CHO
			        		
			        		
		        		
		        		
		        		
		        		
		        			
			        		
			        		Author Information
			        		
		        		
		        		
			        		
			        		
			        			1. Clinical Research Support Center, Inje University Ilsan Paik Hospital, Goyang, Korea
			        		
		        		
	        		
        		 
        	
        	
        	
        		- Publication Type:Original Article
 
        	
        	
            
            
            	- From:Journal of the Korean Society of Laryngology Phoniatrics and Logopedics
	            		
	            		 2025;36(1):5-11
	            	
            	
 
            
            
            	- CountryRepublic of Korea
 
            
            
            	- Language:Korean
 
            
            
            	- 
		        	Abstract:
			       	
			       		
				        
				        	 Background and Objectives:Voice disorders can significantly impact quality of life. This study evaluates the feasibility of using deep learning models to detect voice disorders using an opensource dataset.Materials and Method We utilized the Saarbrücken Voice Database, which contains 1231 voice recordings of various pathologies. Datasets were used for training (n=1036) and validation (n=195). Key vocal parameters, including fundamental frequency (F0), formants (F1, F2), harmonics-to-noise ratio, jitter, and shimmer, were analyzed. A convolutional neural network (CNN) was designed to classify voice recordings into normal, vox senilis, and laryngocele. Performance was assessed using precision, recall, F1-score, and accuracy. 
				        	
				        
				        	Results:The CNN model demonstrated high classification performance, with precision, recall, and F1-scores of 1.00 for normal and 0.99 for vox senilis and laryngocele. Accuracy reached 1.00 after 50 epochs and remained stable through 100 epochs. Time-frequency analysis supported the model’s ability to differentiate between classes. 
				        	
				        
				        	Conclusion:This study highlights the potential of deep learning for voice disorder detection, achieving high accuracy and precision. Future research should address dataset diversity and realworld integration for broader clinical adoption.