Emotional time-based detection of patients with bipolar disorder based on deep learning speech analysis

Zhiying LI; Jun JI; Shuzhe ZHOU; Jiaqi LI; Xinhui LI; Chaonan FENG; Lili GUAN; Zaohui MA; Yantao MA

Return

Emotional time-based detection of patients with bipolar disorder based on deep learning speech analysis

VernacularTitle:基于深度学习语音分析的双相障碍患者情绪时相检测
Author: Zhiying LI ¹ ; Jun JI ; Shuzhe ZHOU ; Jiaqi LI ; Xinhui LI ; Chaonan FENG ; Lili GUAN ; Zaohui MA ; Yantao MA
Author Information

1. 北京大学第六医院临床精神病学研究室　北京大学精神卫生研究所　国家卫生健康委员会精神卫生学重点实验室（北京大学）国家精神心理疾病临床医学研究中心（北京大学第六医院），北京　100191
Publication Type:Journal Article
Keywords: Bipolar disorder; Voice; Mood states; Deep learning; LIGHT-SERNET-based
From: Chinese Journal of Psychiatry 2024;57(4):207-212
CountryChina
Language:Chinese
Abstract: Objective:To utilize a deep learning approach based on speech to distinguish between depressive and manic mood states in patients with bipolar disorder (BD).Methods:Sixty-one BD patients who visited the outpatient department of psychiatry at Peking University Sixth Hospital were recruited to participate in the study from June 2018 to March 2022. Quick Inventory of Depressive Symptomatology, Mood Disorder Questionnaire and Young Mania Rating Scale were used to determine patients′ mood states. The voices of the patients were recorded, including 190 samples during the patient′s remission, depressive, and manic mood period respectively. A total of 136 features were extracted from the voice samples, including Mel-frequency cepstral coefficients and zero-crossing rates using the speech analysis library in Python. A LIGHT-SERNET-based network was then used to train a model for emotion classification. Accuracy is used to evaluate the performance of the model, using sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and receiver operating characteristic curve (ROC) to evaluate the predictive results of model for three mood states. Kruskal-Wallis H tests or χ 2 tests were conducted to compare the differences among the demographic information of three groups. Results:There were statistically significant differences among the three groups in age ( H=25.83, P<0.001), years of education ( H=25.25, P<0.001) and marital status (χ 2=23.81, P<0.001). There is no significant difference in gender (χ 2=4.63, P=0.099). The accuracy of the model in detecting the three emotional states was 0.84. The sensitivity and specificity in detecting remission were 0.88 and 0.93, respectively, and the positive predictive value and negative predictive value were 0.87 and 0.94, respectively. The sensitivity and specificity in detecting depressive episodes were 0.82 and 0.92, respectively, and the positive predictive value and negative predictive value were 0.84 and 0.92, respectively. The sensitivity and specificity in detecting manic episodes were 0.82 and 0.91, respectively, and the positive predictive value and negative predictive value were 0.83 and 0.91, respectively. The areas of the receiver operation characteristic curve for the three mood states were similar and all exceeded 0.90. Conclusion:The LIGHT-SERNET-based deep learning model shows good discrimination ability between depressive and manic mood states based on speech analysis.