2.Neural network for auditory speech enhancement featuring feedback-driven attention and lateral inhibition.
Yudong CAI ; Xue LIU ; Xiang LIAO ; Yi ZHOU
Journal of Biomedical Engineering 2025;42(1):82-89
The processing mechanism of the human brain for speech information is a significant source of inspiration for the study of speech enhancement technology. Attention and lateral inhibition are key mechanisms in auditory information processing that can selectively enhance specific information. Building on this, the study introduces a dual-branch U-Net that integrates lateral inhibition and feedback-driven attention mechanisms. Noisy speech signals input into the first branch of the U-Net led to the selective feedback of time-frequency units with high confidence. The generated activation layer gradients, in conjunction with the lateral inhibition mechanism, were utilized to calculate attention maps. These maps were then concatenated to the second branch of the U-Net, directing the network's focus and achieving selective enhancement of auditory speech signals. The evaluation of the speech enhancement effect was conducted by utilising five metrics, including perceptual evaluation of speech quality. This method was compared horizontally with five other methods: Wiener, SEGAN, PHASEN, Demucs and GRN. The experimental results demonstrated that the proposed method improved speech signal enhancement capabilities in various noise scenarios by 18% to 21% compared to the baseline network across multiple performance metrics. This improvement was particularly notable in low signal-to-noise ratio conditions, where the proposed method exhibited a significant performance advantage over other methods. The speech enhancement technique based on lateral inhibition and feedback-driven attention mechanisms holds significant potential in auditory speech enhancement, making it suitable for clinical practices related to artificial cochleae and hearing aids.
Humans
;
Attention/physiology*
;
Speech Perception/physiology*
;
Neural Networks, Computer
;
Speech
;
Noise
;
Feedback
3.Study on speech imagery electroencephalography decoding of Chinese words based on the CAM-Net model.
Xiaolong LIU ; Banghua YANG ; An'an GAN ; Jie ZHANG
Journal of Biomedical Engineering 2025;42(3):473-479
Speech imagery is an emerging brain-computer interface (BCI) paradigm with potential to provide effective communication for individuals with speech impairments. This study designed a Chinese speech imagery paradigm using three clinically relevant words-"Help me", "Sit up" and "Turn over"-and collected electroencephalography (EEG) data from 15 healthy subjects. Based on the data, a Channel Attention Multi-Scale Convolutional Neural Network (CAM-Net) decoding algorithm was proposed, which combined multi-scale temporal convolutions with asymmetric spatial convolutions to extract multidimensional EEG features, and incorporated a channel attention mechanism along with a bidirectional long short-term memory network to perform channel weighting and capture temporal dependencies. Experimental results showed that CAM-Net achieved a classification accuracy of 48.54% in the three-class task, outperforming baseline models such as EEGNet and Deep ConvNet, and reached a highest accuracy of 64.17% in the binary classification between "Sit up" and "Turn over". This work provides a promising approach for future Chinese speech imagery BCI research and applications.
Humans
;
Electroencephalography/methods*
;
Brain-Computer Interfaces
;
Neural Networks, Computer
;
Speech/physiology*
;
Algorithms
;
Male
;
Adult
;
Imagination
4.Research on bimodal emotion recognition algorithm based on multi-branch bidirectional multi-scale time perception.
Peiyun XUE ; Sibin WANG ; Jing BAI ; Yan QIANG
Journal of Biomedical Engineering 2025;42(3):528-536
Emotion can reflect the psychological and physiological health of human beings, and the main expression of human emotion is voice and facial expression. How to extract and effectively integrate the two modes of emotion information is one of the main challenges faced by emotion recognition. In this paper, a multi-branch bidirectional multi-scale time perception model is proposed, which can detect the forward and reverse speech Mel-frequency spectrum coefficients in the time dimension. At the same time, the model uses causal convolution to obtain temporal correlation information between different scale features, and assigns attention maps to them according to the information, so as to obtain multi-scale fusion of speech emotion features. Secondly, this paper proposes a two-modal feature dynamic fusion algorithm, which combines the advantages of AlexNet and uses overlapping maximum pooling layers to obtain richer fusion features from different modal feature mosaic matrices. Experimental results show that the accuracy of the multi-branch bidirectional multi-scale time sensing dual-modal emotion recognition model proposed in this paper reaches 97.67% and 90.14% respectively on the two public audio and video emotion data sets, which is superior to other common methods, indicating that the proposed emotion recognition model can effectively capture emotion feature information and improve the accuracy of emotion recognition.
Humans
;
Emotions
;
Algorithms
;
Facial Expression
;
Time Perception
;
Neural Networks, Computer
;
Speech
5.Effects of speech duration and voice volume on the respiratory aerosol particle concentration.
Tomoki TAKANO ; Yiming XIANG ; Masayuki OGATA ; Yoshihide YAMAMOTO ; Satoshi HORI ; Shin-Ichi TANABE
Environmental Health and Preventive Medicine 2025;30():14-14
BACKGROUND:
SARS-CoV-2 (COVID-19) is transmitted via infectious respiratory particles. Infectious respiratory particles are released when an infected person breathes, coughs, or speaks. Several studies have measured respiratory particle concentrations through focusing on activities such as breathing, coughing, and short speech. However, few studies have investigated the effect of speech duration.
METHODS:
This study aimed to clarify the effects of speech duration and volume on the respiratory particle concentration. Study participants were requested to speak at three voice volumes across five speech durations, generating 15 speech patterns. Participants spoke inside a clean booth where particle concentrations and voice volumes were measured and analyzed during speech.
RESULTS:
Our findings suggest that as speech duration increased, the aerosol number concentration also increased. Through focusing on individual differences, we considered there might be super-emitters who emit more aerosol particles than the average human. Two participants were identified as statistical outliers (aerosol number concentration, n = 1; mass concentration, n = 1).
CONCLUSIONS
Considering speech duration may improve our understanding of respiratory particle concentration dynamics. Two participants were identified as potential super-emitters.
Humans
;
Male
;
Speech/physiology*
;
Adult
;
Female
;
COVID-19/transmission*
;
Respiratory Aerosols and Droplets
;
Voice
;
SARS-CoV-2
;
Time Factors
;
Young Adult
;
Aerosols/analysis*
6.Perception of Mandarin aspirated/unaspirated consonants in children with cochlear implants.
Yani LI ; Qun LI ; Jian WEN ; Lin LI ; Yun ZHENG
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(4):312-318
Objective:This study aims to investigate the perception of Mandarin aspirated and unaspirated consonants by children with cochlear implants (CIs) under quiet and noisy conditions. It also examines factors that may affect their acquisition, such as auditory conditions, place of articulation, manner of articulation, chronological age, age at implantation, and non-verbal intelligence. Methods:Twenty-eight CI children aged 3 to 5 years who received implantation from 2018 to 2023 were recruited. Additionally, 88 peers with normal hearing (NH) were recruited as controls. Both groups participated in a perception test for aspirated/unaspirated consonants under quiet and noisy conditions, along with tests for speech recognition, speech production, and non-verbal intelligence. The study analyzed the effects of group (CI vs. NH), auditory condition, and consonant characteristics on children's perception of aspirated/unaspirated consonants in Mandarin, as well as the factors contributing to CI children's acquisition of these consonants. Results:①CI children's ability to perceive aspirated/unaspirated consonants was significantly poorer than that of their NH peers (χ²= 14.16, P<0.01), and their perception accuracy was influenced by the acoustic features of consonants (P<0.01); ②CI children's consonant perception abilities were adversely affected by noise (P<0.01), with accuracy in noisy conditions particularly influenced by the manner of articulation (P<0.05); ③The age at implantation significantly affected CI children's ability to perceive aspirated/unaspirated consonants (β= -0.223, P=0.012), with earlier implantation associated with better performance. Conclusion:It takes time for CI children to acquire Mandarin aspirated/unaspirated consonants, and early implantation shows many advantages, especially for the perception ability of fine speech features.
Humans
;
Cochlear Implants
;
Child, Preschool
;
Speech Perception
;
Cochlear Implantation
;
Male
;
Female
;
Language
7.Analysis of cochlear reimplantation surgery and factors influencing postoperative auditory and speech function.
Qingling BI ; Zhongyan CHEN ; Yong LYU ; Wenjing YANG ; Xiaoyu XU ; Yan LI ; Yuan LI
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(5):419-424
Objective:The aim of this study was to present an institution's experience with cochlear reimplantation(CRI), to assess surgical challenges and post-operative outcomes and to increase the success rate of CRI. Methods:We retrospectively evaluated data from 76 reimplantation cases treated in a tertiary center between 2001 and 2022. Clinical features include caused of CRI, type of failure, surgical issues, and auditory speech performance were analyzed. Categorical Auditory Performance (CAP) and Speech Intelligibility Rating (SIR) scores were used to evaluate pre-and post-CRI outcomes. Our center's consecutive cohort of 1 126 patients had seven patients, while 69 patients were from other cochlear implant centers. Device failure was the most common cause of CRI(68/76), with the remaining cases including flap complications(3/76), magnet displacement(3/76), secondary meningitis(1/76), and foreign bodies around the implant(1/76). Postoperative auditory and speech outcome improved in 31.6%(24/76) of patients, remained unchanged in 63.2%(48/76), and decreased in CAP and SIR scores in 5.2%(4/76) of patients. Postoperatively, the seven patients with cochlear ossification and fibrosis scored lower on the overall CAP and SIR scale than non-ossification individuals, which is a significant factor in surgical success rates and auditory-speech outcomes. Conclusion:CRI surgery is a challenging but relatively safe procedure, and most reimplanted patients experience favorable postoperative outcomes. Medical complications and intracochlear damage are the main causes of poor postoperative results. Therefore, minimally invasive CI has a positive significance for reducing the difficulty of CRI surgery and improving the CI performance.
Humans
;
Cochlear Implantation/methods*
;
Retrospective Studies
;
Cochlear Implants
;
Male
;
Female
;
Postoperative Period
;
Treatment Outcome
;
Adult
;
Speech
;
Middle Aged
;
Postoperative Complications
;
Replantation
;
Cochlea/surgery*
8.Comparison and study of multiple scales results in children with cochlear reimplantation, mainly the speech, spatial, and other qualities of hearing scale for parents.
Tian NI ; Jinyuan SI ; Haotian LIU ; Xinyi YAO ; Xiangling ZHANG ; Huilin YIN ; Lin ZHANG ; Xiuyong DING ; Yu ZHAO
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(5):433-442
Objective:To compare the outcomes of multiple scales, primarily the speech, spatial, and other qualities of hearing scale for parents(SSQ-P), in children with ipsilateral vs. Contralateral cochleareimplantat ion(CRI). Methods: A total of 69 children who received cochlear implantation surgery from April 1999 to June 2024 were included. Patients were divided into two groups based on whether the implantation was on the same side. General information such as gender, age, age at initial implantation and reimplantation was collected. The primary caregivers of the children were followed up by telephone using the categories of auditory performance(CAP), speech intelligibility rating(SIR), and SSQ-P questionnaires. Statistical methods including stepwise regression, linear regression, and permutation tests were employed to investigate if there were any statistically significant differences in the scores of CAP, SIR, SSQ-P total, SSQ-P speech perception, SSQ-P spatial hearing, and SSQ-P auditory quality dimensions between the ipsilateral and contralateral reimplantation groups. Results:Of the 69 children included, 62 were in the ipsilateral reimplantation group with a mean age of 11.1 years, and 7 were in the contralateral reimplantation group with a mean age of 11.7 years. Statistical analysis showed that patients in the contralateral reimplantation group had significantly lower SSQ-P total scores (P<0.05) and spatial hearing dimension scores (P<0.05) than those in the ipsilateral reimplantation group after controlling for the corresponding confounders. Conclusion:The effect of ipsilateral reimplantation of cochlear implants is superior to that of contralateral reimplantation in terms of overall auditory function and spatial hearing in daily life for children, but the mechanisms require further investigation.
Humans
;
Cochlear Implantation
;
Child
;
Parents
;
Speech Perception
;
Male
;
Cochlear Implants
;
Female
;
Hearing
;
Surveys and Questionnaires
;
Speech
;
Child, Preschool
9.Analyzing the factors influencing speech recognition ability in patients with age-related hearing loss.
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(7):657-666
Objective:To explore various factors influencing speech recognition ability in patients with age-related hearing loss(ARHL) and to investigate the correlation between speech recognition ability and cognitive function. Methods:This case-control study enrolled 150 ARHL patients(experimental group) and 132 normal-hearing controls. Participants underwent relevant assessments of auditory function, cognitive function, and tinnitus severity. Various statistical analyses were performed to evaluate the results. Results:①The PBmax and MoCA scores were significantly lower in the ARHL group compared to the control group(P<0.05). ②PBmax in the ARHL group was significantly influenced by multiple factors(P<0.05). ③Negative correlations were observed between PBmax in the ARHL group and age, degree of hearing loss, duration of the disease, duration of the worst hearing loss, smoking status, and tinnitus severity(P<0.05), while positive correlations were found between PBmax and education level, occupation type, frequency of verbal communication, and cognitive function level(P<0.05). ④Higher education level, frequent verbal communication, and high cognitive function level were protective factors for PBmax in ARHL patients(P<0.05), whereas the other factors were independent risk factors(P<0.05). ⑤A significant correlation was found between PBmax and MoCA scores in the ARHL group, and this correlation between cognitive function and speech recognition ability remained significant across different degrees of hearing loss(<0.05). Conclusion:Speech recognition ability in ARHL patients is influenced by multiple factors. Cognitive function demonstrates a robust, bidirectional association with speech recognition ability, even after adjusting for hearing loss severity.
Humans
;
Case-Control Studies
;
Middle Aged
;
Male
;
Female
;
Aged
;
Speech Perception
;
Cognition
;
Presbycusis/physiopathology*
;
Adult
;
Hearing Loss
10.Tagalog sentence repetition test: Content validation and pilot testing with Metro Manila speakers aged 7-21
Hannah Maria D. Albert ; Ellyn Cassey K. Chua
Philippine Journal of Health Research and Development 2024;28(1):18-24
Background:
Speech sound disorders (SSD) refer to difficulties in perceiving, mentally representing, and/or articulating speech sounds. In 2018, the Tagalog Sentence Repetition Test (SRT) was developed due to the lack of a commercially available local assessment tool for children with suspected SSDs. The SRT had not been validated or piloted yet.
Objectives:
This study aimed to determine the SRT’s content validity (comprehensiveness, relevance, comprehensibility), ability to successfully elicit the target sounds, and logistical feasibility and flaws.
Methodology:
All procedures were conducted online. Three linguists evaluated the comprehensiveness of the sounds covered, while 31 Manila Tagalog-speaking children (7 to 21 years old) participated in pilot testing. Post-testing, the children answered a questionnaire to evaluate their familiarity with the sentences’ words (relevance) and the comprehensibility of the test instructions. Content validity was assessed by computing the Content Validity Index (CVI). To see how well the test elicits the target sounds, the number of participants who produced each sound were computed.
Results:
A CVI of 1.0 was obtained for all aspects of content validity. All targets were produced by almost all the participants, except for the final glottal stop (18/31, 58%). The test administration seemed feasible as participants from all age groups successfully executed the task.
Conclusion
Although the SRT exhibited good content validity, some sentences need to be revised to address sound production issues noted during the pilot. This new version should be re-piloted to 7 to 11-year-olds in-person and via teleconferencing. A manual should also be created to facilitate administration.
Speech Disorders
;
Speech Production Measurement


Result Analysis
Print
Save
E-mail