1.Neural network for auditory speech enhancement featuring feedback-driven attention and lateral inhibition.
Yudong CAI ; Xue LIU ; Xiang LIAO ; Yi ZHOU
Journal of Biomedical Engineering 2025;42(1):82-89
The processing mechanism of the human brain for speech information is a significant source of inspiration for the study of speech enhancement technology. Attention and lateral inhibition are key mechanisms in auditory information processing that can selectively enhance specific information. Building on this, the study introduces a dual-branch U-Net that integrates lateral inhibition and feedback-driven attention mechanisms. Noisy speech signals input into the first branch of the U-Net led to the selective feedback of time-frequency units with high confidence. The generated activation layer gradients, in conjunction with the lateral inhibition mechanism, were utilized to calculate attention maps. These maps were then concatenated to the second branch of the U-Net, directing the network's focus and achieving selective enhancement of auditory speech signals. The evaluation of the speech enhancement effect was conducted by utilising five metrics, including perceptual evaluation of speech quality. This method was compared horizontally with five other methods: Wiener, SEGAN, PHASEN, Demucs and GRN. The experimental results demonstrated that the proposed method improved speech signal enhancement capabilities in various noise scenarios by 18% to 21% compared to the baseline network across multiple performance metrics. This improvement was particularly notable in low signal-to-noise ratio conditions, where the proposed method exhibited a significant performance advantage over other methods. The speech enhancement technique based on lateral inhibition and feedback-driven attention mechanisms holds significant potential in auditory speech enhancement, making it suitable for clinical practices related to artificial cochleae and hearing aids.
Humans
;
Attention/physiology*
;
Speech Perception/physiology*
;
Neural Networks, Computer
;
Speech
;
Noise
;
Feedback
2.Study on speech imagery electroencephalography decoding of Chinese words based on the CAM-Net model.
Xiaolong LIU ; Banghua YANG ; An'an GAN ; Jie ZHANG
Journal of Biomedical Engineering 2025;42(3):473-479
Speech imagery is an emerging brain-computer interface (BCI) paradigm with potential to provide effective communication for individuals with speech impairments. This study designed a Chinese speech imagery paradigm using three clinically relevant words-"Help me", "Sit up" and "Turn over"-and collected electroencephalography (EEG) data from 15 healthy subjects. Based on the data, a Channel Attention Multi-Scale Convolutional Neural Network (CAM-Net) decoding algorithm was proposed, which combined multi-scale temporal convolutions with asymmetric spatial convolutions to extract multidimensional EEG features, and incorporated a channel attention mechanism along with a bidirectional long short-term memory network to perform channel weighting and capture temporal dependencies. Experimental results showed that CAM-Net achieved a classification accuracy of 48.54% in the three-class task, outperforming baseline models such as EEGNet and Deep ConvNet, and reached a highest accuracy of 64.17% in the binary classification between "Sit up" and "Turn over". This work provides a promising approach for future Chinese speech imagery BCI research and applications.
Humans
;
Electroencephalography/methods*
;
Brain-Computer Interfaces
;
Neural Networks, Computer
;
Speech/physiology*
;
Algorithms
;
Male
;
Adult
;
Imagination
3.Effects of speech duration and voice volume on the respiratory aerosol particle concentration.
Tomoki TAKANO ; Yiming XIANG ; Masayuki OGATA ; Yoshihide YAMAMOTO ; Satoshi HORI ; Shin-Ichi TANABE
Environmental Health and Preventive Medicine 2025;30():14-14
BACKGROUND:
SARS-CoV-2 (COVID-19) is transmitted via infectious respiratory particles. Infectious respiratory particles are released when an infected person breathes, coughs, or speaks. Several studies have measured respiratory particle concentrations through focusing on activities such as breathing, coughing, and short speech. However, few studies have investigated the effect of speech duration.
METHODS:
This study aimed to clarify the effects of speech duration and volume on the respiratory particle concentration. Study participants were requested to speak at three voice volumes across five speech durations, generating 15 speech patterns. Participants spoke inside a clean booth where particle concentrations and voice volumes were measured and analyzed during speech.
RESULTS:
Our findings suggest that as speech duration increased, the aerosol number concentration also increased. Through focusing on individual differences, we considered there might be super-emitters who emit more aerosol particles than the average human. Two participants were identified as statistical outliers (aerosol number concentration, n = 1; mass concentration, n = 1).
CONCLUSIONS
Considering speech duration may improve our understanding of respiratory particle concentration dynamics. Two participants were identified as potential super-emitters.
Humans
;
Male
;
Speech/physiology*
;
Adult
;
Female
;
COVID-19/transmission*
;
Respiratory Aerosols and Droplets
;
Voice
;
SARS-CoV-2
;
Time Factors
;
Young Adult
;
Aerosols/analysis*
4.The influence of vowel and sound intensity on the results of voice acoustic formant detection was analyzed.
Bing XIE ; Zhe LI ; Hongxing WANG ; Xuyuan KUANG ; Wei NI ; Runqi ZHONG ; Yan LI
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2024;38(12):1149-1153
Objective:This study aims to explore the influence of vowels and sound intensity on formant, so as to provide reference for the selection of sound samples and vocal methods in acoustic detection. Methods:Thirty-eight healthy subjects, 19 male and 19 female, aged 19-24 years old were recruited. The formants of different vowels(/a/, //, /i/ and /u/) and different sound intensities(lowest sound, comfort sound, highest true sound and highest falsetto sound) were analyzed, and pairings were compared between groups with significant differences. Results:①The vowels /a/ and // in the first formant were larger than /i/ and /u/, and /i/ was the largest in the second formant. The minimum value of the first formant is the lowest sound of /i/ and the maximum is the highest sound of /a/. ②In the first formant, the chest sound area increases with the increase of sound intensity, while the second formant enters the highest falsetto and decreases significantly. Conclusion:Different vowels and sound intensity have different distribution of formant, that is, vowel and sound intensity have different degree of influence on formant. According to the extreme value of the first formant, the maximum normal range is determined initially, which is helpful to improve the acoustic detection.
Humans
;
Male
;
Female
;
Young Adult
;
Speech Acoustics
;
Voice Quality
;
Phonetics
;
Voice/physiology*
;
Adult
5.Compensation or Preservation? Different Roles of Functional Lateralization in Speech Perception of Older Non-musicians and Musicians.
Xinhu JIN ; Lei ZHANG ; Guowei WU ; Xiuyi WANG ; Yi DU
Neuroscience Bulletin 2024;40(12):1843-1857
Musical training can counteract age-related decline in speech perception in noisy environments. However, it remains unclear whether older non-musicians and musicians rely on functional compensation or functional preservation to counteract the adverse effects of aging. This study utilized resting-state functional connectivity (FC) to investigate functional lateralization, a fundamental organization feature, in older musicians (OM), older non-musicians (ONM), and young non-musicians (YNM). Results showed that OM outperformed ONM and achieved comparable performance to YNM in speech-in-noise and speech-in-speech tasks. ONM exhibited reduced lateralization than YNM in lateralization index (LI) of intrahemispheric FC (LI_intra) in the cingulo-opercular network (CON) and LI of interhemispheric heterotopic FC (LI_he) in the language network (LAN). Conversely, OM showed higher neural alignment to YNM (i.e., a more similar lateralization pattern) compared to ONM in CON, LAN, frontoparietal network (FPN), dorsal attention network (DAN), and default mode network (DMN), indicating preservation of youth-like lateralization patterns due to musical experience. Furthermore, in ONM, stronger left-lateralized and lower alignment-to-young of LI_intra in the somatomotor network (SMN) and DAN and LI_he in DMN correlated with better speech performance, indicating a functional compensation mechanism. In contrast, stronger right-lateralized LI_intra in FPN and DAN and higher alignment-to-young of LI_he in LAN correlated with better performance in OM, suggesting a functional preservation mechanism. These findings highlight the differential roles of functional preservation and compensation of lateralization in speech perception in noise among elderly individuals with and without musical expertise, offering insights into successful aging theories from the lens of functional lateralization and speech perception.
Humans
;
Speech Perception/physiology*
;
Music
;
Male
;
Functional Laterality/physiology*
;
Female
;
Aged
;
Adult
;
Young Adult
;
Aging/physiology*
;
Middle Aged
;
Magnetic Resonance Imaging
;
Brain/physiology*
6.The function of auditory cortex in the elderly using functional near-infrared spectroscopy technology.
Liu YANG ; You Nuo CHEN ; Song Jian WANG ; Yuan WANG ; Ting CHEN ; Ying LIANG ; Shuo WANG
Chinese Journal of Otorhinolaryngology Head and Neck Surgery 2022;57(4):458-466
Objective: Functional near-infrared spectroscopy (fNIRS) was used to study the effect of aging on the neuroimaging characteristics of cerebral cortex in the process of speech perception. Method: Thirty-four adults with normal hearing were recruited from March 2021 to June 2021, including 17 in the young group, with 6 males, 11 females, age (32.1±5.0) years, age range 20-39 years. and 17 in the elderly group, with 6 males, 11 females, age (63.2±2.8) years, age range 60-70 years. The test material was the sentence table of the Mandarin Hearing Test in Noise (MHINT). The task state block experiment design was adopted, and the temporal lobe, Broca's area, Wernicke's area, motor cortex were used as regions of interest. Objective brain imaging technology (fNIRS) combined with subjective psychophysical testing method was used to analyze the activation area and degree of cerebral cortex related to auditory speech perception in the elderly and young people under different listening conditions (quiet, signal-to-noise ratio of 10 dB, 5 dB, 0 dB, -5 dB). SPSS 23 software was used for statistical analysis. Result: The activation area and degree of activation in the elderly group were lower than those in the young group under each task condition; The number of activation channels in the young group were significantly more than those in the old group, and the number of activation channels in the left hemisphere were more than those in the right hemisphere, but there was no difference in the number of activation channels. There were more channels affected by age in the left hemisphere than in the right hemisphere. The activation degree of the young group when the signal-to-noise ratio was 0 dB was significantly higher than that of other signal-to-noise ratio conditions (P<0.05), but there was no significant difference in the old group under the five conditions (P>0.05). The speech recognition score of the young group was higher than that of the old group under all conditions. When the quiet and signal-to-noise ratio was 10 dB, the correct score of the two groups was equal or close to 100%. With the gradual decrease of signal-to-noise ratio, there was a significant difference between the two groups when the signal-to-noise ratio was 5 dB. The speech recognition accuracy of the young group decreased significantly when the signal-to-noise ratio was 0 dB, while that of the old group decreased significantly when the signal-to-noise ratio was 5 dB. Conclusions: With the increase of age, the speech perception in noisy environment and the activity of cerebral cortex gradually deteriorate, and the speech dominance hemisphere (left hemisphere) will be significantly affected by aging. The overall activation area and activation degree of the elderly under different speech tasks are lower than those of the young.
Acoustic Stimulation/methods*
;
Adolescent
;
Adult
;
Aged
;
Auditory Cortex/physiology*
;
Female
;
Humans
;
Male
;
Middle Aged
;
Spectroscopy, Near-Infrared
;
Speech Perception/physiology*
;
Technology
;
Young Adult
7.The neural encoding of continuous speech - recent advances in EEG and MEG studies.
Xun-Yi PAN ; Jia-Jie ZOU ; Pei-Qing JIN ; Nai DING
Acta Physiologica Sinica 2019;71(6):935-945
Speech comprehension is a central cognitive function of the human brain. In cognitive neuroscience, a fundamental question is to understand how neural activity encodes the acoustic properties of a continuous speech stream and resolves multiple levels of linguistic structures at the same time. This paper reviews the recently developed research paradigms that employ electroencephalography (EEG) or magnetoencephalography (MEG) to capture neural tracking of acoustic features or linguistic structures of continuous speech. This review focuses on two questions in speech processing: (1) The encoding of continuously changing acoustic properties of speech; (2) The representation of hierarchical linguistic units, including syllables, words, phrases and sentences. Studies have found that the low-frequency cortical activity tracks the speech envelope. In addition, the cortical activities on different time scales track multiple levels of linguistic units and constitute a representation of hierarchically organized linguistic units. The article reviewed these studies, which provided new insights into the processes of continuous speech in the human brain.
Acoustic Stimulation
;
Electroencephalography
;
Humans
;
Magnetoencephalography
;
Speech
;
physiology
;
Speech Perception
8.Facial Expression Enhances Emotion Perception Compared to Vocal Prosody: Behavioral and fMRI Studies.
Heming ZHANG ; Xuhai CHEN ; Shengdong CHEN ; Yansong LI ; Changming CHEN ; Quanshan LONG ; Jiajin YUAN
Neuroscience Bulletin 2018;34(5):801-815
Facial and vocal expressions are essential modalities mediating the perception of emotion and social communication. Nonetheless, currently little is known about how emotion perception and its neural substrates differ across facial expression and vocal prosody. To clarify this issue, functional MRI scans were acquired in Study 1, in which participants were asked to discriminate the valence of emotional expression (angry, happy or neutral) from facial, vocal, or bimodal stimuli. In Study 2, we used an affective priming task (unimodal materials as primers and bimodal materials as target) and participants were asked to rate the intensity, valence, and arousal of the targets. Study 1 showed higher accuracy and shorter response latencies in the facial than in the vocal modality for a happy expression. Whole-brain analysis showed enhanced activation during facial compared to vocal emotions in the inferior temporal-occipital regions. Region of interest analysis showed a higher percentage signal change for facial than for vocal anger in the superior temporal sulcus. Study 2 showed that facial relative to vocal priming of anger had a greater influence on perceived emotion for bimodal targets, irrespective of the target valence. These findings suggest that facial expression is associated with enhanced emotion perception compared to equivalent vocal prosodies.
Adult
;
Brain Mapping
;
methods
;
Cerebral Cortex
;
diagnostic imaging
;
physiology
;
Emotions
;
physiology
;
Facial Expression
;
Facial Recognition
;
physiology
;
Female
;
Humans
;
Magnetic Resonance Imaging
;
Psychomotor Performance
;
physiology
;
Social Perception
;
Speech Perception
;
physiology
;
Young Adult
9.Change of Swallowing in Patients With Head and Neck Cancer After Concurrent Chemoradiotherapy.
Sehi KWEON ; Bon Seok KOO ; Sungju JEE
Annals of Rehabilitation Medicine 2016;40(6):1100-1107
OBJECTIVE: To evaluate the functional characteristics of swallowing and to analyze the parameters of dysphagia in head and neck cancer patients after concurrent chemoradiotherapy (CCRT). METHODS: The medical records of 32 patients with head and neck cancer who were referred for a videofluoroscopic swallowing study from January 2012 to May 2015 were retrospectively reviewed. The patients were allocated by duration after starting CCRT into early phase (<1 month after radiation therapy) and late phase (>1 month after radiation therapy) groups. We measured the modified penetration aspiration scale (MPAS) and American Speech-Language-Hearing Association National Outcome Measurement System swallowing scale (ASHA-NOMS). The oral transit time (OTT), pharyngeal delay time (PDT), and pharyngeal transit time (PTT) were recorded to assess the swallowing physiology. RESULTS: Among 32 cases, 18 cases (56%) were of the early phase. In both groups, the most common tumor site was the hypopharynx (43.75%) with a histologic type of squamous cell carcinoma (75%). PTT was significantly longer in the late phase (p=0.03). With all types of boluses, except for soup, both phases showed a statistically significant difference in MPAS results. The mean ASHA-NOMS level for the early phase was 5.83±0.78 and that for the late phase was 3.79±1.80, with statistical significance (p=0.01). The PTT and ASHA-NOMS level showed a statistically significant correlation (correlation coefficient=–0.52, p=0.02). However, it showed no relationship with the MPAS results. CONCLUSION: The results of our study suggest that in the late phase that after CCRT, the OTT, PDT, and PTT were longer than in the early phase and the PTT prolongation was statistically significant. Therefore, swallowing therapy targeting the pharyngeal phase is recommended after CCRT.
American Speech-Language-Hearing Association
;
Carcinoma, Squamous Cell
;
Chemoradiotherapy*
;
Deglutition Disorders
;
Deglutition*
;
Head and Neck Neoplasms*
;
Head*
;
Humans
;
Hypopharynx
;
Medical Records
;
Physiology
;
Retrospective Studies
10.Performance-intensity function of short Mandarin monosyllabic word list for normal-hearing listeners.
Rui ZHOU ; Hua ZHANG ; Shuo WANG ; Jing CHEN ; Dan WU
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2014;28(6):396-399
OBJECTIVE:
To analyze the short monosyllabic list of Mandarin speech test materials (MSTMs) which have been evaluated the equivalence of difficulty, and to establish the performance-intensity function (P-I function) for people with normal hearing as clinical reference of hearing recovery and individuals ability to perceive and process speech.
METHOD:
Thirty-seven subjects (the age ranged from 18 to 26 years old) who speak Mandarin well in their daily lives with normal hearing participated in this study. Eight lists of the Short Mandarin Monosyllabic materials (20 words per list) with equal difficulty were utilized. The results were analyzed by Statistical Package for the Social Sciences (SPSS) software version 17.0.
RESULT:
P-I function for short monosyllabic word list was x = 98.557/(1 + 12.243 exp (-0.17(P-15, x(max) = 98.557. And the sound pressure level of speech corresponding to a 50% recognition score was 29.6 dB SPL or 9.6 dB HL. The results showed P-I function of 3.1 per dB for Mandarin materials.
CONCLUSION
The study established the P-I function of the Mandarin short monosyllabic word list materials with equal difficulty, which provides the normative data for identifying the normal hearing in a clinical setting.
Adult
;
Auditory Perception
;
Female
;
Hearing Tests
;
methods
;
Humans
;
Male
;
Speech
;
Speech Perception
;
physiology

Result Analysis
Print
Save
E-mail