1.A review of deep learning methods for non-contact heart rate measurement based on facial videos.
Shuyue GUAN ; Yimou LYU ; Yongchun LI ; Chengzhi XIA ; Lin QI ; Lisheng XU
Journal of Biomedical Engineering 2025;42(1):197-204
Heart rate is a crucial indicator of human health with significant physiological importance. Traditional contact methods for measuring heart rate, such as electrocardiograph or wristbands, may not always meet the need for convenient health monitoring. Remote photoplethysmography (rPPG) provides a non-contact method for measuring heart rate and other physiological indicators by analyzing blood volume pulse signals. This approach is non-invasive, does not require direct contact, and allows for long-term healthcare monitoring. Deep learning has emerged as a powerful tool for processing complex image and video data, and has been increasingly employed to extract heart rate signals remotely. This article reviewed the latest research advancements in rPPG-based heart rate measurement using deep learning, summarized available public datasets, and explored future research directions and potential advancements in non-contact heart rate measurement.
Humans
;
Deep Learning
;
Heart Rate/physiology*
;
Photoplethysmography/methods*
;
Video Recording
;
Face
;
Monitoring, Physiologic/methods*
;
Signal Processing, Computer-Assisted
2.Small bowel video keyframe retrieval based on multi-modal contrastive learning.
Xing WU ; Guoyin YANG ; Jingwen LI ; Jian ZHANG ; Qun SUN ; Xianhua HAN ; Quan QIAN ; Yanwei CHEN
Journal of Biomedical Engineering 2025;42(2):334-342
Retrieving keyframes most relevant to text from small intestine videos with given labels can efficiently and accurately locate pathological regions. However, training directly on raw video data is extremely slow, while learning visual representations from image-text datasets leads to computational inconsistency. To tackle this challenge, a small bowel video keyframe retrieval based on multi-modal contrastive learning (KRCL) is proposed. This framework fully utilizes textual information from video category labels to learn video features closely related to text, while modeling temporal information within a pretrained image-text model. It transfers knowledge learned from image-text multimodal models to the video domain, enabling interaction among medical videos, images, and text data. Experimental results on the hyper-spectral and Kvasir dataset for gastrointestinal disease detection (Hyper-Kvasir) and the Microsoft Research video-to-text (MSR-VTT) retrieval dataset demonstrate the effectiveness and robustness of KRCL, with the proposed method achieving state-of-the-art performance across nearly all evaluation metrics.
Humans
;
Video Recording
;
Intestine, Small/diagnostic imaging*
;
Machine Learning
;
Image Processing, Computer-Assisted/methods*
;
Algorithms
3.Application of multi-scale spatiotemporal networks in physiological signal and facial action unit measurement.
Journal of Biomedical Engineering 2025;42(3):552-559
Multi-task learning (MTL) has demonstrated significant advantages in the field of physiological signal measurement. This approach enhances the model's generalization ability by sharing parameters and features between similar tasks, even in data-scarce environments. However, traditional multi-task physiological signal measurement methods face challenges such as feature conflicts between tasks, task imbalance, and excessive model complexity, which limit their application in complex environments. To address these issues, this paper proposes an enhanced multi-scale spatiotemporal network (EMSTN) based on Eulerian video magnification (EVM), super-resolution reconstruction and convolutional multilayer perceptron. First, EVM is introduced in the input stage of the network to amplify subtle color and motion changes in the video, significantly improving the model's ability to capture pulse and respiratory signals. Additionally, a super-resolution reconstruction module is integrated into the network to enhance the image resolution, thereby improving detail capture and increasing the accuracy of facial action unit (AU) tasks. Then, convolutional multilayer perceptron is employed to replace traditional 2D convolutions, improving feature extraction efficiency and flexibility, which significantly boosts the performance of heart rate and respiratory rate measurements. Finally, comprehensive experiments on the Binghamton-Pittsburgh 4D Spontaneous Facial Expression Database (BP4D+) fully validate the effectiveness and superiority of the proposed method in multi-task physiological signal measurement.
Humans
;
Neural Networks, Computer
;
Signal Processing, Computer-Assisted
;
Face/physiology*
;
Video Recording
;
Facial Expression
;
Heart Rate
;
Algorithms
4.A multi-feature fusion-based model for fetal orientation classification from intrapartum ultrasound videos.
Ziyu ZHENG ; Xiaying YANG ; Shengjie WU ; Shijie ZHANG ; Guorong LYU ; Peizhong LIU ; Jun WANG ; Shaozheng HE
Journal of Southern Medical University 2025;45(7):1563-1570
OBJECTIVES:
To construct an intelligent analysis model for classifying fetal orientation during intrapartum ultrasound videos based on multi-feature fusion.
METHODS:
The proposed model consists of the Input, Backbone Network and Classification Head modules. The Input module carries out data augmentation to improve the sample quality and generalization ability of the model. The Backbone Network was responsible for feature extraction based on Yolov8 combined with CBAM, ECA, PSA attention mechanism and AIFI feature interaction module. The Classification Head consists of a convolutional layer and a softmax function to output the final probability value of each class. The images of the key structures (the eyes, face, head, thalamus, and spine) were annotated with frames by physicians for model training to improve the classification accuracy of the anterior occipital, posterior occipital, and transverse occipital orientations.
RESULTS:
The experimental results showed that the proposed model had excellent performance in the tire orientation classification task with the classification accuracy reaching 0.984, an area under the PR curve (average accuracy) of 0.993, and area under the ROC curve of 0.984, and a kappa consistency test score of 0.974. The prediction results by the deep learning model were highly consistent with the actual classification results.
CONCLUSIONS
The multi-feature fusion model proposed in this study can efficiently and accurately classify fetal orientation in intrapartum ultrasound videos.
Humans
;
Female
;
Ultrasonography, Prenatal/methods*
;
Pregnancy
;
Fetus/diagnostic imaging*
;
Neural Networks, Computer
;
Video Recording
5.A Method for Detecting Depression in Adolescence Based on an Affective Brain-Computer Interface and Resting-State Electroencephalogram Signals.
Zijing GUAN ; Xiaofei ZHANG ; Weichen HUANG ; Kendi LI ; Di CHEN ; Weiming LI ; Jiaqi SUN ; Lei CHEN ; Yimiao MAO ; Huijun SUN ; Xiongzi TANG ; Liping CAO ; Yuanqing LI
Neuroscience Bulletin 2025;41(3):434-448
Depression is increasingly prevalent among adolescents and can profoundly impact their lives. However, the early detection of depression is often hindered by the time-consuming diagnostic process and the absence of objective biomarkers. In this study, we propose a novel approach for depression detection based on an affective brain-computer interface (aBCI) and the resting-state electroencephalogram (EEG). By fusing EEG features associated with both emotional and resting states, our method captures comprehensive depression-related information. The final depression detection model, derived through decision fusion with multiple independent models, further enhances detection efficacy. Our experiments involved 40 adolescents with depression and 40 matched controls. The proposed model achieved an accuracy of 86.54% on cross-validation and 88.20% on the independent test set, demonstrating the efficiency of multimodal fusion. In addition, further analysis revealed distinct brain activity patterns between the two groups across different modalities. These findings hold promise for new directions in depression detection and intervention.
Humans
;
Male
;
Female
;
Adolescent
;
Case-Control Studies
;
Depression/diagnosis*
;
Early Diagnosis
;
Rest
;
Electroencephalography/methods*
;
Brain-Computer Interfaces
;
Models, Psychological
;
Reproducibility of Results
;
Affect/physiology*
;
Photic Stimulation/methods*
;
Video Recording
;
Brain/physiopathology*
6.Improving children's cooperativeness during magnetic resonance imaging using interactive educational animated videos: a prospective, randomised, non-inferiority trial.
Evelyn Gabriela UTAMA ; Seyed Ehsan SAFFARI ; Phua Hwee TANG
Singapore medical journal 2024;65(1):9-15
INTRODUCTION:
A previous prospective, randomised controlled trial showed that animated videos shown to children before magnetic resonance imaging (MRI) scan reduced the proportion of children needing repeated MRI sequences and improved confidence of the children staying still for at least 30 min. Children preferred the interactive video. We hypothesised that the interactive video is non-inferior to showing two videos (regular and interactive) in improving children's cooperativeness during MRI scans.
METHODS:
In this Institutional Review Board-approved prospective, randomised, non-inferiority trial, 558 children aged 3-20 years scheduled for elective MRI scan from June 2017 to March 2019 were randomised into the interactive video only group and combined (regular and interactive) videos group. Children were shown the videos before their scan. Repeated MRI sequences, general anaesthesia (GA) requirement and improvement in confidence of staying still for at least 30 min were assessed.
RESULTS:
In the interactive video group ( n = 277), 86 (31.0%) children needed repeated MRI sequences, two (0.7%) needed GA and the proportion of children who had confidence in staying still for more than 30 min increased by 22.1% after the video. In the combined videos group ( n = 281), 102 (36.3%) children needed repeated MRI sequences, six (2.1%) needed GA and the proportion of children who had confidence in staying still for more than 30 min increased by 23.2% after the videos; the results were not significantly different between the two groups.
CONCLUSION
The interactive video group demonstrated non-inferiority to the combined videos group.
Child
;
Humans
;
Anesthesia, General
;
Magnetic Resonance Imaging
;
Prospective Studies
;
Simulation Training
;
Child, Preschool
;
Adolescent
;
Young Adult
;
Video Recording
7.Time to intubation with McGrath ™ videolaryngoscope versus direct laryngoscope in powered air-purifying respirator: a randomised controlled trial.
Qing Yuan GOH ; Sui An LIE ; Zihui TAN ; Pei Yi Brenda TAN ; Shin Yi NG ; Hairil Rizal ABDULLAH
Singapore medical journal 2024;65(1):2-8
INTRODUCTION:
During the coronavirus disease 2019 (COVID-19) pandemic, multiple guidelines have recommended videolaryngoscope (VL) for tracheal intubation. However, there is no evidence that VL reduces time to tracheal intubation, and this is important for COVID-19 patients with respiratory failure.
METHODS:
To simulate intubation of COVID-19 patients, we randomly assigned 28 elective surgical patients to be intubated with either McGrath™ MAC VL or direct laryngoscope (DL) by specialist anaesthetists who donned 3M™ Jupiter™ powered air-purifying respirators (PAPR) and N95 masks. The primary outcome was time to intubation.
RESULTS:
The median time to intubation was 61 s (interquartile range [IQR] 37-63 s) and 41.5 s (IQR 37-56 s) in the VL and DL groups, respectively ( P = 0.35). The closest mean distance between the anaesthetist and patient during intubation was 21.6 ± 4.8 cm and 17.6 ± 5.3 cm in the VL and DL groups, respectively ( P = 0.045). There were no significant differences in the median intubation difficulty scale scores, proportion of successful intubations at the first laryngoscopic attempt and proportion of intubations requiring adjuncts. All the patients underwent successful intubation with no adverse event.
CONCLUSION
There was no significant difference in the time to intubation of elective surgical patients with either McGrath™ VL or DL by specialist anaesthetists who donned PAPR and N95 masks. The distance between the anaesthetist and patient was significantly greater with VL. When resources are limited or disrupted during a pandemic, DL could be a viable alternative to VL for specialist anaesthetists.
Humans
;
COVID-19
;
Intubation, Intratracheal
;
Laryngoscopes
;
Laryngoscopy
;
Respiratory Protective Devices
;
Video Recording
8.Video Feedback Improves Anesthesia Residents' Communication Skill and Performance on Showing Empathy During Preoperative Interviews.
Di XIA ; Ya-Hong GONG ; Xia RUAN ; Li XU ; Li-Jian PEI ; Xu LI ; Rui-Ying WANG
Chinese Medical Sciences Journal 2024;39(4):282-287
OBJECTIVES:
To determine the impact of scenario-based lecture and personalized video feedback on anesthesia residents' communication skills during preoperative visits.
METHODS:
A total of 24 anesthesia residents were randomly divided into a video group and a control group. Residents in both groups took part in a simulated interview and received a scenario-based lecture on how to communicate with patients during preoperative visits. Afterwards, residents in the video group received personalized video feedback recorded during the simulated interview. One week later all the residents undertook another simulated interview. The communication skills of all the residents were assessed using the Consultation and Relational Empathy measure (CARE) scale by two examiners and one standardized patient (SP), both of whom were blinded to the group allocation.
RESULTS:
CARE scores were comparable between the two groups before training, and significantly improved after training in both groups (all P < 0.05). The video group showed significantly greater increase in CARE score after the training than the control group, especially assessed by the SP (t = 6.980, P <0.001). There were significant correlations between the examiner-assessed scores and SP-assessed scores (both P = 0.001).
CONCLUSIONS
Scenario-based lectures with simulated interviews provide a good method for training communication skills of anesthesia residents, and personalized video feedback can enhance their performance on showing empathy during preoperative interview.
Humans
;
Internship and Residency
;
Empathy
;
Communication
;
Anesthesiology/education*
;
Male
;
Female
;
Adult
;
Video Recording
;
Feedback
;
Clinical Competence
9.Analysis of vocal fold movement and voice onset behavior in patients with laryngopharyngeal reflux based on high speed laryngeal high-speed videoendoscopy.
Xinlin XU ; Xueqiong HUANG ; Xiangping LI ; Peiyun ZHUANG
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2024;38(11):1031-1037
Objective:Patients with Laryngopharyngeal Reflux(LPR) have chronic inflammation of the laryngeal mucosa leading to a high response state in the larynx, which may make the vocal fold movement too fast. This paper discusses the characteristics of vocal fold movement and voice onset by analyzing laryngeal high-speed videoendoscopy in patients with LPR. Methods:Forty patients with LPR were enrolled as LPR group. The diagnostic criteria of LPR included positive reflux symptom index(RSI) and reflux syndrome score(RFS) to identify suspected LPR, objective oropharyngeal DX pH monitoring was carried out, and positive Ryan index indicated reflux. According to age and sex matching, 40 healthy volunteers were selected as the normal group. Laryngeal high-speed videoendoscopy, and the vocal fold motion and vibration parameters, including vocal fold adduction time, vocal fold abduction time, vocal fold vibration onset mode(vocal onset time and mode) and the opening quotient of vocal fold vibration cycle. Statistical analysis was performed using SPSS 25.0. Results:The time of vocal fold adduction in LPR group(mean 225.81ms) was less than that in normal group(mean 277.01 ms), and the difference was statistically significant(P<0.05). There was no significant difference in adduction time between LPR group and normal group(P>0.05). The vocal onset time in LPR group was significantly longer than that in normal group(P<0.05). High speed video endoscope showed that there were 17 patients with hard onset in LPR group and 8 patients with hard onset in normal group, the difference was statistically significant(P<0.05). There was no significant difference in the open quotient of vocal fold vibration between LPR group and normal group(P>0.05). The vocal fold abduction time in LPR group(mean 372.92 ms) was less than that in normal group(mean 426.98ms), but the difference was not statistically significant(P>0.05). The time difference of bilateral abduction of vocal fold in LPR group was significantly higher than that in normal group(P<0.05). Conclusion:The larynx of LPR patients is in a high response state, the vocal fold moves faster, and it is more likely to have a hard vocal onset. These may result in voice dysfunction.
Humans
;
Vocal Cords/physiopathology*
;
Laryngopharyngeal Reflux/diagnosis*
;
Laryngoscopy/methods*
;
Male
;
Video Recording
;
Female
;
Middle Aged
;
Adult
;
Voice/physiology*
;
Case-Control Studies
;
Vibration
10.Development of a program to prevent sexual violence among teens in Japan: education using DVD video teaching materials and web-based learning.
Miyuki NAGAMATSU ; Narumi OOSHIGE ; Nozomi SONODA ; Mika NIINA ; Ken-Ichi HARA
Environmental Health and Preventive Medicine 2021;26(1):41-41
BACKGROUND:
This study aimed to develop an education system using DVD video-based teaching materials or web-based learning to reduce sexual violence among teens in Japan.
METHODS:
During the first stage, June 2018 to March 2019, an education program using DVD video teaching materials was carried out at three high schools and four universities with research consent from the director of the facility. From 1337 high school students and first- and second-year university students, subjects in their teen years were targeted for analysis. A survey was conducted at baseline and after the DVD video teaching. During the second stage, November 2019 to March 2020, web-based learning using improved video teaching materials was developed and carried out. From the adolescents who participated in the web-based learning, subjects in their teen years were targeted for analysis. A survey was conducted at baseline and after the web-based learning.
RESULTS:
In the first stage, 876 students consented to and participated in the education using DVD video teaching materials and baseline and after surveys (collection rate 65.5%). Among these, the number of respondents in their teens both baseline and after education was 705 persons (valid response rate 80.4%). In the second stage, the number of respondents in their teens both baseline and after education was 250 respondents in their teens who received web-based learning using the improved video teaching materials (valid response rate 87.1%). The improvement effect of the two programs was observed in attitudes that lead to physical violence, attitudes that lead to mental violence, attitudes that promote healthy conflict resolution, and dangerous attitudes that lead to sexual violence from persons in the community or through the Internet. The web-based learning program achieved an improvement of preventive attitudes toward sexual violence.
CONCLUSIONS
The education program using DVD video teaching materials or web-based learning may help prevent sexual violence among teens in Japan.
Adolescent
;
Compact Disks
;
Female
;
Humans
;
Internet
;
Japan
;
Male
;
Sex Offenses/statistics & numerical data*
;
Students
;
Teaching Materials
;
Video Recording

Result Analysis
Print
Save
E-mail