1.Construction and value of a vestibular function calibration test recognition model based on dual-stream ViT and ConvNeXt architecture
Xu LUO ; Peixia WU ; Weiming HAO ; Yinhong QU ; Han CHEN
Chinese Journal of Clinical Medicine 2025;32(2):207-211
Objective To improve the efficiency and accuracy of videonystagmography calibration test results while enabling effective recognition of saccadic undershoot waveform by developing a dual-stream architecture-based deep learning model. Methods A vestibular function calibration test recognition model with cross-modal feature fusion was constructed by integrating vision transformer (ViT) and a modified ConvNeXt convolutional network. The model utilized trajectory pictures and spatial distribution maps as inputs, employed a multi-task learning framework to classify calibration data, and to directly evaluate undershoot waveform. Results The model showed outstanding performance in assessing calibration compliance. The accuracy, sensitivity, specificity of the model in left side, middle, and right side were all greater than 90%, and AUC values were all greater than 0.99, with 97.66% of optimal accuracy (middle), 98.98% of optimal sensitivity (middle), 96.87% of optimal specificity (right side), and