A joint cognitive representation learning method based on multi-modal variational autoencoders

Qiuyue SONG; Yuan CHEN; Shuyu JIA; Xiaomin YING; Zhen HE

Return

A joint cognitive representation learning method based on multi-modal variational autoencoders

VernacularTitle:一种基于多模态变分自编码器的联合认知表征学习方法
Author: Qiuyue SONG ¹ ; Yuan CHEN ; Shuyu JIA ; Xiaomin YING ; Zhen HE
Author Information

1. 南京农业大学人工智能学院,南京 210031;军事科学院军事医学研究院,北京 100850
Keywords: multimodal variational autoencoders; cognitive representation; electroencephalogram; cross-modal generation
From: Military Medical Sciences 2024;48(7):516-523
CountryChina
Language:Chinese
Abstract: Objective To develop multimodal joint cognitive representations for the research of visual cognitive activities of the brain,enhance the classification performance of visual information cognitive representations,predict brain electro-encephalogram(EEG)responses from visual image features,and decode visual images from EEG signals.Methods A architecture combining a multimodal variational autoencoder network with the Mixture of Product Experts(MoPoE)approach and with a style generation adversarial network based on adaptive discriminator augmentation(Style-GAN2-ADA)was used for facilitating the learning of cognitive representations and the encoding and decoding of EEG signals.This framework not only catered to classification tasks but also enabled cross-modal generation of images and EEG data.Results The present study integrated features from different modalities,enhancing the classification accuracy of cognitive representations of visual information.By aligning the feature spaces of diverse modalities into a cohesive latent space,cross-modal generation tasks were made possible.The cross-modal generation results of EEG and images,derived from this unified latent space,outperformed the one-way mapping methods that involved transition from one modality to another employed in previous research.Conclusion This study effectively integrates and aligns information from various modalities,enabling the classification performance of joint cognitive representations beyond any single modality.Moreover,the study demonstrates superior outcomes in cross-modal generation tasks compared to modality-specific unidirectional mappings,which is expected to offer a new line of thought for the effective unified encoding and decoding modeling of visual cognitive information in the brain.