Heterogeneous graph construction and node representation learning method of Treatise on Febrile Diseases based on graph convolutional network
10.1016/j.dcmed.2022.12.007
- Author:
Junfeng YAN
1
;
Zhihua WEN
2
;
Beiji ZOU
3
Author Information
1. School of Informatics, Hunan University of Chinese Medicine, Changsha, Hunan 410208, China
2. School of Informatics, Hunan University of Chinese Medicine, Changsha, Hunan 410208, China;School of Computer Science and Engineering, Hunan University of Technology, Zhuzhou, Hunan 412008, China
3. School of Informatics, Hunan University of Chinese Medicine, Changsha, Hunan 410208, China;School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
- Publication Type:Journal Article
- Keywords:
Graph convolutional network (GCN);
Heterogeneous graph;
Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》);
node representations on heterogeneous graph;
node representation learning
- From:
Digital Chinese Medicine
2022;5(4):419-428
- CountryChina
- Language:English
-
Abstract:
Objective:To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases (Shang Han Lun,《伤寒论》) dataset and explore an optimal learning method represented with node attributes based on graph convolutional network (GCN).
Methods:Clauses that contain symptoms, formulas, and herbs were abstracted from Treatise on Febrile Diseases to construct symptom-formula-herb heterogeneous graphs, which were used to propose a node representation learning method based on GCN − the Traditional Chinese Medicine Graph Convolution Network (TCM-GCN). The symptom-formula, symptom-herb, and formula-herb heterogeneous graphs were processed with the TCM-GCN to realize high-order propagating message passing and neighbor aggregation to obtain new node representation attributes, and thus acquiring the nodes’ sum-aggregations of symptoms, formulas, and herbs to lay a foundation for the downstream tasks of the prediction models.
Results:Comparisons among the node representations with multi-hot encoding, non-fusion encoding, and fusion encoding showed that the Precision@10, Recall@10, and F1-score@10 of the fusion encoding were 9.77%, 6.65%, and 8.30%, respectively, higher than those of the non-fusion encoding in the prediction studies of the model.
Conclusion:Node representations by fusion encoding achieved comparatively ideal results, indicating the TCM-GCN is effective in realizing node-level representations of heterogeneous graph structured Treatise on Febrile Diseases dataset and is able to elevate the performance of the downstream tasks of the diagnosis model.
- Full text:yanjunfeng.pdf