Heterogeneous graph construction and node representation learning method of Treatise on Febrile Diseases based on graph convolutional network
10.1016/j.dcmed.2022.12.007
- Author:
Junfeng YAN
1
;
Zhihua WEN
1
,
2
;
Beiji ZOU
1
,
3
Author Information
1. School of Informatics, Hunan University of Chinese Medicine, Changsha, Hunan 410208, China
2. School of Computer Science and Engineering, Hunan University of Technology, Zhuzhou, Hunan 412008, China
3. School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
- Publication Type:Journal Article
- Keywords:
Graph convolutional network (GCN);
Heterogeneous graph;
Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》);
node representations on heterogeneous graph;
node representation learning
- From:
Digital Chinese Medicine
2022;5(4):419-428
- CountryChina
- Language:English
-
Abstract:
Objective:To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases (Shang Han Lun,《伤寒论》) dataset and explore an optimal learning method represented with node attributes based on graph convolutional network (GCN).
Methods:Clauses that contain symptoms, formulas, and herbs were abstracted from Treatise on Febrile Diseases to construct symptom-formula-herb heterogeneous graphs, which were used to propose a node representation learning method based on GCN − the Traditional Chinese Medicine Graph Convolution Network (TCM-GCN). The symptom-formula, symptom-herb, and formula-herb heterogeneous graphs were processed with the TCM-GCN to realize high-order propagating message passing and neighbor aggregation to obtain new node representation attributes, and thus acquiring the nodes’ sum-aggregations of symptoms, formulas, and herbs to lay a foundation for the downstream tasks of the prediction models.
Results:Comparisons among the node representations with multi-hot encoding, non-fusion encoding, and fusion encoding showed that the Precision@10, Recall@10, and F1-score@10 of the fusion encoding were 9.77%, 6.65%, and 8.30%, respectively, higher than those of the non-fusion encoding in the prediction studies of the model.
Conclusion:Node representations by fusion encoding achieved comparatively ideal results, indicating the TCM-GCN is effective in realizing node-level representations of heterogeneous graph structured Treatise on Febrile Diseases dataset and is able to elevate the performance of the downstream tasks of the diagnosis model.
- Full text:yanjunfeng.pdf