A retrieval method of drug molecules based on graph collapsing.
- Author:
Jing Wei QU
1
;
Xiao Qing LV
2
,
3
;
Zhen Ming LIU
4
;
Yuan LIAO
1
;
Peng Hui SUN
1
;
Bei WANG
1
;
Zhi TANG
2
,
3
Author Information
1. Institute of Computer Science & Technology, Peking University, Beijing 100080, China.
2. Institute of Computer Science & Technology, Peking University, Beijing 100080, China
3. State Key Laboratory of Digital Publishing Technology, Beijing 100080, China.
4. Department of Medical Chemistry, Peking University School of Pharmaceutical Sciences, Beijing 100191, China.
- Publication Type:Journal Article
- MeSH:
Algorithms;
Databases, Chemical;
Information Storage and Retrieval;
Molecular Structure
- From:
Journal of Peking University(Health Sciences)
2018;50(2):368-374
- CountryChina
- Language:Chinese
-
Abstract:
OBJECTIVE:To establish a compact and efficient hypergraph representation and a graph-similarity-based retrieval method of molecules to achieve effective and efficient medicine information retrieval.
METHODS:Chemical structural formula (CSF) was a primary search target as a unique and precise identifier for each compound at the molecular level in the research field of medicine information retrieval. To retrieve medicine information effectively and efficiently, a complete workflow of the graph-based CSF retrieval system was introduced. This system accepted the photos taken from smartphones and the sketches drawn on tablet personal computers as CSF inputs, and formalized the CSFs with the corresponding graphs. Then this paper proposed a compact and efficient hypergraph representation for molecules on the basis of analyzing factors that directly affected the efficiency of graph matching. According to the characteristics of CSFs, a hierarchical collapsing method combining graph isomorphism and frequent subgraph mining was adopted. There was yet a fundamental challenge, subgraph overlapping during the collapsing procedure, which hindered the method from establishing the correct compact hypergraph of an original CSF graph. Therefore, a graph-isomorphism-based algorithm was proposed to select dominant acyclic subgraphs on the basis of overlapping analysis. Finally, the spatial similarity among graphical CSFs was evaluated by multi-dimensional measures of similarity.
RESULTS:To evaluate the performance of the proposed method, the proposed system was firstly compared with Wikipedia Chemical Structure Explorer (WCSE), the state-of-the-art system that allowed CSF similarity searching within Wikipedia molecules dataset, on retrieval accuracy. The system achieved higher values on mean average precision, discounted cumulative gain, rank-biased precision, and expected reciprocal rank than WCSE from the top-2 to the top-10 retrieved results. Specifically, the system achieved 10%, 1.41, 6.42%, and 1.32% higher than WCSE on these metrics for top-10 retrieval results, respectively. Moreover, several retrieval cases were presented to intuitively compare with WCSE. The results of the above comparative study demonstrated that the proposed method outperformed the existing method with regard to accuracy and effectiveness.
CONCLUSION:This paper proposes a graph-similarity-based retrieval approach for medicine information. To obtain satisfactory retrieval results, an isomorphism-based algorithm is proposed for dominant subgraph selection based on the subgraph overlapping analysis, as well as an effective and efficient hypergraph representation of molecules. Experiment results demonstrate the effectiveness of the proposed approach.