1.Development and validation of PhenoRAG: A visualization tool for automated human phenotype ontology term annotation based on large language models and retrieval-augmented generation technology.
Wei ZHONG ; Yousheng YAN ; Kai YANG ; Yan LIU ; Xinyu FU ; Zhengyang YAO ; Chenghong YIN
Chinese Journal of Medical Genetics 2026;43(1):36-43
OBJECTIVE:
To develop a user-friendly visualization application for the automatic annotation of Human Phenotype Ontology (HPO) terms based on large language models and retrieval-augmented generation (RAG) technology, and to validate its performance in an authoritative case dataset.
METHODS:
By integrating the domestic open-source large language model DeepSeek-V3 with RAG technology, an interactive web application was deployed on the Streamlit cloud platform. Using only the latest official HPO dataset as the data source, the lightweight sentence-embedding model BAAI/bge-small-en-v1.5 was employed to construct a FAISS vector index. During the online phase, a four-step closed-loop process is automatically completed: multilingual translation, phenotype phrase extraction, RAG candidate retrieval, term mapping, and official database validation. 121 English case reports publicly released by BMJ Case Reports and Oxford Medical Case Reports (with a gold-standard HPO set of 1 794 terms) were selected for application validation. Precision, recall, and F1 score were calculated and compared horizontally with traditional dictionary tools, standalone large language models, and the similar application "RAG-HPO". Finally, replace the model with the more advanced ChatGPT-5 and evaluate its performance on the newly extracted dataset.
RESULTS:
An HPO term automatic annotation visualization application named PhenoRAG, based on large language models and RAG technology, was successfully developed. Users can access it directly via a web link. Across the 112 cases, a total of 2 150 HPO terms were generated; 2,064 (96.0%) were fully validated by the official database, with a hallucination rate of 1.3% and an HPO ID-name mismatch rate of 2.7%. After deduplication, 1,906 terms remained for testing. The overall precision was 63.65%, recall was 67.34%, and F1 was 65.44%, significantly outperforming traditional annotation tools (F1: 0.45-0.49, P < 0.001). Although PhenoRAG's F1 was lower than that of RAG-HPO (F1 = 0.78, P < 0.001), which relies on a manually constructed synonym database of 54 000 entries plus the HPO dataset, it requires no additional dictionary maintenance and can be used without any background in computer programming. Moreover, after switching to the GPT-5 model, PhenoRAG exhibited no hallucination rate on the new dataset, and its F1 score significantly increased (P = 0.038).
CONCLUSION
Without constructing a synonym database, the PhenoRAG achieved high-accuracy automatic mapping from clinical text to standard HPO terms. It features a low usage threshold, free access, and a Chinese-language interface, and can directly serve rare disease diagnosis, genetic counseling, and research scenarios in China and worldwide, warranting further clinical promotion and multicenter validation.
Humans
;
Phenotype
;
Biological Ontologies
;
Language
;
Software
;
Large Language Models
3.Analysis of the ontology construction approach to acupoint anatomy.
Wenwen LIU ; Xianghong JING ; Feng YANG
Chinese Acupuncture & Moxibustion 2025;45(5):694-702
Through the investigation of relevant literature, the concepts, methods, languages and tools of ontology were explored, and the suitable methods and tools for the ontology construction of acupoint anatomy were selected. The current mainstream anatomical ontology and related ontology of TCM were investigated so as to provide the reference for the ontology construction of acupoint anatomy. According to the knowledge attributes of acupoint anatomy, the foundational model of anatomy (FMA) was served as the reusable ontology, and in association with the attribute classification of traditional Chinese medicine language system (TCMLS), the construction approach to acupoint anatomical ontology was explored. By taking "anatomical entity of acupoints" as the top-level concept, the demonstrative study on the anatomical ontology construction was conducted on the acupoints of lung meridian of hand-taiyin.
Acupuncture Points
;
Humans
;
Meridians
;
Medicine, Chinese Traditional
;
Biological Ontologies
4.The MAP1 family: a new perspective for exploring unknown functions.
Qing WANG ; Mei LIU ; Zhang-Ji DONG
Acta Physiologica Sinica 2025;77(5):876-892
As an important part of the cytoskeleton, microtubules play a crucial role in many cellular processes, such as cell division, intracellular transport, and maintaining cell morphology. The MAP1 family is an important family of microtubule-associated proteins, which includes three members: MAP1A, MAP1B, and MAP1S. These proteins are widely involved in the dynamic regulation of the cytoskeleton and play a key role in the development and function of the central nervous system, especially in the development and function of neurons. This study reviews the research progress of the MAP1 family, mainly focusing on the structure and function of MAP1 family members, and paying particular attention to their roles in neuronal development and regeneration, regulatory mechanisms, and neurodegenerative diseases.
Humans
;
Animals
;
Microtubule-Associated Proteins/classification*
;
Neurons/cytology*
;
Neurodegenerative Diseases/physiopathology*
;
Microtubules/physiology*
;
Cytoskeleton/physiology*
5.Research progress on variety breeding of root- and rhizome-derived traditional Chinese medicine.
Yan CHEN ; Miao-Yin DONG ; Zhan-Feng CAO ; Xue-Zhou LIU ; Meng-Fei LI ; Jian-He WEI
China Journal of Chinese Materia Medica 2025;50(2):363-383
Germplasm degeneration occurs during the long-term cultivation of root-and rhizome-derived traditional Chinese medicine(RR-TCM), which seriously restricts the high-quality development of their industry. Therefore, it is urgent to solve the problem of germplasm degeneration through variety breeding. In this paper, based on previously published research articles, monographs, and news reports, the research progresses on the number and origins, breeding methods, and selection of new varieties of RR-TCM listed in the Chinese Pharmacopoeia(Edition 2020) were summarized and analyzed. The results show that there are 169 kinds of RR-TCM listed in the Chinese Pharmacopoeia(Edition 2020), originated from 223 origins with three breeding methods(i.e., seed propagation, vegetative reproduction, and tissue culture), and there are 215 species derived from seed propagation, 177 species derived from vegetative reproduction, and 164 species derived from tissue culture. To date, there are 62 origins breeding new varieties through conventional breeding, cross breeding, mutation breeding, ploidy breeding, or modern biotechnology breeding methods, including 57 origins breeding 145 new varieties through conventional breeding, 10 origins breeding 43 new varieties through mutation breeding, and seven origins breeding 12 new varieties through cross breeding method. They are used mainly to improve yield, disease resistance, and active ingredient content, but only a few new varieties have been widely used. This review will provide useful references in variety breeding, quality breeding, and standardized planting of RR-TCM.
Plant Breeding/methods*
;
Plant Roots/growth & development*
;
Rhizome/growth & development*
;
Drugs, Chinese Herbal
;
Plants, Medicinal/classification*
;
Medicine, Chinese Traditional
6.Characteristics, microbial composition, and mycotoxin profile of fermented traditional Chinese medicines.
Hui-Ru ZHANG ; Meng-Yue GUO ; Jian-Xin LYU ; Wan-Xuan ZHU ; Chuang WANG ; Xin-Xin KANG ; Jiao-Yang LUO ; Mei-Hua YANG
China Journal of Chinese Materia Medica 2025;50(1):48-57
Fermented traditional Chinese medicine(TCM) has a long history of medicinal use, such as Sojae Semen Praeparatum, Arisaema Cum Bile, Pinelliae Rhizoma Fermentata, red yeast rice, and Jianqu. Fermentation technology was recorded in the earliest TCM work, Shen Nong's Classic of the Materia Medica. Microorganisms are essential components of the fermentation process. However, the contamination of fermented TCM by toxigenic fungi and mycotoxins due to unstandardized fermentation processes seriously affects the quality of TCM and poses a threat to the life and health of consumers. In this paper, the characteristics, microbial composition, and mycotoxin profile of fermented TCM are systematically summarized to provide a theoretical basis for its quality and safety control.
Fermentation
;
Mycotoxins/analysis*
;
Drugs, Chinese Herbal/analysis*
;
Fungi/classification*
;
Bacteria/genetics*
;
Drug Contamination
;
Medicine, Chinese Traditional
7.Identification and functional analysis of β-amyrin synthase gene in Dipsacus asper.
Huan LEI ; Hua HE ; Jiao XU ; Chang-Gui YANG ; Wei-Ke JIANG ; Tao ZHOU ; Lan-Ping GUO
China Journal of Chinese Materia Medica 2025;50(4):1043-1050
Dipsaci Radix is a commonly used Chinese herbal medicine in China, with triterpenoid saponins as the main active components. β-Amyrin synthase, a member of the oxidosqualene cyclase superfamily, plays a crucial role in the biosynthesis of oleanane-type triterpenoid saponins. Asperosaponin Ⅵ is an oleanane-type triterpenoid saponin. To explore the β-amyrin synthase genes involved in the biosynthesis of asperosaponin Ⅵ in Dipsacus asper, this study screened the candidate genes from the transcriptome data of D. asper. Two β-amyrin synthase genes, Da OSC1 and Da OSC2, were identified by phylogenetic analysis and correlation analysis. The coding sequences of Da OSC1 and Da OSC2 were 2 286 bp and 2 295 bp in length, encoding 761 and 764 amino acids,respectively. Multiple sequence alignments showed that Da OSC1 and Da OSC2 had three conserved motifs( DCTAE, QW, and MWCYCR) unique to the oxidosqualene cyclase family. Real-time quantitative PCR results showed that Da OSC1 and Da OSC2 had the highest expression levels in the roots. Compared with normal growth conditions, the low-temperature treatment significantly upregulated the expression of Da OSC1 and Da OSC2. Agrobacterium-mediated transient expression of Da OSC1 and Da OSC2 in Nicotiana benthamiana resulted in the production of β-amyrin, which suggested that Da OSC1 and Da OSC2 were able to catalyze the synthesis of β-amyrin. This study clarified the catalytic functions of two β-amyrin synthases in D. asper, analyzed their expression patterns in different tissue and at low temperatures. The findings provide a foundation for further studying the biosynthetic pathway and regulatory mechanism of asperosaponin Ⅵ in D. asper.
Intramolecular Transferases/chemistry*
;
Phylogeny
;
Plant Proteins/chemistry*
;
Gene Expression Regulation, Plant
;
Dipsacaceae/classification*
;
Saponins/metabolism*
;
Oleanolic Acid/metabolism*
8.Multi-gene molecular identification and pathogenicity analysis of pathogens causing root rot of Atractylodes lancea in Hubei province.
Tie-Lin WANG ; Yang XU ; Xiu-Fu WAN ; Zhao-Geng LYU ; Bin-Bin YAN ; Yong-Xi DU ; Chuan-Zhi KANG ; Lan-Ping GUO
China Journal of Chinese Materia Medica 2025;50(7):1721-1726
To clarify the species, pathogenicity, and distribution of the pathogens causing the root rot of Atractylodes lancea in Hubei province, the tissue separation method was used to isolate the pathogens from root rot samples in the main planting areas of A. lancea in Hubei. Based on the preliminary identification of the Fusarium genus by the internal transcribed spacer(ITS) sequence, three housekeeping genes, EF1/EF2, Btu-F-FO1/Btu-F-RO1, and FF1/FR1, were amplified and sequenced. Subsequently, a phylogenetic tree was constructed based on these TEF gene sequences to classify the pathogens. The pathogenicity of these strains was determined using the root irrigation method. A total of 194 pathogen strains were isolated using the tissue separation method. Molecular identification using the three housekeeping genes identified the pathogens as F. solani, F. oxysporum, F. commune, F. equiseti, F. tricinctum, F. redolens, F. fujikuroi, F. avenaceum, F. acuminatum, and F. incarnatum. Among them, F. solani and F. oxysporum were the dominant strains, widely distributed in multiple regions, with F. solani accounting for approximately 54% of the total isolated strains and F. oxysporum accounting for approximately 34%. Other strains accounted for a relatively small proportion, totaling approximately 12%. The results of pathogenicity determination showed that there were certain differences in pathogenicity among strains. The analysis of the pathogenicity differentiation of the widely distributed F. solani and F. oxysporum strains revealed that these dominant strains in Hubei were mainly highly pathogenic. This study determined the species, pathogenicity, and distribution of the pathogens causing the root rot of A. lancea in Hubei province. The results provide a scientific basis for further understanding the root rot of A. lancea and its epidemic occurrence and scientifically preventing and controlling this disease.
Plant Diseases/microbiology*
;
Atractylodes/microbiology*
;
Phylogeny
;
Plant Roots/microbiology*
;
Fusarium/classification*
;
China
;
Virulence
;
Fungal Proteins/genetics*
9.Detection and sequence analysis of broad bean wilt virus 2 on Rehmannia glutinosa.
Xiao-Long DENG ; Jie YAO ; Lang QIN ; Shi-Wen DING ; Tie-Lin WANG ; Kun ZHANG ; Lei CHENG ; Zhen HE
China Journal of Chinese Materia Medica 2025;50(7):1741-1747
To clarify the occurrence and distribution of broad bean wilt virus 2(BBWV2) on Rehmannia glutinosa, this study collected 87 R. glutinosa samples with typical symptoms of viral disease such as chlorosis and crumple from Wenxian county and Wuzhi county in Jiaozuo city, Henan province and Qiaocheng district in Bozhou city, Anhui province. The BBWV2 CP target band was amplified from 37 R. glutinosa samples by RT-PCR technology. The total detection rate reached 42.5%, among which 43.0% was detected in samples from Henan province. The detection rate in samples from Anhui province was 37.5%. 37 BBWV2 CP sequences were obtained by cloning and sequencing of BBWV2 positive samples(data has been submitted to GenBank, accession numbers: PP407959-PP407995), and the sequence analysis of these CP sequences with 91 other BBWV2 isolates in GenBank showed a high genetic diversity with a consistency rate of 70.8%-100%. Meanwhile, phylogenetic analysis showed that BBWV2 could be divided into three groups according to CP sequences, among which the BBWV2 in R. glutinosa isolates obtained in this study were all located in group 3. This study identified the differences in the occurrence, distribution, and genetic diversity of BBWV2 in R. glutinosa from Henan province and Anhui province and provided a theoretical basis for the prevention and control of BBWV2.
Rehmannia/virology*
;
Phylogeny
;
Plant Diseases/virology*
;
China
;
Molecular Sequence Data
;
Fabavirus/classification*
10.Functional characterization of flavonoid glycosyltransferase AmGT90 in Astragalus membranaceus.
Guo-Qing PENG ; Bing-Yan XU ; Jian-Ping HUANG ; Zhi-Yin YU ; Sheng-Xiong HUANG
China Journal of Chinese Materia Medica 2025;50(6):1534-1543
Astragalus membranaceus(A. membranaceus), a traditional tonic, contains flavonoids as one of its main bioactive components and key indicators for quality standard detection. These compounds predominantly exist in glycosylated forms after glycosylation modification within the plant. The catalytic products of flavonoid glycosyltransferases in A. membranaceus have been reported to be mostly monoglycosides, and only AmUGT28 catalyzes luteolin to form diglycosides. In this study, we cloned a glycosyltransferase gene, AmGT90, from A. membranaceus, with an ORF length of 1 335 bp, encoding 444 amino acids, and the protein had a relative molecular mass of 50.5 kDa. Phylogenetic tree analysis indicated that AmGT90 belongs to the UGT74 family. In vitro enzymatic reaction showed that AmGT90 had broad substrate specificity and could catalyze the glycosylation of various flavonoids, including isoflavones, flavones, flavanones, and chalcones. AmGT90 not only catalyzed the formation of monoglycosides but also diglycosides. In addition, the mechanism of AmGT90 catalyzing the formation of diglycosides from luteolin was preliminarily explored. The experimental results showed that AmGT90 may preferentially recognize C4'-OH of luteolin and then recognize C7-OH to form diglycosides. This study reported a glycosyltransferase from A. membranaceus capable of converting flavonoids into monoglycosides and diglycosides. This finding not only enhances our understanding of the biosynthetic pathways of flavonoid glycosides in A. membranaceus but also introduces a new component for glycoside production through synthetic biology.
Glycosyltransferases/chemistry*
;
Flavonoids/chemistry*
;
Astragalus propinquus/classification*
;
Phylogeny
;
Glycosylation
;
Plant Proteins/chemistry*
;
Substrate Specificity
;
Cloning, Molecular
;
Amino Acid Sequence


Result Analysis
Print
Save
E-mail