1.Screening of pathogenic molecular markers of Staphylococcus aureus in children based on whole genome sequencing technology.
Jian-Yu CHEN ; Xu-Lin WANG ; Wen-Yu LI ; Min-Qi CHEN ; Jun-Li ZHOU ; Zhen-Jiang YAO ; Jin-Jian FU ; Xiao-Hua YE
Chinese Journal of Contemporary Pediatrics 2023;25(11):1161-1169
OBJECTIVES:
To explore the molecular characteristics of Staphylococcus aureus (S. aureus) in children, and to compare the molecular characteristics of different types of strains (infection and colonization strains) so as to reveal pathogenic molecular markers of S. aureus.
METHODS:
A cross-sectional study design was used to conduct nasopharyngeal swab sampling from healthy children in the community and clinical samples from infected children in the hospital. Whole genome sequencing was used to detect antibiotic resistance genes and virulence genes. A random forest method to used to screen pathogenic markers.
RESULTS:
A total of 512 S. aureus strains were detected, including 272 infection strains and 240 colonization strains. For virulence genes, the carrying rates of enterotoxin genes (seb and sep), extracellular enzyme coding genes (splA, splB, splE and edinC), leukocytotoxin genes (lukD, lukE, lukF-PV and lukS-PV) and epidermal exfoliating genes (eta and etb) in infection strains were higher than those in colonization strains. But the carrying rates of enterotoxin genes (sec, sec3, seg, seh, sei, sel, sem, sen, seo and seu) were lower in infection strains than in colonization strains (P<0.05). For antibiotic resistance genes, the carrying rates of lnuA, lnuG, aadD, tetK and dfrG were significantly higher in infection strains than in colonization strains (P<0.05). The accuracy of cross-validation of the random forest model for screening pathogenic markers of S. aureus before and after screening was 69% and 68%, respectively, and the area under the curve was 0.75 and 0.70, respectively. The random forest model finally screened out 16 pathogenic markers (sem, etb, splE, sep, ser, mecA, lnuA, sea, blaZ, cat(pC233), blaTEm-1A, aph(3')-III, ermB, ermA, ant(9)-Ia and ant(6)-Ia). The top five variables in the variable importance ranking were sem (OR=0.40), etb (OR=3.95), splE (OR=1.68), sep (OR=3.97), and ser (OR=1.68).
CONCLUSIONS
The random forest model can screen out pathogenic markers of S. aureus and exhibits a superior predictive performance, providing genetic evidence for tracing highly pathogenic S. aureus and conducting precise targeted interventions.
Child
;
Humans
;
Staphylococcus aureus/genetics*
;
Cross-Sectional Studies
;
Enterotoxins/genetics*
;
Staphylococcal Infections
;
Whole Genome Sequencing
2.Complete chloroplast genome sequencing and phylogeny of wild Atractylodes lancea from Yuexi, Anhui province.
Jian-Peng HU ; Lu JIANG ; Rui XU ; Jun-Xian WU ; Feng-Ya GUAN ; Jin-Chen YAO ; Jun-Ling LIU ; Ya-Zhong ZHANG ; Liang-Ping ZHA
China Journal of Chinese Materia Medica 2023;48(1):52-59
This study investigated the choroplast genome sequence of wild Atractylodes lancea from Yuexi in Anhui province by high-throughput sequencing, followed by characterization of the genome structure, which laid a foundation for the species identification, analysis of genetic diversity, and resource conservation of A. lancea. To be specific, the total genomic DNA was extracted from the leaves of A. lancea with the improved CTAB method. The chloroplast genome of A. lancea was sequenced by the high-throughput sequencing technology, followed by assembling by metaSPAdes and annotation by CPGAVAS2. Bioiformatics methods were employed for the analysis of simple sequence repeats(SSRs), inverted repeat(IR) border, codon bias, and phylogeny. The results showed that the whole chloroplast genome of A. lancea was 153 178 bp, with an 84 226 bp large single copy(LSC) and a 18 658 bp small single copy(SSC) separated by a pair of IRs(25 147 bp). The genome had the GC content of 37.7% and 124 genes: 87 protein-coding genes, 8 rRNA genes, and 29 tRNA genes. It had 26 287 codons and encoded 20 amino acids. Phylogenetic analysis showed that Atractylodes species clustered into one clade and that A. lancea had close genetic relationship with A. koreana. This study established a method for sequencing the chloroplast genome of A. lancea and enriched the genetic resources of Compositae. The findings are expected to lay a foundation for species identification, analysis of genetic diversity, and resource conservation of A. lancea.
Phylogeny
;
Atractylodes/genetics*
;
Genome, Chloroplast
;
Whole Genome Sequencing
;
Microsatellite Repeats
;
Lamiales
3.Analysis of the chloroplast genome of Incarvillea younghusbandii Sprague.
Yaying ZHANG ; Wanyao JIAO ; Wenrui JIAO ; Tianle QIAO ; Zhiyang SU ; Shuo FENG
Chinese Journal of Biotechnology 2023;39(7):2954-2964
Incarvillea younghusbandii Sprague is a traditional tonic herb. The roots are used as herbal medicine for nourishing and strengthening, as well as treating postpartum milk deficiency and weakness. In this study, the chloroplast genome of I. younghusbandii was sequenced and assembled by the high-throughput sequencing technology. The sequence characteristics, sequence repeats, codon usage bias, phylogenetic relationships and estimated divergence time of I. younghusbandii were analyzed. The 159 323 bp sequence contained a large single copy (80 197 bp), a small single copy (9 030 bp) and two inverted repeat sequences (35 048 bp). It contained 120 genes, including 77 protein coding genes, 8 ribosomal RNA genes and 35 transfer RNA genes. AAA was the most frequent codon in the chloroplast coding sequence of I. younghusbandii. A total of 42 simple sequence repeats were identified in the chloroplast genome. Phylogenetic analysis revealed I. younghusbandii was mostly like its taxonomically close relative Incarvillea compacta. The divergence between I. younghusbandii and I. compacta was dated to 4.66 million years ago. This study was significant for the scientific conservation and development of resources related to I. compacta. It also provides a basic genetic resource for the subsequent species identification of the genus Incarvillea, and the population genetic diversity study of Bignoniaceae.
Phylogeny
;
Molecular Sequence Annotation
;
Genome, Chloroplast
;
Sequence Analysis, DNA
;
Whole Genome Sequencing
4.Exploring the association between de novo mutations and non-syndromic cleft lip with or without palate based on whole exome sequencing of case-parent trios.
Xi CHEN ; Si Yue WANG ; En Ci XUE ; Xue Heng WANG ; He Xiang PENG ; Meng FAN ; Meng Ying WANG ; Yi Qun WU ; Xue Ying QIN ; Jing LI ; Tao WU ; Hong Ping ZHU ; Jing LI ; Zhi Bo ZHOU ; Da Fang CHEN ; Yong Hua HU
Journal of Peking University(Health Sciences) 2022;54(3):387-393
OBJECTIVE:
To explore the association between de novo mutations (DNM) and non-syndromic cleft lip with or without palate (NSCL/P) using case-parent trio design.
METHODS:
Whole-exome sequencing was conducted for twenty-two NSCL/P trios and Genome Analysis ToolKit (GATK) was used to identify DNM by comparing the alleles of the cases and their parents. Information of predictable functions was annotated to the locus with SnpEff. Enrichment analysis for DNM was conducted to test the difference between the actual number and the expected number of DNM, and to explore whether there were genes with more DNM than expected. NSCL/P-related genes indicated by previous studies with solid evidence were selected by literature reviewing. Protein-protein interactions analysis was conducted among the genes with protein-altering DNM and NSCL/P-related genes. R package "denovolyzeR" was used for the enrichment analysis (Bonferroni correction: P=0.05/n, n is the number of genes in the whole genome range). Protein-protein interactions among genes with DNM and genes with solid evidence on the risk factors of NSCL/P were predicted depending on the information provided by STRING database.
RESULTS:
A total of 339 908 SNPs were qualified for the subsequent analysis after quality control. The number of high confident DNM identified by GATK was 345. Among those DNM, forty-four DNM were missense mutations, one DNM was nonsense mutation, two DNM were splicing site mutations, twenty DNM were synonymous mutations and others were located in intron or intergenic regions. The results of enrichment analysis showed that the number of protein-altering DNM on the exome regions was larger than expected (P < 0.05), and five genes (KRTCAP2, HMCN2, ANKRD36C, ADGRL2 and DIPK2A) had more DNM than expected (P < 0.05/(2×19 618)). Protein-protein interaction analysis was conducted among forty-six genes with protein-altering DNM and thirteen genes associated with NSCL/P selected by literature reviewing. Six pairs of interactions occurred between the genes with DNM and known NSCL/P-related genes. The score measuring the confidence level of the predicted interaction between RGPD4 and SUMO1 was 0.868, which was higher than the scores for other pairs of genes.
CONCLUSION
Our study provided novel insights into the development of NSCL/P and demonstrated that functional analyses of genes carrying DNM were warranted to understand the genetic architecture of complex diseases.
Asians
;
Case-Control Studies
;
Cleft Lip/genetics*
;
Cleft Palate/genetics*
;
Genetic Predisposition to Disease
;
Genome-Wide Association Study
;
Genotype
;
Humans
;
Mutation
;
Parents
;
Polymorphism, Single Nucleotide
;
Whole Exome Sequencing
5.Comprehensive analysis of RNA-seq and whole genome sequencing data reveals no evidence for SARS-CoV-2 integrating into host genome.
Yu-Sheng CHEN ; Shuaiyao LU ; Bing ZHANG ; Tingfu DU ; Wen-Jie LI ; Meng LEI ; Yanan ZHOU ; Yong ZHANG ; Penghui LIU ; Yong-Qiao SUN ; Yong-Liang ZHAO ; Ying YANG ; Xiaozhong PENG ; Yun-Gui YANG
Protein & Cell 2022;13(5):379-385
7.Prenatal diagnosis of fetuses with renal anomalies by whole genome sequencing.
Fengchang QIAO ; Ping HU ; Cuiping ZHANG ; Yan WANG ; Ran ZHOU ; Chunyu LUO ; Zhengfeng XU
Chinese Journal of Medical Genetics 2022;39(8):819-823
OBJECTIVE:
To explore the genetic basis for fetuses with renal anomalies.
METHODS:
Genomic DNA of four fetuses and their parents was extracted from amniotic fluid and peripheral blood samples and subjected to whole genome sequencing. Candidate variants were predicted according to the American College of Medical Genetics and Genomics (ACMG) guidelines and validated by SNP-array and Sanger sequencing.
RESULTS:
Two fetuses were found to carry a 1.45 Mb pathogenic microdeletion in 17q12 and a pathogenic 1.85 Mb microduplication at 1q21.1-21.2, respectively. One fetus was found to harbor compound heterozygous variants c.8301del (p.Asn2768Thrfs*18) and c.4481del (p.Asn1494Thrfs*6) of the PKHD1 gene, which were predicted to be pathogenic. And one fetus has harbored homozygous c.1372dup (p.Thr458Asnfs*5) variants of the BBS12 gene, which was predicted to be likely pathogenic. All variants were validated by Sanger sequencing.
CONCLUSION
Whole genome sequencing can enable efficient prenatal diagnosis for fetuses with renal anomalies with high accuracy.
Female
;
Fetus/abnormalities*
;
Humans
;
Pregnancy
;
Prenatal Diagnosis
;
Whole Genome Sequencing
8.Estimation of molecular clock of Mycobacterium tuberculosis based on whole genome sequencing data.
Bi Lin TAO ; Yu Ting WANG ; Zhong Qi LI ; Ji Zhou WU ; Jian Ming WANG
Chinese Journal of Epidemiology 2022;43(9):1462-1468
Objective: To analyze the genomic mutation of Mycobacterium tuberculosis (M. tuberculosis) isolated in endogenous activation period and estimate the molecular clock based on the whole genome sequencing data. Methods: Literatures of the whole genome research of endogenous reactivated tuberculosis were retrieved, and the corresponding whole genome sequencing data were downloaded. We extracted the single nucleotide polymorphisms (SNPs) and strain isolation time of initial treatment and relapse of tuberculosis cases, explored the relationship between the different SNPs and interval between initial treatment and relapse by Poisson regression model, calculated the M. tuberculosis molecular clock, and estimated the mutation rate. Results: When the generation time of M. tuberculosis was 18 hours, the mutation rate in 0-2 years, i.e. short-term endogenous activation, was 6.47×10-10 (95%CI: 5.59×10-10-7.44×10-10), which was significantly higher than that in 2-14 years in long term endogenous activation (3.27×10-10, 95%CI: 2.88×10-10-3.69×10-10). The mutation rates of 0-, 1-, 2-, 3-, 5- and 7-14 years were 7.10×10-10, 6.06×10-10, 4.24×10-10, 5.34×10-10, 2.59×10-10 and 1.26×10-10 respectively. Conclusions: In the period of endogenous reactivation, the mutation rate of M. tuberculosis decreases with the interval time between initial treatment and relapse, which verifies the clinically observed phenomenon that the relapse often occurs within two years after the initial treatment of tuberculosis.
Genome, Bacterial
;
Humans
;
Mycobacterium tuberculosis/genetics*
;
Recurrence
;
Tuberculosis/microbiology*
;
Whole Genome Sequencing
9.Genome-wide analysis of aberrant DNA methylation patterns in iPSCs derived from patients with Down syndrome.
Wenbo MA ; Yanna LIU ; Jingbin YAN
Chinese Journal of Medical Genetics 2021;38(6):531-535
OBJECTIVE:
To study the correlation between DNA methylation patterns and gene expression in Down syndrome (DS).
METHODS:
Induced pluripotent stem cells (iPSCs) derived from normal controls and DS patients were subjected to whole genome bisulfite sequencing and differentially methylated region (DMR) screening. Statistical analysis for chromosomal and gene element distribution were carried out for DMR. Gene ontology (GO) and enrichment-based cluster analysis were used to explore the molecular function of differentially expressed genes.
RESULTS:
A total of 1569 DMR were identified in iPSCs derived from DS patients, for which the proportion of hypermethylation in promoter regions was significantly greater than that of the genebody. No DMR enrichment was noted on chromosome 21. Hypermethylation of the promoter and genebody was predicted to be inhibitory for gene expression. Functional clustering revealed the pathways related to neurodevelopmental, stem cell pluripotency and organ size regulation to be significantly correlated with differentially methylated genes.
CONCLUSION
Extensive and stochastic anomalies of genome-wide DNA methylation has been discovered in iPSCs derived from DS patients, for which the pattern and molecular regulation of methylation were significantly different from those of normal controls. Above findings suggested that DNA methylation pattern may play a vital role in both the pathogenesis of neurodevelopmental disorders and other phenotypic abnormalities during early embryonic development.
DNA Methylation
;
Down Syndrome/genetics*
;
Female
;
Humans
;
Induced Pluripotent Stem Cells
;
Pregnancy
;
Promoter Regions, Genetic
;
Whole Genome Sequencing
10.Application of the artificial intelligence-rapid whole-genome sequencing diagnostic system in the neonatal/pediatric intensive care unit.
Chinese Journal of Contemporary Pediatrics 2021;23(5):433-437
Pediatric patients in the neonatal intensive care unit (NICU) and the pediatric intensive care unit (PICU) have a high incidence rate of genetic diseases, and early rapid etiological diagnosis and targeted interventions can help to reduce mortality or improve prognosis. Whole-genome sequencing covers more comprehensive information including point mutation, copy number, and structural and rearrangement variations in the intron region and has become one of the powerful diagnostic tools for genetic diseases. Sequencing data require highly professional judgment and interpretation and are returned for clinical application after several weeks, which cannot meet the need for the diagnosis and treatment of genetic diseases in children. This article introduces the clinical application of rapid whole-genome sequencing in the NICU/PICU and briefly describes related techniques of artificial intelligence-rapid whole-genome sequencing diagnostic system, a rapid high-throughput automated platform for the diagnosis of genetic diseases. The diagnostic system introduces artificial intelligence into the processing of data after whole-genome sequencing and can solve the problems of long time and professional interpretation required for routine genome sequencing and provide a rapid diagnostic regimen for critically ill children suspected of genetic diseases within 24 hours, and therefore, it holds promise for clinical application.
Artificial Intelligence
;
Child
;
Critical Illness
;
Humans
;
Infant, Newborn
;
Intensive Care Units, Neonatal
;
Intensive Care Units, Pediatric
;
Whole Genome Sequencing

Result Analysis
Print
Save
E-mail