1.HisCoM-PAGE: software for hierarchical structural component models for pathway analysis of gene expression data
Genomics & Informatics 2019;17(4):45-
To identify pathways associated with survival phenotypes using gene expression data, we recently proposed the hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE) method. The HisCoM-PAGE software can consider hierarchical structural relationships between genes and pathways and analyze multiple pathways simultaneously. It can be applied to various types of gene expression data, such as microarray data or RNA sequencing data. We expect that the HisCoM-PAGE software will make our method more easily accessible to researchers who want to perform pathway analysis for survival times.
Gene Expression
;
Methods
;
Phenotype
;
Sequence Analysis, RNA
2.CircPlant: An Integrated Tool for circRNA Detection and Functional Prediction in Plants.
Peijing ZHANG ; Yongjing LIU ; Hongjun CHEN ; Xianwen MENG ; Jitong XUE ; Kunsong CHEN ; Ming CHEN
Genomics, Proteomics & Bioinformatics 2020;18(3):352-358
The recent discovery of circular RNAs (circRNAs) and characterization of their functional roles have opened a new avenue for understanding the biology of genomes. circRNAs have been implicated to play important roles in a variety of biological processes, but their precise functions remain largely elusive. Currently, a few approaches are available for novel circRNA prediction, but almost all these methods are intended for animal genomes. Considering that the major differences between the organization of plant and mammal genomes cannot be neglected, a plant-specific method is needed to enhance the validity of plant circRNA identification. In this study, we present CircPlant, an integrated tool for the exploration of plant circRNAs, potentially acting as competing endogenous RNAs (ceRNAs), and their potential functions. With the incorporation of several unique plant-specific criteria, CircPlant can accurately detect plant circRNAs from high-throughput RNA-seq data. Based on comparison tests on simulated and real RNA-seq datasets from Arabidopsis thaliana and Oryza sativa, we show that CircPlant outperforms all evaluated competing tools in both accuracy and efficiency. CircPlant is freely available at http://bis.zju.edu.cn/circplant.
Arabidopsis/metabolism*
;
Oryza/metabolism*
;
RNA, Circular/metabolism*
;
RNA, Plant/metabolism*
;
Sequence Analysis, RNA/methods*
3.Predicting RNA secondary structures including pseudoknots by covariance with stacking and minimum free energy.
Jinwei YANG ; Zhigang LUO ; Xiaoyong FANG ; Jinhua WANG ; Kecheng TANG
Chinese Journal of Biotechnology 2008;24(4):659-664
Prediction of RNA secondary structures including pseudoknots is a difficult topic in RNA field. Current predicting methods usually have relatively low accuracy and high complexity. Considering that the stacking of adjacent base pairs is a common feature of RNA secondary structure, here we present a method for predicting pseudoknots based on covariance with stacking and minimum free energy. A new score scheme, which combined stacked covariance with free energy, was used to assess the evaluation of base pair in our method. Based on this score scheme, we utilized an iterative procedure to compute the optimized RNA secondary structure with minimum score approximately. In each interaction, helix of high covariance and low free energy was selected until the sequences didn't form helix, so two crossing helixes which were selected from different iterations could form a pseudoknot. We test our method on data sets of ClustalW alignments and structural alignments downloaded from RNA databases. Experimental results show that our method can correctly predict the major portion of pseudoknots. Our method has both higher average sensitivity and specificity than the reference algorithms, and performs much better for structural alignments than for ClustalW alignments. Finally, we discuss the influence on the performance by the factor of covariance weight, and conclude that the best performance is achieved when lambda1 : lambda2 = 5 : 1.
Algorithms
;
Base Pairing
;
Base Sequence
;
Computational Biology
;
methods
;
Molecular Sequence Data
;
Nucleic Acid Conformation
;
RNA
;
chemistry
;
genetics
;
Sequence Analysis, RNA
4.VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder.
Genomics, Proteomics & Bioinformatics 2018;16(5):320-331
Single-cell RNA sequencing (scRNA-seq) is a powerful technique to analyze the transcriptomic heterogeneities at the single cell level. It is an important step for studying cell sub-populations and lineages, with an effective low-dimensional representation and visualization of the original scRNA-Seq data. At the single cell level, the transcriptional fluctuations are much larger than the average of a cell population, and the low amount of RNA transcripts will increase the rate of technical dropout events. Therefore, scRNA-seq data are much noisier than traditional bulk RNA-seq data. In this study, we proposed the deep variational autoencoder for scRNA-seq data (VASC), a deep multi-layer generative model, for the unsupervised dimension reduction and visualization of scRNA-seq data. VASC can explicitly model the dropout events and find the nonlinear hierarchical feature representations of the original data. Tested on over 20 datasets, VASC shows superior performances in most cases and exhibits broader dataset compatibility compared to four state-of-the-art dimension reduction and visualization methods. In addition, VASC provides better representations for very rare cell populations in the 2D visualization. As a case study, VASC successfully re-establishes the cell dynamics in pre-implantation embryos and identifies several candidate marker genes associated with early embryo development. Moreover, VASC also performs well on a 10× Genomics dataset with more cells and higher dropout rate.
Computer Graphics
;
Gene Expression Profiling
;
methods
;
Humans
;
Sequence Analysis, RNA
;
methods
;
Single-Cell Analysis
5.Further analysis and study based on a visualized method for SARS RNA sequences.
Guoping LIU ; Jie YANG ; Zhijie XU ; Meng WANG ; Zhende HUANG
Journal of Biomedical Engineering 2007;24(1):26-31
This paper proposed a new kind of visualized method of genome. Using cellular automation theory, the visual method transfers one-dimensional RNA sequence into two-demension visual image. Applying this method to SARS RNA sequence analysis, the characteristic of SARS-CoV differing from Non-SARS is discovered. This paper extracts characteristic genome fragment, visualize them, and study them with some pattern recognition method such as PCA and SVM. The result shows that the characteristic of SARS-CoV is classifiable. Some combined methods can use the characteristic more sufficient as an un-routine method.
Genome, Viral
;
Image Processing, Computer-Assisted
;
methods
;
RNA, Viral
;
genetics
;
SARS Virus
;
genetics
;
Sequence Analysis, RNA
;
methods
6.Development of gene microarray in screening differently expressed genes in keloid and normal-control skin.
Wei CHEN ; Xiao-bing FU ; Shi-li GE ; Xiao-qing SUN ; Gang ZHOU ; Zhi-li ZHAO ; Zhi-yong SHENG
Chinese Medical Journal 2004;117(6):877-881
BACKGROUNDKeloid is an intricate lesion that is probably regulated by many genes. In this study, the authors used the technique of complementary DNA (cDNA) microarray to analyse abnormal gene expression in keloids and normal control skins.
METHODSThe polymerase chain reaction (PCR) products of 8400 genes were spotted in an array on chemical-material-coated-glass plates. The DNAs were fixed on the glass plates. The total RNAs were isolated from freshly excised human keloid and normal control skins, and the mRNAs were then purified. The mRNA from both keloid and normal control skins were reversely transcribed to cDNAs, with the incorporation of fluorescent dUTP, for preparing the hybridisation probes. The mixed probes were then hybridised to the cDNA microarray. After thorough washing, the cDNA microarray was scanned for differing fluorescent signals from two types of tissues. Gene expression of tissue growth factor-beta1 (TGF-beta1) and of c-myc was detected with both RT-PCR and Northern blot hybridisation to confirm the effectiveness of cDNA microarray.
RESULTSAmong the 8400 human genes, 402 were detected with different expression levels between keloid and normal control skins. Two hundred and fifty genes, including TGF-beta1 and c-myc, were up-regulated and 152 genes were down-regulated. Higher expressions of TGF-beta1 and c-myc in keloid were also revealed using RT-PCR and Northern blot methods.
CONCLUSIONcDNA microarray analysis provides a powerful tool for investigating differential gene expression in keloid and normal control skins. Keloid is a complicated lesion with many genes involved.
DNA, Complementary ; analysis ; Humans ; Keloid ; genetics ; Oligonucleotide Array Sequence Analysis ; methods ; Polymerase Chain Reaction ; RNA, Messenger ; analysis ; Skin
7.Genotyping and species identification of Fritillaria by DNA chips.
Pui-yan TSOI ; Hok-sin WOO ; Man-sau WONG ; Shi-lin CHEN ; Wan-fung FONG ; Pei-gen XIAO ; Meng-su YANG
Acta Pharmaceutica Sinica 2003;38(3):185-190
AIMTo investigate the genetic polymorphism of several species of Fritillaria and to develop a DNA chip for the genotyping and identification of the origin of various species of Fritillaria at molecular level.
METHODSGenomic DNA from bulbs of several Fritillaria species was extracted and the polymorphisms of the D2 and D3 regions inside the 26S rDNA gene were identified by direct sequencing. Oligonucleotide probes specific for these polymorphisms were designed and printed on the poly-lysine coated slides to prepare the DNA chip. PCR products from the Fritillaria species were labeled with fluorescence by incorporation of dye-labeled dideoxyribonucleotides and hybridized to the immobilized probes on the chip.
RESULTSThe polymorphisms were used as markers for discrimination among various species. Specific oligonucleotide probes were designed and immobilized on a DNA chip. Differentiation of the various Fritillaria species was accomplished based on hybridization of fluorescent labeled PCR products with the DNA chip.
CONCLUSIONThe results demonstrated the reliability of using DNA chips to identify different species of Fritillaria, and the DNA chip technology can provide a rapid, high throughput tool for genotyping and quality assurance of the plant species verification.
Base Sequence ; DNA, Plant ; analysis ; Fritillaria ; classification ; genetics ; Genotype ; Molecular Sequence Data ; Oligonucleotide Array Sequence Analysis ; methods ; Plants, Medicinal ; genetics ; Polymorphism, Single Nucleotide ; RNA, Ribosomal ; genetics ; Species Specificity
8.RNA secondary structure prediction based on support vector machine classification.
Chinese Journal of Biotechnology 2008;24(7):1140-1148
The comparative sequence analysis is the most reliable method for RNA secondary structure prediction, and many algorithms based on it have been developed in last several decades. This paper considers RNA structure prediction as a 2-classes classification problem: given a sequence alignment, to decide whether or not two columns of alignment form a base pair. We employed Support Vector Machine (SVM) to predict potential paired sites, and selected co-variation information, thermodynamic information and the fraction of complementary bases as feature vectors. Considering the effect of sequence similarity upon co-variation score, we introduced a similarity weight factor, which could adjust the contribution of co-variation and thermodynamic information toward prediction according to sequence similarity. The test on 49 Rfam-seed alignments showed the effectiveness of our method, and the accuracy was better than many similar algorithms. Furthermore, this method could predict simple pseudoknot.
Algorithms
;
Artificial Intelligence
;
Base Pairing
;
Computational Biology
;
methods
;
RNA
;
chemistry
;
classification
;
Sequence Alignment
;
methods
;
statistics & numerical data
;
Sequence Analysis, RNA
;
Thermodynamics
9.Advance in Deep Sequencing of Small RNAs for Virus Identification and Discovery.
Yang LI ; Hao WANG ; Zhang CHEN ; Xuejun MA
Chinese Journal of Virology 2015;31(4):457-462
Small RNAs (sRNA) are produced abundantly in either plants or animals and function in regulating gene expression or in defense against virus infection. Deep sequencing of small RNAs is an emerging technology in virus identification and de novo assembly of virus genomes and is demonstrated to be an effective method to discover new viruses and monitor virus variation. A significant number of viruses from plants, invertebrates and human cells has been successfully identified using this technology. In this paper, we summarized the principle, operation process and latest advances of sRNA deep sequencing We also showed the feasibility of sRNA deep sequencing by bioinformatic analysis using sRNA deep sequencing dataset public available for the detection of viruses.
Genomics
;
High-Throughput Nucleotide Sequencing
;
methods
;
RNA, Small Untranslated
;
genetics
;
RNA, Viral
;
genetics
;
Sequence Analysis, RNA
;
methods
;
Viruses
;
genetics
;
isolation & purification
10.Rapid Whole-genome Sequencing of Zika Viruses using Direct RNA Sequencing
Jung Heon KIM ; Jiyeon KIM ; Bon Sang KOO ; Hanseul OH ; Jung Joo HONG ; Eung Soo HWANG
Journal of Bacteriology and Virology 2019;49(3):115-123
Zika virus (ZIKV) is one of the pathogens which is transmitted world widely, but there are no effective drugs and vaccines. Whole genome sequencing (WGS) of viruses could be applied to viral pathogen characterization, diagnosis, molecular surveillance, and even finding novel pathogens. We established an improved method using direct RNA sequencing with Nanopore technology to obtain WGS of ZIKV, after adding poly (A) tails to viral RNA. This established method does not require specific primers, complimentary DNA (cDNA) synthesis, and polymerase chain reaction (PCR)-based enrichment, resulting in the reduction of biases as well as of the ability to find novel RNA viruses. Nanopore technology also allows to read long sequences. It makes WGS easier and faster with long-read assembly. In this study, we obtained WGS of two strains of ZIKV following the established protocol. The sequenced reads resulted in 99% and 100% genome coverage with 63.5X and 21,136X, for the ZIKV PRVABC59 and MR 766 strains, respectively. The sequence identities of the ZIKV PRVABC59 and MR 766 strains for each reference genomes were 98.76% and 99.72%, respectively. We also found that the maximum length of reads was 10,311 bp which is almost the whole genome size of ZIKV. These long-reads could make overall structure of whole genome easily, and WGS faster and easier. The protocol in this study could provide rapid and efficient WGS that could be applied to study the biology of RNA viruses including identification, characterization, and global surveillance.
Bias (Epidemiology)
;
Biology
;
Diagnosis
;
DNA
;
Genome
;
Genome Size
;
Methods
;
Nanopores
;
Polymerase Chain Reaction
;
RNA Viruses
;
RNA
;
RNA, Viral
;
Sequence Analysis, RNA
;
Tail
;
Vaccines
;
Zika Virus