1.Construction and Implications of the Immunology Database and Analysis Portal:ImmPort
Zhengyong HU ; Wei ZHOU ; Anran WANG ; Yifan DUAN ; Wanfei HU ; Sizhu WU
Journal of Medical Informatics 2024;45(8):20-27
Purpose/Significance By summarizing the construction experience of the immunology database and analysis portal(ImmPort),the study aims to provide insights and references for the development of a large-scale immunology database in China.Method/Process It comprehensively analyzes the architecture of ImmPort and the functionalities of its various modules,and delineates the data flow for data collection,organization,sharing,and analysis within the database.Finally,it summarizes the practical achieve-ments of the ImmPort platform.Result/Conclusion In the process of constructing the immunology database,our country should prioritize the standardization of data organization and modeling.Standardized terminology should be actively adopted to provide semantic support,and data sharing should be carried out in the management of classification and grading,while developing a supporting tool system to en-sure the safe and effective sharing and utilization of immunology data.
2.Discovery, Identification and Comparative Analysis of Non-Specific Lipid Transfer Protein(nsLtp) Family in Solanaceae
Liu WANFEI ; Huang DAWEI ; Liu KAN ; Hu SONGNIAN ; Yu JUN ; Gao GANG ; Song SHUHUI
Genomics, Proteomics & Bioinformatics 2010;08(4):229-237
Plant non-specific lipid transfer proteins(nsLtps) have been reported to be involved in plant defense activity against bacterial and fungal pathogens.In this study,we identified 135 (122 putative and 13 previously identified) Solanaceae nsLtps,which are clustered into 8 different groups.By comparing with Boutrot's nsLtp classification,we classified these eight groups into five types (Ⅰ,Ⅱ,Ⅳ,Ⅸ and Ⅹ).We compared Solanaceae nsLtps with Arabidopsis and Gramineae nsLtps and found that (1) Types Ⅰ,Ⅱ and Ⅳ are shared by Solanaceae,Gramineae and Arabidopsis;(2) Types Ⅲ,Ⅴ,Ⅵ and Ⅷ are shared by Gramineae and Arabidopsis but not detected in Solanaceae so far;(3) Type Ⅶ is only found in Gramineae whereas type Ⅸ is present only in Arabidopsis and Solanaceae;(4) Type X is a new type that accounts for 52.59% Solanaceae nsLtps in our data,and has not been reported in any other plant so far.We further built and compared the three-dimensional structures of the eight groups,and found that the major functional diversification within the nsLtp family could be predated to the monocot/dicot divergence,and many gene duplications and sequence variations had happened in the nsLtp family after the monocot/dicot divergence,especially in Solanaceae.
3.The Association Between H3K4me3 and Antisense Transcription
Cui PENG ; Liu WANFEI ; Zhao YUHUI ; Lin QIANG ; Ding FENG ; Xin CHENGQI ; Geng JIANING ; Song SHUHUI ; Sun FANGLIN ; Hu SONGNIAN ; Yu JUN
Genomics, Proteomics & Bioinformatics 2012;10(2):74-81
Histone H3 lysine 4 trimethylation (H3K4me3) is well known to occur in the promoter region of genes for transcription activation.However,when investigating the H3K4me3 profiles in the mouse cerebrum and testis,we discovered that H3K4me3 also has a significant enrichment at the 3' end of actively transcribed (sense) genes,named as 3′-H3K4me3.3′-H3K4me3 is associated with ~15% of protein-coding genes in both tissues.In addition,we examined the transcriptional initiation signals including RNA polymerase II (RNAPII)binding sites and 5′-CAGE-tag that marks transcriptional start sites.Interestingly,we found that 3′-H3K4me3 is associated with the initiation of antisense transcription.Furthermore,3′-H3K4me3 modification levels correlate positively with the antisense expression levels of the associated sense genes,implying that 3′-H3K4me3 is involved in the activation of antisense transcription.Taken together,our findings suggest that H3K4me3 may be involved in the regulation of antisense transcription that initiates from the 3′ end of sense genes.In addition,a positive correlation was also observed between the expression of antisense and the associated sense genes with 3'-H3K4me3 modification.More importantly,we observed the 3'-H3K4me3 enrichment among genes in human,fruitfly and Arabidopsis,and found that the sequences of 3'-H3K4me3-marked regions are highly conserved and essentially indistinguishable from known promoters in vertebrate.Therefore,we speculate that these 3'-H3K4me3-marked regions may serve as potential promoters for antisense transcription and 3′-H3K4me3 appear to be a universal epigenetic feature in eukaryotes.Our results provide a novel insight into the epigenetic roles of H3K4me3 and the regulatory mechanism of antisense transcription.
4.Comparative Analyses of H3K4 and H3K27 Trimethylations Between the Mouse Cerebrum and Testis
Cui PENG ; Liu WANFEI ; Zhao YUHUI ; Lin QIANG ; Zhang DAOYONG ; Ding FENG ; Xin CHENGQI ; Zhang ZHANG ; Song SHUHUI ; Sun FANGLIN ; Yu JUN ; Hu SONGNIAN
Genomics, Proteomics & Bioinformatics 2012;10(2):82-93
The global features of H3K4 and H3K27 trimethylations (H3K4me3 and H3K27me3) have been well studied in recent years,but most of these studies were performed in mammalian cell lines.In this work,wegenerated the genome-wide maps of H3K4me3 and H3K27me3 of mouse cerebrum and testis using ChlP-seq and their high-coverage transcriptomes using ribominus RNA-seq with SOLiD technology.We examined the global patterns of H3K4me3 and H3K27me3 in both tissues and found that modifications are closely-associated with tissue-specific expression,function and development.Moreover,we revealed that H3K4me3 and H3K27me3 rarely occur in silent genes,which contradicts the findings in previous studies.Finally,we observed that bivalent domains,with both H3K4me3 and H3K27me3,existed ubiquitously in both tissues and demonstrated an invariable preference for the regulation of developmentally-related genes.However,the bivalent domains tend towards a "winner-takes-all" approach to regulate the expression of associated genes.We also verified the above results in mouse ES cells.As expected,the results in ES cells are consistent with those in cerebrum and testis.In conclusion,we present two very important findings.One is that H3K4me3 and H3K27me3 rarely occur in silent genes.The other is that bivalent domains may adopt a "winner-takes-all" principle to regulate gene expression.
5.A Chromosome-level Genome Assembly of Wild Castor Provides New Insights into Its Adaptive Evolution in Tropical Desert
Lu JIANJUN ; Pan CHENG ; Fan WEI ; Liu WANFEI ; Zhao HUAYAN ; Li DONGHAI ; Wang SEN ; Hu LIANLIAN ; He BING ; Qian KUN ; Qin RUI ; Ruan JUE ; Lin QIANG ; Lü SHIYOU ; Cui PENG
Genomics, Proteomics & Bioinformatics 2022;20(1):42-59
Wild castor grows in the high-altitude tropical desert of the African Plateau,a region known for high ultraviolet radiation,strong light,and extremely dry condition.To investigate the potential genetic basis of adaptation to both highland and tropical deserts,we generated a chromosome-level genome sequence assembly of the wild castor accession WT05,with a genome size of 316 Mb,a scaffold N50 of 31.93 Mb,and a contig N50 of 8.96 Mb,respectively.Compared with cultivated castor and other Euphorbiaceae species,the wild castor exhibits positive selection and gene family expansion for genes involved in DNA repair,photosynthesis,and abiotic stress responses.Genetic variations associated with positive selection were identified in several key genes,such as LIG1,DDB2,and RECGI,involved in nucleotide excision repair.Moreover,a study of genomic diversity among wild and cultivated accessions revealed genomic regions containing selection signatures associated with the adaptation to extreme environments.The identification of the genes and alleles with selection signatures provides insights into the genetic mechanisms under-lying the adaptation of wild castor to the high-altitude tropical desert and would facilitate direct improvement of modern castor varieties.
6.RGAAT: A Reference-based Genome Assembly and Annotation Tool for New Genomes and Upgrade of Known Genomes.
Wanfei LIU ; Shuangyang WU ; Qiang LIN ; Shenghan GAO ; Feng DING ; Xiaowei ZHANG ; Hasan Awad ALJOHI ; Jun YU ; Songnian HU
Genomics, Proteomics & Bioinformatics 2018;16(5):373-381
The rapid development of high-throughput sequencing technologies has led to a dramatic decrease in the money and time required for de novo genome sequencing or genome resequencing projects, with new genome sequences constantly released every week. Among such projects, the plethora of updated genome assemblies induces the requirement of version-dependent annotation files and other compatible public dataset for downstream analysis. To handle these tasks in an efficient manner, we developed the reference-based genome assembly and annotation tool (RGAAT), a flexible toolkit for resequencing-based consensus building and annotation update. RGAAT can detect sequence variants with comparable precision, specificity, and sensitivity to GATK and with higher precision and specificity than Freebayes and SAMtools on four DNA-seq datasets tested in this study. RGAAT can also identify sequence variants based on cross-cultivar or cross-version genomic alignments. Unlike GATK and SAMtools/BCFtools, RGAAT builds the consensus sequence by taking into account the true allele frequency. Finally, RGAAT generates a coordinate conversion file between the reference and query genomes using sequence variants and supports annotation file transfer. Compared to the rapid annotation transfer tool (RATT), RGAAT displays better performance characteristics for annotation transfer between different genome assemblies, strains, and species. In addition, RGAAT can be used for genome modification, genome comparison, and coordinate conversion. RGAAT is available at https://sourceforge.net/projects/rgaat/ and https://github.com/wushyer/RGAAT_v2 at no cost.
Genome
;
Genomics
;
High-Throughput Nucleotide Sequencing
;
methods
;
standards
;
Humans
;
Reference Standards
;
Sequence Analysis, DNA
;
methods
;
standards
;
Software