1.Identification of Conserved Regulatory Elements in Mammalian Promoter Regions: A Case Study Using the PCK1 Promoter
Liu E. GEORGE ; Weirauch T. MATTHEW ; Curtis P. Van Tassell ; Li W. ROBERT ; Sonstegard S. TAD ; Matukumalli K. LAKSHMI ; Connor E. ERIN ; Hanson W. RICHARD ; Yang JIANQI
Genomics, Proteomics & Bioinformatics 2008;6(3):129-143
A systematic phylogenetic footprinting approach was performed to identify conserved transcription factor binding sites (TFBSs) in mammalian promoter regions using human, mouse and rat sequence alignments. We found that the score distributions of most binding site models did not follow the Gaussian distribution required by many statistical methods. Therefore, we performed an empirical test to establish the optimal threshold for each model. We gauged our computational predictions by comparing with previously known TFBSs in the PCK1 gene promoter of the cytosolic isoform of phosphoenolpyruvate carboxykinase, and achieved a sensitivity of 75% and a specificity of approximately 32%. Almost all known sites overlapped with predicted sites, and several new putative TFBSs were also identified. We validated a predicted SP1 binding site in the control of PCK1 transcription using gel shift and reporter assays. Finally, we applied our computational approach to the prediction of putative TFBSs within the promoter regions of all available RefSeq genes. Our full set of TFBS predictions is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.
2.BGVD:An Integrated Database for Bovine Sequencing Variations and Selective Signatures
Chen NINGBO ; Fu WEIWEI ; Zhao JIANBANG ; Shen JIAFEI ; Chen QIUMING ; Zheng ZHUQING ; Chen HONG ; Sonstegard S. TAD ; Lei CHUZHAO ; Jiang YU
Genomics, Proteomics & Bioinformatics 2020;18(2):186-193
Next-generation sequencing has yielded a vast amount of cattle genomic data for global characterization of population genetic diversity and identification of genomic regions under natural and artificial selection. However, efficient storage, querying, and visualization of such large datasets remain challenging. Here, we developed a comprehensive database, the Bovine Genome Variation Database (BGVD). It provides six main functionalities:gene search, variation search, genomic sig-nature search, Genome Browser, alignment search tools, and the genome coordinate conversion tool. BGVD contains information on genomic variations comprising ~60.44 M SNPs, ~6.86 M indels, 76,634 CNV regions, and signatures of selective sweeps in 432 samples from modern cattle worldwide. Users can quickly retrieve distribution patterns of these variations for 54 cattle breeds through an interactive source of breed origin map, using a given gene symbol or genomic region for any of the three versions of the bovine reference genomes (ARS-UCD1.2, UMD3.1.1, and Btau 5.0.1). Signals of selection sweep are displayed as Manhattan plots and Genome Browser tracks. To further investigate and visualize the relationships between variants and signatures of selection, the Genome Browser integrates all variations, selection data, and resources, from NCBI, the UCSC Genome Browser, and Animal QTLdb. Collectively, all these features make the BGVD a useful archive for in-depth data mining and analyses of cattle biology and cattle breeding on a global scale. BGVD is publicly available at http://animal.nwsuaf.edu.cn/BosVar.
3.Identification of conserved regulatory elements in mammalian promoter regions: a case study using the PCK1 promoter.
George E LIU ; Matthew T WEIRAUCH ; Curtis P Van TASSELL ; Robert W LI ; Tad S SONSTEGARD ; Lakshmi K MATUKUMALLI ; Erin E CONNOR ; Richard W HANSON ; Jianqi YANG
Genomics, Proteomics & Bioinformatics 2008;6(3-4):129-143
A systematic phylogenetic footprinting approach was performed to identify conserved transcription factor binding sites (TFBSs) in mammalian promoter regions using human, mouse and rat sequence alignments. We found that the score distributions of most binding site models did not follow the Gaussian distribution required by many statistical methods. Therefore, we performed an empirical test to establish the optimal threshold for each model. We gauged our computational predictions by comparing with previously known TFBSs in the PCK1 gene promoter of the cytosolic isoform of phosphoenolpyruvate carboxykinase, and achieved a sensitivity of 75% and a specificity of approximately 32%. Almost all known sites overlapped with predicted sites, and several new putative TFBSs were also identified. We validated a predicted SP1 binding site in the control of PCK1 transcription using gel shift and reporter assays. Finally, we applied our computational approach to the prediction of putative TFBSs within the promoter regions of all available RefSeq genes. Our full set of TFBS predictions is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.
Algorithms
;
Amino Acid Sequence
;
Animals
;
Base Sequence
;
Binding Sites
;
genetics
;
Cell Line, Tumor
;
Computational Biology
;
methods
;
Conserved Sequence
;
Electrophoretic Mobility Shift Assay
;
Humans
;
Intracellular Signaling Peptides and Proteins
;
genetics
;
Luciferases
;
genetics
;
metabolism
;
Mice
;
Normal Distribution
;
Oligonucleotides
;
genetics
;
metabolism
;
Phosphoenolpyruvate Carboxykinase (GTP)
;
genetics
;
Promoter Regions, Genetic
;
genetics
;
Protein Binding
;
Rats
;
Recombinant Fusion Proteins
;
genetics
;
metabolism
;
Regulatory Sequences, Nucleic Acid
;
genetics
;
Reproducibility of Results
;
Sp1 Transcription Factor
;
genetics
;
metabolism
;
Transcription Factors
;
metabolism
;
Transfection