1.In Silico Identification of 6-Phosphogluconolactonase Genes that are Frequently Missing from Completely Sequenced Bacterial Genomes.
Haeyoung JEONG ; Jihyun F KIM ; Hong Seog PARK
Genomics & Informatics 2006;4(4):182-187
6-Phosphogluconolactonase (6PGL) is one of the key enzymes in the ubiquitous pathways of central carbon metabolism, but bacterial 6PGL had been long known as a missing enzyme even after complete bacterial genome sequence information became available. Although recent experimental characterization suggests that there are two types of 6PGLs (DevB and YbhE), their phylogenetic distribution is severely biased. Here we present that proteins in COG group previously described as 3-carboxymuconate cyclase (COG2706) are actually the YbhE-type 6PGLs, which are widely distributed in Proteobacteria and Firmicutes. This case exemplifies how erroneous functional description of a member in the reference database commonly used in transitive genome annotation cause systematic problem in the prediction of genes even with universal cellular functions.
Bias (Epidemiology)
;
Carbon
;
Computer Simulation*
;
Genome
;
Genome, Bacterial*
;
Metabolism
;
Pentose Phosphate Pathway
;
Proteobacteria
2.An Optimized Strategy for Genome Assembly of Sanger/pyrosequencing Hybrid Data using Available Software.
Genomics & Informatics 2008;6(2):87-90
During the last four years, the pyrosequencing-based 454 platform has rapidly displaced the traditional Sanger sequencing method due to its high throughput and cost effectiveness. Meanwhile, the Sanger sequencing methodology still provides the longest reads, and paired-end sequencing that is based on that chemistry offers an opportunity to ensure accurate assembly results. In this report, we describe an optimized approach for hybrid de novo genome assembly using pyrosequencing data and varying amounts of Sanger-type reads. 454 platformderived contigs can be used as single non-breakable virtual reads or converted to simpler contigs that consist of editable, overlapping pseudoreads. These modified contigs maintain their integrity at the first jumpstarting assembly stage and are edited by fragmenting and rejoining. Pre-existing assembly software then can be applied for mixed assembly with 454-derived data and Sanger reads. An effective method for identifying genomic differences between reference and sample sequences in whole-genome resequencing procedures also is suggested.
Chimera
;
Cost-Benefit Analysis
;
Dietary Sucrose
;
Genome
3.Bioinformatics services for analyzing massive genomic datasets
Gunhwan KO ; Pan-Gyu KIM ; Youngbum CHO ; Seongmun JEONG ; Jae-Yoon KIM ; Kyoung Hyoun KIM ; Ho-Yeon LEE ; Jiyeon HAN ; Namhee YU ; Seokjin HAM ; Insoon JANG ; Byunghee KANG ; Sunguk SHIN ; Lian KIM ; Seung-Won LEE ; Dougu NAM ; Jihyun F. KIM ; Namshin KIM ; Seon-Young KIM ; Sanghyuk LEE ; Tae-Young ROH ; Byungwook LEE
Genomics & Informatics 2020;18(1):e8-
The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www.bioexpress.re.kr/.
4.Bioinformatics services for analyzing massive genomic datasets
Gunhwan KO ; Pan-Gyu KIM ; Youngbum CHO ; Seongmun JEONG ; Jae-Yoon KIM ; Kyoung Hyoun KIM ; Ho-Yeon LEE ; Jiyeon HAN ; Namhee YU ; Seokjin HAM ; Insoon JANG ; Byunghee KANG ; Sunguk SHIN ; Lian KIM ; Seung-Won LEE ; Dougu NAM ; Jihyun F. KIM ; Namshin KIM ; Seon-Young KIM ; Sanghyuk LEE ; Tae-Young ROH ; Byungwook LEE
Genomics & Informatics 2020;18(1):e8-
The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www.bioexpress.re.kr/.