1.Computational Approach for Biosynthetic Engineering of Post-PKS Tailoring Enzymes.
Genomics & Informatics 2008;6(4):227-230
Compounds of polyketide origin possess a wealth of pharmacological effects, including antibacterial, antifungal, antiparasitic, anticancer and immunosuppressive activities. Many of these compounds and their semisynthetic derivatives are used today in the clinic. Most of the gene clusters encoding commercially important drugs have also been cloned and sequenced and their biosynthetic mechanisms studied in great detail. The area of biosynthetic engineering of the enzymes involved in polyketide biosynthesis has recently advanced and been transferred into the industrial arena. In this work, we introduce a computational system to provide the user with a wealth of information that can be utilized for biosynthetic engineering of enzymes involved in post-PKS tailoring steps. Post-PKS tailoring steps are necessary to add functional groups essential for the biological activity and are therefore important in polyketide biosynthesis.
Clone Cells
;
Multigene Family
2.Computational Approach for the Analysis of Post-PKS Glycosylation Step.
Genomics & Informatics 2008;6(4):223-226
We introduce a computational approach for analysis of glycosylation in Post-PKS tailoring steps. It is a computational method to predict the deoxysugar biosynthesis unit pathway and the substrate specificity of glycosyltransferases involved in the glycosylation of polyketides. In this work, a directed and weighted graph is introduced to represent and predict the deoxysugar biosynthesis unit pathway. In addition, a homology based gene clustering method is used to predict the substrate specificity of glycosyltransferases. It is useful for the rational design of polyketide natural products, which leads to in silico drug discovery.
Biological Agents
;
Computer Simulation
;
Glycosylation
;
Glycosyltransferases
;
Polyketides
;
Substrate Specificity
3.Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data
Yuna LEE ; Kiejung PARK ; Insong KOH
Genomics & Informatics 2019;17(4):40-
While studies aimed at detecting and analyzing indels or single nucleotide polymorphisms within human genomic sequences have been actively conducted, studies on detecting long insertions/deletions are not easy to orchestrate. For the last 10 years, the availability of long read data of human genomes from PacBio or Nanopore platforms has increased, which makes it easier to detect long insertions/deletions. However, because long read data have a critical disadvantage due to their relatively high cost, many next generation sequencing data are produced mainly by short read sequencing machines. Here, we constructed programs to detect so-called unmapped regions (UMRs, where no reads are mapped on the reference genome), scanned 40 Korean genomes to select UMR long deletion candidates, and compared the candidates with the long deletion break points within the genomes available from the 1000 Genomes Project (1KGP). An average of about 36,000 UMRs were found in the 40 Korean genomes tested, 284 UMRs were common across the 40 genomes, and a total of 37,943 UMRs were found. Compared with the 74,045 break points provided by the 1KGP, 30,698 UMRs overlapped. As the number of compared samples increased from 1 to 40, the number of UMRs that overlapped with the break points also increased. This eventually reached a peak of 80.9% of the total UMRs found in this study. As the total number of overlapped UMRs could probably grow to encompass 74,045 break points with the inclusion of more Korean genomes, this approach could be practically useful for studies on long deletions utilizing short read data.
Genome
;
Genome, Human
;
Humans
;
Nanopores
;
Polymorphism, Single Nucleotide
4.Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data
Yuna LEE ; Kiejung PARK ; Insong KOH
Genomics & Informatics 2019;17(4):e40-
While studies aimed at detecting and analyzing indels or single nucleotide polymorphisms within human genomic sequences have been actively conducted, studies on detecting long insertions/deletions are not easy to orchestrate. For the last 10 years, the availability of long read data of human genomes from PacBio or Nanopore platforms has increased, which makes it easier to detect long insertions/deletions. However, because long read data have a critical disadvantage due to their relatively high cost, many next generation sequencing data are produced mainly by short read sequencing machines. Here, we constructed programs to detect so-called unmapped regions (UMRs, where no reads are mapped on the reference genome), scanned 40 Korean genomes to select UMR long deletion candidates, and compared the candidates with the long deletion break points within the genomes available from the 1000 Genomes Project (1KGP). An average of about 36,000 UMRs were found in the 40 Korean genomes tested, 284 UMRs were common across the 40 genomes, and a total of 37,943 UMRs were found. Compared with the 74,045 break points provided by the 1KGP, 30,698 UMRs overlapped. As the number of compared samples increased from 1 to 40, the number of UMRs that overlapped with the break points also increased. This eventually reached a peak of 80.9% of the total UMRs found in this study. As the total number of overlapped UMRs could probably grow to encompass 74,045 break points with the inclusion of more Korean genomes, this approach could be practically useful for studies on long deletions utilizing short read data.
5.BioStore: A Repository System for Registering and Distributing Public Biology Databases.
Hongseok TAE ; Jeong Min HAN ; Bu Young AHN ; Kiejung PARK
Genomics & Informatics 2009;7(1):49-51
Although abundant biology data have been accumulated in public biology databases, such as GenBank and PIR, few easy-interface services are provided for users to access or update them. We have developed a system, named BioStore, that is composed of several programs to aid users to not only access public data but also share their own data easily. The service can be used for maintaining a local database as a repository of raw data files of several public databases and distributing the data files to other users. Currently, BioStore manipulates major bio-databases and will expand to include more databases and more useful interfaces.
Biology
;
Databases, Nucleic Acid
;
Formycins
;
Ribonucleotides
;
Information Storage and Retrieval
6.COCAW: A Genome-wide Pattern Search System for Designing Microbial Probes.
Seunghee RYU ; Kiejung PARK ; Dohoon LEE ; Cheol Min KIM
Genomics & Informatics 2009;7(3):178-180
A few bioinformatics tools have been used to find out conserved regions as probes. We have developed a system based on a heuristic method with web interfaces to find out conserved regions against microbial genomes. The system runs in real time by using relative entropy in limited narrow regions and detecting similar regions between pair regions with local alignment. The system could be useful to find out conserved regions as genome-wide scale.
Computational Biology
;
Entropy
;
Genome
7.WinBioDBs: A Windows-based Integrated Program for Manipulating Major Biological Databases.
Hyeweon NAM ; Jin Ho LEE ; Kiejung PARK
Genomics & Informatics 2009;7(3):175-177
We have developed WinBioDBs with Windows interfaces, which include importing modules and searching interfaces for 10 major public databases such as GenBank, PIR, SwissProt, Pathway, EPD, ENZYME, REBASE, Prosite, Blocks, and Pfam. User databases can be constructed with searching results of queries and their entries can be edited. The program is a stand-alone database searching program on Windows PC. Database update features are supported by importing raw database files and indexing after downloading them. Users can adjust their own searching environments and report format and construct their own projects consisting of a combination of a local databases. WinBioDBs are implemented with VC++ and its database is based on MySQL.
Abstracting and Indexing as Topic
;
Databases, Nucleic Acid
;
Databases, Protein
8.PromoterWizard: An Integrated Promoter Prediction Program Using Hybrid Methods.
Genomics & Informatics 2011;9(4):194-196
Promoter prediction is a very important problem and is closely related to the main problems of bioinformatics such as the construction of gene regulatory networks and gene function annotation. In this context, we developed an integrated promoter prediction program using hybrid methods, PromoterWizard, which can be employed to detect the core promoter region and the transcription start site (TSS) in vertebrate genomic DNA sequences, an issue of obvious importance for genome annotation efforts. PromoterWizard consists of three main modules and two auxiliary modules. The three main modules include CDRM (Composite Dependency Reflecting Model) module, SVM (Support Vector Machine) module, and ICM (Interpolated Context Model) module. The two auxiliary modules are CpG Island Detector and GCPlot that may contribute to improving the predictive accuracy of the three main modules and facilitating human curator to decide on the final annotation.
Base Sequence
;
Chimera
;
Computational Biology
;
CpG Islands
;
Dependency (Psychology)
;
Gene Regulatory Networks
;
Genome
;
Humans
;
Promoter Regions, Genetic
;
Transcription Initiation Site
;
Vertebrates
9.RGISS: Rice (Oryza sativa L. ssp. japonica) Genome Information Service System.
Daesang LEE ; Hwajung SEO ; Jang Ho HAHN ; Eun Bae KONG ; Kiejung PARK
Genomics & Informatics 2007;5(4):194-195
We have constructed the Rice Genome Information Service System (RGISS), which is an information service system of the Oryza sativa L. ssp. japonica (rice) genome, using the released version of rice Build 3.0 pseudomolecules based on the Ensembl architecture. The nonredundant library, composed of 3,360 clones of BACs, PACs, and fosmids, was used to construct supercontigs. RGISS contains 50,717 annotated genes from GenBank, 56,161 predicted genes from FgeneSH, and information on 9,587 markers, which includes STS, SSR, and EST-based RFLP. The 20,180 ESTs sequenced by the Korea National Institute of Agricultural Biotechnology (NIAB) were aligned and mapped into 168,792 exons. By gene ontology analysis, the classified protein numbers in the rice genome were 6158, 4531, and 12,364 proteins, which were mapped to molecular function, cellular component, and biological process, respectively.
Biological Processes
;
Biotechnology
;
Clone Cells
;
Databases, Nucleic Acid
;
Exons
;
Expressed Sequence Tags
;
Gene Ontology
;
Genome*
;
Information Services*
;
Korea
;
Polymorphism, Restriction Fragment Length
;
Oryza
10.A Bio-database Management System for the Monitoring and Automatic FTP of Public Databases.
Hongseok TAE ; Jeong Min HAN ; Bu Young AHN ; Kiejung PARK
Genomics & Informatics 2008;6(2):95-97
Many bioinformatics sites have managed local bio-databases, including major databases such as GenBank and PIR with update load. We have developed several programs to monitor the update status of these databases and to FTP them automatically. These programs can be used for maintaining local bio-databases as recent versions and providing up-to-date databases through FTP sites. Currently, the program serves major bio-databases and will extend to accommodate many more bio-databases.
Computational Biology
;
Databases, Nucleic Acid
;
Formycins
;
Organothiophosphorus Compounds
;
Ribonucleotides