1.BioCovi: A Visualization Service for Comparative Genomics Analysis.
Jungsul LEE ; Daeui PARK ; Jong BHAK
Genomics & Informatics 2005;3(2):52-54
Visualization of the homology information is an important method to analyze the evolutionary and functional meanings of genes. With a database containing model genomes of Homo sapiens, Mus muculus, and Rattus norvegicus, we constructed a web-based comparative analysis tool, BioCovi, to visualize the homology information of mammalian sequences on a very large scale. The user interface has several features: it marks regions whose identity is greater than that specified, it shows or hides gaps from the result of global sequence alignment, and it inverts the graph when total identity is higher than the threshold specified.
Animals
;
Genome
;
Genomics*
;
Humans
;
Mice
;
Rats
;
Sequence Alignment
2.Post-GWAS Strategies.
Genomics & Informatics 2011;9(1):1-4
Genome-wide association (GWA) studies are the method of choice for discovering loci associated with common diseases. More than a thousand GWA studies have reported successful identification of statistically significant association signals in human genomes for a variety of complex diseases. In this review, I discuss some of the issues related to the future of GWA studies and their biomedical applications.
Genome, Human
;
Genome-Wide Association Study
;
Humans
3.BioSubroutine: an Open Web Server for Bioinformatics Algorithms and Subroutines.
Joowon LEE ; Hana KIM ; Wonhye LEE ; Dongil CHUNG ; Jong BHAK
Genomics & Informatics 2005;3(1):35-38
We present BioSubroutine, an open depository server that automatically categorizes various subroutines frequently used in bioinformatics research. We processed a large bioinformatics subroutine library called Bio.pl that was the first Bioperl subroutine library built in 1995. Over 1000 subroutines were processed automatically and an HTML interface has been created. BioSubroutine can accept new subroutines and algorithms from any such subroutine library, as well as provide interactive user forms. The subroutines are stored in an SQL database for quick searching and accessing. BioSubroutine is an open access project under the BioLicense license scheme.
Computational Biology*
;
Licensure
4.HExDB: Human EXon DataBase for Alternative Splicing Pattern Analysis.
Junghwan PARK ; Minho LEE ; Jong BHAK
Genomics & Informatics 2005;3(3):80-85
HExDB is a database for analyzing exon and splicing pattern information in Homo sapiens. HExDB is useful for specific purposes: 1) to design primers for exon amplification from cDNA and 2) to understand the change of ORFs by alternative splicing. HExDB was constructed by integrating data from AltExtron which is the computationally predicted exon database, Ensemble cDNA annotation, and Affymetrix genome tile published recently. Although it may contain false positive data, HExDB is good starting point due to its sensitivity. At present, there are as many as 2,046,519 exons stored in the HExDB. We found that 16.8% of the exons in the database was constitutive exons and 83.1% were novel gene exons.
Alternative Splicing*
;
Animals
;
DNA, Complementary
;
Ecthyma, Contagious
;
Exons*
;
Genome
;
Humans*
;
Open Reading Frames
5.Personal Genomics, Bioinformatics, and Variomics.
Jong BHAK ; Ho GHANG ; Rohit REJA ; Sangsoo KIM
Genomics & Informatics 2008;6(4):161-165
In 2008 at least five complete genome sequences are available. It is known that there are over 15,000,000 genetic variants, called SNPs, in the dbSNP database. The cost of full genome sequencing in 2009 is claimed to be less than $5000 USD. The genomics era has arrived in 2008. This review introduces technologies, bioinformatics, genomics visions, and variomics projects. Variomics is the study of the total genetic variation in an individual and populations. Research on genetic variation is the most valuable among many genomics research branches. Genomics and variomics projects will change biology and the society so dramatically that biology will become an everyday technology like personal computers and the internet. 'BioRevolution' is the term that can adequately describe this change.
Biology
;
Computational Biology
;
Genetic Variation
;
Genome
;
Genomics
;
Humans
;
Internet
;
Microcomputers
;
Polymorphism, Single Nucleotide
;
Vision, Ocular
6.A Clinical Risk Score to Predict In-hospital Mortality from COVID-19 in South Korea
Ae-Young HER ; Youngjune BHAK ; Eun Jung JUN ; Song Lin YUAN ; Scot GARG ; Semin LEE ; Jong BHAK ; Eun-Seok SHIN
Journal of Korean Medical Science 2021;36(15):e108-
Background:
Early identification of patients with coronavirus disease 2019 (COVID-19) who are at high risk of mortality is of vital importance for appropriate clinical decision making and delivering optimal treatment. We aimed to develop and validate a clinical risk score for predicting mortality at the time of admission of patients hospitalized with COVID-19.
Methods:
Collaborating with the Korea Centers for Disease Control and Prevention (KCDC), we established a prospective consecutive cohort of 5,628 patients with confirmed COVID-19 infection who were admitted to 120 hospitals in Korea between January 20, 2020, and April 30, 2020. The cohort was randomly divided using a 7:3 ratio into a development (n = 3,940) and validation (n = 1,688) set. Clinical information and complete blood count (CBC) detected at admission were investigated using Least Absolute Shrinkage and Selection Operator (LASSO) and logistic regression to construct a predictive risk score (COVID-Mortality Score).The discriminative power of the risk model was assessed by calculating the area under the curve (AUC) of the receiver operating characteristic curves.
Results:
The incidence of mortality was 4.3% in both the development and validation set.A COVID-Mortality Score consisting of age, sex, body mass index, combined comorbidity, clinical symptoms, and CBC was developed. AUCs of the scoring system were 0.96 (95% confidence interval [CI], 0.85–0.91) and 0.97 (95% CI, 0.84–0.93) in the development and validation set, respectively. If the model was optimized for > 90% sensitivity, accuracies were 81.0% and 80.2% with sensitivities of 91.7% and 86.1% in the development and validation set, respectively. The optimized scoring system has been applied to the public online risk calculator (https://www.diseaseriskscore.com).
Conclusion
This clinically developed and validated COVID-Mortality Score, using clinical data available at the time of admission, will aid clinicians in predicting in-hospital mortality.
7.A Clinical Risk Score to Predict In-hospital Mortality from COVID-19 in South Korea
Ae-Young HER ; Youngjune BHAK ; Eun Jung JUN ; Song Lin YUAN ; Scot GARG ; Semin LEE ; Jong BHAK ; Eun-Seok SHIN
Journal of Korean Medical Science 2021;36(15):e108-
Background:
Early identification of patients with coronavirus disease 2019 (COVID-19) who are at high risk of mortality is of vital importance for appropriate clinical decision making and delivering optimal treatment. We aimed to develop and validate a clinical risk score for predicting mortality at the time of admission of patients hospitalized with COVID-19.
Methods:
Collaborating with the Korea Centers for Disease Control and Prevention (KCDC), we established a prospective consecutive cohort of 5,628 patients with confirmed COVID-19 infection who were admitted to 120 hospitals in Korea between January 20, 2020, and April 30, 2020. The cohort was randomly divided using a 7:3 ratio into a development (n = 3,940) and validation (n = 1,688) set. Clinical information and complete blood count (CBC) detected at admission were investigated using Least Absolute Shrinkage and Selection Operator (LASSO) and logistic regression to construct a predictive risk score (COVID-Mortality Score).The discriminative power of the risk model was assessed by calculating the area under the curve (AUC) of the receiver operating characteristic curves.
Results:
The incidence of mortality was 4.3% in both the development and validation set.A COVID-Mortality Score consisting of age, sex, body mass index, combined comorbidity, clinical symptoms, and CBC was developed. AUCs of the scoring system were 0.96 (95% confidence interval [CI], 0.85–0.91) and 0.97 (95% CI, 0.84–0.93) in the development and validation set, respectively. If the model was optimized for > 90% sensitivity, accuracies were 81.0% and 80.2% with sensitivities of 91.7% and 86.1% in the development and validation set, respectively. The optimized scoring system has been applied to the public online risk calculator (https://www.diseaseriskscore.com).
Conclusion
This clinically developed and validated COVID-Mortality Score, using clinical data available at the time of admission, will aid clinicians in predicting in-hospital mortality.
8.Biological Object Downloader (BOD) Service for Easy Download and Management of Biological Databases.
Daeui PARK ; Jungwoo LEE ; Giseok YOON ; Sungsam GONG ; Jong BHAK
Genomics & Informatics 2007;5(4):196-199
BOD is an FTP service management tool on the Internet. It was developed for biological researchers in South Korea. It enables easier and faster access of bioinformation without having to go through foreign FTP sites. BOD includes an automatic downloader with a management and email alert service from which the user can easily select and schedule any biological database. Once listed in BOD, the user can check and modify the download status and data from an additional email alert service.
Appointments and Schedules
;
Electronic Mail
;
Internet
;
Korea
9.Structural Bioinformatics Analysis of Disease-related Mutations.
Seong Jin PARK ; Sangho OH ; Daeui PARK ; Jong BHAK
Genomics & Informatics 2008;6(3):142-146
In order to understand the protein functions that are related to disease, it is important to detect the correlation between amino acid mutations and isease. Many mutation studies about disease-related proteins have been carried out through molecular biology techniques, such as vector design, protein engineering, and protein crystallization. However, experimental protein mutation studies are time-consuming, be it in vivo or in vitro. We therefore performed a bioinformatic analysis of known disease-related mutations and their protein structure changes in order to analyze the correlation between mutation and disease. For this study, we selected 111 diseases that were related to 175 proteins from the PDB database and 710 mutations that were found in the protein structures. The mutations were acquired from the Human Gene Mutation Database (HGMD). We selected point mutations, excluding only insertions or deletions, for detecting structural changes. To detect a structural change by mutation, we analyzed not only the structural properties (distance of pocket and mutation, pocket size, surface size, and stability), but also the physico-chemical properties (weight, instability, isoelectric point (IEP), and GRAVY score) for the 710 mutations. We detected that the distance between the pocket and disease-related mutation lay within 20 A (98.5%, 700 proteins). We found that there was no significant correlation between structural stability and disease-causing mutations or between hydrophobicity changes and critical mutations. For large-scale mutational analysis of disease-causing mutations, our bioinformatics approach, using 710 structural mutations, called "Structural Mutatomics," can help researchers to detect disease-specific mutations and to understand the biological functions of disease-related proteins.
Computational Biology
;
Crystallization
;
Humans
;
Hydrophobic and Hydrophilic Interactions
;
Isoelectric Point
;
Molecular Biology
;
Point Mutation
;
Protein Engineering
;
Proteins
10.New Lung Cancer Panel for High-Throughput Targeted Resequencing.
Eun Hye KIM ; Sunghoon LEE ; Jongsun PARK ; Kyusang LEE ; Jong BHAK ; Byung Chul KIM
Genomics & Informatics 2014;12(2):50-57
We present a new next-generation sequencing-based method to identify somatic mutations of lung cancer. It is a comprehensive mutation profiling protocol to detect somatic mutations in 30 genes found frequently in lung adenocarcinoma. The total length of the target regions is 107 kb, and a capture assay was designed to cover 99% of it. This method exhibited about 97% mean coverage at 30x sequencing depth and 42% average specificity when sequencing of more than 3.25 Gb was carried out for the normal sample. We discovered 513 variations from targeted exome sequencing of lung cancer cells, which is 3.9-fold higher than in the normal sample. The variations in cancer cells included previously reported somatic mutations in the COSMIC database, such as variations in TP53, KRAS, and STK11 of sample H-23 and in EGFR of sample H-1650, especially with more than 1,000x coverage. Among the somatic mutations, up to 91% of single nucleotide polymorphisms from the two cancer samples were validated by DNA microarray-based genotyping. Our results demonstrated the feasibility of high-throughput mutation profiling with lung adenocarcinoma samples, and the profiling method can be used as a robust and effective protocol for somatic variant screening.
Adenocarcinoma
;
DNA
;
Exome
;
High-Throughput Nucleotide Sequencing
;
Lung
;
Lung Neoplasms*
;
Mass Screening
;
Polymorphism, Single Nucleotide
;
Sensitivity and Specificity