1.TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.
Jae Hyun LIM ; Soo Youn LEE ; Ju Han KIM
Genomics & Informatics 2017;15(1):51-53
High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.
Base Sequence
;
Gene Expression
;
Gene Expression Profiling
;
Molecular Sequence Data
;
Programming Languages
;
Sequence Analysis, RNA
;
Statistics as Topic
;
Transcriptome
2.Circulating Tumor DNA in a Breast Cancer Patient's Plasma Represents Driver Alterations in the Tumor Tissue.
Jieun LEE ; Sung Min CHO ; Min Sung KIM ; Sug Hyung LEE ; Yeun Jun CHUNG ; Seung Hyun JUNG
Genomics & Informatics 2017;15(1):48-50
Tumor tissues from biopsies or surgery are major sources for the next generation sequencing (NGS) study, but these procedures are invasive and have limitation to overcome intratumor heterogeneity. Recent studies have shown that driver alterations in tumor tissues can be detected by liquid biopsy which is a less invasive technique capable of both capturing the tumor heterogeneity and overcoming the difficulty in tissue sampling. However, it is still unclear whether the driver alterations in liquid biopsy can be detected by targeted NGS and how those related to the tissue biopsy. In this study, we performed whole-exome sequencing for a breast cancer tissue and identified PTEN p.H259fs*7 frameshift mutation. In the plasma DNA (liquid biopsy) analysis by targeted NGS, the same variant initially identified in the tumor tissue was also detected with low variant allele frequency. This mutation was subsequently validated by digital polymerase chain reaction in liquid biopsy. Our result confirm that driver alterations identified in the tumor tissue were detected in liquid biopsy by targeted NGS as well, and suggest that a higher depth of sequencing coverage is needed for detection of genomic alterations in a liquid biopsy.
Biopsy
;
Breast Neoplasms*
;
Breast*
;
DNA*
;
Frameshift Mutation
;
Gene Frequency
;
Plasma*
;
Polymerase Chain Reaction
;
Population Characteristics
3.Comparative Analysis of Predicted Gene Expression among Crenarchaeal Genomes.
Shibsankar DAS ; Brajadulal CHOTTOPADHYAY ; Satyabrata SAHOO
Genomics & Informatics 2017;15(1):38-47
Research into new methods for identifying highly expressed genes in anonymous genome sequences has been going on for more than 15 years. We presented here an alternative approach based on modified score of relative codon usage bias to identify highly expressed genes in crenarchaeal genomes. The proposed algorithm relies exclusively on sequence features for identifying the highly expressed genes. In this study, a comparative analysis of predicted highly expressed genes in five crenarchaeal genomes was performed using the score of Modified Relative Codon Bias Strength (MRCBS) as a numerical estimator of gene expression level. We found a systematic strong correlation between Codon Adaptation Index and MRCBS. Additionally, MRCBS correlated well with other expression measures. Our study indicates that MRCBS can consistently capture the highly expressed genes.
Anonyms and Pseudonyms
;
Archaea
;
Base Composition
;
Bias (Epidemiology)
;
Codon
;
Gene Expression*
;
Genome*
4.DNA Methylation Profiles of Blood Cells Are Distinct between Early-Onset Obese and Control Individuals.
Je Keun RHEE ; Jin Hee LEE ; Hae Kyung YANG ; Tae Min KIM ; Kun Ho YOON
Genomics & Informatics 2017;15(1):28-37
Obesity is a highly prevalent, chronic disorder that has been increasing in incidence in young patients. Both epigenetic and genetic aberrations may play a role in the pathogenesis of obesity. Therefore, in-depth epigenomic and genomic analyses will advance our understanding of the detailed molecular mechanisms underlying obesity and aid in the selection of potential biomarkers for obesity in youth. Here, we performed microarray-based DNA methylation and gene expression profiling of peripheral white blood cells obtained from six young, obese individuals and six healthy controls. We observed that the hierarchical clustering of DNA methylation, but not gene expression, clearly segregates the obese individuals from the controls, suggesting that the metabolic disturbance that occurs as a result of obesity at a young age may affect the DNA methylation of peripheral blood cells without accompanying transcriptional changes. To examine the genome-wide differences in the DNA methylation profiles of young obese and control individuals, we identified differentially methylated CpG sites and investigated their genomic and epigenomic contexts. The aberrant DNA methylation patterns in obese individuals can be summarized as relative gains and losses of DNA methylation in gene promoters and gene bodies, respectively. We also observed that the CpG islands of obese individuals are more susceptible to DNA methylation compared to controls. Our pilot study suggests that the genome-wide aberrant DNA methylation patterns of obese individuals may advance not only our understanding of the epigenomic pathogenesis but also early screening of obesity in youth.
Adolescent
;
Biomarkers
;
Blood Cells*
;
CpG Islands
;
DNA Methylation*
;
DNA*
;
Epigenomics
;
Gene Expression
;
Gene Expression Profiling
;
Humans
;
Incidence
;
Leukocytes
;
Mass Screening
;
Obesity
;
Pilot Projects
5.Use of Graph Database for the Integration of Heterogeneous Biological Data.
Byoung Ha YOON ; Seon Kyu KIM ; Seon Young KIM
Genomics & Informatics 2017;15(1):19-27
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Biology
;
Data Mining
6.A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages.
Seung Jin PARK ; Jong Hwan KIM ; Byung Ha YOON ; Seon Young KIM
Genomics & Informatics 2017;15(1):11-18
Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. ‘dada2’ performs trimming of the high-throughput sequencing data. ‘QuasR’ and ‘mosaics’ perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, ‘ChIPseeker’ performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git.
Chromatin
;
Chromatin Immunoprecipitation
;
Genome
;
Quality Control
;
Statistics as Topic*
7.Evaluation of Digital PCR as a Technique for Monitoring Acute Rejection in Kidney Transplantation.
Hyeseon LEE ; Young Mi PARK ; Yu Mee WE ; Duck Jong HAN ; Jung Woo SEO ; Haena MOON ; Yu Ho LEE ; Yang Gyun KIM ; Ju Young MOON ; Sang Ho LEE ; Jong Keuk LEE
Genomics & Informatics 2017;15(1):2-10
Early detection and proper management of kidney rejection are crucial for the long-term health of a transplant recipient. Recipients are normally monitored by serum creatinine measurement and sometimes with graft biopsies. Donor-derived cell-free deoxyribonucleic acid (cfDNA) in the recipient's plasma and/or urine may be a better indicator of acute rejection. We evaluated digital PCR (dPCR) as a system for monitoring graft status using single nucleotide polymorphism (SNP)-based detection of donor DNA in plasma or urine. We compared the detection abilities of the QX200, RainDrop, and QuantStudio 3D dPCR systems. The QX200 was the most accurate and sensitive. Plasma and/or urine samples were isolated from 34 kidney recipients at multiple time points after transplantation, and analyzed by dPCR using the QX200. We found that donor DNA was almost undetectable in plasma DNA samples, whereas a high percentage of donor DNA was measured in urine DNA samples, indicating that urine is a good source of cfDNA for patient monitoring. We found that at least 24% of the highly polymorphic SNPs used to identify individuals could also identify donor cfDNA in transplant patient samples. Our results further showed that autosomal, sex-specific, and mitochondrial SNPs were suitable markers for identifying donor cfDNA. Finally, we found that donor-derived cfDNA measurement by dPCR was not sufficient to predict a patient's clinical condition. Our results indicate that donor-derived cfDNA is not an accurate predictor of kidney status in kidney transplant patients.
Biopsy
;
Creatinine
;
DNA
;
Humans
;
Kidney Transplantation*
;
Kidney*
;
Monitoring, Physiologic
;
Plasma
;
Polymerase Chain Reaction*
;
Polymorphism, Single Nucleotide
;
Tissue Donors
;
Transplant Recipients
;
Transplants
8.Editor's Introduction to This Issue (G&I 15:1, 2017).
Genomics & Informatics 2017;15(1):1-1
No abstract available.
9.Identification and Functional Characterization of P159L Mutation in HNF1B in a Family with Maturity-Onset Diabetes of the Young 5 (MODY5).
Eun Ky KIM ; Ji Seon LEE ; Hae Il CHEONG ; Sung Soo CHUNG ; Soo Heon KWAK ; Kyong Soo PARK
Genomics & Informatics 2014;12(4):240-246
Mutation in HNF1B, the hepatocyte nuclear factor-1beta (HNF-1beta) gene, results in maturity-onset diabetes of the young (MODY) 5, which is characterized by gradual impairment of insulin secretion. However, the functional role of HNF-1beta in insulin secretion and glucose metabolism is not fully understood. We identified a family with early-onset diabetes that fulfilled the criteria of MODY. Sanger sequencing revealed that a heterozygous P159L (CCT to CTT in codon 159 in the DNA-binding domain) mutation in HNF1B was segregated according to the affected status. To investigate the functional consequences of this HNF1B mutation, we generated a P159L HNF1B construct. The wild-type and mutant HNF1B constructs were transfected into COS-7 cells in the presence of the promoter sequence of human glucose transporter type 2 (GLUT2). The luciferase reporter assay revealed that P159L HNF1B had decreased transcriptional activity compared to wild-type (p < 0.05). Electrophoretic mobility shift assay showed reduced DNA binding activity of P159L HNF1B. In the MIN6 pancreatic beta-cell line, overexpression of the P159L mutant was significantly associated with decreased mRNA levels of GLUT2 compared to wild-type (p < 0.05). However, INS expression was not different between the wild-type and mutant HNF1B constructs. These findings suggests that the impaired insulin secretion in this family with the P159L HNF1B mutation may be related to altered GLUT2 expression in beta-cells rather than decreased insulin gene expression. In conclusion, we have identified a Korean family with an HNF1B mutation and characterized its effect on the pathogenesis of diabetes.
Animals
;
Codon
;
COS Cells
;
Diabetes Mellitus, Type 2*
;
DNA
;
Electrophoretic Mobility Shift Assay
;
Gene Expression
;
Glucose
;
Glucose Transporter Type 2
;
Hepatocyte Nuclear Factor 1-beta
;
Humans
;
Insulin
;
Luciferases
;
Metabolism
;
Point Mutation
;
RNA, Messenger
10.Replication of Interactions between Genome-Wide Genetic Variants and Body Mass Index in Fasting Glucose and Insulin Levels.
Kyung Won HONG ; Myungguen CHUNG ; Seong Beom CHO
Genomics & Informatics 2014;12(4):236-239
The genetic regulation of glucose and insulin levels might be modified by adiposity. With regard to the genetic factors that are altered by adiposity, a large meta-analysis on the interactions between genetic variants and body mass index with regard to fasting glucose and insulin levels was reported by the Meta-Analyses of Glucose- and Insulin-related trait Consortium (MAGIC), based on European ancestry. Because no replication study has been performed in other ethnic groups, we first examined the link between reported single-nucleotide polymorphisms (SNPs) and fasting glucose and insulin levels in a large Korean cohort (Korean Genome and Epidemiology Study cohort [KoGES], n = 5,814). The MAGIC study reported 7 novel SNPs for fasting glucose levels and 6 novel SNPs for fasting insulin levels. In this study, we attempted to replicate the association of 5 SNPs with fasting glucose levels and 5 SNPs with fasting insulin levels. One SNP (rs2293941) in PDX1 was identified as a significant obesity-modifiable factor in Koreans. Our results indicate that the novel loci that were identified by MAGIC are poorly replicated in other ethnic groups, although we do not know why.
Adiposity
;
Body Mass Index*
;
Cohort Studies
;
Epidemiology
;
Ethnic Groups
;
Fasting*
;
Genome
;
Glucose*
;
Humans
;
Insulin*
;
Magic
;
Polymorphism, Single Nucleotide