1.Comparative analysis of commonly used peak calling programs for ChIP-Seq analysis
Hyeongrin JEON ; Hyunji LEE ; Byunghee KANG ; Insoon JANG ; Tae-Young ROH
Genomics & Informatics 2020;18(4):e42-
Chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-Seq) is a powerful technology to profile the location of proteins of interest on a whole-genome scale. To identify the enrichment location of proteins, many programs and algorithms have been proposed. However, none of the commonly used peak calling programs could accurately explain the binding features of target proteins detected by ChIP-Seq. Here, publicly available data on 12 histone modifications, including H3K4ac/me1/me2/me3, H3K9ac/me3, H3K27ac/me3, H3K36me3, H3K56ac, and H3K79me1/me2, generated from a human embryonic stem cell line (H1), were profiled with five peak callers (CisGenome, MACS1, MACS2, PeakSeq, and SISSRs). The performance of the peak calling programs was compared in terms of reproducibility between replicates, examination of enriched regions to variable sequencing depths, the specificity-to-noise signal, and sensitivity of peak prediction. There were no major differences among peak callers when analyzing point source histone modifications. The peak calling results from histone modifications with low fidelity, such as H3K4ac, H3K56ac, and H3K79me1/me2, showed low performance in all parameters, which indicates that their peak positions might not be located accurately. Our comparative results could provide a helpful guide to choose a suitable peak calling program for specific histone modifications.
2.Comparative analysis of commonly used peak calling programs for ChIP-Seq analysis
Hyeongrin JEON ; Hyunji LEE ; Byunghee KANG ; Insoon JANG ; Tae-Young ROH
Genomics & Informatics 2020;18(4):e42-
Chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-Seq) is a powerful technology to profile the location of proteins of interest on a whole-genome scale. To identify the enrichment location of proteins, many programs and algorithms have been proposed. However, none of the commonly used peak calling programs could accurately explain the binding features of target proteins detected by ChIP-Seq. Here, publicly available data on 12 histone modifications, including H3K4ac/me1/me2/me3, H3K9ac/me3, H3K27ac/me3, H3K36me3, H3K56ac, and H3K79me1/me2, generated from a human embryonic stem cell line (H1), were profiled with five peak callers (CisGenome, MACS1, MACS2, PeakSeq, and SISSRs). The performance of the peak calling programs was compared in terms of reproducibility between replicates, examination of enriched regions to variable sequencing depths, the specificity-to-noise signal, and sensitivity of peak prediction. There were no major differences among peak callers when analyzing point source histone modifications. The peak calling results from histone modifications with low fidelity, such as H3K4ac, H3K56ac, and H3K79me1/me2, showed low performance in all parameters, which indicates that their peak positions might not be located accurately. Our comparative results could provide a helpful guide to choose a suitable peak calling program for specific histone modifications.
3.Bioinformatics services for analyzing massive genomic datasets
Gunhwan KO ; Pan-Gyu KIM ; Youngbum CHO ; Seongmun JEONG ; Jae-Yoon KIM ; Kyoung Hyoun KIM ; Ho-Yeon LEE ; Jiyeon HAN ; Namhee YU ; Seokjin HAM ; Insoon JANG ; Byunghee KANG ; Sunguk SHIN ; Lian KIM ; Seung-Won LEE ; Dougu NAM ; Jihyun F. KIM ; Namshin KIM ; Seon-Young KIM ; Sanghyuk LEE ; Tae-Young ROH ; Byungwook LEE
Genomics & Informatics 2020;18(1):e8-
The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www.bioexpress.re.kr/.
4.Bioinformatics services for analyzing massive genomic datasets
Gunhwan KO ; Pan-Gyu KIM ; Youngbum CHO ; Seongmun JEONG ; Jae-Yoon KIM ; Kyoung Hyoun KIM ; Ho-Yeon LEE ; Jiyeon HAN ; Namhee YU ; Seokjin HAM ; Insoon JANG ; Byunghee KANG ; Sunguk SHIN ; Lian KIM ; Seung-Won LEE ; Dougu NAM ; Jihyun F. KIM ; Namshin KIM ; Seon-Young KIM ; Sanghyuk LEE ; Tae-Young ROH ; Byungwook LEE
Genomics & Informatics 2020;18(1):e8-
The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www.bioexpress.re.kr/.