Identification of the associations between genes and quantitative traits using entropy-based kernel densityestimation

Jaeyong YEE; Taesung PARK; Mira PARK

Return

Identification of the associations between genes and quantitative traits using entropy-based kernel densityestimation

Author: Jaeyong YEE ¹ ; Taesung PARK ; Mira PARK
Author Information

1. Department of Physiology and Biophysics, Eulji University, Daejeon 34824, Korea
Publication Type:Original article
From:Genomics & Informatics 2022;20(2):e17-
CountryRepublic of Korea
Language:English
Abstract: Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.