Statistical methods for extremely unbalanced data in genome-wide association study (2)

Ning XIE; Wenjian BI; Zhongwen ZHANG; Fang SHAO; Yongyue WEI; Yang ZHAO; Ruyang ZHANG; Feng CHEN

Return

Statistical methods for extremely unbalanced data in genome-wide association study (2)

VernacularTitle:全基因组关联研究中极端不平衡数据的统计分析方法（二）
Author: Ning XIE ¹ ; Wenjian BI ; Zhongwen ZHANG ; Fang SHAO ; Yongyue WEI ; Yang ZHAO ; Ruyang ZHANG ; Feng CHEN
Author Information

1. 南京医科大学公共卫生学院生物统计学系，南京　211166
Publication Type:Journal Article
Keywords: Genome-wide association study; Extremely unbalanced data; Firth correction; Saddle point approximation; Rare variant
From: Chinese Journal of Epidemiology 2025;46(1):147-153
CountryChina
Language:Chinese
Abstract: Extremely unbalanced data refers to datasets with independent or dependent variables showing severe imbalances in proportions, which might lead to deviation of classical test statistics from theoretical distribution and difficulties in controlling type Ⅰ error. The increased availability of genome-wide resources from large population cohorts has highlighted the growing demand for efficient and accurate statistical methods for the process of extremely unbalanced data to improve the development of genetic statistical methods. This paper introduces two widely used correction methods in current genome-wide association study for extremely unbalanced data, i.e. Firth correction and saddle point approximation, describes their effectiveness in controlling type Ⅰ errors confirmed by simulation experiments, finally, and summarizes the commonly used software for extremely unbalanced genomic data to provide theoretical reference and suggestion for its application for the statistical analysis on extremely unbalanced data in future.