Application of self-organizing maps in the design of longevity genetic research: sample selection in a nested case-control study
10.3760/cma.j.cn112338-20220616-00536
- VernacularTitle:自组织神经网络在长寿基因研究设计中的应用:巢式病例对照研究样本选择
- Author:
Zhenping ZHAO
1
;
Yan LI
;
Limin WANG
;
Mei ZHANG
;
Zhengjing HUANG
;
Detao ZHANG
;
Jiangmei LIU
;
Fan MAO
;
Yuchang ZHOU
;
Yaning LIU
;
Chao NIE
;
Maigeng ZHOU
Author Information
1. 中国疾病预防控制中心慢性非传染性疾病预防控制中心,北京 100050
- Keywords:
Longevity;
Cohort;
Nested case-control;
Genome-wide association studies
- From:
Chinese Journal of Epidemiology
2023;44(2):326-334
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To improve the longevity genetic research study design by applying self-organizing maps to select a control group for longevity study.Methods:This study included the Han population aged 90 years and above or less than 80 years who have died (control group) from the natural population-based cohort formed by the fusion of the Chinese Chronic Diseases and Risk Factors Surveillance in 2013 and the China Death Surveillance System. The subjects who died of injury, infectious diseases, parasitic diseases, and malignant tumors were excluded. The self-organizing maps method, with multiple iterations and self-organizing clustering, was used to select similar factors among the population aged 90 years and above and the control group, including demographic characteristics, diseases, living habits, social behaviors, and mental and psychological factors. The study used PLINK 1.9 software to evaluate the quality of whole genome sequencing and to conduct logistic regression of single nucleotide polymorphisms (SNPs) and longevity on autosomes. Q-Q plots were used to visualize the P value associated with SNPs and longevity. Results:There were 1 019 samples selected from the baseline of 177 099 survey participants for genome sequencing, including 517 in the longevity group and 502 in the control group. The longevity and the control groups are generally similar in smoking, drinking, diet, sleep duration, blood lipid level, and self-assessment oral health status but differ significantly in socio-economic status, physical activity time, BMI, and self-assessment health status. The whole genome sequencing results were controlled, and 4 618 216 SNPs were involved in association analysis. The Q-Q plot of longevity-related SNPs analysis results showed that the enrichment of P value 1e-4 was significantly lower than the expected P value, and significant signals were also detected among P<1e-7 regions. Conclusions:The self-organizing maps can comprehensively consider the influence of socioeconomic and behavioral risk factors and select longevity control samples among samples with actual age and cause of death in a large-scale natural population cohort to improve the efficiency of longevity genome association analysis. This study provides a methodological reference for nested case-control study sample selection from the large-scale natural population cohort.