Research on The Genealogical Inference Efficiency of High-density SNPs

Jing LI; Yi-Jie SUN; Wen-Ting ZHAO; Zi-Chen TANG; Jing LIU; Cai-Xia LI

Return

Research on The Genealogical Inference Efficiency of High-density SNPs

VernacularTitle:高密度单核苷酸多态性的系谱推断效能研究
Author: Jing LI ¹ ; Yi-Jie SUN ² ; Wen-Ting ZHAO ³ ; Zi-Chen TANG ⁴ ; Jing LIU ³ ; Cai-Xia LI ¹
Author Information

1. College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Hebei Medical University, Shijiazhuang 050017, China
2. Institute of Criminal Investigation, People’s Public Security University of China, Beijing 100038, China
3. Key Laboratory of Forensic Genetics, Beijing Engineering Research Center of Crime Scene Evidence Examination, Institute of Forensic science, Beijing 100038, China
4. Jiangsu Key Laboratory of Phylogenomics and Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou 221000, China
Publication Type:Journal Article
Keywords: high-density SNPs panel; forensic SNP genealogy inference; whole genome sequencing; genealogy inference; identity by descent algorithm
From: Progress in Biochemistry and Biophysics 2026;53(3):740-753
CountryChina
Language:Chinese
Abstract: ObjectiveThis study aims to explore the potential of different orders of magnitude single-nucleotide polymorphism (SNP) locus combinations for predicting distant kinship relationships. A high-density SNP locus set was constructed, and a comprehensive assessment of its inference capability was conducted. MethodsFirstly, we selected three commercial chip panels, CGA (Chinese genotyping array, Illumina), GSA (Global screening array, Illumina), Affy (23MF_V2 high-density SNP array, Affymetrix) and merged them after quality control, forming a high-density SNP locus panel(1 180 k). Secondly, we selected 161 samples and collected their peripheral blood samples by using whole-genome sequencing technology. Within this sample population, the levels of kinship relationships fully covered the range from level 1 to level 9, and the number of kinship pairs at each level was consistently maintained at over 50 pairs. From 161 samples data of whole-genome sequencing, the 1 180 k locus set was extracted, which is referred to as the high-density SNP locus set in the following text. The kinship inference was conducted using the identity-by-descent (IBD) algorithm with the selected optimal parameters. To comprehensively evaluate the performance of the high-density SNP locus set in kinship inference, we compared it with the three commercial chip panels, the intersection of these three chip loci, and the control sets constructed by randomly reducing the number of the high-density SNP locus set. Based on the changes in the IBD lengths, as well as the dynamic trends in prediction accuracy, we conducted a scientific assessment of the kinship inference capability of the high-density SNP locus set. ResultsAfter screening, a set of 1 184 334 autosomal SNPs was obtained. During the process of screening the optimal IBD length threshold, the result revealed that 0 cM, 1 cM, and 2 cM all demonstrated good applicability. However, to avoid the issue of a large amount of redundant information caused by setting a too low IBD length threshold, this study ultimately selected 2 cM as the optimal threshold. Compared with the average results of three chip panels, the high-density SNP locus set increased the total IBD length and the average IBD length across levels 1-9; the accuracy of the confidence interval for level 8 was 70.97%, which represented a 3.50% improvement; the average confidence interval accuracy for levels 1-8 was 91.39%, representing a 1.00% increase; and the false negative rates at levels 8 and 9 were reduced by 2.42% and 6.76%, respectively. The system efficacy of the high-density SNP locus set for kinship inference of first to eighth degree relationships reached 98.91%. Through random reduction of the high-density SNP locus set results, it is found that increasing the number of SNPs with the panel, the detection efficiency of IBD length showed a significant upward trend. At the same time, the overall trend in the accuracy of kinship relationship prediction as well as the confidence interval accuracy also indicated that both metrics steadily increased with the addition of more loci. ConclusionThe results show that the high-density SNPs panel significantly enhances the efficacy of distant kinship inference, accurately covering kinship degrees, with the average confidence interval accuracy for first to eighth degree relationships stably above 90%. The study finds that increasing the number of SNPs panel can improve the ability to predict distant kinship.