Development and validation of colorectal cancer risk prediction model based on the big data in laboratory medicine
10.3760/cma.j.cn114452-20210630-00408
- VernacularTitle:基于检验大数据的结直肠癌风险预测模型建立与验证
- Author:
Jie GUO
1
;
Haidong LIU
;
Qin WEI
;
Zehui CHEN
;
Jianying WANG
;
Fan YANG
;
Shanrong LIU
Author Information
1. 海军军医大学第一附属医院实验诊断科,上海200433
- Keywords:
Colorectal cancer;
Machine learning;
Laboratory diagnosis;
Big data;
Risk prediction
- From:
Chinese Journal of Laboratory Medicine
2021;44(10):914-920
- CountryChina
- Language:Chinese
-
Abstract:
Objective:We aimed to explore a colorectal cancer risk prediction model through machine learning algorithm based on the big data in laboratory medicine.Methods:According to the labeling of colonoscopy combined with pathology or referring to the ICD-10 code, the colonoscopy patients in Shanghai Changhai Hospital from 2013.1.1 to 2019.6.30 and the outpatients and inpatients from 2010.1.1 to 2019.6.30 were divided into colorectal cancer groups and non-colorectal cancer group. Four machine learning algorithms, Extreme gradient boosting(Xgboost),Artificial Neural Network(ANN),Support Vector Machine(SVM),Random Forest(RF), are used to mine all routine laboratory test item data of the enrolled patients, select model features and establish a classification model for colorectal cancer. And the effectiveness of the model was prospectively verified in patients in the whole hospital of Changhai Hospital from 2019.7.1 to 2020.8.31.Result:A colorectal cancer risk prediction model (CRC-Lab7) including 7 characteristics of fecal occult blood, carcinoembryonic antigen, red blood cell distribution width, lymphocyte count, albumin/globulin, high-density lipoprotein cholesterol and hepatitis B virus core antibody was constructed by the XgBoost algorithm. The AUC of the model in the validation set and prospective validation set were 0.799 and 0.816, respectively, which was significantly higher than that of fecal occult blood (AUC was 0.68 and 0.706, respectively). It also has high diagnostic accuracy for colorectal cancer with negative fecal occult blood or under 50 years old.Conclusion:In this study, a colorectal cancer risk prediction model was established by mining routine laboratory big data. The model′s performance is better than fecal occult blood, and it has high diagnostic accuracy for colorectal cancer in patients with negative fecal occult blood and younger than 50 years old.