Computational Discrimination of Breast Cancer for Korean Women Based on Epidemiologic Data Only.
10.3346/jkms.2015.30.8.1025
- Author:
Chiwon LEE
1
;
Jung Chan LEE
;
Boyoung PARK
;
Jonghee BAE
;
Min Hyuk LIM
;
Daehee KANG
;
Keun Young YOO
;
Sue K PARK
;
Youdan KIM
;
Sungwan KIM
Author Information
1. The Interdisciplinary Program for Bioengineering, Graduate School, Seoul National University, Seoul, Korea.
- Publication Type:Original Article ; Research Support, Non-U.S. Gov't
- Keywords:
Breast Neoplasms;
Support Vector Machines;
Neural Networks;
Computers
- MeSH:
Adult;
Aged;
Aged, 80 and over;
Breast Neoplasms/*diagnosis/*epidemiology;
Diagnosis, Computer-Assisted/*methods;
Early Detection of Cancer/*methods;
Female;
Humans;
*Machine Learning;
Middle Aged;
Pattern Recognition, Automated/methods;
Prevalence;
Reproducibility of Results;
Republic of Korea/epidemiology;
Risk Assessment/methods;
Risk Factors;
Sensitivity and Specificity;
Women's Health/*statistics & numerical data
- From:Journal of Korean Medical Science
2015;30(8):1025-1034
- CountryRepublic of Korea
- Language:English
-
Abstract:
Breast cancer is the second leading cancer for Korean women and its incidence rate has been increasing annually. If early diagnosis were implemented with epidemiologic data, the women could easily assess breast cancer risk using internet. National Cancer Institute in the United States has released a Web-based Breast Cancer Risk Assessment Tool based on Gail model. However, it is inapplicable directly to Korean women since breast cancer risk is dependent on race. Also, it shows low accuracy (58%-59%). In this study, breast cancer discrimination models for Korean women are developed using only epidemiological case-control data (n = 4,574). The models are configured by different classification techniques: support vector machine, artificial neural network, and Bayesian network. A 1,000-time repeated random sub-sampling validation is performed for diverse parameter conditions, respectively. The performance is evaluated and compared as an area under the receiver operating characteristic curve (AUC). According to age group and classification techniques, AUC, accuracy, sensitivity, specificity, and calculation time of all models were calculated and compared. Although the support vector machine took the longest calculation time, the highest classification performance has been achieved in the case of women older than 50 yr (AUC = 64%). The proposed model is dependent on demographic characteristics, reproductive factors, and lifestyle habits without using any clinical or genetic test. It is expected that the model could be implemented as a web-based discrimination tool for breast cancer. This tool can encourage potential breast cancer prone women to go the hospital for diagnostic tests.