Canregtools: a tool package for routine statistical analysis of Chinese population-based cancer registry data based on R language
10.3760/cma.j.cn112152-20241226-00592
- VernacularTitle:Canregtools:基于R语言的中国肿瘤登记常规统计分析工具包
- Author:
Qiong CHEN
1
;
Rongshou ZHENG
;
Shuzheng LIU
;
Hongwei LIU
;
Yin LIU
;
Ranran QIE
;
Shaokai ZHANG
Author Information
1. 郑州大学附属肿瘤医院(河南省肿瘤医院)疾病预防控制科 河南省肿瘤防控工程研究中心 河南省肿瘤预防国际联合实验室,郑州450008
- Publication Type:Journal Article
- Keywords:
Neoplasms;
Registration;
Software;
R package;
China
- From:
Chinese Journal of Oncology
2025;47(11):1074-1079
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To develop a tool package that meets the routine statistical analysis requirements of population-based cancer registries in China based on R language, with the aim of improving data quality and efficiency, and promoting the nationwide scientific utilization of cancer registry data.Methods:The functional demands for statistical analysis of population-based cancer registry staff were collected through questionnaires or face-to-face interviews. Based on the concept of generic functions in R software's S3 object system, functions were developed by defining specific S3 classes for different data types, allowing the same function to perform diverse tasks depending on the class of input data. A stepwise development strategy was adopted to ensure logical coherence among functional modules, and all functions were systematically tested and validated in accordance with standard R package development guidelines.Results:Six categories of functions, including data reading, data manipulation, data processing, statistical calculation, visualization, and statistical reporting, were developed to support routine statistical analysis of population-based cancer registry data. Data reading functions support reading data formats required by the National Cancer Registry. Data manipulation functions empower conditional filtering of registry data and support regrouping, merging, or transforming the data based on registry attributes (such as urban/rural location) to accommodate different analytical needs. Data processing functions includes age grouping, International Classification of Diseases 10 th Revision (ICD-10) classification, childhood cancer classification, and population estimation. Statistical calculation functions permit the calculation of age-standardized rates, truncated rates, cumulative rates, cumulative risks, life tables, and expansion from abridged to complete life tables. Visualization functions can generate commonly used statistical charts, including population pyramids, bar charts, and line graphs. Statistical reporting functions can integrate key indicators, charts, and narrative descriptions into comprehensive cancer registry reports. Conclusion:An R package named Canregtools was developed based on the concept of S3 generic functions. This package is free of charge, open-source, and highly efficient. It can meet the diversified needs in cancer registry data analysis, visualization, and reporting through standardized data processing workflows, thereby enhancing the quality and efficiency of routine statistical analysis in population-based cancer registries in China.