Rapid identification of geographic origins of Zingiberis Rhizoma by NIRS combined with chemometrics and machine learning algorithms.
10.19540/j.cnki.cjcmm.20220514.103
- Author:
Dai-Xin YU
1
;
Sheng GUO
1
;
Xia ZHANG
2
;
Hui YAN
1
;
Zhen-Yu ZHANG
1
;
Hai-Yang LI
1
;
Jian YANG
3
;
Jin-Ao DUAN
1
Author Information
1. National and Local Collaborative Engineering Center of Chinese Medicinal Resources Industrialization and Formulae Innovative Medicine/Jiangsu Collaborative Innovation Center of Chinese Medicinal Resources Industrialization/Jiangsu Key Laboratory for High Technology Research of Traditional Chinese Medicine Formulae, Nanjing University of Chinese Medicine Nanjing 210023, China.
2. College of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine Nanjing 210023, China.
3. State Key Laboratory Breeding Base of Dao-di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences Beijing 100700, China.
- Publication Type:Journal Article
- Keywords:
NIRS;
Zingiberis Rhizoma;
chemometrics;
machine learning algorithms;
traceability
- MeSH:
Algorithms;
Chemometrics;
China;
Ginger;
Least-Squares Analysis;
Plant Extracts;
Principal Component Analysis;
Support Vector Machine
- From:
China Journal of Chinese Materia Medica
2022;47(17):4583-4592
- CountryChina
- Language:Chinese
-
Abstract:
In this study, 280 batches of Zingiberis Rhizoma samples from nine producing areas were analyzed to obtain infrared spectral information based on near-infrared spectroscopy(NIRS). Pluralistic chemometrics such as principal component analysis(PCA), partial least squares-discriminant analysis(PLS-DA), orthogonal partial least squares-discriminant analysis(OPLS-DA), K-nearest neighbors(KNN), support vector machine(SVM), random forest(RF), artificial neural network(ANN), and gradient boosting(GB) were applied for tracing of origins. The results showed that the discriminative accuracy of the spectral preprocessing by standard normal variate transformation coupled with the first derivative was 93.9%, which could be used for the construction of the discrimination model. PCA and PLS-DA score plots showed that samples from Shandong, Sichuan, Yunnan, and Guizhou could be effectively distinguished, but the remaining samples were partially overlapped. As revealed by the analysis results by machine learning algorithms, the AUC values of KNN, SVM, RF, ANN, and GB algorithms were 0.96, 0.99, 0.99, 0.99, and 0.98, respectively, with overall prediction accuracies of 83.3%, 89.3%, 90.5%, 91.7%, and 89.3%. It indicated that the developed model was reliable and the machine learning algorithm combined with NIRS for origin identification was sufficiently feasible. OPLS-DA showed that Zingiberis Rhizoma from Sichuan(genuine producing areas) could be significantly distinguished from other regions, with good discriminative accuracy, suggesting that the NIRS established in this study combined with chemometrics can be used for the identification of Zingiberis Rhizoma from Sichuan. This study established a rapid and nondestructive identification and reliable data analysis method for origin identification of Zingiberis Rhizoma, which is expected to provide a new idea for the origin tracing of Chinese medicinal materials.