A Variable Selection Method Based on Mayfly Algorithm for Near-infrared Spectroscopy

Ruo-Xin WANG; Guang-He YAN; Peng LIU; Yan ZHANG; Xi-Hui BIAN

Return

A Variable Selection Method Based on Mayfly Algorithm for Near-infrared Spectroscopy

VernacularTitle:基于蜉蝣算法的近红外光谱变量选择方法研究
Author: Ruo-Xin WANG ¹ ; Guang-He YAN ; Peng LIU ; Yan ZHANG ; Xi-Hui BIAN
Author Information

1. 天津工业大学,天津市绿色化工过程工程重点实验室,天津300387
Keywords: Near-infrared spectroscopy; Variable selection; Mayfly algorithm; Partial least squares; Swarm intelligence optimization
From: Chinese Journal of Analytical Chemistry 2024;52(11):1717-1725
CountryChina
Language:Chinese
Abstract: Near-infrared (NIR) spectroscopy has become a widely used analytical technique for qualitative and quantitative analysis of complex systems due to its advantages such as simplicity,rapidity,and non-destruction. However,NIR spctoscopy often contains numerous redundant wavelengths that are not correlated with the target components,which will reduce the prediction accuracy of model. Therefore,it is necessary to select spectral variables before modeling. In this research,discretized mayfly algorithm (MA) was first developed for quantitative analysis of NIR spectroscopy. The MA simulated the courtship and mating behavior of mayflies. Initially,same number of male and female mayflies was set. The positions of mayflies were updated and discretized. Mayflies produced 20 offsprings through mating and mutation. These offsprings were added to the initial number of search agents. To evaluate the performance of the MA,NIR data of corn and adulterated vegetable oils were used for partial least squares (PLS) modeling analysis. The influence of gravity coefficient,iteration numbers and population numbers of MA were investigated. The MA-PLS was compared with the full-spectrum PLS model. Results showed that the root mean square error of prediction (RMSEP) of MA-PLS model for prediction of oil,moisture,protein and starch contents in corn dataset decreased by 30.59％,40.24％,36.96％and 27.93％ compared with PLS,and the RMSEP of MA-PLS for prediction of perilla seed oil,soybean oil,corn oil and cottonseed oil in adulterated vegetable oil dataset decreased by 83.85％,90.90％,81.60％ and 92.18％ compared with PLS. In addition,the number of variables used in MA-PLS was also less than PLS. Therefore,MA could effectively reduce the complexity of PLS and improve the accuracy of prediction of PLS.