1.Data mining in traditional Chinese medicine product quality review.
Sheng ZHANG ; Hou-Liu CHEN ; Hai-Bin QU
China Journal of Chinese Materia Medica 2023;48(5):1264-1272
The traditional Chinese medicine(TCM) enterprises have accumulated a large amount of product quality review(PQR) data. Mining these data can reveal the hidden knowledge in production and helps improve pharmaceutical manufacturing technology. However, there are few studies involving the mining of PQR data and thus enterprises lack the guidance to analyze the data. This study proposed a method to mine the PQR data, which consisted of 4 functional modules: data collection and preprocessing, risk classification of variables, risk evaluation by batches, and the regression analysis of quality. Further, we carried out a case study of the formulation process of a TCM product to illustrate the method. In the case study, the data of 398 batches of products during 2019-2021 were collected, which contained 65 process variables. The risks of variables were classified according to the process performance index. The risk of each batch was analyzed through short-term and long-term evaluation, and the critical variables with the strongest impact on the product quality were identified by partial least square regression. The results showed that 1 variable and 13 batches were of high risk, and the critical process variable was the quality of the intermediates. The proposed method enables enterprises to comprehensively mine the PQR data and helps to enhance the process understanding and improve the quality control.
Medicine, Chinese Traditional
;
Drugs, Chinese Herbal
;
Data Mining/methods*
;
Quality Control
;
Technology, Pharmaceutical
2.Examining patterns of traditional Chinese medicine use in pediatric oncology: A systematic review, meta-analysis and data-mining study.
Chun Sing LAM ; Li Wen PENG ; Lok Sum YANG ; Ho Wing Janessa CHOU ; Chi-Kong LI ; Zhong ZUO ; Ho-Kee KOON ; Yin Ting CHEUNG
Journal of Integrative Medicine 2022;20(5):402-415
BACKGROUND:
Traditional Chinese medicine (TCM) is becoming a popular complementary approach in pediatric oncology. However, few or no meta-analyses have focused on clinical studies of the use of TCM in pediatric oncology.
OBJECTIVE:
We explored the patterns of TCM use and its efficacy in children with cancer, using a systematic review, meta-analysis and data mining study.
SEARCH STRATEGY:
We conducted a search of five English (Allied and Complementary Medicine Database, Embase, PubMed, Cochrane Central Register of Controlled Trials, and ClinicalTrials.gov) and four Chinese databases (Wanfang Data, China National Knowledge Infrastructure, Chinese Biomedical Literature Database, and VIP Chinese Science and Technology Periodicals Database) for clinical studies published before October 2021, using keywords related to "pediatric," "cancer," and "TCM."
INCLUSION CRITERIA:
We included studies which were randomized controlled trials (RCTs) or observational clinical studies, focused on patients aged < 19 years old who had been diagnosed with cancer, and included at least one group of subjects receiving TCM treatment.
DATA EXTRACTION AND ANALYSIS:
The methodological quality of RCTs and observational studies was assessed using the six-item Jadad scale and the Effective Public Healthcare Panacea Project Quality Assessment Tool, respectively. Meta-analysis was used to evaluate the efficacy of combining TCM with chemotherapy. Study outcomes included the treatment response rate and occurrence of cancer-related symptoms. Association rule mining (ARM) was used to investigate the associations among medicinal herbs and patient symptoms.
RESULTS:
The 54 studies included in this analysis were comprised of RCTs (63.0%) and observational studies (37.0%). Most RCTs focused on hematological malignancies (41.2%). The study outcomes included chemotherapy-induced toxicities (76.5%), infection rate (35.3%), and response, survival or relapse rate (23.5%). The methodological quality of most of the RCTs (82.4%) and observational studies (80.0%) was rated as "moderate." In studies of leukemia patients, adding TCM to conventional treatment significantly improved the clinical response rate (odds ratio [OR] = 2.55; 95% confidence interval [CI] = 1.49-4.36), lowered infection rate (OR = 0.23; 95% CI = 0.13-0.40), and reduced nausea and vomiting (OR = 0.13; 95% CI = 0.08-0.23). ARM showed that Radix Astragali, the most commonly used medicinal herb (58.0%), was associated with treating myelosuppression, gastrointestinal complications, and infection.
CONCLUSION
There is growing evidence that TCM is an effective adjuvant therapy for children with cancer. We proposed a checklist to improve the quality of TCM trials in pediatric oncology. Future work will examine the use of ARM techniques on real-world data to evaluate the efficacy of medicinal herbs and drug-herb interactions in children receiving TCM as a part of integrated cancer therapy.
Adult
;
Child
;
China
;
Combined Modality Therapy
;
Complementary Therapies
;
Data Mining
;
Drugs, Chinese Herbal/therapeutic use*
;
Humans
;
Medicine, Chinese Traditional/methods*
;
Observational Studies as Topic
;
Randomized Controlled Trials as Topic
;
Young Adult
3.Screening and identification of key genes ATP1B3 and ENAH in the progression of hepatocellular carcinoma: based on data mining and clinical validation.
Xue Jia YANG ; Yu Jie LI ; Deng Qiang WU ; Yi Li MA ; Su Fang ZHOU
Journal of Southern Medical University 2022;42(6):815-823
OBJECTIVE:
To explore the marker genes correlated with the prognosis, progression and clinical diagnosis of hepatocellular carcinoma (HCC) based on bioinformatics methods.
METHODS:
The TCGA-LIHC, GSE84432, GSE143233 and GSE63898 datasets from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) were analyzed. The differentially expressed genes (DEGs) shared by different disease types were obtained using GEO2R and edge R packages, and Gene Ontology (GO) and Kyoto Gene and Genome Encyclopedia (KEGG) enrichment analyses of the DEGs were performed. The expression levels of these DEGs in normal and cancerous tissues were verified in TCGA-LIHC to identify the upregulated genes in HCC. Survival analysis, receiver-operating characteristic (ROC) curve analysis, and correlation analysis between the key genes and the clinical features of the patients were carried out using the R language. The differential expressions of 15 key genes were verified in clinical samples of HCC and adjacent tissues using RT-qPCR.
RESULTS:
A total of 118 common DEGs were obtained in the database, and among them two genes, namely ATPase Na +/K + transport subunit beta 3 (ATP1B3) and actin regulator (ENAH), showed increased expressions with disease progression. Survival analysis combined with the TCGA-LIHC dataset suggested that high expressions of ATP1B3 and ENAH were both significantly correlated with a poor prognosis of HCC patients (P < 0.05), and their AUC values were 0.821 and 0.933, respectively. A high expression of ATP1B3 was correlated with T stage, pathological stage and pathological grade of the tumors (P < 0.05), while that of ENAH was associated only with an advanced tumor grade (P < 0.05). The results of RT-qPCR showed that ATP1B3 and ENAH were both significantly upregulated in clinical HCC tissues (P < 0.05).
CONCLUSION
ATPIB3 and ENAH are both upregulated in HCC, and their high expressions may serve as biomarkers of progression of liver diseases and a poor prognosis of HCC.
Carcinoma, Hepatocellular/pathology*
;
Data Mining
;
Gene Expression Profiling/methods*
;
Gene Expression Regulation, Neoplastic
;
Humans
;
Liver Neoplasms/pathology*
;
Microfilament Proteins/metabolism*
;
Sodium-Potassium-Exchanging ATPase/metabolism*
4.OryzaGP: rice gene and protein dataset for named-entity recognition
Pierre LARMANDE ; Huy DO ; Yue WANG
Genomics & Informatics 2019;17(2):e17-
Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.
Benchmarking
;
Biology
;
Data Mining
;
Dataset
;
Machine Learning
;
Methods
;
Molecular Biology
;
Natural Language Processing
;
Oryza
;
Plants
5.Health Information Technology Trends in Social Media: Using Twitter Data
Jisan LEE ; Jeongeun KIM ; Yeong Joo HONG ; Meihua PIAO ; Ahjung BYUN ; Healim SONG ; Hyeong Suk LEE
Healthcare Informatics Research 2019;25(2):99-105
OBJECTIVES: This study analyzed the health technology trends and sentiments of users using Twitter data in an attempt to examine the public's opinions and identify their needs. METHODS: Twitter data related to health technology, from January 2010 to October 2016, were collected. An ontology related to health technology was developed. Frequently occurring keywords were analyzed and visualized with the word cloud technique. The keywords were then reclassified and analyzed using the developed ontology and sentiment dictionary. Python and the R program were used for crawling, natural language processing, and sentiment analysis. RESULTS: In the developed ontology, the keywords are divided into ‘health technology‘ and ‘health information‘. Under health technology, there are are six subcategories, namely, health technology, wearable technology, biotechnology, mobile health, medical technology, and telemedicine. Under health information, there are four subcategories, namely, health information, privacy, clinical informatics, and consumer health informatics. The number of tweets about health technology has consistently increased since 2010; the number of posts in 2014 was double that in 2010, which was about 150 thousand posts. Posts about mHealth accounted for the majority, and the dominant words were ‘care‘, ‘new‘, ‘mental‘, and ‘fitness‘. Sentiment analysis by subcategory showed that most of the posts in nearly all subcategories had a positive tone with a positive score. CONCLUSIONS: Interests in mHealth have risen recently, and consequently, posts about mHealth were the most frequent. Examining social media users' responses to new health technology can be a useful method to understand the trends in rapidly evolving fields.
Biomedical Technology
;
Biotechnology
;
Boidae
;
Data Mining
;
Informatics
;
Medical Informatics
;
Methods
;
Natural Language Processing
;
Privacy
;
Public Opinion
;
Social Media
;
Telemedicine
6.Classification of Common Relationships Based on Short Tandem Repeat Profiles Using Data Mining
Su Jin JEONG ; Hyo Jung LEE ; Soong Deok LEE ; Seung Hwan LEE ; Su Jeong PARK ; Jong Sik KIM ; Jae Won LEE
Korean Journal of Legal Medicine 2019;43(3):97-105
We reviewed past studies on the identification of familial relationships using 22 short tandem repeat markers. As a result, we can obtain a high discrimination power and a relatively accurate cut-off value in parent-child and full sibling relationships. However, in the case of pairs of uncle-nephew or cousin, we found a limit of low discrimination power of the likelihood ratio (LR) method. Therefore, we compare the LR ranking method and data mining techniques (e.g., logistic regression, linear discriminant analysis, diagonal linear discriminant analysis, diagonal quadratic discriminant analysis, K-nearest neighbor, classification and regression trees, support vector machines, random forest [RF], and penalized multivariate analysis) that can be applied to identify familial relationships, and provide a guideline for choosing the most appropriate model under a given situation. RF, one of the data mining techniques, was found to be more accurate than other methods. The accuracy of RF is 99.99% for parent-child, 99.44% for full siblings, 90.34% for uncle-nephew, and 79.69% for first cousins.
Classification
;
Data Mining
;
Discrimination (Psychology)
;
Forests
;
Humans
;
Logistic Models
;
Methods
;
Microsatellite Repeats
;
Siblings
;
Support Vector Machine
;
Trees
7.Identification and Validation of Circulating MicroRNA Signatures for Breast Cancer Early Detection Based on Large Scale Tissue-Derived Data.
Xiaokang YU ; Jinsheng LIANG ; Jiarui XU ; Xingsong LI ; Shan XING ; Huilan LI ; Wanli LIU ; Dongdong LIU ; Jianhua XU ; Lizhen HUANG ; Hongli DU
Journal of Breast Cancer 2018;21(4):363-370
PURPOSE: Breast cancer is the most commonly occurring cancer among women worldwide, and therefore, improved approaches for its early detection are urgently needed. As microRNAs (miRNAs) are increasingly recognized as critical regulators in tumorigenesis and possess excellent stability in plasma, this study focused on using miRNAs to develop a method for identifying noninvasive biomarkers. METHODS: To discover critical candidates, differential expression analysis was performed on tissue-originated miRNA profiles of 409 early breast cancer patients and 87 healthy controls from The Cancer Genome Atlas database. We selected candidates from the differentially expressed miRNAs and then evaluated every possible molecular signature formed by the candidates. The best signature was validated in independent serum samples from 113 early breast cancer patients and 47 healthy controls using reverse transcription quantitative real-time polymerase chain reaction. RESULTS: The miRNA candidates in our method were revealed to be associated with breast cancer according to previous studies and showed potential as useful biomarkers. When validated in independent serum samples, the area under curve of the final miRNA signature (miR-21-3p, miR-21-5p, and miR-99a-5p) was 0.895. Diagnostic sensitivity and specificity were 97.9% and 73.5%, respectively. CONCLUSION: The present study established a novel and effective method to identify biomarkers for early breast cancer. And the method, is also suitable for other cancer types. Furthermore, a combination of three miRNAs was identified as a prospective biomarker for breast cancer early detection.
Area Under Curve
;
Biomarkers
;
Biomarkers, Tumor
;
Breast Neoplasms*
;
Breast*
;
Carcinogenesis
;
Data Mining
;
Early Detection of Cancer
;
Female
;
Genome
;
Humans
;
Methods
;
MicroRNAs*
;
Plasma
;
Prospective Studies
;
Real-Time Polymerase Chain Reaction
;
Reverse Transcription
;
Sensitivity and Specificity
8.A Study of the Trends in Korean Nursing Research on Critical Care in the Last 10 Years (2008–2017) Using Integrated Review and Key Word Analysis
Jiyeon KANG ; Soo Gyeong KIM ; Young Shin CHO ; Hyunyoung KO ; Ji Hyun BACK ; Su Jin LEE
Journal of Korean Critical Care Nursing 2018;11(2):75-85
PURPOSE: The purpose of this study was to examine the possible direction of critical care nursing research in the future by analyzing the trends of recent Korean studies.METHOD: Using a database search, we selected 263 articles on critical care nursing that were published in Korean journals between 2008 and 2017. Then, we conducted an integrative review of the contents of the selected articles and analyzed the English abstracts using the relevant packages and functions of the R program.RESULTS: The number of studies concerning critical care nursing has increased over the 10-year period, and the specific topic of each study has diversified according to the time at which it was conducted. In terms of quality, the majority of the research was published in high-level academic journals. The key words regularly studied over the past decade were: knowledge, delirium, education, restraint, stress, and infection. Studies related to vancomycin-resistant enterococci infection, compliance, and standards have decreased, while studies related to death, communication, and safety have increased.CONCLUSION: Randomized controlled trials and protocol research for evidence-based critical care need to be conducted, as does research on family involvement. The key word analysis of unstructured text used in this study is a relatively new method; it is suggested that this method be applied to various critical care nursing research and develop it methodologically.
Compliance
;
Critical Care Nursing
;
Critical Care
;
Data Mining
;
Delirium
;
Education
;
Humans
;
Korea
;
Methods
;
Nursing Research
;
Nursing
;
Vancomycin-Resistant Enterococci
9.Systematic Review of Data Mining Applications in Patient-Centered Mobile-Based Information Systems.
Mina FALLAH ; Sharareh R NIAKAN KALHORI
Healthcare Informatics Research 2017;23(4):262-270
OBJECTIVES: Smartphones represent a promising technology for patient-centered healthcare. It is claimed that data mining techniques have improved mobile apps to address patients’ needs at subgroup and individual levels. This study reviewed the current literature regarding data mining applications in patient-centered mobile-based information systems. METHODS: We systematically searched PubMed, Scopus, and Web of Science for original studies reported from 2014 to 2016. After screening 226 records at the title/abstract level, the full texts of 92 relevant papers were retrieved and checked against inclusion criteria. Finally, 30 papers were included in this study and reviewed. RESULTS: Data mining techniques have been reported in development of mobile health apps for three main purposes: data analysis for follow-up and monitoring, early diagnosis and detection for screening purpose, classification/prediction of outcomes, and risk calculation (n = 27); data collection (n = 3); and provision of recommendations (n = 2). The most accurate and frequently applied data mining method was support vector machine; however, decision tree has shown superior performance to enhance mobile apps applied for patients’ self-management. CONCLUSIONS: Embedded data-mining-based feature in mobile apps, such as case detection, prediction/classification, risk estimation, or collection of patient data, particularly during self-management, would save, apply, and analyze patient data during and after care. More intelligent methods, such as artificial neural networks, fuzzy logic, and genetic algorithms, and even the hybrid methods may result in more patients-centered recommendations, providing education, guidance, alerts, and awareness of personalized output.
Artificial Intelligence
;
Data Collection
;
Data Mining*
;
Decision Trees
;
Delivery of Health Care
;
Early Diagnosis
;
Education
;
Follow-Up Studies
;
Fuzzy Logic
;
Humans
;
Information Systems*
;
Mass Screening
;
Methods
;
Mobile Applications
;
Patient Care
;
Self Care
;
Smartphone
;
Statistics as Topic
;
Support Vector Machine
;
Telemedicine
10.Hierarchical Genetic Algorithm and Fuzzy Radial Basis Function Networks for Factors Influencing Hospital Length of Stay Outliers.
Ahmed BELDERRAR ; Abdeldjebar HAZZAB
Healthcare Informatics Research 2017;23(3):226-232
OBJECTIVES: Controlling hospital high length of stay outliers can provide significant benefits to hospital management resources and lead to cost reduction. The strongest predictive factors influencing high length of stay outliers should be identified to build a high-performance prediction model for hospital outliers. METHODS: We highlight the application of the hierarchical genetic algorithm to provide the main predictive factors and to define the optimal structure of the prediction model fuzzy radial basis function neural network. To establish the prediction model, we used a data set of 26,897 admissions from five different intensive care units with discharges between 2001 and 2012. We selected and analyzed the high length of stay outliers using the trimming method geometric mean plus two standard deviations. A total of 28 predictive factors were extracted from the collected data set and investigated. RESULTS: High length of stay outliers comprised 5.07% of the collected data set. The results indicate that the prediction model can provide effective forecasting. We found 10 common predictive factors within the studied intensive care units. The obtained main predictive factors include patient demographic characteristics, hospital characteristics, medical events, and comorbidities. CONCLUSIONS: The main initial predictive factors available at the time of admission are useful in evaluating high length of stay outliers. The proposed approach can provide a practical tool for healthcare providers, and its application can be extended to other hospital predictions, such as readmissions and cost.
Comorbidity
;
Data Mining
;
Dataset
;
Forecasting
;
Health Personnel
;
Humans
;
Intensive Care Units
;
Length of Stay*
;
Machine Learning
;
Medical Informatics
;
Methods

Result Analysis
Print
Save
E-mail