Using text mining to identify gap in acquired immunodeficiency syndrome related information dissemination between the official channel delivery and the needs of adolescents
10.3760/cma.j.cn112150-20190816-00663
- VernacularTitle:青少年艾滋病防治投放的核心知识宣传信息与“百度知道”文本挖掘词频对比分析
- Author:
Huichao WU
1
;
Wen SHU
;
Menglong LI
;
Ziang LI
;
Yifei HU
Author Information
1. 首都医科大学公共卫生学院,北京 100069
- Keywords:
Acquired immunodeficiency syndrome;
HIV;
Official information, education and communication information;
“Baidu zhidao” inquiry;
Text mining
- From:
Chinese Journal of Preventive Medicine
2020;54(6):685-690
- CountryChina
- Language:Chinese
-
Abstract:
Objective:The study intends to identify gap in HIV/AIDS awareness dissemination between the official channel delivery and the needs of adolescents.Methods:We crawled all the HIV/AIDS queries from “Baidu zhidao” till June 11st, 2018. “Baidu zhidao” inquiry and information form official public service announcement (abbreviated for “official delivery” hereafter) were the data source for comparative analysis. We categorized the text data into four kinds, “prevention”, “testing and treatment”, “symptoms and infection” and “legalization and policies” according to official categorization. Word segmentation was used for text mining and word frequency statistics, as well word cloud was used for word frequency visualization (all based on a comparison after removing the useless words).Results:Of the official delivery, the proportion of prevention category accounted for 32.3% ( n=162) (ranks 1 st), and the proportion of legalization and policies category was 14.1% ( n=71). While among the “Baidu zhidao” inquiry information, the proportion of testing and treatment category accounted for 51.7% ( n=51 264), and the proportion of prevention category accounted for 11.4% ( n=11 272). The frequencies of same terms/ repeated terms of two channels accounted for 60% (59.3%-63.9%) of each category among the official delivery, of which, the proportion of interest terms comparatively less and more diverse in “Baidu zhidao” inquiries. The proportion of the terms frequency of each category was about 45% in “prevention, testing and treatment”, 34.3% ( n=14 781) in “symptoms and infection” and 17.0% ( n=5 744) in “legalization and policies”, respectively. Conclusion:A big gap was identified between the available official source and inquiries’ term, especially word frequency discrepancy between “legalization and policies” and “prevention” categories. It underscore the necessity for the official channel to address the needs and interests of adolescents in the future.