1.Similar Drug Proposals Based on Package Inserts Using Latent Semantic Analysis
Misa KIKUCHI ; Rie ITO ; Yuta TANAKA ; Yohsuke SHIMADA ; Satoru GOTO ; Rie OZEKI ; Masayo KOMODA
Japanese Journal of Drug Informatics 2018;20(2):111-119
Objective:The topic model is a well-known method used in the field of natural language processing (NLP)that defines adocument as constructed of topics that combine specific t erms. This method is used to model topic co-occurrencemathematically. In this study,we extracted topics from featu re vectors of explicit documents called medical package insertsby using cluster analysis. Methods:We counted the terms(nouns)recognized by the morphological analysis engine MeCab and created a documentterm matrix. A value of“tf・idf”was calculated in this matrix for term weighting to avoid the effect of term frequency. We reduced the dimensionality of the matrix using singular v alue decomposition,which removed unnecessary data,and weextracted feature vectors attributed to each medical package insert. The distance between feature vectors was calculatedusing cosine distance,and cluster analysis was performed based on the distance between the vectors.Results:Cluster analysis on our document-term matrix show ed that medical package inserts of drugs that have the sameefficacy or active ingredient were included in the same cl uster. Moreover, using term weighting and dimensionalityreduction,we could extract topics from medical package inserts.Conclusion:We obtained a foothold to apply our findings t o the recommendation of similar drugs. Cluster analysis ofmedical package inserts using NLP can contribute to the pro per application of drugs. In addition,our study revealed thesimilarities of drugs and suggested possibilities for new applications from several points of view.