1.Unsupervised Machine Learning to Identify Depressive Subtypes
Benson KUNG ; Maurice CHIANG ; Gayan PERERA ; Megan PRITCHARD ; Robert STEWART
Healthcare Informatics Research 2022;28(3):256-266
Objectives:
This study evaluated an unsupervised machine learning method, latent Dirichlet allocation (LDA), as a method for identifying subtypes of depression within symptom data.
Methods:
Data from 18,314 depressed patients were used to create LDA models. The outcomes included future emergency presentations, crisis events, and behavioral problems. One model was chosen for further analysis based upon its potential as a clinically meaningful construct. The associations between patient groups created with the final LDA model and outcomes were tested. These steps were repeated with a commonly-used latent variable model to provide additional context to the LDA results.
Results:
Five subtypes were identified using the final LDA model. Prior to the outcome analysis, the subtypes were labeled based upon the symptom distributions they produced: psychotic, severe, mild, agitated, and anergic-apathetic. The patient groups largely aligned with the outcome data. For example, the psychotic and severe subgroups were more likely to have emergency presentations (odds ratio [OR] = 1.29; 95% confidence interval [CI], 1.17–1.43 and OR = 1.16; 95% CI, 1.05–1.29, respectively), whereas these outcomes were less likely in the mild subgroup (OR = 0.86; 95% CI, 0.78–0.94). We found that the LDA subtypes were characterized by clusters of unique symptoms. This contrasted with the latent variable model subtypes, which were largely stratified by severity.
Conclusions
This study suggests that LDA can surface clinically meaningful, qualitative subtypes. Future work could be incorporated into studies concerning the biological bases of depression, thereby contributing to the development of new psychiatric therapeutics.