Development of features analysis workflow for Klebsiella pneumoniae based on clinical metagenomics next generation sequencing data
10.3760/cma.j.cn114452-20250608-00333
- VernacularTitle:基于临床宏基因组测序数据的肺炎克雷伯菌特征分析流程的搭建与应用
- Author:
Shuyi WANG
1
;
Qi WANG
;
Yuyao YIN
;
Yifan GUO
;
Shuai MA
;
Guankun YIN
;
Hui WANG
Author Information
1. 北京大学医学部医学技术研究院,北京100191
- Publication Type:Journal Article
- Keywords:
Klebsiella pneumoniae;
Metagenomics;
Antimicrobial drug resistance;
Genotype
- From:
Chinese Journal of Laboratory Medicine
2025;48(9):1149-1157
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To identify strain-specific features of Klebsiella pneumoniae by analyzing metagenomics next generation sequencing (mNGS) data, thereby expanding the downstream applications of mNGS. Methods:The sequences of K.pneumoniae strains were organized from both the self-built database of the long-term multi-center research cohort in China established by the Peking University People′s Hospital from 2009 to 2020 (with 2 345 sequences) and the public databases (with 19 648 sequences). The existing large-scale databases were compressed, and a set of strains representative of clonal groups were screened. A strain genome information library was constructed based on k-mer features, and the most matching representative sequences in the database were searched for the raw mNGS data. The search results of the self-built library and public library were merged and optimized to update the prediction of antimicrobial-resistance characteristics and avoid the impact of uneven data distribution on the results. A total of 314 clinical samples from patients with K.pneumoniae detected by mNGS in the Clinical Microbiology Laboratory of Peking University People′s Hospital from 2022 to 2024 were retrospectively collected, and 101 samples with positive clinical culture results were selected to validate the prediction results. The antimicrobial-resistance phenotypes were verified by clinical antimicrobial susceptibility test results. Whole-genome sequencing was performed on the culture strains of 14 samples randomly selected using random numbers to verify the genotypes. Single nucleotide polymorphism distance analysis was used to verify the occurrence of outbreak events. The χ2 test and Mann-Whitney U test were used for statistical analysis. Results:A representative strain sequence k-mer feature library containing self-built and public sub-libraries was constructed. The library construction required only about 1 hour with <3 GB storage, with a high compression ratio and low update cost. Using k-mer-based analysis, mNGS data achieved precise strain characterization within 4 minutes and and <5 GB memory occupation. There was a significant difference in the antimicrobial-resistance rates to more than half of the antibiotics between the self-built database (90.8%, 2 130/2 345) and the public database (22.7%, 4 457/19 648) ( χ2=4 634.1, P<0.001). After optimizing the search results, the mean category agreement, sensitivity, and specificity of the prediction for eight antibiotics reached 84.8% (323/381), 78.9% (131/166), and 91.2% (196/215), respectively. The target genotypes were successfully detected in 10 out of 12 samples, and two outbreak events (2 samples per event) were successfully identified. Conclusions:An independent analysis process adapted to the needs of identifying the features of K. pneumoniae strains in mNGS data was developed. This process requires minimal computational resources and processing time and can directly achieve the simultaneous analysis of the antimicrobial-resistance phenotypes of K. pneumoniae at the strain level and their corresponding genomic characteristic profiles based on the raw mNGS reads.