Abstract:
Antimicrobial resistance (AMR) poses a growing global health threat, particularly
in the treatment of tuberculosis (TB) and its common co-infections such
as Klebsiella pneumoniae and Staphylococcus aureus. Traditional antimicrobial
susceptibility testing (AST), while accurate, is time-intensive and inaccessible in
many clinical settings. This study presents a genome-based machine learning
(ML) framework that integrates AMR gene and transposon detection to enhance
the prediction of drug resistance. Using whole-genome sequences (WGS) sourced
from NCBI, transposons were identified via TnComp finder and AMR genes via
ABRicate. Feature engineering focused on transposon-AMR gene co-occurrence,
and five supervised ML models—Logistic Regression, Random Forest, XGBoost,
AdaBoost, and Support Vector Machine—were trained and evaluated with and
without SMOTE oversampling. Model performance was assessed using accuracy,
precision, recall, AUC, and F2-score, with top-performing models achieving AUC
scores above 0.85 and F2-scores above 0.80 for several antibiotics. Explainability
was introduced through feature importance analysis, highlighting key AMR
gene-transposon interactions influencing resistance. A web application, Resist-
Gen, was developed to operationalize this pipeline, enabling users to input WGS
data in FASTA format and obtain rapid resistance predictions. This approach
underscores the significant role of transposons in AMR dissemination and offers a
scalable, interpretable, and clinically relevant tool for guiding antibiotic treatment
strategies.