dc.description.abstract |
Hepatitis C continues to be a major health concern in the Philippines, yet machine
learning based decision support tools with Explainable AI remain limited
in number. This study aims to develop an interpretable, web-based binary classification
model for HCV prediction using biochemical markers from a publicly
available dataset from UCI. Data preprocessing involved removal of insignificant
attributes, encoding, normalization, and handling of missing values using KNN
imputation. Class imbalance was addressed using SMOTE, and five supervised
machine learning algorithms—K-Nearest Neighbors, Random Forest, Logistic Regression,
Support Vector Machine, and Extreme Gradient Boosting—were evaluated
using GridSearchCV with 10-fold cross validation. Among all model configurations,
the Random Forest model trained with SMOTE, no imputation, and
no hyperparamter tuning achieved perfect performance (100% recall, accuracy,
precision, and F1 score), and was implemented in a functional web application.
Explainability was provided through SHAP and LIME. SHAP revealed AST, ALT,
and BIL as the most influential features, aligning with domain knowledge on liver
enzyme activity in HCV patients. LIME explanations further supported model
transparency at the individual prediction level. This study not only demonstrates
the viability of interpretable machine learning for HCV prediction but also contributes
a usable web application that may aid in early disease detection and
patient education. |
en_US |