Abstract:
Stroke is a cerebrovascular disease caused by an infarction or hemorrhage in the
brain, potentially leaving irreversible tissue damage, loss of neurons, and physiological
damage. Furthermore, it is the third leading cause of death in the Philippines
as of 2023, with a higher incidence in younger adults. Due to gaps in stroke
care — shortage of neurologists, diagnostic machines, and stroke protocol — there
is a need for a transparent decision support tool to diagnose stroke that is easily
accessible and feasible for community-based programs. Only a few local tools
use explainable AI (XAI) to predict stroke incidence based on modifiable and nonmodifiable
factors, and most stroke prediction models do not have XAI. This study
used various machine learning techniques to develop a classifier to predict stroke incidence
to integrate into a web application. The models used were Random Forest
(RF), Support Vector Machine (SVM), XGBoost (XGB), 1D Convolutional Neural
Network (CNN), and EasyEnsemble Classifier (EEC). The missing values were
imputed using KNN imputation and mode imputation, numerical variables were
Z-scaled, categorical variables were one-hot encoded and ordinal encoded, and the
study explored and compared various imbalance handling methods, namely Random
Undersampling (RUS), SMOTE-NC, and SMOTE-RUS. Furthermore, hyperparameter
tuning with 10-fold stratified cross-validation was used to attempt
to improve model performance. Results showed that the EEC classifier with RUS
and hybrid imputation was the best model, with 91.72% recall and 0.4797 AUCPR.
Shapley Additive Explanations (SHAP) results show that age was the most
important feature in the model, with stroke incidence increasing by at most 12%
due to old age and followed by at most 2% due to higher average glucose level. Finally,
the model was integrated into a web application with a Local Interpretable
Model-agnostic Explanations (LIME) explainer to establish transparency between
the model and its users by showing local feature importance for every prediction.