Web-based Stroke Prediction Using Explainable Machine Learning

Ordoñez, Marie Ashley C.

DSpace Home
→
Department of Physical Sciences and Mathematics
→
BS Computer Science SP
→
View Item

Web-based Stroke Prediction Using Explainable Machine Learning

Ordoñez, Marie Ashley C.

URI: http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3135

Date: 2025-07

Abstract:

Stroke is a cerebrovascular disease caused by an infarction or hemorrhage in the brain, potentially leaving irreversible tissue damage, loss of neurons, and physiological damage. Furthermore, it is the third leading cause of death in the Philippines as of 2023, with a higher incidence in younger adults. Due to gaps in stroke care — shortage of neurologists, diagnostic machines, and stroke protocol — there is a need for a transparent decision support tool to diagnose stroke that is easily accessible and feasible for community-based programs. Only a few local tools use explainable AI (XAI) to predict stroke incidence based on modifiable and nonmodifiable factors, and most stroke prediction models do not have XAI. This study used various machine learning techniques to develop a classifier to predict stroke incidence to integrate into a web application. The models used were Random Forest (RF), Support Vector Machine (SVM), XGBoost (XGB), 1D Convolutional Neural Network (CNN), and EasyEnsemble Classifier (EEC). The missing values were imputed using KNN imputation and mode imputation, numerical variables were Z-scaled, categorical variables were one-hot encoded and ordinal encoded, and the study explored and compared various imbalance handling methods, namely Random Undersampling (RUS), SMOTE-NC, and SMOTE-RUS. Furthermore, hyperparameter tuning with 10-fold stratified cross-validation was used to attempt to improve model performance. Results showed that the EEC classifier with RUS and hybrid imputation was the best model, with 91.72% recall and 0.4797 AUCPR. Shapley Additive Explanations (SHAP) results show that age was the most important feature in the model, with stroke incidence increasing by at most 12% due to old age and followed by at most 2% due to higher average glucose level. Finally, the model was integrated into a web application with a Local Interpretable Model-agnostic Explanations (LIME) explainer to establish transparency between the model and its users by showing local feature importance for every prediction.

Show full item record