Distilled BERT models in automated ICD-10 coding

Simangan, Rencio Noel Q.

DSpace Home
→
Department of Physical Sciences and Mathematics
→
BS Computer Science SP
→
View Item

Distilled BERT models in automated ICD-10 coding

Simangan, Rencio Noel Q.

URI: http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3143

Date: 2025-06

Abstract:

Accurate and efficient extraction of ICD-10 codes from electronic medical records (EMRs) remains a critical task for automating clinical documentation and supporting healthcare analytics. However, the large size and computational demands of pre-trained language models (PLMs) pose challenges for deployment in realworld and resource-constrained settings. This study investigates the effectiveness of distilled BERT-based models—specifically CompactBioBERT, DistilBioBERT, Roberta-PM-distill, TinyBioBERT, and Bio-MobileBERT—for ICD-10 code prediction using the PLM-ICD framework on the MIMIC-IV dataset. Evaluation metrics including Micro AUC, Micro Precision, Micro F1, and Precision at K were used to assess model performance. Among the models tested, Roberta-PM-distill achieved the best results with a Micro AUC of 97.91% and a Micro F1 score of 46.15% in addition to maintaining strong performance in P@K metrics. While lower, performance proves comparable to similar studies, providing basis for the viability of distilled models for for scalable and efficient ICD code prediction. A web application was developed to deploy the best-performing model for practical use.

Show full item record

Files in this item

Name: 2025_Simangan ...

Size: 2.826Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

BS Computer Science SP
Special Project documents of BS Computer Science students

Distilled BERT models in automated ICD-10 coding

Distilled BERT models in automated ICD-10 coding

Abstract:

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account