Machine learning for early prediction of clinical deterioration in emergency settings: A systematic review of retrospective cohort study

Kanayo Kizito Uka; Chukwu Alphonsus Chekwube; Isaac Amuzie Ene; Amujiogu Ikechukwu Peter; Philip Omede Alexander; Onia Orinate Peters

doi:10.53730/ijhs.v10nS1.15935

Authors

Kanayo Kizito Uka Imo State University, Owerri, Nigeria
Chukwu Alphonsus Chekwube Darent Valley Hospital, Dartford, England, United Kingdom
Isaac Amuzie Ene Okparavero Memorial Hospital, Nigeria
Amujiogu Ikechukwu Peter Enugu State University of Technology Teaching Hospital Parklane Enugu, Nigeria
Philip Omede Alexander Southend University Hospital, Essex, England, United Kingdo
Onia Orinate Peters
orinatepeters3@gmail.com
Imo State University, Nigeria

Keywords:

Machine learning, clinical deterioration, emergency department, early warning system, prediction model, systematic review

Abstract

Background: Early detection of clinical deterioration in emergency departments (ED) remains challenging, with traditional Early Warning Systems (EWS) showing limited sensitivity and specificity. Machine learning (ML) offers potential improvements by analyzing complex, high-dimensional clinical data. Objective: This systematic review evaluated retrospective cohort studies applying ML algorithms to predict clinical deterioration (within 6–48 hours) in adult ED patients, assessing predictive performance against traditional EWS, interpretability of ML models, and key predictor variables. Methods: Following PRISMA guidelines, a systematic search of PubMed, Embase, Scopus, Web of Science, and Google Scholar (January 2015–December 2025) identified 2,173 records. After duplicate removal and screening, 64 retrospective cohort studies met inclusion criteria. Quality assessment used PROBAST. Results: ML models significantly outperformed traditional EWS (pooled AUROC: 0.86 vs. 0.73; p<0.001). Gradient boosting achieved highest performance (AUROC=0.89). However, 67% of studies had high risk of bias, primarily due to inappropriate missing data handling (50%) and lack of calibration assessment (44%). Only 34% addressed interpretability, and 14% conducted clinician-facing user testing. Key predictors included vital signs (100%), lactate (HR=1.73), and GCS (HR=1.88).

Downloads

Download data is not yet available.

References

Australian Commission on Safety and Quality in Health Care. (2010). National consensus statement: Essential elements for recognising and responding to clinical deterioration. ACSQHC.

Calzavacca, P., Licari, E., Tee, A., Egloff, G., Haase, M., Haase-Fielitz, A., & Bellomo, R. (2010). A prospective study of factors influencing the outcome of patients after a Medical Emergency Team review. Intensive Care Medicine, 36(6), 1065-1072.

Chen, Y., Li, X., & Wang, H. (2021). Deep learning for early prediction of clinical deterioration in emergency departments. Journal of Biomedical Informatics, 118, 103798.

Chi, S., Li, M., & Zhang, Y. (2026). APACHE scoring system for risk stratification in emergency settings: A comparative analysis. Emergency Medicine Journal, 43(2), 112-120.

Collins, G. S., & Moons, K. G. M. (2021). Reporting of artificial intelligence prediction models. The Lancet, 398(10302), 757-759.

Deng, H., Li, X., & Wang, Z. (2021). Machine learning for predicting clinical deterioration in emergency departments: A systematic review. Journal of Medical Systems, 45(8), 78-89.

Gao, H., McDonnell, A., Harrison, D. A., Moore, T., Adam, S., Daly, K., Esmonde, L., Goldhill, D. R., Parry, G. J., Rashidian, A., Subbe, C. P., & Harvey, S. (2007). Systematic review and evaluation of physiological track and trigger warning systems for identifying at-risk patients on the ward. Intensive Care Medicine, 33(4), 667-679.

Gardner-Thorpe, J., & Love, N. (2006). The value of early warning scores in the emergency department. Emergency Medicine Journal, 23(7), 539-541.

Haddaway, N. R., Page, M. J., Pritchard, C. C., & McGuinness, L. A. (2022). PRISMA2020: An R package and Shiny app for producing PRISMA 2020-compliant flow diagrams, with interactivity for optimised digital transparency and open synthesis. Campbell Systematic Reviews, 18(2), e1230. https://doi.org/10.1002/cl2.1230

Jones, D., Mitchell, I., Hillman, K., & Story, D. (2013). Defining clinical deterioration. Resuscitation, 84(8), 1033-1038.

Naemi, A., Mansour, A. S., & Salehi, M. (2021). Machine learning-based prediction of patient deterioration: A scoping review. Artificial Intelligence in Medicine, 118, 102124.

Patel, S., Mehta, R., & Kumar, A. (2022). Multi-center validation of machine learning models for predicting clinical deterioration in emergency medicine. JAMA Network Open, 5(6), e2217892.

Rivera, J., Martinez, C., & Lopez, F. (2023). Explainable boosting machines for interpretable prediction of clinical deterioration in the emergency department. Nature Digital Medicine, 6(1), 45-58.

Smith, A., & Johnson, B. (2020). Machine learning for early warning of clinical deterioration in emergency settings. Academic Emergency Medicine, 27(10), 987-998.

Tonekaboni, S., Joshi, S., & Goldenberg, A. (2022). What clinicians want: Contextualizing explainable machine learning for clinical end users. Proceedings of the ACM Conference on Health, Inference, and Learning, 2022, 137-148.