Raising High Risk-aware in Hemodynamic Treatment Optimization with Reinforcement Learning for Septic Shock Patients

Meicheng Yang, Runfa Li, Tong Hao, Caiyun Ma, Jianqing Li, Chengyu Liu
Southeast University


Septic shock is a life-threatening condition in the intensive care unit (ICU). Optimizing its hemodynamic treatment for maintaining blood pressure has been studied using reinforcement learning, but lacks treatment security assessment making it unreliable in clinical decisions. Raising awareness of the high risk of the treatment that might lead to poor outcomes is thus required. To address this issue, we included retrospective data from 7190 septic shock patients (mortality of 22.6%) who were admitted to the ICU of the Beth Israel Deaconess Medical Center. Data with the length of 80 h since ICU admission in 4-h time steps were coded as multivariate discrete-time series. The total volume of IV fluids and maximum dose of vasopressors administered over each 4-h period defined the medical treatments of interest. The reward (+100/-100) was defined as surviving or not. Patient states measured by vital signs and clinical tests were constructed using an auto-encoder network. To evaluate the probability of transitioning to poor outcomes or good, we trained two separate double deep Q-Network to produce value estimates of the embedded patient states and all possible treatments separately. Models were developed with 85% of the patients (4728 survivors, 1383 nonsurvivors) and validated on the remained 15% (834 survivors, 245 nonsurvivors). Results reported that more than 6.1% of treatments administered to non-surviving patients were identified as detrimental at least 12 hours prior to death with a 0.7% false-positive rate. This increased to 16.7% four hours before death, with only a 0.6% false-positive rate.