Analysis of Reward Formulation Based on Mean Arterial Pressure in Reinforcement Learning for Critically Ill Septic Patient

Cristian Drudi, Maximiliano Mollura, Riccardo Barbieri
Politecnico di Milano


Abstract

Aims: Optimal reward formulation in Reinforcement Learning (RL) is still uncertain. The aim of this study is to show that formulating a reward in RL for sepsis treatment using Mean Arterial Pressure (MAP) is a viable solution and can improve patient outcomes.

Methods: The data were extracted from the MIMIC-III database. Patient data from 20,496 intensive care unit (ICU) stays were modeled with two different Markov Decision Processes that differed in reward formulation. The Mortality Model had a reward function linked only to 90-day mortality, and the Target MAP Model had an additional reward component that penalized the RL agent if the patient's MAP fell below 65 mmHg.

Results: The Target MAP Model achieved the best results with a 95% lower bound (LB) of estimated policy value equal to 88.64 compared to 86.01 obtained from the Mortality Model despite having a more penalizing reward. The Target MAP Model in hypotensive patients uses less intravenous fluids and resorts more often to aggressive dosages of vasopressors.

Conclusions: The results show that tying the reward to MAP is a viable approach, and the less sparse reward that comes with tying the reward to high temporal resolution cardiovascular features allows to evaluate single actions rather than the whole sequences of actions leading to the final outcome, allowing the RL agent to learn a better policy.