The Effect of Missing Data When Predicting Readmission in Heart Failure Patients

Filip Plesinger1, Zuzana Koscova1, Eniko Vargova1, Jan Pavlus1, Radovan Smisek2, Ivo Viscor1, Veronika Bulkova3
1Institute of Scientific Instruments of the CAS, 2Brno University of Technology, Faculty of Electrical Engineering and Communication, Department of Biomedical Engineering, 3Medical Data Transfer, s.r.o.


Abstract

Background: The discharge of patients from hospital care is regulated by guidelines. Still, readmission of heart failure (HF) patients is a common issue, and several calculators have been published to predict it.

Aims: In this paper, we elaborate on how the prediction performance decreases when features become missing. We also elaborate on which features a user should include every time to reach acceptable prediction performance.

Method: We prepared a balanced dataset from HF patients in the MIMIC-III database (N=2,004) with 16 features. Using training data (80%) in a four-fold cross-validation manner, we evaluated all feature combinations (N=216-1) and found the optimal feature set for the logistic regression model. We also evaluated feature presence in top-performing models (N=655) and identified essential features. Finally, we trained the resultant model using all training data and evaluated the effect of missing features (N=28 combinations) using separate test data (20%).

Results: We identified five less important and three essential features (age, blood urea nitrogen, and systolic blood pressure). This led to a resultant model with eleven features. The hazard ratio (HR) using test data showed a value of 1.99 (95%CI 1.57-2.51) when all eleven features were present. It also showed an HR of 1.72 (95%CI 1.37-2.17) when only three essential features were present, and others were missing (i.e., replaced by zeros).