A machine learning approach to detect hypertension, diabetes and cardiovascular disease from PPG

George Searle1, Stefan van Duijvenboden2, Julia Ramírez3, Andrew Tinker4, Patricia Munroe4, Pier Lambiase1, Alun Hughes1, Michele Orini5
1UCL, 2UCL institute of cardiovascular science, 3University of Zaragoza, 4QMUL, 5University College London, Institute of Cardiovascular Science


Abstract

Background. As the burden of cardiovascular disease increases due to aging and lifestyle, novel early risk stratification methods are needed. The photoplethysmogram (PPG) is a non-invasive, low-cost signal rich in cardiovascular information that is embedded in mobile phones and smartwatches. Hypertension and diabetes are global leading causes of death and are often undiagnosed despite being on the rise. Previous PPG studies have suffered from relatively small sample sizes. Aim. To develop and test a machine learning approach to detect hypertension (HT), type 2 diabetes (T2D), and past major adverse cardiovascular events (MACE) from PPG. Methods. Forty morphological PPG features were derived from a signal-averaged PPG waveform in 165,340 UK Biobank study participants (age 60 (52-65), sex 53% female). A machine learning model (XGBoost) was trained in 80% of the dataset and tested in the remaining 20%. Three models were developed, which included the following predictors: age, sex and body mass index (M0); PPG features (M1) and M0+M1 (M2). Discrimination was assessed using the area under the ROC curve (AUC). Results. Prevalence of hypertension, T2D and MACE was 54.7%, 6.0% and 5.5%, respectively. Models based on PPG features (M0) discriminated clinical outcomes with an AUC ranging between 0.62 and 0.70, while models based on demographics (M1) showed AUCs between 0.69 and 0.74. Combining PPG features with demographics data (M2) significantly improved discrimination, with AUC reaching 0.79, 0.77 and 0.73 for hyper-tension, T2D and MACE, respectively, with narrow confidence intervals. In the case of HT and T2D, PPG features were also found to be more important than age, sex or BMI in discrimination. Conclusions. In this large study, a machine learning approach showed that features extracted from the signal-averaged PPG waveforms can improve discrimination of primary cardiovascular risk factors and events.