PPG Foundation Models, Morphological Features and Hypertension and Diabetes in 215,000 Subjects

George Searle1, Stefan van Duijvenboden2, Julia Ramírez3, Andrew Tinker4, Patricia Munroe4, Pier Lambiase1, Alun Hughes1, Michele Orini5
1UCL, 2UCL institute of cardiovascular science, 3University of Zaragoza, 4QMUL, 5University College London, Institute of Cardiovascular Science


Abstract

Background: Cardiovascular disease prevalence is rising globally due to aging populations and lifestyle changes, emphasizing the need for novel approaches to early and accessible risk screening. Hypertension (HTN) and type 2 diabetes (T2D) significantly contribute to cardiovascular morbidity and mortality but often remain undiagnosed despite screening efforts. Photoplethysmography (PPG), a non-invasive, cost-effective technology, has emerged as promising for cardiovascular risk assessment via wearable devices. Historically, PPG risk assessment has relied on manually derived morphological features or convolutional neural networks tailored to specific datasets. Recently, pretrained foundation models, such as PaPaGeI, developed specifically for PPG signals, have demonstrated potential by generating generalizable embeddings suitable for cardiovascular risk classification tasks.

Aim: Compare PaPaGeI-derived embeddings with traditional morphological PPG features for detecting HTN and T2D.

Methods: 80 morphological features were derived from signal-averaged PPG waveforms of 215,468 UK Biobank participants (median age 60, 53% female). PaPaGeI embeddings were extracted from these participants. XGBoost models were trained with an 80:20 train-test split. Five models were developed for HTN and T2D: baseline (age, sex, BMI; M0), PaPaGeI embeddings only (P0), M0 plus PaPaGeI (P1), morphological features only (P2), and M0 plus morphological features (P3). Performance was evaluated using the area under the receiver-operator curve (AUROC).

Results: HTN prevalence was 54.7%, and T2D was 6.0%. For HTN, morphological features significantly outperformed PaPaGeI embeddings alone (P2: 0.69 vs. P0: 0.67) and combined with traditional factors (P3: 0.81 vs. P1: 0.78). For T2D, PaPaGeI embeddings alone outperformed morphological features (P0: 0.64 vs. P2: 0.61), with similar performance when combined with traditional factors (P1: 0.76 vs. P3: 0.75).

Conclusions: Morphological PPG features generally outperformed PaPaGeI embeddings in HTN detection. However, PaPaGeI embeddings provided slight advantages for T2D detection independently, highlighting the potential of foundation models in specific cardiovascular risk assessments.