Aims: Pulmonary hypertension (PH) is a hemodynamic condition, where the pressure in the pulmonary artery and right ventricle is raised. At date, Right Heart Catheterism is the gold standard diagnostic test for PH, but it is invasive and expensive procedure. Deep learning (DL) techniques applied to heart sounds previously shown promising performances for PH screening.
Methods: In this work, we analyze the impact of different input repre-sentations for PH detection with convolutional neural networks (CNNs). We leverage a private dataset including 42 subjects (29 with, 13 without PH). Feature maps were constructed starting from the signals: 1) S2s: collecting the S2 segments in the time domain (one image per recording); 2) TFDs: adding the information of the frequency domain through the Wigner-Ville (one image per beat); 3) MFCCs: Mel Frequency Cepstral Coefficients (one image per beat). Two CNN models were designed: A) a 2D CNN, where each image is provided to the model as an independent input, or averaged; and B) a 3D CNN, where images from the same recording are stacked into a 3D matrix and provided as a single input. We performed bootstrapped 5-fold cross-validation and assessed the macro-averaged AUC.
Results: Results are presented in Table I. Considering each heartbeat as an independ-ent input yielded systematically lower performance than considering the recordings as a whole: preserving the information about the variability over the heartbeats is key. Time-domain feature maps (S2s, TFDs) outperformed MFCCs. Combining time and frequency (TFDs) proved consistently most effective. Reducing the number of heartbeats to 30 did not affect perfor-mances, and even reducing to 10 beats preserves the diagnostic value.
Conclusions: We believe that the proposed analysis clarifies the im-portance of input representation in DL and moves one step further the ap-plicability of DL for PH detection from heart sounds in the clinical practice.