Classification of Heart Murmurs by a hybrid Deep-Learning Architecture based on Residual Convolutional Neural Networks and Attention-based Convolution-free Audio Spectrogram Transformers

Enrique Almar Muñoz1, Marawan Elbatel1, Santiago Jimenez2, Conrado Calvo3
1MAIA, Universitat de Girona, 2Universitat Politècnica de València, 3Universitat de València; Universitat Politècnica de València


Non-invasive assessment and analysis of phonocardiogram (PCG) recordings detecting physiological or abnormal mechanical function and heart murmurs may provide critical clinical information for early diagnoses of dramatic heart diseases. Recent efforts have focused on automatically detect such abnormalities by using advanced transformations and artificial neural networks. As part of the 2022 CinC Challenge, we focus on detecting the presence of murmurs from multiple heart sound recordings from multiple auscultation locations.

Expert-labeled heart sound recordings from 1568 patients obtained sequentially from one or multiple auscultation locations were used for testing our classification strategy. The training set contains 3163 recordings from 942 patients. All data was preprocessed, following digital signal filtering and conditioning. Empirical wavelet transform (EWT) was used for spectral decomposition. S1-S2 sounds were detected and segmented out by band-pass filtering and normaliced shannon average energy envelope analysis. Apart from the spectrogram, a total of 28 out of 50 features were selected after dimentionality reduction including temporal, spectral and statistical domain features.

Our team (UV_MAIANS) devised a hybrid deep learning strategy combining a residual CNN and transformers of heart murmur classification. Our residual ResNET18 based strategy obtained average classification results by itself (AUC 0.618 after 10 epochs). We obtained an score of 1496 using the challenge classification metric. In the training set, a recall of 0.462 was reached in the training set and 0.51 in the test set.

We surmise that by using a hybrid deep learning strategy combining multilayer CNNs and vision transformers may enhance our classification performance. We propose to pursue this strategy, combining the resnet18 network with the transformers increasing the number of features used to detect multidomain, sequential and weak and strong variations found in heart murmurs. Specifically we’d like to leverage the state-of-the-art learned from Audio Spectrogram Transformers (AST) to multisite murmur classification tasks.