Maiby's Algorithm: a Denoiser, a Murmur Detection, and a Decision Model for Congenital Heart Disease Automatic Auscultation

Matheus Araujo1, Dewen Zeng2, Joao Palotti3, Xinrong Hu2, Yiyu Shi2, Wei Guang Shao4, Lee Pyles5, Quan Ni4
1Cleveland Clinic Foundation, 2University of Notre Dame, 3Qatar Computing Research Institute, 4One Heart Health, 5West Virginia University


Congenital heart disease (CHD) is a major cause of mortality among congenital disabilities. For cardiac centers in low-income countries, adjudication of innocent from pathologic murmurs through cardiac auscultation is non-scalable due to limited resources. Consequently, hearts are not being screened, and lives are not being saved. We propose an automatic auscultation model trained to have a specialist-like performance to address this limitation. Our model identifies specific sounds generated by turbulent blood from pathological murmurs recorded from multiple chest positions and outputs the classification prediction for an individual regarding having or not an abnormal heart sound. To train and validate our model, we use the CirCor Digiscope dataset with 5282 heart sounds collected from 1568 children in the Paraiba state of Brazil. Their duration varies from 5 to 168 seconds, recorded in 4000 Hz. To address the challenge of pathological pattern recognition in a high dimensional feature space, our classification model uses an ensemble strategy that combines the output of three independent deep-learning (DL) models for audio classification. The first DL model uses a transfer learning technique based on the OpenL3 python library, trained over ten days using four GPUs to generate a low-dimensional representation of sounds. The second model is an end-to-end 1-D ResNet architecture that addresses the "vanishing/exploding gradient problem" for long sequence data. The third model uses a transformer-based architecture, the current state-of-the-art DL architecture containing attention layers designed. The attention layers permit the model to focus on the relevant part of the input during training and inference, leveraging spatial and spatial patterns. Since false negatives have considerably worse consequences than false positives, we use a data augmentation and sensibility adjustment of our ensemble model in the context of the Physionet Challenge. Our cross-validation score was 526, and the Physionet Challenge score was 528 during the unofficial phase.