Evidential Deep Learning Model for Atrial Fibrillation Detection from Holter Recordings

Md Moklesur Rahman1, Massimo W Rivolta1, Pierre Maison Blanche2, FABIO BADILINI3, Roberto Sassi1
1Dipartimento di Informatica, Università degli Studi di Milano, 2Department of Cardiology, H'opital Bichat, Paris, France, 3AMPS LLC


Abstract

Background: Deep learning (DL) has demonstrated promising success for atrial fibril- lation (AF) detection from electrocardiograms (ECGs). However, despite their capabilities, DL models are susceptible to overconfidence in predictions and demonstrate inadequacies in calibrating output probabilities. Thus, uncertainty estimation is a key factor to make DL models reliable in practical applications. Recently, evidential DL has emerged, explicitly quantifying uncertainty by treating model outputs (logits) as evidence to parameterize the Dirichlet distribution. Empirically, it proved capable of notable advancements in uncertainty estimation performance.

Purpose: The main purpose of this study is to develop a DL model capable of estimating uncertainty without incurring additional computational costs compared to standard softmax- output models, leveraging a large cohort of 2-lead Holter ECG recordings.

Methods: We developed an evidential residual-based DL model, utilizing the output of the model as evidence values derived from input features. It parameterized evidence as a Dirichlet distribution, treating predicted probabilities as subjective opinions. The evidential DL model was trained using data from 250 Holter recordings (patients) plus 18 for validation, encompassing three distinct rhythms: AF, atrial flutter (AFL) and a third class (Non-AF) combining normal sinus rhythm and atrial tachycardia.

Results: The efficacy of the evidential DL model was tested using 393 Holter recordings. Our experimental findings revealed sensitivities of 0.91, 0.79 and 0.93, for detecting AF, AFL, and Non-AF, respectively. Corresponding AUC scores were 0.96 for AF and non-AF, and 0.90 for AFL. In terms of confidence estimation, the evidential DL model exhibited superior uncertainty quantification with an expected calibration error (ECE) of 0.09, while the softmax-based DL model yielded an ECE of 0.17.

Conclusion: Our experiments suggest that evidential DL models might offer superior calibration compared to standard softmax models, thereby enhancing their effectiveness in detecting AF.