Robustness of Residual Network in Predicting PR Interval Trained using Noisy Labels

Loc Cao1, Hamid Ghanbari2, Negar Farzaneh1, Kevin Ward1, Sardar Ansari3
1Department of Emergency Medicine, University of Michigan, 2Division of Internal Medicine, Section of Cardiovascular Medicine, University of Michigan, 3Emergency Medicine, University of Michigan


Abstract

Background: Deep learning models have recently been proposed to estimate ECG intervals. These models are often validated using datasets collected as part of routine clinical care, containing noisy labels due to automatically generated incorrect labels that are not corrected by human readers. This poses challenges for the training and validation of these models. Additionally, the validation should account for the human eye's inability to detect differences smaller than 0.25mm (10ms) on an ECG strip.

Methods: Using a dataset of >1.1M 12-lead ECGs (885,583 for training and 221,269 for testing) from the University of Michigan, a residual neural network (ResNet) was trained to estimate PR intervals. To address the problem of validation with noisy labels, blinded manual adjudication was performed by an electrophysiologist on a stratified sample of 160 ECGs. The reviewer identified whether the model estimation and/or labels were accurate. Cases with error <10ms were considered accurate since the human eye is unable to detect smaller differences.

Results: We estimated that ~4% of the PR interval labels had errors of up to ~500ms. The ResNet achieved a correlation of 94.1%, a bias of 0.06ms, and a mean absolute error of 4.78ms against the noisy test labels. 90.7% of model estimates were within 10ms of the labels and were considered accurate, which was verified by manual adjudication of a random sample. A stratified sample of the remaining ECGs was also manually adjudicated. The ResNet accurately estimated 96.9% [95% CI: 96.56-97.24] of the PR intervals, higher than the estimated accuracy of the noisy labels at 96.2% [95% CI: 95.88-96.52].

Conclusions: The results indicate that a ResNet trained on noisy PR interval labels can make correct predictions, and outperforms the noisy labels it was trained on.