Background: Deep learning models have recently been proposed to estimate ECG intervals. These models are often validated using datasets collected as part of routine clinical care, containing noisy labels due to automatically generated incorrect labels that are not corrected by human readers. This poses challenges for the training and validation of these models. Additionally, the validation should account for the human eye's inability to detect differences smaller than 0.25mm (10ms) on an ECG strip.
Methods: Using a dataset of >1M 12-lead ECGs (751,886 for training and 221,424 for testing) from the University of Michigan, a residual neural network (ResNet) was trained to estimate PR intervals. To address the problem of validation with noisy labels, blinded manual adjudication was performed by an electrophysiologist on a stratified sample of 200 ECGs. The reviewer identified whether the model estimation and/or labels were accurate. Cases with error <10ms were considered accurate since the human eye is unable to detect smaller differences.
Results: We estimated that ~4% of the PR interval labels had errors of up to ~500ms. The ResNet achieved a correlation of 94.3%, a bias of 0.27ms, and a mean absolute error of 4.7ms against the noisy test labels. 91.3% of model estimates were within 10ms of the labels and were considered accurate, which was verified by manual adjudication of a random sample. A stratified sample of the remaining ECGs was also manually adjudicated. The ResNet accurately estimated 96.9% [95% CI: 96.6-97.4] of the PR intervals, higher than the estimated accuracy of the noisy labels at 96% [95% CI: 95.7-96.3].
Conclusions: The results indicate that a ResNet trained on noisy PR interval labels can make correct predictions, and outperforms the noisy labels it was trained on.