WAVIE: A Modular and Open-Source Python Implementation for Fully Automated Digitisation of Paper Electrocardiograms

Mathilde A Verlyck, Joshua R Dillon, Stephen A Creamer, Debbie Zhao
Auckland Bioengineering Institute, University of Auckland


Abstract

Introduction—The electrocardiogram (ECG) is a ubiquitous tool for the assessment of heart disease. Traditionally, ECGs have been stored in paper format for manual interpretation. While considerable effort has been directed at ECG digitisation to facilitate artificial intelligence applications, limitations persist in the generalisability of existing methods. To address this challenge, we present a fully automated, modular, and open-source framework for ECG digitisation to handle the heterogeneity of real-world data.

Methods—Using the PTB-XL dataset, 3000 synthetic paper ECGs (2874 for training, 126 for validation) with known waveforms and lead-specific bounding boxes were generated under known variations and artefacts. A three-stage framework was developed to reconstruct the waveforms, consisting of open-source deep-learning models for orientation correction, object detection, and waveform extraction. Firstly, Deep-OAD v2, which leverages depthwise separable convolutions and ImageNet features, was fine-tuned (100 epochs) to learn the angle by which ECGs were rotated. Next, the YOLOv5s model was fine-tuned (300 epochs) to extract sub-image bounding boxes around individual lead data. The layout was determined using hierarchical and k-means clustering of bounding box coordinates to yield sub-images corresponding to each lead. Finally, a 2D nnU-Net was trained (200 epochs) to extract masks of the signal from each sub-image, which were converted into waveforms based on gridlines detected using Otsu's Binarisation.

Results—Inference on the validation set produced a median signal-to-noise ratio of 5.12 and a mean of -3.96±6.32. Fully automated digitisation of a single image took approximately 180 seconds (wall clock) on an 8GB NVIDIA RTX 2000 GPU.

Conclusion—Our framework provides a comprehensive and generalisable baseline which can be fine-tuned for specific ECG digitisation tasks and dataset features which fall beyond the original training set. This is an important advantage given the diversity in real-word data, and ensures adaptability for future research applications.