Digitising ECG Images: A Vision to Time Series Transformer Vs. A Pipeline of Specialised Deep-Learning Models

Felix H Krones, Terry J Lyons, Adam Mahdi, Benjamin Walker
University of Oxford


Abstract

This work presents our team's (SignalSavants) contributions to the 2024 George B. Moody PhysioNet Challenge. The Challenge has two goals: reconstructing ECG signals from ECG printouts (our focus) and predicting cardiac diseases.

ECG digitalisation techniques can enable automatic, low-cost diagnosis of heart conditions, thus improving cardiac care. Additionally, they facilitate building datasets with diverse data from various sources and backgrounds. Despite most ECGs being digitally recorded today, countless paper ECGs remain stored worldwide. The presence of varying recording standards and poor image quality necessitates a data-centric approach to develop robust models that generalise well.

We used two approaches for the digitisation task. First, a fine-tuned pre-trained vision transformer modified to output time series data. The penultimate hidden state is fed into two independent final transformer layers, with one output trained directly on the signal and the second output trained on the path signature of the signal. The aim of this dual objective is to improve the stability during fine-tuning. The second is a pipeline of models, consisting of bounding boxes to locate individual signals and grid information, image segmentation and signal vectorisation. We rigorously assessed the performance of our models by employing a cross-validation technique on the 21,799 recordings. We used the suggested splits from the PTB-XL dataset and created multiple image versions for each signal.

On the digitisation task, our model achieved a local signal-noise-ratio of $0.00$ and an unofficial Challenge score of $0.00$ on the hidden set. For the classification task we achieved a local F-score of $0.16$ and a Challenge score of $0.52$.

Our study shows the challenges of building robust, well generalisable signal digitisation approaches. Such models require large amounts of resources (data, time and computational power) but have great potential in diversifying the data available.