Convolutional Neural Networks with Spatial Transformer Modules for End-to-End Digitization of Paper Electrocardiogram Records

Haoliang Shang, Clemens Hutter, Rodrigo Casado Noguerales, Yani Zhang
Department of Information Technology and Electrical Engineering, ETH Zurich, Switzerland


Abstract

Automatic digitization of electrocardiogram (ECG) paper-based records—to extract 1-D ECG time series from 2-D scanned images—entails a model that is robust to a variety of distortions present in scanned images, including rotation, cropping, creases, as well as text artifacts. As part of the George B. Moody PhysioNet Challenge 2024, our team (mins-eth) proposes a convolutional neural network (CNN) architecture that is end-to-end trainable to perform the digitization task automatically. Specifically, we employ a convolutional module called spatial transformer, which is explicitly invariant to spatial deformation of features, to crop out single leads from the input image. The single leads are then processed by a denoising U-Net module to produce binarized 2-D signals, from which 1-D time series are read out. Our network is learnable in an end-to-end manner, allowing us to leverage the provided training set with data augmentation. Currently, our model achieves a reconstruction SNR of 8.57 in cross-validation on the training set. We did not receive a score during the unofficial phase due to timeout during evaluation.