Diffusion Models and Masked Training Helps Digitization and Classification of Multiple Layout ECG Images

Zisheng Liang1, Shanwei Zhang2, 可鑫 王2, Deyun Zhang3, Shijia Geng3, Jun Li4, qinghao zhao5, Yuxi Zhou6, Shenda Hong7
1National Institute of Health Data Science, Peking University, Beijing, China, 2Tianjin University of Technology, 3Heartvoice Medical Technology, 4Jilin University, 5Department of Cardiology, Peking University People's Hospital, 6Peking University, 7Georgia Institute of Technology


Abstract

Recent advancements have led to the development of algorithms for interpreting ECG time series. However, the continued prevalence of ECG images (including photo, print screen, or scan from paper-based ECG), especially in the underdeveloped countries, underscores the need for digitization and affordable data analysis to ensure com- prehensive cardiac care and capture the diverse manifes- tations of cardiovascular diseases worldwide. As part of the George B. Moody PhysioNet Challenge 2024, our goal is to propose a series of pragmatic components for the dig- itization and classification of ECG images. First, we gen- erate training samples with the ECG-Image-Kit and re- fine them using diffusion models for data augmentation. Then, we employ a U-net architecture to perform ECG digitization, utilizing this large scale ECG images paired with their corresponding ground-truth time series. Next, we pre-train a RegNet model for ECG classification us- ing a large scale ECG time series data from open-source datasets. This pre-trained classifier is then further fine- tuned with the digitized ECG time series derived from ECG images. Additionally, we devise an adaptable meta-model and a masked training strategy to address issues related to varying lengths and asynchronization when digitizing diverse ECG image layouts. Our team, PKU NIHDS, re- ceives a SNR of -1.103 on the reconstruction task, and F- measure of 0.421 on the classification task on the hidden test set.