Overcoming Modality Gaps: Self-Supervised Learning for Image-based Cardiovascular Disease Detection

Yeongyeon Na1, Minje Park1, Taehyung Yu1, Jeonghwa Lim1, Younghoon Ji2, Sunghoon Joo1
1VUNO Inc., 2VUNO Inc


Abstract

Electrocardiogram (ECG) is a simple, non-invasive tool for cardiovascular disease diagnosis. Several studies have proposed artificial intelligence (AI) models to detect various cardiovascular disease such as arrhythmia, heart failure, acute myocardial infarction from ECGs. Most of these models operate under the assumption of receiving one-dimensional raw signal inputs. How-ever, in clinical practice, ECGs are frequently used in image format, and raw signal access can sometimes be unavailable. To address this challenge, there is a growing demand for AI models capable of disease detection from image-based ECGs. We found that simply applying the computer vision models suffers from inferior performance compared to the signal-based models. We proposed leveraging the self-supervised learning (SSL) method to mitigate the performance disparity arising from the modality difference between raw ECG signals and ECG images. Among SSL techniques, Masked Image Modeling (MIM) was well-suited for ECG images in SSL due to: 1) avoiding the domain-specific data aug-mentations and 2) superior scalability compared to contrastive learning methods. We demonstrated that MIM effectively helped the model to extract meaningful features from ECG images through experiments. Applying our MIM algorithm, we achieved an F1 score of 0.84, surpassing the 0.82 attained by the model trained without MIM. Notably, this performance was also comparable to the signal-based model, which yielded an F1 score of 0.85. Based on these findings, we demonstrated the potential for expanding deep learning-based ECG diagnosis to settings with limited access to ECG waveform data by utilizing ECG images alone.