Open Science to Foster Progress in Automatic ECG Analysis: Status and Future Directions

Nils Strodthoff
Carl von Ossietzky Universität Oldenburg


Abstract

In recent years, there has been a significant increase in the availability of publicly accessible ECG datasets. These datasets have been used in thousands of research works, showcasing their profound impact on the research landscape. However, despite their widespread use, many currently available public datasets lack certain crucial components. These include the absence of high-quality expert annotations based on standardized ontologies or mapping schemes to facilitate comparisons across datasets. Additionally, there is a need for enriched metadata, such as interoperable ECG features or automatic diagnostic statements, to greatly improve their usability and interoperability. Nevertheless, the most glaring shortfall of current publicly available datasets, compared to those utilized in recent high-impact publications, is the absence of clinical ground truth. This deficiency prevents the broader research community from replicating and expanding upon literature works based on closed in-hospital datasets. Consequently, this impedes scientific progress and results in suboptimal solutions by excluding a significant portion of the research community.

In my talk I will trace the recent history on publicly available ECG datasets based on the PTB-XL dataset and discuss possible directions for the enrichment of existing datasets at the example of the PTB-XL+ feature dataset. I will close with a possible path towards ECG datasets with clinical ground truth based on recent work in the context of the MIMIC-IV-ECG dataset.