Past, Present and Future Challenges in Sharing Science: From PhysioNet to Foundation Models

Gari Clifford
Emory University and Georgia Institute of Technology


Abstract

Over the last 25 years, the sharing of data and models for research in cardiology has evolved from sneakernet to the internet - from mailing tapes and compact discs of a handful of well-curated recordings of an array of arrhythmias, to the high-speed download of an entire hospital database. Yet, bandwidth and local computation has not kept pace with the rate at which we can strip mine our data archives. Recently, the trend towards the development of large foundational models has required enormous computing power, making large-scale local clusters or centralized cloud instances of data and compute the only viable solution for developing such models. Instead of democratizing access to data and compute, as was the intent of the pioneers in this field, this trend is leading to the balkanization of innovation in the hands of the rich and powerful. Moreover, claims of foundation models in cardiology, physiology, and medical data in general, are largely premature because foundation models must be trained on broad enough range of data so that they may be applied across a wide range of use cases. To do so requires broad representation, of both individuals and medical conditions. This article examines these trends and provides a discussion of the most promising future directions, together with potential solutions to the concentration of power and lack of diversity in foundation models.