Reproducibility of machine learning models for paroxysmal atrial fibrillation onset prediction

Cédric Gilon1, Jean-Marie Grégoire1, Jérome Hellinckx1, Stéphane Carlier2, Hugues Bersini1
1Université Libre de Bruxelles, 2Université de Mons


Background Atrial fibrillation (AF) is the most common heart arrhythmia. Paroxysmal AF onset prediction is a much more complex task than screening AF. Published methods using the Physionet AFPDB database show excellent results, suggesting that forecasting AF episodes is possible by implementing machine learning (ML) models using heart rate variability (HRV) parameters.

Aims Reproduce previously obtained results by published studies using the Physionet database and a larger database of unselected real-life patients.

Methods We searched the literature for all articles on the forecasting of paroxysmal AF episodes. We analysed in depth the methodology of 3 recent studies using ML methods, to replicate their results. We screened our ECG Holter monitoring database of 11833 Holters to find those with paroxysmal AF episodes recorded. A total of 214 Holters with paroxysmal AF were labelled. We developed two ML models (deep neural network and a random forest model) for AF forecast using 13 HRV parameters. We compared performances of published models and our models using the Physionet database and our real-life database of patients.

Results 21 papers dedicated to the prediction of the onset of AF episodes have been published so far, showing exciting results culminating in sensitivities of 98%, specificity of 95% and accuracy of 98%. Using each model description available in the publications, we could not reach the published performances on the Physionet database. In addition, our models obtained a lower sensitivity of 84% for a specificity of 49% on the Physionet database, similar to the sensitivity of 80.1% for a specificity of 52.8% on our larger database.

Conclusion ML models need to be more detailed if the reported results must be reproducible. Progress must still be made before the clinical use of algorithms that can anticipate paroxysmal AF. The use of larger databases is mandatory for this type of prediction.