Rationale: Neurologic prognostication following cardiac arrest remains challenging. Electroencephalography (EEG) can aid with real-time prognostication but the sheer mass of data generated requires an automated analytic approach. In response to the 2023 George B. Moody PhysioNet Challenge, we propose an automated, unsupervised pre-training approach to predict neurologic outcomes after cardiac arrest.
Methods: Our model architecture consisted of three parts: a pre-processor to convert raw EEGs to two-dimensional spectrograms, a three-layer convolutional autoencoder (CAE) for unsupervised pre-training, and a Time Series Transformer (TST). We trained the CAE on randomly selected five-minute EEG samples from the Temple University EEG Corpus (TUEG). We then incorporated the pre-trained encoder into the TST as a base layer and trained the model as a classifier on EEGs from the 2023 PhysioNet Challenge dataset. Model performance was assessed using F1-score with five-fold cross validation. For comparison, a TST with a convolutional base layer that had been randomly initialized, rather than pre-trained, was tested.
Results: The TUEG dataset included 14,927 subjects. The 2023 PhysioNet Challenge dataset included 607 post-cardiac arrest patients with a mean of 189 minutes of EEG data per patient. Neurologic outcomes were classified as "poor” (37.1%) or "good” (62.9%). In a side-by-side comparison, our model performed better when the CAE layer, before being trained in tandem with the TST weights, was pre-trained on unlabeled EEG data, F1-score 0.762 ± 0.017, rather than randomly initialized (0.683 ± 0.032).
Conclusion: The CAE was able to learn latent representations of EEGs by training on a large unlabeled dataset not specifically curated for the task of predicting post-arrest outcomes. These latent representations proved useful for training the TST to predict post-arrest outcomes from the labeled PhysioNet dataset, demonstrating that unlabeled EEG data can be leveraged to improve performance on other modeling tasks that depend on smaller labeled EEG datasets.