Preliminary Program

Exploring Pixel Value Scaling in the Application of Convolutional Neural Network U-Net Models for Segmentation of the Myocardium in Magnetic Resonance Images

Trygve Eftestøl¹, Mina Farmanbar¹, Tomas Royal Choat², Otto Nessa Ljosdal², Mathias Dyvik², Casper Cappelen², Vidar Frøysa³, Jørn Kværness⁴, Gøran Jansson Berg², Stein Ørn³
¹University of Stavanger, ²Department of Electrical and Computer Science, University of Stavanger, P.O. box 8600, 4036 Stavanger, Norway, ³Department of Cardiology, Stavanger University Hospital, Armauer Hansens vei 20, 4011, Stavanger, Norway, ⁴God Klinisk Forskning

Abstract

Aims: Two deep learning models are trained and evaluated on dataset (A) and then applied to a separate dataset (B) comprising Late Gadolinium Enhanced Cardiac Magnetic Resonance Images (LGE-CMRI) for myocardium segmentation. Due to variations in pixel value distributions between the two datasets, different scaling strategies for preprocessing the images before the model applications are explored for dataset B.

Methods: Two deep learning models are trained on data set A: a Modified U-Net adding residual connections (RES_MOD) and a Multiresolution U-Net (MUL_MOD). The maximum pixel value, pv_m, over all slices for each patient specific image set is determined. To standardize pixel values in dataset (B), scaling is performed by dividing each by various values, including s_std=256, s_max=pv_m, s_floor=2^⌊log2(pv_m)⌋, s_round=2^⌊log2(pv_m)+1/2⌋ and s_ceil=2^⌈log2(pv_m)⌉. For each set of scaled images, the two models are applied. The training and evaluation were conducted with s_std scaling on dataset A comprising MRI scans from 51 patients, divided into training (30), test (13), and validation (8) sets. Dataset B includes MRI scans from 58 patients.

Results: The Jaccard scores (mean(SD)) on the test data for dataset A were consistent across both models, with 0.73(0.01). However, for dataset B, the MUL_MOD, with s_floor-scaling, demonstrated the highest score of 0.60(0.14), outperforming all other scaling methods and models. Scaling with s_std yielded a score of 0.58(0.19). The corresponding scores for RES_MOD were 0.56(0.18) and 0.54(0.18) for s_std and s_floor scaling, respectively.

Conclusion: This study highlights the critical significance of the dataset when assessing a model trained and tested on one dataset against another with varying pixel value distributions. The findings indicate that scaling could be effectively employed in semi-automated segmentation processes.