Exploring Pixel Value Scaling in the Application of Convolutional Neural Network U-Net Models for Segmentation of the Myocardium in Magnetic Resonance Images

Trygve Eftestøl1, Mina Farmanbar1, Tomas Royal Choat2, Otto Nessa Ljosdal2, Mathias Dyvik2, Casper Cappelen2, Vidar Frøysa3, Jørn Kværness4, Gøran Jansson Berg2, Stein Ørn3
1University of Stavanger, 2Department of Electrical and Computer Science, University of Stavanger, P.O. box 8600, 4036 Stavanger, Norway, 3Department of Cardiology, Stavanger University Hospital, Armauer Hansens vei 20, 4011, Stavanger, Norway, 4God Klinisk Forskning


Abstract

Aims: Two deep learning models are trained and evaluated on dataset (A) and then applied to a separate dataset (B) comprising Late Gadolinium Enhanced Cardiac Magnetic Resonance Images (LGE-CMRI) for myocardium segmentation. Due to variations in pixel value distributions between the two datasets, different scaling strategies for preprocessing the images before the model applications are explored for dataset B.

Methods: Two deep learning models are trained on data set A: a Modified U-Net adding residual connections (RES_MOD) and a Multiresolution U-Net (MUL_MOD). The maximum pixel value, pv_m, over all slices for each patient specific image set is determined. To standardize pixel values in dataset (B), scaling is performed by dividing each by various values, including s_std=256, s_max=pv_m, s_floor=2^⌊log2(pv_m)⌋, s_round=2^⌊log2(pv_m)+1/2⌋ and s_ceil=2^⌈log2(pv_m)⌉. For each set of scaled images, the two models are applied. The training and evaluation were conducted with s_std scaling on dataset A comprising MRI scans from 51 patients, divided into training (30), test (13), and validation (8) sets. Dataset B includes MRI scans from 58 patients.

Results: The Jaccard scores (mean(SD)) on the test data for dataset A were consistent across both models, with 0.73(0.01). However, for dataset B, the MUL_MOD, with s_floor-scaling, demonstrated the highest score of 0.60(0.14), outperforming all other scaling methods and models. Scaling with s_std yielded a score of 0.58(0.19). The corresponding scores for RES_MOD were 0.56(0.18) and 0.54(0.18) for s_std and s_floor scaling, respectively.

Conclusion: This study highlights the critical significance of the dataset when assessing a model trained and tested on one dataset against another with varying pixel value distributions. The findings indicate that scaling could be effectively employed in semi-automated segmentation processes.