Background: Cardiotocography (CTG) can identify babies at risk of fetal hypoxia by detecting changes in fetal heart rate and womb contractions. However, variability in CTG interpretations affects intervention timings. Machine learning (ML) has been applied to this problem and showed the potential to distinguish CTG accurately. Previous research mainly used babies' blood pH levels as the benchmark for fetal hypoxia with varying pH levels, which is not directly usable for clinical decision-making. Aim: We proposed to use a 5-minute Apgar score as the benchmark for hypoxia in our ML algorithms. Low Apgar scores have shown a high correlation with hypoxic diagnosis and abnormal CTG, and it is a routine, standardised measurement of babies' condition such as skin appearance, heart rate, reflex, muscle tone and breathing after birth. Method: We used the CTU-UHB database containing 552 CTGs. Firstly, we extracted features and signal characteristics from CTGs, and these were used to build ML models. We employed the Synthetic Minority Sampling Technique to overcome the imbalance issue. Next, the dataset was split into training and test sets. A grid search cross-validation was used for tuning hyperparameters. We trained and compared five algorithms of decision tree (DT), random forest (RF), support vector machine (SVM), k-Nearest Neighbour (kNN) and artificial neural network (ANN). Performances were evaluated using precision, recall, F1 score and AUROC. Results: The ANN with 4 deep layers had the highest recall (100%), while the RF classifier had the highest F1 (97%), AUROC (99.73%) and precision (97%) (Table 1). The longest deceleration length is the most important feature among the 21 features. Conclusion: Apgar scores can be used as a surrogate hypoxia marker for classifying CTG, making the model results more informative for clinical decision-making. Our model could benefit from more CTGs to maximise ML model performances.