Clinical and dental predictors of preterm birth using machine learning methods: the MOHEPI study
In this study, we found that dental factors such as MGI and DMFT index could function as major predictors in the PTB prediction model constructed using the ML method. What differentiates our study from previous ones is that we added dental factors in addition to the well-known clinical risk factors, including various clinical backgrounds and obstetric histories. In previous prediction models using either ML or multivariate regression analysis, model performance calculated using AUC ranged within 0.61–0.7123,24,25,26,27,28. The types of birth used as the outcome variables were SPTBs and PTBs. In this study, among the five ML methods tested (decision tree, naïve Bayes, RF, support vector machine; artificial neural network), RF showed the highest performance with AUC 0.73 in PTB and 0.86 in the SPTB model. RF creates many training sets, trains many decision trees, and makes a prediction with a majority vote (bagging). RF included 1000 decision trees in this study. A majority vote from 1000 doctors would be more robust than a vote from 1 doctor. In a similar context, a majority vote from 1000 decision trees would be more robust than a vote from a single ML approach.
This study confirmed the effectiveness of introducing ML for the development of predictive models for PTB. This can help to establish guidelines for PTB prediction models and serve as critical evidence for early screening and personalized preventive interventions based on these risk factors. In particular, identifying maternal dental health, such as the MGI, as an important predictor of PTB underscores the significance of generalizing dental examinations for pregnant women. Furthermore, integrating medical and dental evaluations into prenatal care protocols can facilitate early interventions, which may improve outcomes for both mothers and newborns.
This model is significant because it is the first attempt to adopt dental factors as independent variables for PTB prediction. Surprisingly, the AUC values for prediction performance were comparable to or even superior to those of previous publications. The rankings of RF variable importance verified the significance of the dental factors in this model (Table 3). MGI ranked second in the PTB and sixth in the SPTB model. This outranked well-known PTB risk factors from a medical perspective, such as maternal age (5th), prior PTB (14th), preeclampsia (3rd), chronic hypertension (15th), and GDM (10th)8. MGI indicates the level of gingival inflammation during the examination period and represents susceptibility to chronic inflammation29. In addition, considering several specificities, our patient population showed the following in terms of periodontal status: (1) mild/moderate periodontitis (stages 1 and 2) in general; and (2) no observable differences between the two groups with regard to periodontitis severity and amount of plaque biofilm. Higher MGI scores meaningfully reflect gingivitis susceptibility of the host, which correlates strongly with future periodontitis, especially in young adults30. In this context, high MGI score, especially MGI ≥ 2 in average (Fig. 2a), at prenatal screening or regular check-ups during pregnancy should indicate either high maternal susceptibility to the possible infection that could occur during the pregnancy or the tendency for progressive periodontitis, the disease whose causative pathogens are known to influence the activation of preterm labor15.
Numerous publications including systematic reviews and meta-analyses have addressed the association between periodontitis and PTB or adverse pregnancy outcomes14,16,17,18,31,32,33,34,35,36,37,38. Although the overall results indicate that maternal periodontitis is significantly associated with the PTB rate, the results are often inconsistent, and causal relationships are still far from being determined. Many of the studies included in the meta-analyses are case–control studies; therefore, inevitable biases and limitations are inherent14. Importantly, the definitions of periodontitis vary in different studies, and the parameters considered in periodontal diagnosis are often confined to probing pocket depth or CAL.
In this prospective study, we collected data on multiple dental parameters, including periodontitis stage, MGI, PI, and DMFT index. Periodontitis staging represents the severity of the disease and is based on a newly established periodontitis classification39. The periodontal status of our participants evaluated using the staging system did not differ between the two groups. Also, there were very few severe periodontitis cases corresponding to stages 3 or 4, and the prevalence did not differ between the groups. This result is somewhat different from that of previous publication from our group, in which the incidence of periodontitis was generally higher in preterm mothers40. It is possible that our study population did not fully reflect real-world situations because the number of participants was relatively small.
Considering the high AUC (0.86) in our SPTB analysis model and RF variable importance rankings relating to that, preterm labor with or without rupture of membranes (SPTB) should be considered the main cause of PTBs. It is regarded as a syndrome with multiple causes, including infections, vascular disorders, decidual senescence, breakdown of maternal–fetal tolerance, a decline in progesterone action, and cervical disease41. A delicate balance in maternal immunology is a prerequisite for healthy pregnancy and labor processes. The shift from quiescent to a proinflammatory state activates labor, and inflammatory cytokines such as interleukin-1, interleukin-6, and tumor necrosis factor-α are involved in this process42. In this context, when the pregnant mother becomes vulnerable to the infection relating to the multitude of factors such as BMI, stress, behavioral factors, genetics, and nutritional deficiency, labor process can be abnormally brought forward43. According to our results, high MGI score can work as a screening tool for mother’s susceptibility to the infection. It may be a simple, noninvasive, and cost-effective test that can be included in the prenatal screening. Moreover, because it is likely that mothers with high MGI scores will have more pathogenic bacteria in their dental plaques, periodontal therapy should be carried out in the prenatal or mid-term period to prevent hematogenous dissemination of the oral pathogen.
Further, the dependence plot showed negative correlation within the range of DMFT index ≤ 10 (Fig. 2b), implying that dental caries-susceptible patients had a low probability of SPTB. Dental caries and periodontitis are two contrasting infections occurring in the oral cavity, because the bacterial species associated primarily with each disease entity have nearly opposite characteristics44. In this context, periodontitis-susceptible patients have a low tendency to be caries-prone, therefore presenting a low DMFT index and high SHAP value.
The strength of this study lies in the inclusion of dental risk factors such as MGI and DMFT index in predicting PTB. There were several researches that establishing PTB predicting model based on the LR or ML analysis45,46,47. However, all of these studies included only clinical data based on electronic health records. Unlike previous models, this study incorporates dental factors in addition to maternal clinical data, based on the correlation between periodontal disease and PTB. This is a novel approach and introduces an important new insight to existing research on PTB prediction. The other strength of this study is to use of cutting-edge ML approaches, such as RF variable importance and SHAP summary/dependence plots, to identify the major predictors and explaining the directions of their associations. In conventional statistical approaches like linear or LR analysis, unrealistic data assumption underlies the methodology, that is, ceteris paribus, “all the other variables staying constant.” By contrast, SHAP considers all realistic scenarios. Assume that there are three predictors of SPTB: PROM, pre-pregnancy BMI, and MGI. Here, the SHAP value of MGI for a participant is the average of the following four scenarios: (1) PROM excluded, pre-pregnancy BMI excluded; (2) PROM included, pre-pregnancy BMI excluded; (3) PROM excluded, pre-pregnancy BMI included; and (4) PROM included, pre-pregnancy BMI included. In other words, the SHAP value combines the results of all possible subgroup analyses that are ignored in conventional methods.
There were several limitations in this study. Firstly, the limitation of this study lay in its small sample size of the dataset. Normally, ML techniques can structure data without explicit programming, and the prediction performance should increase and become more reliable when a large dataset is used. However, in this study, we could still obtain solid performance using the RF method, that is, 73% (AUC) for PTB and 86% (AUC) for SPTB, from a relatively small dataset. Unlike previous studies where population-based retrospective data were used in ML-based modeling, our data set was prospectively collected and contained multifaceted parameters, including dental, clinical, and obstetric factors. It appears that the high-quality data enabled the construction of a robust model. However, it will be necessary to confirm our results using a larger dataset. Also, small sample size might have led to the study population that tended to have more favorable periodontal conditions comparing the real world. Secondly, social factors of parents including SES such as parents’ educational level, occupation, and income, as well as level of physical activity, were not included in this analysis. These SES and physical activity may serve as risk factors for PTB48,49, but research findings on this matter have not yielded consistent conclusions50,51. Additional research may be necessary to evaluate the impact of these variables. Thirdly, the limitation is that dental records were collected after delivery. Although dental examinations were conducted within several days after delivery, hormonal changes that occur along with the delivery can affect periodontal status of the mothers. Most ideally, serial records of dental examination during antenatal visits will provide more valuable information in clarifying the significance of periodontal status on PTB prediction. This should be taken into consideration in further studies. Fourthly, there were potential biases in this study, including selection bias arising from the focus on women who delivered via cesarean section, which may not represent the broader population of expectant mothers. Additionally, reliance on self-reported medical histories could introduce reporting bias, affecting the accuracy of the predictors identified. Consequently, while the findings provide valuable insights, caution should be exercised when generalizing these results to all pregnant women, and further research is needed to validate the predictive model across diverse populations and settings.
link