A stacking ensemble machine learning model for predicting postoperative axial pain intensity in patients with degenerative cervical myelopathy

A stacking ensemble machine learning model for predicting postoperative axial pain intensity in patients with degenerative cervical myelopathy

This study identified three key findings: (1) Classifier performance was significantly improved by applying an ensemble learning approach that integrated three commonly used ML models—EmbeddingLR-RF, EmbeddingRF-MLP, and RFE-SVM; (2) The implementation of an ensemble learning approach, such as stacking, led to an improvement in the performance of the machine learning classifier, with the SVM ensemble classifier achieving the best results; (3) Finally, preoperative axial pain intensity, JOARR, preoperative C2-7 Cobb angle, HADS-D, and age were identified as the five most important clinical predictors of postoperative axial pain following posterior cervical decompression surgery in DCM patients.

Determining the optimal NRS cut-off between mild and moderate pain is essential for identifying patients who require pain management (moderate-to-severe pain) versus those who do not (mild pain)21. However, the exact threshold remains debated. Current literature suggests varying cut-off points for moderate-to-severe pain, with NRS thresholds ranging from 3 to 6, depending on diagnostic criteria and analytical methods21. Recent studies show a tendency to set the threshold at 4 for identifying postoperative patients needing pain treatment22. Accordingly, this study defines postoperative axial pain as an NRS score of 4 or higher.

In this study, we converted a continuous variable (e.g., postoperative axial pain) into a binary one in the machine learning classifiers for three primary reasons23. First, with a limited sample size, binary variables simplify model construction by reducing data complexity. Second, this transformation minimizes the impact of outliers, enhancing predictive reliability. Finally, binary variables are less prone to noise and measurement errors, thereby improving the model’s robustness and stability—qualities critical for clinical applications.

Early identification of patients with PAP enables clinicians to develop new perioperative management strategies and early interventions to reduce postoperative pain incidence. Over recent decades, efforts have focused on creating clinical prediction models to forecast PAP severity in DCM patients undergoing posterior cervical decompression surgery24,25,26. An accurate prediction model would help spine surgeons identify individuals at higher risk of severe pain postoperatively, facilitating the creation of personalized treatment plans based on each patient’s risk profile. However, current predictive models for PAP following this procedure remain limited. Prior studies have offered some insights. Kimura et al. used logistic regression to identify predictors of PAP, including anterolisthesis, smoking, moderate-to-severe baseline neck pain, and lower SF-36 Mental Component Summary scores12. Ionse et al. found that preoperative and postoperative cervical lordosis in extension independently predicted postoperative neck pain11. Additionally, Cao et al. used logistic regression on 144 patients to predict PAP after cervical decompression surgery, achieving an AUC of 0.78, with a sensitivity of 0.77 and specificity of 0.65, and identified preoperative C2-C7 Cobb angle as an independent PAP risk factor27. This study systematically examined and compared various widely used machine learning algorithms to identify the most effective predictive models for PAP in DCM patients. Future research should build on this work, focusing on model refinement and optimization to enhance predictive accuracy and clinical relevance.

In this study, the three top-performing classifiers, selected based on AUC, were integrated into an ensemble model utilizing an SVM classifier. Ensemble learning provides distinct advantages over individual machine learning models by harnessing the collective strengths of multiple algorithms. By combining diverse models, ensemble methods reduce the risk of overfitting and improve generalization, as different algorithms capture unique patterns within the data. This diversity allows the ensemble to compensate for the weaknesses of any single model, resulting in more robust predictions. Additionally, ensemble techniques enhance model stability by aggregating multiple outputs, thereby minimizing the influence of performance variability. Another critical benefit is their ability to explore the solution space more comprehensively, helping prevent models from becoming trapped in local minima or overly influenced by specific data patterns. The synergistic effect of ensemble learning ultimately yields more accurate and reliable predictive outcomes, especially valuable in complex clinical scenarios.

To enhance methodological rigor, this study incorporated insights from prior research and introduced independent testing, which was absent in similar studies. Independent testing enabled objective assessment of model performance, reduction of overfitting, and increased reliability of findings. For example, Jiang et al. developed a predictive model for forecasting JOARR in DCM patients, achieving an AUC improvement from 0.78 to 0.81 through an ensemble learning approach. Similarly, by stacking the top three predictive models in this study, an AUC increase from 0.81 to 0.92 was observed20. This improvement underscores the potential of ensemble learning to enhance classification accuracy and affirms the model’s practical applicability in clinical settings.

To further assess the role of each feature in the final ensemble model, the relative importance of predictors was evaluated. Machine learning-driven feature selection offers significant advantages, particularly in reducing the subjective bias commonly associated with manual selection methods28,29,30. By algorithmically determining the significance of predictors, this approach ensures an objective and data-driven process. It is especially effective in managing large datasets with numerous variables, facilitating the identification of the most relevant features. This reduction in dimensionality minimizes redundancy, irrelevant information, and the risk of overfitting. Streamlined models resulting from this process not only operate more efficiently—requiring fewer data and computational resources—but also maintain high levels of generalizability. Moreover, isolating the most impactful factors provides valuable scientific insights by revealing key causal relationships, which can guide future research by emphasizing high-value variables that drive outcomes. In summary, automated and unbiased feature selection enhances model performance and efficiency while deepening the understanding of complex phenomena, making it essential for advancements across various scientific fields.

Our analysis identified preoperative axial pain intensity, JOARR, preoperative cervical C2–7 Cobb angle, HADS-D score, and age as the most predictive features, aligning with established predictors of postoperative axial pain (PAP) in patients with degenerative cervical myelopathy (DCM)7,14,26. Notably, preoperative axial pain emerged as the most significant risk factor for predicting postoperative pain. A study revealed that approximately 40% of patients experienced postoperative axial pain, predominantly affecting those with a history of preoperative pain31. Research by Su et al. linked preoperative pain hypersensitivity to the development of PAP in DCM patients, indicating that individuals with preoperative axial pain exhibited higher levels of pain hypersensitivity, leading to increased postoperative pain intensity8. Additionally, our findings highlighted the JOARR as a significant predictor of prognosis, serving as an important indicator of neurological recovery and functional outcomes post-surgery. Yoshida et al. found that patients with poor JOARR often experience more severe postoperative axial symptoms, suggesting that degenerative changes in the dorsal horn of the spinal cord may contribute to chronic axial pain in those with impaired neurological function7. Toyama et al. also indicated that certain cervical myelopathic pain types may stem from abnormalities in second-order neurons located in the dorsal horn32. The preoperative cervical C2–7 Cobb angle (CCA) is another risk factor for predicting PAP, consistenting with previous studies. Previous studies have demonstrated a negative correlation between preoperative CCA and postoperative axial symptoms14. Biomechanically, a cervical spine with reduced lordosis or kyphotic alignment experiences increased flexural stress, contributing to postoperative axial pain33. Chavanne et al. found that a cervical lordosis of less than 7.5° resulted in elevated intramedullary pressure within the spinal cord, hypothesizing a higher likelihood of developing postoperative axial pain34. Notably, our study identified the HADS-D score as a key predictive factor for PAP intensity, indicating that depressive states in DCM patients are associated with postoperative pain severity. Previous research highlights a bidirectional relationship between chronic pain and comorbid depression, with Kroenke et al. demonstrating that pain predicts depression severity while depression predicts pain intensity35. In this context, chronic neck pain in DCM patients may lead to psychological comorbidities such as depression, exacerbating the severity of PAP after posterior cervical decompression surgery. The role of age as a risk factor for PAP in DCM patients remains controversial7,26,36. Kato et al. discovered that an older age (greater than 63 years) significantly reduced the risk of postoperative axial pain, while another study reported that patients over 70 years of age experienced significantly higher levels of axial pain26. Our findings suggest that age is one of the most important predictive factors for PAP intensity, as age-related spinal cord changes and comorbidities may hinder elderly patients from achieving the same level of functional improvement as younger patients, potentially leading to PAP7. Additionally, degenerative changes in the dorsal horn of the spinal cord may be associated with postoperative axial pain in patients with cervical spondylosis. In conclusion, our thorough analysis not only confirmed established predictors but also identified new determinants, including the HADS-D score, thereby improving the accuracy of PAP predictions in patients with DCM.

It is worth noting that the choice of surgical techniques in this study may also influence the occurrence of PAP. Simple laminectomy achieves decompression by removing the lamina but may cause biomechanical instability by disrupting posterior muscles and ligaments, leading to atrophy, dysfunction, and persistent neck pain37. Laminoplasty, which preserves part of the posterior structures, reduces cervical instability but may still result in muscle atrophy and pain due to muscle detachment or damage to facet joints and ligaments31,36,38. Laminectomy with fusion combines decompression and stabilization using internal fixation, minimizing instability-related pain but restricting cervical motion, which can cause compensatory stress and new sources of discomfort37. Additionally, recent advances, such as the application of endoscopic techniques and minimally invasive tubular approaches in DCM patients, have shown promising outcomes in reducing postoperative pain and improving recovery. These approaches minimize tissue disruption, preserve posterior muscle and ligament integrity, and shorten recovery times, offering potential advantages over traditional surgical methods39,40.

link