Development of machine learning models for gait-based classification of incomplete spinal cord injuries and cauda equina syndrome

Development of machine learning models for gait-based classification of incomplete spinal cord injuries and cauda equina syndrome

Data collection

In this study, we used gait data collected from 2013 to 2021 at Chungnam National University Hospital from patients diagnosed with various neurological conditions. A 3D motion capture system (Vicon Motion Systems, Ltd., Oxford, UK) was used to acquire all data. All data were recorded using the Vicon MX system (T20 model) in a clinical motion analysis laboratory. After anthropometric measurements were taken, the spatial coordinate system of the laboratory was calibrated. Reflective markers and surface electromyography (EMG) pads were attached to anatomical landmarks according to the Plug-in Gait Lower Body protocol. Patients walked along a predefined path while motion data were recorded. Data acquisition and initial processing were performed using Vicon Nexus software (Vicon, Oxford, UK). Although software versions may have changed during the extended data collection period, the acquisition protocol, marker placement, and calibration procedures remained consistent. All gait assessments were performed following standardized clinical procedures.

This study was a retrospective analysis of previously collected clinical gait data. It was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (IRB) of Gachon University Gil Hospital (IRB no. GCIRB2023-439). Data access for research purposes began on December 12, 2023, and continued until the end of the study. All data were anonymized prior to analysis.

The initial dataset included 844 reports containing patient information. Data were excluded if they lacked temporal gait parameters, age or sex information, or a confirmed diagnosis of incomplete tetraplegia, incomplete paraplegia, or cauda equina syndrome. For incomplete tetraplegia and paraplegia, only cases corresponding to ASIA grades C and D, indicating partial motor function preservation, were included. After applying these criteria, the final dataset consisted of 214 cases: 95 with incomplete tetraplegia, 68 with incomplete paraplegia, and 51 with cauda equina syndrome. The study flowchart is provided in Fig. 6.

Fig. 6
figure 6

Overview of the study process, including data preprocessing, feature selection, and statistical analysis, from gait data collection to classification model evaluation.

Although approximately 75% of the initial records were excluded (47 missing temporal gait parameters, 573 missing neurological diagnosis, 6 missing age, and 4 missing sex), the final dataset maintained a balanced distribution by sex and age and included all three diagnostic categories. Therefore, the potential for selection bias introduced by these exclusions is considered minimal.

Temporal parameters

In gait analysis, various temporal parameters were used to evaluate patients’ gait characteristics.

These parameters quantitatively measure different aspects of gait, serving as key indicators for defining normal gait patterns and detecting abnormalities or deviations in patients’ walking behavior.

Temporal gait parameters were automatically extracted using the gait event detection (GED) algorithm built into the Plug-in Gait model. This algorithm detects key events such as heel strike and toe off based on the vertical position and velocity of the heel and toe markers, without the use of force plates. A gait cycle was defined from the heel strike to the subsequent ipsilateral heel strike. The gait cycle duration was computed as the sum of step times from both feet, and parameters such as single support, double support, and foot off were converted into percentages of the total gait cycle time. Only unilateral gait cycles were analyzed to ensure consistency and comparability across all participants. Although gait events were detected separately for the left and right sides, the mean values of bilateral parameters were used for machine learning classification and statistical comparisons, considering the symmetric nature of trauma-related gait impairment in SCI and cauda equina syndrome patients.

The temporal gait parameters evaluated in this study were:

Cadence: number of steps per minute.

Double support: percentage of time when both feet are on the ground during a gait cycle.

Foot off: percentage of time when the foot leaves the ground.

Limp index: a measure of gait asymmetry reflecting mobility impairment.

Opposite foot contact: timing of opposite foot contact when one foot strikes the ground.

Opposite foot off: percentage of time from the moment one foot leaves the ground to the moment the opposite foot leaves the ground.

Single-support time: percentage of time during a gait cycle in which one foot is in contact with the ground.

Step length: distance between the initial contact points of opposite feet.

Step time: time taken to complete one step.

Step width: horizontal distance between the feet.

Stride length: distance for one foot to return to the same position.

Stride time: time required to complete one stride.

Walking speed: distance walked in 1 s.

Institutional normative reference values for each gait parameter were obtained from clinical gait analysis reports at Chungnam National University Hospital. These values were obtained under the same measurement conditions (i.e., Vicon MX system and Plug-in Gait protocol) as used in the present study and represent standard gait performance in healthy individuals. They were used as a consistent reference baseline to support interpretation of deviations in gait characteristics among patients with neurological impairments.

Data preprocessing

In this study, 214 cases were classified into three disease groups (incomplete tetraplegia, incomplete paraplegia, and cauda equina syndrome), and the gait patterns were analyzed. The Isolation Forest algorithm was used to remove top 1% of outliers, and to solve the problem of class imbalance, class weights were adjusted inversely to sample size for each group. The weights were automatically calculated based on the reciprocal of the sample proportion per class using the class weight = ‘balanced’ option of scikit-learn. To address the unit inconsistency of some variables, we converted double support and single support durations, originally measured in seconds, into percentages of the gait cycle time. To do this, we calculated the gait cycle time by summing the step times of the left and right feet and then converting the time values to percentages.

The calculation processes are as follows:

  1. 1.

    Gait cycle time calculation:

    $$\:\text{G}\text{a}\text{i}\text{t}\:\text{c}\text{y}\text{c}\text{l}\text{e}\:\text{t}\text{i}\text{m}\text{e}\:\left(\text{c}\text{y}\text{c}\text{l}\text{e}\:\text{t}\text{i}\text{m}\text{e}\right)=\:{\text{L}\text{t}}_{\text{s}\text{t}\text{e}\text{p}\:\text{t}\text{i}\text{m}\text{e}}+\:{\text{R}\text{t}}_{\text{s}\text{t}\text{e}\text{p}\:\text{t}\text{i}\text{m}\text{e}},\:$$

    where Lt and Rt indicates left and right, respectively.

  2. 2.

    Conversion of time values to percentage units:

    $$\:\text{P}\text{e}\text{r}\text{c}\text{e}\text{n}\text{t}\text{a}\text{g}\text{e}\:\left(\text{\%}\right)=\frac{\text{T}\text{i}\text{m}\text{e}\:\left(\text{s}\right)}{\text{G}\text{a}\text{i}\text{t}\:\text{c}\text{y}\text{c}\text{l}\text{e}\:\text{t}\text{i}\text{m}\text{e}\:\left(\text{s}\right)}\:\times\:100.$$

Feature selection and model training

In this study, we used the SelectFromModel (SFM) to select the main features of the gait data. The SFM uses a RandomForestClassifier to calculate the importance of each feature and automatically removes features with importance scores below a predefined threshold. The RandomForestClassifier can be used as a suitable tool for feature selection because it can reliably evaluate the importance of each variable while reflecting nonlinear relationships and interactions between variables. Although SFM was initially selected based on its interpretability and performance in preliminary testing, we additionally evaluated three alternative feature selection methods—RFE, LASSO, and Ridge regression—to validate the robustness of our approach. Each method was applied independently using the same preprocessing pipeline.

The threshold was determined automatically as the mean feature importance to exclude features with low relevance. Based on the selected features, we trained three classifiers: support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGB). To solve the multiclass problem, we applied the One-vs-Rest (OvR) strategy to convert it into multiple binary classification problems. The OvR strategy trains a binary classifier for each class by grouping the remaining classes into a single group, which effectively leverages the performance of binary classification-based algorithms, such as SVMs, for multiclass problems. GridSearchCV was used for hyperparameter optimization, and cross-validation was used to determine the optimal hyperparameter combinations to maximize the generalization performance of the model.

Model evaluation metrics and statistical analysis methods

Model performance was evaluated using fivefold StratifiedKFold cross- validation, in which 80% of the data were used for training and 20% for validation in each fold. StratifiedKFold divides data such that each fold maintains the proportion of classes in the original data, which is useful for minimizing the class imbalance problem. The multiclass classification performance of the model was evaluated based on true positives, true negatives, false positives, and false negatives obtained by comparing the actual and predicted values. We used accuracy, precision, recall, and F1-scores as evaluation metrics. We also analyzed the receiver operating characteristic (ROC) curve to evaluate the classification performance of the model and calculated the area under the ROC curve (AUC). The AUC value ranges from 0 to 1, and a value approaching 1 reflects superior model performance. Finally, we analyzed the classification performance for each class (incomplete tetraplegia, incomplete paraplegia, cauda equina syndrome) using the confusion matrix. The confusion matrix provides a visual representation of the relationship between the model predictions and the actual class, illustrating the number and type of misclassified samples. This was used to assess the tendency of the model to overpredict or misclassify certain conditions. Differences in gait variables between groups divided by sex and age (< 60 vs. ≥60 years), as well as differences in model performance across classifiers, were statistically analyzed. To assess the normality of each variable, the Shapiro–Wilk test was performed. Variables that violated the normality assumption were analyzed using the Mann–Whitney U test, while normally distributed variables were compared using independent samples t-tests. To control for Type I error inflation due to multiple comparisons, the false discovery rate (FDR) correction was applied to the analysis of gait parameters. Statistical significance was determined using FDR-adjusted p-values, with p < 0.05 considered significant. All statistical analyses were performed using IBM SPSS Statistics version 20 (IBM Corp., Armonk, NY, USA).

link