Introduction

Cervical spondylotic myelopathy (CSM) is the most common cause of spinal cord dysfunction worldwide. The disease is caused by the degeneration of various components of the vertebra, including the vertebral body, the intervertebral disk, the supporting ligaments and the facet joints [1]. Static factors, including the protrusion of osteophytic spurs (spondylosis), disk desiccation, ossification of the posterior longitudinal ligament (OPLL) and hypertrophy of the ligamentum flavum, may lead to the narrowing of the spinal canal and to cord compression [2]. Longstanding compression of the spinal cord can result in irreversible damage including demyelination and necrosis of the gray matter. The onset of CSM is generally insidious and progresses in a stepwise fashion [3, 4]. Upon diagnosis of symptomatic CSM, a physician often recommends surgical treatment to decompress the spinal cord [5]. Surgery has proven to be an effective intervention for the full range of myelopathy severity [6].

Given that CSM is a prevalent cause of spinal cord injury, and since surgery is often an appropriate intervention, it would be useful to identify the most important predictors of surgical outcome. Prediction is a valuable tool in a clinical setting. Knowing a patient’s surgical outcome can help determine which patients are most likely to benefit from surgery and help assess their degree of functional improvement [7]. This allows surgeons to provide valuable prognostic information to concerned patients, helping to manage expectations, as well as implement and direct appropriate treatment programs.

Holly et al. [8] conducted a similar systematic review of the literature and found that the most common predictors of surgical outcome for patients with CSM were age, duration of symptoms and severity of myelopathy. These three clinical factors are most frequently reported in the literature. Controversy still remains as to the significance, strength and direction of the relationship between surgical outcome and age, duration of symptoms and baseline severity.

The objective of this paper is to conduct a comprehensive literature search to determine the most important clinical predictors of outcome in surgical CSM-patients. This paper will address whether age, duration of symptoms, baseline severity score are indeed predictors and will also examine other clinical factors including comorbidities, smoking status, signs and symptoms to determine their predictive value.

Materials and methods

A literature search was performed using MEDLINE, MEDLINE in Process, EMBASE and Cochrane Central Register of Controlled Trials. The keywords used for the search were Cervical Spondylotic Myelopathy AND Surgery or Postoperative AND Prediction/Prognosis AND observational studies. The search was limited to humans, aged 18 years or older. The total number of citations found for this review was 1,677.

Articles were included if they were observational studies on patients >18 years with degenerative cervical myelopathy, treated surgically and followed postoperatively. Articles must have either directly or indirectly assessed the ability of a clinical factor to predict surgical outcome. Articles were eliminated if they were review articles or opinions; studies on patients with traumatic spinal cord injuries, thoracic myelopathy, radiculopathy, or non-degenerative cervical myelopathy; studies assessing only radiographic factors as predictors and studies that used complications as an outcome measure. Articles that were not in English or Japanese were excluded. Japanese articles were translated by Dr. Iwasaki and were included in the analysis.

All 1,677 abstracts and titles were reviewed independently by two authors (LAT, AK) and were sorted based on pre-determined inclusion and exclusion criteria. Figure 1 displays the search and review process in detail. Ninety-one articles were included. Three of these were translated from Japanese to English. Each article was assessed for quality with respect to methodology and overall structure. Several rating scales were examined, including Altman [9], Hayden et al. [10], and the Scottish Inter-Collegiate Guidelines Network (SIGN) scale for prognostic studies [11]. A modified version of the SIGN scale was used to rate the articles.

Fig. 1
figure 1

Search strategy and detailed review process. CSM cervical spondylotic myelopathy

A modified version of the SIGN scoring system was implemented in a systematic review published by Kalsi-Ryan and Verrier [12]. Since the incidence of spinal cord injury is comparatively low, high quality research in this field is challenging. Studies often have small sample sizes with no opportunity for blinded assessment and randomization. Kalsi-Ryan and Verrier modified SIGN so that it was more specific to the nature of literature they were reviewing. We selected the modified SIGN system and further altered it to increase its applicability to literature reporting clinical predictors of surgical outcome in patients with CSM. Questions 15 and 16 were changed from dichotomous scoring to trichotomous scoring as studies may vary greatly in quality of statistical analysis, methodology and bias elimination (Table 1). It was arbitrarily decided that an article whose score was <7 would be classified as POOR, 7–9 as GOOD and 10–14 as EXCELLENT. The results from the EXCELLENT studies were compared to the combined results from the EXCELLENT and GOOD studies which were compared to the results from all the studies.

Table 1 Modified Scottish Inter-collegiate Guidelines Network (SIGN) used to rate all articles

For each study, the association between various clinical factors and surgical outcome, evaluated by the (modified) Japanese Orthopaedic Association scale (mJOA/JOA), Nurick score or “other” measures, was extracted. A relationship between the outcome and predictor was defined as conditional, if it was significant for certain groups of patients but not others or using one statistical test, but not another.

Results

This review consisted of 37 POOR [1349], 38 GOOD [5087] and 16 EXCELLENT [88103] articles (Table 2).

Table 2 List of EXCELLENT, GOOD and POOR studies included in systematic review

Fifteen of these studies controlled for confounding variables when looking at the association between outcome and age, duration of symptoms and baseline severity score. The outcome measure used in all EXCELLENT studies was either the Nurick or the mJOA/JOA, with one study commenting on both.

Duration of symptoms

Thirteen articles evaluated duration of symptoms as a predictor of surgical outcome. Nine reported a negative, three a non-significant and one a conditional relationship. It is evident that outcome, assessed on both the mJOA/JOA and Nurick scale, is dependent on preoperative duration of symptoms as indicated by significantly more articles reporting a negative association than a non-significant association. The R values for this negative relationship ranged from weak to strong [95] (Table 3).

Table 3 Evaluation of duration of symptoms, age and baseline severity score as predictors of surgical outcome by excellent, GOOD + EXCELLENT and POOR + GOOD + EXCELLENT studies

Baseline severity score

Nine articles reported on the relationship between baseline severity score and surgical outcome. One article suggested a negative, seven a positive and one a non-significant association. All studies (7) that used JOA as the primary outcome measure demonstrated that more severe preoperative myelopathy is predictive of a worse outcome. When assessing this association on the Nurick scale, there was one article reporting a positive, one a negative and one an insignificant relationship, making it difficult to draw a conclusion as to the predictive value of preoperative severity on Nurick. One study recorded a strong R value (0.61) for this positive association [95] (Table 3).

Age

All 16 articles explored the importance of age on surgical outcome. Six studies found a negative, eight a non-significant and two a conditional relationship. Breaking it down by scale, two articles reported a negative and two a non-association between age and Nurick. Four and six studies found a negative and a non-relationship, respectively, between age and JOA/mJOA. It is unclear as to the association between age and outcome evaluated by Nurick, but it is possible to suggest that age may not be predictive of outcome on the JOA/mJOA scale. The two conditional studies were not included in this count. Furlan et al. [91] found that age was a significant predictor of outcome on both scales using multiple regression, but not after dichotomizing the mJOA outcome. In addition, Kim et al. [93] suggested that age was an important predictor, but only in patients with diabetes. The R values for this negative relationship ranged from moderate to strong (Table 3).

Most of the GOOD articles were not rated excellent due to flaws in their statistical analysis such as a lack of control for confounding variables. In contrast to the EXCELLENT studies, the GOOD studies used a wider variety of scales and measures to assess outcome such as the Cooper, neurosurgical cervical spine scale (NCSS), neurological assessments, questionnaires and evaluation of symptom improvement.

Duration of symptoms

Thirty-nine articles investigated duration of symptoms as a potential predictor of outcome. Twenty-five reported a negative, ten a non-significant and four a conditional relationship. It is evident that there is a significant negative association between duration of symptoms and outcome evaluated on both the JOA/mJOA scale and other measures: 22 versus 8 articles identified a negative versus a non-significant relationship. Using the Nurick scale, on the other hand, the results were inconclusive: four articles reported a negative and four a non-association. Inclusion of the conditional articles did not alter these results. The R values for this negative relationship ranged from weak to strong (Table 3).

Baseline severity score

Thirty-eight studies assessed baseline severity score as a potential predictor of surgical outcome. Five reported a negative, 17 a positive, 12 a non-significant and 4 a conditional relationship. It is evident that outcome, evaluated by mJOA/JOA, is positively dependent on the baseline severity score: 15 papers suggested a positive association, while only 8 reported a non-significant relationship. It is hard to define the relationship between baseline score and Nurick score as two versus four papers reported negative versus positive associations. It is clear that baseline score is a significant predictor of Nurick, but the direction of the relationship is unclear. With respect to all the other outcome measures, two and three studies suggested a negative and a non-significant relationship, respectively. The R-values of this association ranged from moderate to strong (Table 3).

Age

Fifty articles reported on age as a predictor of outcome. Sixteen identified a negative, one a positive, 27 a non-significant and 6 a conditional association between age and outcome. Age was not found to be a predictor of outcome, assessed using either the Nurick, mJOA/JOA or other measures. Two and 14 papers found age had a negative association with Nurick and JOA/mJOA, respectively. Five and 18 studies, on the other hand, reported no relationship with Nurick or JOA/mJOA, respectively. It is important to incorporate the conditional studies into this analysis, especially those that used JOA/mJOA as the primary outcome measure. Both Nagashima et al. and Ogawa et al. [74, 76] identified age as a significant predictor of outcome in more severe myelopathy groups, but not in moderate severity (10–12) groups. Furlan et al. [91] identified age as an important negative predictor using multiple regression, but not stepwise logistic regression. Finally, Koyanagi et al. [69] suggested that age was a significant predictor in patients with OPLL and CDH, but not CSM. Incorporating these results into our assessment of age as a predictor, we still conclude that it is an insignificant predictor. The R values of the significant associations ranged from weak to strong (Table 3).

The POOR studies had significant flaws, including study design and poor statistical power and control. In addition, many of these studies used unreliable outcome measures to evaluate surgical improvement and suffered on their ratings as a result.

Duration of symptoms

Sixty-three articles explored duration of symptoms as a predictor of surgical outcome. Forty-two reported a negative, 16 a non-significant and 5 a conditional relationship. The results were clear for all outcome measures: a longer duration of symptoms was predictive of a worse outcome. The R value of this association was reported in six studies and ranged from weak to strong (Table 3).

Baseline severity score

Fifty-six papers assessed baseline severity score as a predictor of outcome. Twenty-nine reported a positive, 17 a non-significant, 6 a negative and 4 a conditional association. Baseline severity score was a definite positive predictor of outcome assessed using the JOA/mJOA and other measures. Twenty-two versus ten papers reported a positive versus a non-significant association. The relationship between baseline severity score and Nurick was inconclusive: three papers identified as a negative, four a positive and two a non-significant association. As in the good/excellent analysis, it is evident that preoperative severity is related to Nurick score, but the direction of this association is unclear. The R value of this positive association was reported in six studies and ranged from moderate to strong (Table 3).

Age

Seventy-four studies commented on age as a potential predictor. Twenty-seven identified a negative, 1 a positive, 40 a non-significant and 6 a conditional association. Age was not a significant predictor of outcome, assessed using either the Nurick score or other measures. Nine papers reported a non-significant relationship between age and Nurick, whereas only four suggested a negative relationship. Without looking at the conditional associations, JOA/mJOA was also not dependent on age as indicated by 25 articles reporting no relationship and 20 suggesting a negative one. Five articles identified a conditional association between age and mJOA/JOA. The results from these studies do not affect these conclusions. The R values from studies that reported an association ranged from weak to moderate (Table 3).

Other predictors

Articles included in this review also explored the predictive value of other factors including gender, signs and symptoms, disease progression pattern and various comorbidities. The results from these studies are displayed in Table 4. There is no sufficient evidence in the literature to conclude that the presence of a particular sign or symptom or co-morbidity is predictive of outcome.

Table 4 Other prognostic indicators of surgical outcome reported either by EXCELLENT, GOOD or POOR studies

Discussion

This review compared the results from the EXCELLENT, GOOD + EXCELLENT and POOR + GOOD + EXCELLENT papers (Table 5).

Table 5 Summary of results: percentages of articles reporting a negative, positive, non-significant or conditional association between surgical outcome and duration of symptoms, baseline severity score or age

One of the major findings of this review was that patients with a longer duration of symptoms and a more severe baseline score are more likely to have an unfavorable surgical result. The rationale behind this finding is that both severe and chronic, longstanding compression of the spinal cord may lead to irreversible damage due to demyelination and necrosis of the gray matter. Secondly, controversy exists in the literature as to the significance, strength and direction of the relationship between surgical outcome and age. Age was a non-significant predictor on all the scales when looking at the GOOD + EXCELLENT and the POOR + GOOD + EXCELLENT studies. When looking at only the higher quality studies (modified SIGN ≥ 10), however, age went from a non-significant predictor to a potential predictor. Although most surgeons will not discriminate on the basis of age, they should be aware that the elderly are not able to translate neurological recovery to functional improvement as well as the younger population. Potential explanations for this discrepancy include: (1) the elderly experience age related changes in their spinal cord including a decrease in γ-motoneurons, number of anterior horn cells and number of myelinated fibers in the corticospinal tracts and posterior funiculus, (2) older patients are more likely to have unassociated comorbidities that may affect outcome or (3) the elderly may not be able to conduct all activities on a certain functional scale due to these comorbidities (e.g. walking time may be affected by osteoarthritis) [35, 75, 88, 92, 93]. Finally, our review determined that factors such as signs (hyperreflexia, leg spasticity and Babinski sign), symptoms (gait impairment, clumsy hands and numbness), comorbidities (diabetes and psychological issues), and smoking status do carry some predictive value. Physicians should progressively incorporate predictive modeling into their practices to provide valuable prognostic information to their patients and direct appropriate treatment programs. When evaluating a CSM patient’s likely surgical outcome, the surgeon must weigh his/her preoperative severity, duration of symptoms and age accordingly while keeping in mind the ability of other factors to affect the outcome.

As shown in this review, results may differ depending on what scale is used to evaluate surgical outcome. This may be due to limitations in the scales rather than an indication of the actual association between the predictor and outcome. The Nurick score is a scale with lower sensitivity, it is graded out of five and is largely weighted towards lower limb function [104]. When outcome was assessed using the Nurick score, its association with various predictors was less conclusive. For example, duration of symptoms was significantly associated with Nurick score when looking at the EXCELLENT and the POOR + GOOD + EXCELLENT group, but was a questionable predictor in the GOOD + EXCELLENT group. In addition, in the GOOD + EXCELLENT and POOR + GOOD + EXCELLENT studies, there was a significant relationship between preoperative condition and Nurick, but the direction of the association was not evident. The articles that identified a negative association, however, had more biased samples: Gok et al., Huang et al. and Rajshekhar and Kumar [22, 62, 97] all had stricter inclusion criteria. On the other hand, the results were more definite when the outcome was evaluated on either the JOA or mJOA score: a longer duration of symptoms and a more severe baseline severity score were associated with a worse outcome. The mJOA and JOA are widely accepted standards for CSM assessment and separately evaluate lower and upper limb, sphincter and sensory function. Although JOA has been validated and shown to have high inter- and intra-rater reliability [105], its modified version has not.

In a research setting, when looking at a relationship between various factors and outcome, it is important to control for the confounders baseline severity score and duration of symptoms. When assessing statistical control in our review to rate the articles, we ensured that the studies controlled for age, duration of symptoms and baseline severity as these were identified as important predictors by Holly et al. [5]. According to this review, age may be a less important confounder. Few articles reported on the R values for the significant associations between various clinical factors and outcome. This makes it difficult for clinicians and researchers to evaluate the strength of these correlations.

Holly et al. [5] indicated that the limitations of their review were that there were very few prospective studies, that many studies assessed the outcome using un-validated measures and that it was hard to analyze functional outcome due to the use of different scales between studies. Our study had much larger pool of articles and consisted of higher quality literature, including some prospective studies that evaluated outcome using the validated JOA scale or Nurick score. There was also a sufficient number of articles to compare predictors on the same scale. In addition, the differences in our methodology, including a comparison of results among the three groups, also allowed for the incorporation of quality assessment in the analysis. Since, the Japanese have had a substantial contribution to research in the field of spinal cord injury, including the creation of the JOA scale, we also translated all Japanese articles into English and incorporated them into our analysis. Finally, our systematic review differed from Holly et al.’s as it included a preliminary analysis of other predictors including signs and symptoms, comorbidities, gender and smoking status to determine their predictive value.

There are limitations to our study: (1) we did not separate studies based on length of follow-up time; (2) articles that dichotomized a predictor might have done it differently (e.g. age) and (3) some of the articles with relevant abstracts or titles were excluded because they were not available or in another language other than Japanese or English. Future systematic reviews should address these limitations to provide a completely unbiased evaluation of important predictors of outcome.

The results from this review should encourage further exploration in this area. Even though many studies have examined important predictors of surgical outcome in CSM, there still remains a lack of evidence in the form of high quality, prospective studies using validated outcome measures. A large prospective analysis is required to reemphasize the predictive value of duration of symptoms and baseline severity score, to settle the controversy surrounding age and to confirm that signs, symptoms and comorbidities do impact surgical results.