Appropriate Use Criteria
Osteoporotic Compression Fracture
Methodology
A common methodology for AUC is the RAND/UCLA Appropriateness Method, a modified Delphi process, where AUCs are developed using evidence-based information in conjunction with the clinical expertise of physicians from multiple specialties.1 The NASS AUC methodology is closely patterned after the RAND method, although not identical. The rating process is the same, but more steps are taken in the prerating process to ensure precision of definitions and optimal development of scenarios. For this AUC, the topic was selected based on active clinical guideline development to take advantage of the literature search for both efforts and to simultaneously create complementary information for the members.
Process:
1. AUC Groups. Members were assembled from NASS volunteers. Training in evidence-based medicine was helpful but not mandatory. Multispecialty representation was stressed. National experts and/or thought leaders were selected.
2. Patient Population. The patient cohort encompassed adults (18 years or older) with symptoms related to osteoporotic vertebral fractures.
Table 1. Specific definitions and inclusion/exclusion criteria utilized for this document
3. Title. The title of the CPG “Osteoporotic Vertebral Compression Fracture” was modified for the AUC to “Osteoporotic Vertebral Fracture” to further include those fractures with minimal endplate fracture often not visible in radiographs but identified on MRI.
4. Standardization of Definitions. The definitions developed in the CPG for osteoporotic vertebral compression fracture was used. Minor modifications of the definitions were made to further clarify the intent of the definition (Table 1).
Table 2.
■ Function • Homebound ambulator • Community ambulator ■ Pain • Mechanical, VAS 0-6 • Mechanical, VAS 7-10 • Non mechanical ■ Duration • Acute • Chronic ■ Fracture morphology • Simple (height loss <80%, intact posterior wall) • Complex (height loss <80%, non-intact posterior wall or height loss >80%) ■ Spinal stenosis • No • Yes, no change in sensory / motor / bladder function • Yes, plus change in sensory / motor / bladder function ■ Fracture stability • Stable • Unstable
5. Scenario Writing. The key modifiers or variables were determined, and a matrix of scenarios was developed based on these modifiers (Table 2). Number of scenarios vary based on the breadth of the topic, but usually number in the hundreds. Scenarios are intended to represent the most common scenarios that arise during clinical practice. These are comprehensive and include, but are not limited to technical, diagnostic, demographic, and psychosocial factors. Conflicts of interest are permissible within the Scenario Writing Work Group if writers adhere to the NASS Disclosure Policy.
6. Scenario Review. A separate group independently reviewed the scenarios, which is the primary difference from the RAND methodology, but provides additional opportunity to ensure optimal development of scenarios. Feedback was given to the Scenario Writing Work Group to consider further refinement and a final draft of scenarios was created. As part of the review process, the scenario document was scored by the review group as if they were raters to provide feedback and suggest improvements to the document. Conflicts of interest are permissible as long as reviewers adhere to the NASS Disclosure Policy.
7. Literature Review Group. Concurrent to scenario writing and review, a literature review was conducted by the clinical practice guideline on this topic and used by the AUC work group. Members of the literature review group in CPG are required to be trained in evidence-analysis techniques. Literature review and completion of accompanying evidentiary tables is a critical component to the process. All relevant studies were summarized in evidentiary tables by CPG work group which were then provided to the AUC raters to assist them with informed voting. All references and evidentiary tables can be found on the NASS website, www.spine.org.
8. Rating. A multidisciplinary group of 13 raters were identified representing the specialties of orthopedics, neurosurgery, physical medicine and rehabilitation, internal medicine, anesthesia, rheumatology, radiology, and chiropractic who were considered thought leaders in the field. There was a total of 7 nonoperative specialists and 6 surgeons. Three raters were part of the ongoing CPG development in this topic. This group also included two independent and experienced moderators, one of whom was part of the CPG group. Raters were not required to be trained in evidence analysis. They were required to have participated in the NASS disclosure process. The group was introduced to the scenarios and the rating method on a prerating conference call which was also recorded. Scenarios were to be rated based on appropriateness at large, not relative to a rater’s practice. The raters each rated the scenarios independently and anonymously. The raters then met for a 2-day meeting to discuss the scenarios and participate in a second round of anonymous rating. This process started with introductions and relevant disclosures. Deidentified scores from the first round of rating were compiled and presented to the group to facilitate discussion. Scenarios were clarified if needed. Most of the discussion centered around scenarios where more disagreement existed. Consensus was not a goal and costs were not considered. Raters were directed to consider whether a procedure was reasonable, in general, relative to the scenarios presented. Each treatment is evaluated on its own. Raters combined evidence with personal experience and voted on the appropriateness of each scenario using a 1 through 9 scale. The votes were recorded, and the median used to determine the final score in line with RAND methodology. Scores of 1 to 3 indicate the procedure is rarely appropriate, 4 to 6 uncertain/may be appropriate, and 7 to 9 appropriate. An option of N/A was included in rating each scenario for implausible clinical scenario. Each treatment was considered independently, resulting in some treatments receiving similar ratings for the same scenarios.
9. Agreement. As previously described in the RAND manual, scenarios were characterized as ‘with agreement’ when the interpercentile range adjusted for symmetry (IPRAS) of the ratings was greater than the interpercentile range (IPR) required for disagreement with perfect symmetry. When the IPRAS was less than the IPR, the scenario was characterized as ‘with disagreement’. This method has previously demonstrated higher sensitivity and specificity for identifying disagreement that classical methods.
10. Final Rating. All scenarios received a final rating based on the median score and whether agreement existed amongst the raters. The final ratings were as follows: • 1 to 3 = Rarely appropriate with agreement • 4 to 6 = Uncertain/Maybe Appropriate or Disagreement • 7 to 9 = Appropriate with Agreement. Independent of the median score, all scenarios with disagreement were assigned a final rating of ‘Uncertain/Maybe Appropriate or Disagreement.’
Construction of Scenarios:
Key modifiers were selected that most influenced treatment decisions (Table 2). Modifiers included function, pain, duration, morphology, spinal stenosis, and fracture stability. From these modifiers, a matrix was constructed of all possible modifier combinations, resulting in 144 scenarios for each treatment. Appropriateness of each treatment (medical therapy, cement augmentation, and surgery) was independently rated for all scenarios.
Assumptions:
The purpose of this document was to evaluate the appropriateness of treatment for osteoporotic vertebral fractures. Where medical/interventional/surgical treatment is mentioned, the document assumes that the type of medical/interventional treatment provided was within an acceptable community standard of care to the reader.
Statistical Analysis:
Modifiers were treated as categorical variables and expressed as frequencies with percentages. Final appropriateness ratings were compared between modifier responses using the Fisher exact test. Decision trees were constructed for cement augmentation and surgery to identify the most important modifiers when deciding on appropriateness of each treatment. Modifiers that did not improve the accuracy of the classification were not included in the final decision tree. Relative modifier importance (summing to one) was computed for each treatment. To examine modifiers that contributed toward agreement for surgery, all scenarios with a median rater score between 7 and 9 were used to fit a multivariable logistic regression model with the final rating (based on median score and agreement) as the outcome. Adjusted odds ratios were computed to determine the odds of agreement among the raters for each modifier. All analyses were performed using R 4.2.2, the gtsummary package, and rpart package.