Exercise on reliability and validity

Back to the main page

Please read the Exercise introduction tab first. Then answer the questions in the ‘Questions’ tab.

Imagine you are part of a research project looking at the effect of a multidisciplinary intervention for patients with chronic whiplash associated disorder (WAD). These patients are seen both in:

  • the primary sector (medical doctors, physiotherapists, chiropractors) and
  • the secondary sector (outpatient hospital units)

within the Danish health care system.

To measure pain-related function, you decide to use the Neck Disability Index (NDI). The NDI has 10 items, each with 6 response options scored from 0–5. The total score is converted to a 0–100 scale, where a higher score means more disability.

Because you are not yet sure whether the NDI is the right tool for your study, you want to check how reliable and valid it is in your sample.

You can view the NDI in full by here: Neck Disability Index

Start with the questions in tab 1. Internal consistency and, when finished, continue to the next.

Please write your answers to the questions in a Word document. The exercise including answers can be downloaded after the course as a html-document from the webpage.

You calculate the internal consistency (Cronbach’s alpha) and find the following results in Table 1 and Table 2.

Table 1. Cronbach’s Alpha at item level
Item Name Alpha 95% CI
n1 Pain intensity 0.933 -
n2 Personal care 0.897 -
n3 Lifting 0.896 -
n4 Reading 0.898 -
n5 Headache 0.937 -
n6 Concentration 0.897 -
n7 Work 0.899 -
n8 Driving car 0.897 -
n9 Sleep 0.897 -
n10 Recreation 0.897 -
Total All items 0.914 [0.899, 0.928]
Note: n = 300

Questions

1.1 The Cronbach’s alpha for the NDI is presented in Table 1. What does this value mean for the questionnaire?

1.2 Look at the alpha for n1 to n10 and explain what you see.

Your data includes baseline (t1) and follow-up (t2) data 2 weeks later. You have ensured that the respondents are stable. The summary statistics of the test-retest data are presented in Table 2.

Table 2. NDI test-retest summary statistics
N Min score Max score Mean score t1 Mean score t2 Mean difference SD difference 95% prediction interval
300 0 100 48.5 43.3 −5.2 9.8 19.2

Questions

2.1 Draw a Bland & Altman limits of agreement plot using Table 2.

2.2 What is the measurement error with 95% prediction intervals and which unit does it have?

2.3 Are you happy with the size of the measurement error?

2.4 What have you learned about the systematic error?

The patients included in the study are from both the primary sector (physiotherapy, chiropractic and GP practices) and the secondary sector (ambulatory hospital units). You decide to calculate the ICC for both groups and find the results shown in Table 3.

Table 3. Reliability of the NDI in two different populations
Population ICCconsistency 95% CI ICCagreement 95% CI
Primary sector patients 0.93 [0.91, 0.95] 0.91 [0.82, 0.95]
Secondary sector patients 0.83 [0.75, 0.88] 0.80 [0.63, 0.88]

Questions

3.1 Why do you see a difference between ICCconsistency and ICCagreement

3.2 Why do you think the ICCs are lower in the secondary sector patients?

3.3 Which parameter do you prefer: ICCconsistency or ICCagreement? Justify your choice.

Question

4.1 Validation is a continuous process. Give at least two reasons why this is so.

Imagine you have had intermittent neck pain for the past 2-3 years. Right now you have neck pain radiating into the left shoulder region. You want to know if the Neck Disability Index works for your problem. The content of the 10 items of the NDI is outlined in Table 4.

Table 4. Content of the NDI
Item Name
n1 Pain intensity
n2 Personal care
n3 Lifting
n4 Reading
n5 Headache
n6 Concentration
n7 Work
n8 Driving car
n9 Sleep
n10 Recreation

(You can view the NDI in full by here: Neck Disability Index)

Questions

5.1 Are the included items relevant for you as a neck pain patient? If not, please state why?

5.2 Are there any missing areas/domains/constructs which would be relevant for a neck pain patient? If yes, please state what is missing and why?

5.3 After having considered the questions included in the NDI, how do you consider the content validity?

You also want to measure criterion validity for the NDI.

Question

6.1 Name 2-3 good criteria for measuring function of the neck? Indicate why you think they are good.

You have decided to include a generic multidimensional outcome measure in addition to the NDI. This is the SF-36 (Short form 36 items) which has been validated in Danish. The SF-36 consists of 8 scales and two summary scales as follows:

Box 4. Internal consistency

Box 4. Internal consistency

Each scale of the SF-36 is briefly described below:

Physical Functioning (PF). Assess limitations of normal physical activities (lifting, climbing stairs, bending, kneeling, walking moderate distance), and is designed to estimate the severity of the limitation (10 questions).

Role/Physical (RP). Assess work function limitations caused by physical health problems. ‘Role’ applies to work or everyday responsibilities (a job, community activity or volunteer work) typical for a specific age (4 questions).

Bodily Pain (BP). Assess the severity of pain and the extent to which it interferes with daily activities (2 questions).

General health (GH). Assess physical health status (current and prior health), and has been documented to be a good predictor of health care expenditure (10 questions).

Vitality/Energy (VT). Assess a subjective feeling of well-being including energy and fatigue (4 questions).

Social Functioning (SF). Assess the quantity and quality of interaction with others (social relationships) extending measurements beyond exclusively physical and mental health concepts (2 questions).

Role/Emotional (RE). Assess ‘role’ (see above for explanation of ‘role’) limitations due to emotional problems (3 questions).

Mental Health /Emotional well-being (MH). Assess the 4 major mental health dimensions of anxiety, depression, loss of behavioral or emotional control and psychological well-being (5 questions).

Summary measures The SF-36 also provides 2 important summery measures of health-related quality of life: Physical Component Summary (PCS) and Mental Component Summary (MCS) scales. The strength of both summary measures lies in their ability to distinguish a physical from a mental outcome.

Question

7.1 You want to test construct validity (hypothesis testing) of the NDI. Please describe at least 3 a priori hypotheses which are specific (i.e. have direction, strength and reason).