Reliability, validity and responsiveness of a Norwegian version of the Chronic Sinusitis Survey

Background The Chronic Sinusitis Survey (CSS) is a valid, disease-specific questionnaire for assessing health status and treatment effectiveness in chronic rhinosinusitis. In the present study, we developed a Norwegian version of the CSS and assessed its psychometric properties. Methods In the pooled data set of 65 patients from a trial of treatment for chronic sinusitis with long-standing symptoms and signs of sinusitis on computed tomography (CT), we assessed the reliability, validity and responsiveness of the CSS. Results Test-retest reliability of the two CSS scales and the total scale ranged 0.87–0.92, while internal consistency reliability ranged 0.31–0.55. CSS subscale scores were associated with other items on sinusitis symptoms, and with the Mental health and Bodily pain scale of the SF-36. There was little association of the CSS scale scores with sinus CT findings. The patients with chronic sinusitis had worse scores on all three CSS scales than a healthy reference population (n = 42) (p < 0.001). The CSS sinus symptoms subscale and the total scale were sensitive to improvement in global symptoms during 12 weeks. Conclusion The Norwegian version of the CSS had acceptable test-retest reliability, but lower internal consistency reliability than the accepted standard criteria. The results support the construct validity of the measure and the sinusitis symptoms subscale and the total scales were responsive to change. This supports the use of the questionnaire in interventions for chronic sinusitis, but points at problems with the internal consistency reliability.


Background
Sinusitis is a common condition that causes frequent physician visits. In the United States in 1999-2000, chronic sinusitis was accountable for more than 13 million ambulatory care visits, or 1.3 % of all visits [1]. The symptoms of chronic sinusitis are not easy to quantify [2] and show little association with computed tomography (CT) find-ings [3]. Postoperatively, imaging is also unreliable because of postoperative changes and scarring [4].
Health-related quality of life (HRQoL) has recently gained increased awareness as an outcome measure for interventions in chronic sinusitis [2,[5][6][7][8][9][10][11]. Some studies of interventions in chronic sinusitis have used generic HRQoL measures, which are developed for use in a wide range of conditions and typically include aspects of physical, emotional and social health or functioning, such as the Short Form 36 (SF-36) [12]. To increase the sensitivity to change, disease-specific measures have been developed to assess the impairment of chronic sinusitis and the effects of interventions. A review reported that only three instrument for outcome in chronic sinusitis had acceptable performance characteristics with documented reliability, validity and responsiveness [5,[13][14][15], however new instruments have later been introduced [16].
The Chronic Sinusitis Survey (CSS) is a valid, disease-specific questionnaire for assessing health status and treatment effectiveness in chronic rhinosinusitis [2,5,10]. In the present study, we developed a Norwegian version of the CSS and assessed its psychometric properties in the pooled sample of patients in an intervention for chronic sinusitis [17].

Subjects and study design
We included patients above 17 years of age with sinusitis symptoms for more than three months and sinus swelling, fluid retention, or opacification on CT. Patients with polypous sinusitis or pansinusitis, pregnancy, previous acupuncture treatment, previous surgery for chronic sinusitis, or recent medication use that could influence the results of the study were excluded [17]. We evaluated more than 500 patients with sinusitis for eligibility to the study. In total, 65 patients were included from August 1996 to December 2000. Patients were initially recruited from the clinical practice of one otorhino-laryngologist and later also through advertising in local newspapers and a magazine.
One otorhino-laryngologist examined and included all patients. He allocated them to one of three groups according to a six-block randomisation algorithm. The major reasons for exclusion estimated post hoc by the otorhinolaryngologist were: normal CT (30%), heavy allergies (20%), refused conventional medical therapy (10%), trigeminal neuralgia (10%) and did not want a CT-scan (5%).
The patients had one of three treatments: conventional medical therapy with antibiotics and local congestants, traditional Chinese acupuncture, or minimal acupuncture at non-acupoints. No treatments were given during the Norwegian allergy season (February -September), because some of the patients were expected to have seasonal allergies [17]. For the purpose of this validation study, we pooled the three study arms in the analysis.

Health status assessment
We assessed HRQoL at baseline, after 12 weeks and 13 weeks with several instruments. The purpose of the 13week assessment was to assess test-retest reliability of the questionnaires, by comparison with the 12-week assessment.

Chronic sinusitis survey (CSS)
The CSS is a 6-item duration-based, sinus-specific questionnaire with a symptom and a medication subscale. It is developed at the Massachusetts Eye and Ear Infirmary [2,5,10] and also exists in Chinese and Turkish versions [18,19]. We obtained permission to use the CSS and translated the questionnaire to Norwegian according to a recommended procedure [20]. First, two individuals translated the CSS into Norwegian independently. They then met and discussed with a third person, agreeing on a consensus version. Later, this consensus version was backtranslated into English by an American fluent in Norwegian. Comparison of the backtranslation with the original English version revealed little discrepancies, and the questionnaires were considered conceptually and linguistically equivalent. The Norwegian version of the CSS is enclosed ' [see Additional file 1]'.

Short Form 36 (SF-36)
The general health status questionnaire SF-36 assesses eight dimensions of health status including physical functioning, role limitations due to physical problems, bodily pain, general health, vitality, social functioning, role limitations due to emotional problems and mental health [12,21]. The scales were scored from 0 (lowest level of functioning) to 100 (highest level of functioning). The SF-36 has been extensively validated in general populations [12,21], and in many diseases including subjects with chronic rhinosinusitis [9,10,15,22]. We used the Norwegian standard SF-36 version 1.2 [23]. Additionally, we scored the physical component summary (PCS) and mental component summary (MCS) scales [24]. These two scales were scored and transformed for comparison with a U.S. general population with mean 50 and SD 10.

Sinus computed tomography
At baseline and 4-6 weeks later the patients had sinus CT scans. An otorhino-laryngologist assessed soft tissue swelling in millimetres and signs of fluid retention and opacification on the CT-scans [17].

Sinusitis symptom assessment
The patients reported on six symptoms of chronic sinusitis using a self-administered questionnaire [17]: (1) mucus production, (2) maxillary headache, (3) stuffed nose, (4) frontal headache, (5) ability to smell, and (5) feeling of illness. The first four items were scored on an ordinal scale with the response alternatives none, little, some, much, very much (and recoded on a 0-4 scale). The fifth item had the response alternatives none, little, some, close to normal, normal (recoded on a 4-0 scale). The sixth item had the response alternatives very ill, ill, a little ill, healthy (recoded 3-0). The recoded six symptom scores were summed to give an aggregate value representing a "sinusitis symptom score", with a score ranging from 0 (minimal symptoms) to 23 (maximal symptoms) [17].

Healthy comparison group
To have a healthy comparison group, with which we could compare the CSS scores, we used a convenience sample of hospital personnel (n = 42). These subjects only responded to the CSS and items on age and sex.

Statistical analysis
Descriptive statistics are presented with means and SDs, or percentages. Group characteristics were compared using the t-test or χ 2 test.
Internal consistency reliability of the scales was assessed using Cronbach's α [25]. Test-retest reliability of the aggregate scales was assessed with an intraclass correlation coefficient (ICC), using the average of raters in a two-way mixed model with an absolute agreement definition. To minimize the subjects' recall for the previous answers, the test-retest was done using individual patient scores 12 and 13 weeks after randomization. We excluded assessments more than 21 days apart. The time between the assessments was median 7 days (range 6 to 21).
Construct validity of the scales of the CSS was assessed by correlations with: (1) corresponding items of the SF-36, and (2) the items on the sinusitis symptom scale. We used Spearman's rank correlations because of the ordinal scale on sinusitis symptoms items. A finding of higher correlations between items measuring related phenomena than between non-corresponding items would support construct validity.
The discriminant validity of the CSS and the SF-36 was evaluated by the capacity of the subscales to differentiate between two groups with expected differences in health status. For this purpose, we used grading of CT sinus soft tissue swelling (sum of six measures of sinus soft tissue swelling) in millimeters, dividing the patients into two groups according to score below or above the median of 12 mm. In this comparison, we adjusted for age using analysis of variance. We similarly compared scores between two groups divided according to the median of the overall symptom score (≤ 9 vs. >9).
Finally, we compared CSS scores in the total sample of patients with chronic sinusitis with the healthy comparison group of hospital personnel, using the t-test for independent samples.
Responsiveness was assessed in the pooled sample using change in the overall symptom score as an indicator of global change. The SD of the baseline overall symptom score was 3.1. We used a change of 2 units on the (equivalent to 0.65 SD) to denote a meaningful change. We had no empirical evidence for this, however previous reports have suggested that minimally clinical important changes frequently are about 0.5 SD [26], hence our choice is in accordance with this. We report responsiveness as standardized response mean (SRM) (mean change/SD of change) and effect size (mean change/SD at baseline) [27]. Because patients in one of the treatment arms were given antibiotics and other medication as part of the protocol, we excluded patients in this treatment arm from the analysis of responsiveness of the CSS Medication usage and Total scales. These scales are influenced directly by medication use. Instead, we would expect this group to report deterioration in score on the CSS Medication usage subscale.
We chose a significance level of 5%. For statistical analysis, we used Stata version 8.2 (Stata Corp., College Station, TX) and SPSS version 12.0 (SPSS Inc, Chicago, IL). The Regional Committee for Medical Research Ethics and The Norwegian Data Inspectorate approved the study.

Results
The mean age of the participants was 43 years, 51% were women and 35% were current smokers. On baseline sinus CT, 17 patients had opacification, 2 had fluid retention, and 57 had sinus soft tissue swelling. There was no difference in age or sex between the chronic sinusitis group and the healthy comparison group (table 1).
Completers, who responded to the sinusitis symptoms scale of the CSS at baseline and after 12 weeks (n = 47), tended to be somewhat older, had suffered from chronic sinusitis longer, and had better baseline CSS and SF-36 scores on all subscales than dropouts (n = 18) after 12 weeks, although only the differences on the General health (p = 0.03) and the Social functioning scales (p = 0.04) of the SF-36 were statistically significant.
At baseline the mean CSS sinus symptoms score was 40 (SD 26) and the medication usage score 83 (SD 18) (  The CSS and SF-36 scales were not able to discriminate between the group of patients with aggregate sinus soft tissue swelling ≤ 12 mm versus >12 mm (table 4). In contrast, the CSS sinus symptom and total scales, and the SF-36 Bodily Pain and Mental health scales discriminated between patients divided into groups according to overall symptom scale score above or below the median (table 4). When comparing CSS scores in the total sample of patients with chronic sinusitis with the healthy comparison group, there was a marked difference in scores on the two subscales and the total score (table 5).
Of the 47 patients that completed the study, 22 reported improved overall symptom score (change < -2 units), 19 were unchanged (change of -2 to 2 units), and 6 were worse (change > 2 units). The worse group was considered too small for further analysis. The responsiveness indices in general were larger in the improved group than in the unchanged group and in the right direction for all indices  In the conventional group with medication as part of the intervention, the unchanged symptom group reported SRM and ES of -0.29 (n = 4). In the improved group SRM was -0.40 and ES -0.64 (n = 10), in accordance with increased use of medication during the study than before.

Discussion
In this study we have documented translation of the CSS into Norwegian and assessed aspects of its reliability, validity and responsiveness in patients with CT-verified chronic sinusitis. The internal consistency reliability of the  CSS scales was moderate to fair, and lower than the level of 0.70 usually considered acceptable for group use [28].
The internal consistency reliability in the present study was lower than previously reported for the CSS scales in some studies [5,18], for some scales of the RhinoQOL [6,16] and some other instruments [14,15,29,30]. However, the internal consistency reliability was at the level of the CSS in another study [6] and some other RhinoQOL scales [6,18]. In contrast all, eight SF-36 scales in the present study had internal consistency reliability <0.70. The problems with internal consistency of the CSS could be related to the low number of CSS items, as internal consistency normally increases with increasing number of items. Hence, there is a trade-off between ease of administration and internal consistency.
The test-retest reliability in the present study was substantial for all CSS and SF-36 scales, as previously reported for the CSS [5,18] and higher than reported for some other disease-specific instruments [6,29]. However, these comparisons should be interpreted carefully, because of differences in samples and assessment methods.
Assessment of cross-sectional validity in the present study showed that the CSS total scale was moderately associated with the Bodily pain scale of the SF-36, at the level previously reported [5]. The association of the CSS total with the Role-emotional scale of the SF-36 was higher than previously reported [5]. Further, there was marked difference between CSS subscale and total scores among patients with chronic sinusitis and the healthy comparison group. These findings and the associations with symptom scores were in line with expectations and give support to construct validity of the CSS [31]. In contrast, when relating the CSS and SF-36 scale scores to the CT findings, there was little association. This finding supports previous findings that there is little association between symptoms and CT based severity staging in chronic sinusitis [32][33][34]. The results from the comparison of CSS scores with healthy controls in the present study are in line with previous similar comparisons with general population values [2,35].
Some subscales of both the CSS and the SF-36 were sensitive to change in this pooled sample of patients receiving three different interventions, in accordance with previous reports of surgical interventions [2,6,10] The ES for the responsive scales ranged from medium to large, using the nomenclature of Cohen, where an ES of 0.2 is considered small, 0.5 medium, and 0.8 represents a large ES [36]. In the present study, the sinusitis symptoms subscale of the CSS was more responsive than the medication use subscale, as previously reported [6]. The medication use subscale of the CSS is not feasible for use with interventions using pharmaceuticals, as in one of the arms of the present study; however, we think this subscale is more justified in surgical interventions.
Compared with other disease-specific questionnaires for use in chronic sinusitis, the CSS is short, easy to use, has documented validity, and scores are available for healthy populations. In the present study, we did not compare the questionnaire with other disease-specific instruments. However, we think the CSS is a feasible disease-specific instrument for use in many interventions, and more documentation exists for this instrument than for other sinusspecific HRQoL instruments.
Some weaknesses of our study should be mentioned. The sample size was small, hence reducing the power of the study. Periodicity or seasonality of symptoms or environmental factors might influence our measured outcomes. In lack of a gold standard, we compared scores in chronic sinusitis with those in a healthy convenience sample and assessed cross-sectional associations with CT findings and a symptom score that we had developed for this study. However, this scale has not been subject to the same rigorous testing as the HRQoL measures. We based its use in the present study on an assumption of face validity of the items. We also did not use a standardized and validated CT staging system, but the lack of association with our system was in accordance with previous reports [32][33][34].
Responsiveness was assessed using change in overall symptoms as benchmark, which we thought was the best estimate of global outcome that we could find.
Because we only included patients with CT-verified sinusitis, we included a small proportion of all patients presenting symptoms of chronic sinusitis. Hence, one should be careful extrapolating the results of the study beyond patients with CT-verified sinus soft tissue swelling.

Conclusion
We have documented the translation of the CSS into Norwegian and shown that this version of the CSS had substantial test-retest reliability, but there were problems with the internal consistency reliability. This suggests that the two three-point scales were not homogeneous. The crosssectional associations give support to validity of the scales in chronic sinusitis. Finally, scales of the CSS and the SF-36 were responsive in this pooled population receiving three different interventions.