The risk of bias in randomized controlled trials in otorhinolaryngology: hardly any improvement since 1950
BMC Ear, Nose and Throat Disorders volume 17, Article number: 3 (2017)
Randomized Controlled Trials (RCTs) represent the most valuable study design to evaluate the effectiveness of therapeutic interventions. However, flaws in design, conduct, analysis, and reporting of RCTs can cause the effect of an intervention to be under- or overestimated. These biased RCTs may be included in literature reviews. To make the assessment of Risk of Bias (RoB) consistent and transparent, Cochrane published a RoB tool, with which RoB is assessed per item as “low”, “unclear” or “high”. Our objective was to provide an overview of RoB assessments of RCTs in otorhinolaryngology over time, and to identify items where improvement is still warranted.
We retrieved Cochrane reviews in the otorhinolaryngologic research field published in 2012 and 2013. We used all judgments per item as assessed by the review authors of the included RCTs. We evaluated the association between “low RoB” vs. “unclear and high RoB” and the year of publication (time strata: ‘<1990’, ‘1990–1995’, ‘1996–2000’, ‘2001–2005’, ‘2006–2012’) per item using binary logistic regression.
We extracted the RoB assessments from 42 Cochrane reviews that had included 402 RCTs (median number of RCTs per review: 7, range 1–40). In total 2,356 items were assessed (mean number of assessed items per RCT: 5.9, standard deviation 1.8). On binary logistic regression, RCTs published in 2006–2012, compared with those published before 1990, were more likely to have a low RoB for two items: random sequence generation (odds ratio 6.09 [95% confidence interval: 3.11–11.95]) and allocation concealment (3.59 [1.87–6.90]). On all other items, there was no significant increase in the proportion of low RoB when comparing RCTs published in 2006–2012 with RCTs published before 1990.
Although there were some positive developments in the RoB assessments in otorhinolaryngology, a further decrease in RoB is still warranted on several items. Currently, biased RCTs are included in Cochrane reviews and effects of therapeutic interventions can be under- or overestimated, with implications for clinical patient care.
Randomized Controlled Trials (RCTs) represent the most valuable study design for individual studies to evaluate the effectiveness of therapeutic interventions. Adequate randomization ensures that known and unknown confounding factors are distributed evenly across groups. However, flaws in the design, conduct, analysis, and reporting of RCTs can cause the effect of an intervention to be under- or overestimated [1, 2]; this is referred to as “bias”. When biased RCTs are included in literature reviews, the findings of these reviews may also be biased [3, 4]. Since the conclusions of reviews are used directly in clinical practice, patients may be at risk because of poorly conducted RCTs.
To make the process of assessing Risk of Bias (RoB) more consistent and transparent, Cochrane developed and validated the Cochrane RoB tool. The first version of the Cochrane RoB tool was presented in 2008 , and in 2011 a revision was published  (the most recent version can be accessed online ). In the revised version, separate assessments were recommended for some items (e.g. not only assess blinding, but assess blinding of participants and personnel and blinding of outcome assessment separately). Cochrane recommends authors of Cochrane reviews to carefully consider the potential limitations of the included studies to obtain reliable conclusions  and to discuss the impact of including trials with a high RoB on the results of the Cochrane review. Using the Cochrane RoB tool, authors of Cochrane reviews classify the RoB in the included RCT on specific items in three categories (“low”, “unclear” or “high” RoB). In the Cochrane RoB tool, the following items are included: 1) selection bias (items: random sequence generation, allocation concealment), 2) performance bias (blinding of participants and personnel), 3) detection bias (blinding of outcome assessment), 4) attrition bias (incomplete outcome data), 5) reporting bias (selective reporting), and 6) other bias (other sources of bias).
In this paper, we provide an overview of the RoB assessments in the literature of the otorhinolaryngologic research field. We aimed to assess how the RoB has developed over time per item. We hypothesize that the RoB has decreased over time for all items. Subsequently, we identified items where improvement is still warranted.
Selection of Cochrane reviews
We retrieved all Cochrane reviews on otorhinolaryngologic topics published in 2012 and 2013. The search syntax to retrieve these reviews in PubMed was described in detail elsewhere , and also uploaded as Additional file 1. In short, search syntaxes for otorhinolaryngologic articles , reviews  and the journal (Cochrane Database of Systematic Reviews) were used and restricted for publication type (no editorials, letters to the editor, news or comments) and publication date (2012 and 2013).
Two authors (JPMP and IS) assessed whether the retrieved Cochrane reviews were truly conducted in otorhinolaryngology.
We collected the year of publication, the total number of included RCTs, and the RoB assessments (“low”, “unclear” or “high” RoB) per item as judged by the original review authors from the included reviews.
All Cochrane review authors used the Cochrane RoB tool . However, some review authors did not use all items or used sub-categories for certain items. For example, sometimes blinding was assessed, whereas newer reviews assessed blinding of participants and personnel and blinding of outcome assessments separately. When multiple outcomes were assessed, we adopted the RoB assessment of the primary reported outcome.
Descriptive statistics of the included Cochrane reviews and of the RoB assessments of the individual RCTs were computed. The frequency of low, unclear and high RoB was calculated per item. Subsequently, we calculated the proportion of items that were scored as low, unclear or high RoB per item per time stratum. Therefore, we divided all RCTs in five time strata based on the year of publication: ‘<1990’, ‘1990–1995’, ‘1996–2000’, ‘2001–2005’, and ‘2006–2012’.
To explore the development of RoB over time, we performed a binary logistic regression analysis. The RoB assessment “low” was the reference category, and was compared to “unclear and high” RoB per item; this resulted in an odds ratio (OR, with 95% confidence intervals (CI)) per time stratum with ‘<1990’ as reference time stratum.
Statistical package SPSS v22 was used. A p-value of < .05 was considered statistically significant.
Selection of Cochrane reviews
The search retrieved 91 Cochrane reviews, of which 42 reviews were not conducted in the otorhinolaryngologic research field. The remaining 49 Cochrane reviews were included in our study (Fig. 1). Of these 49 articles, six did not include any RCTs (so called “empty reviews”) and thus did not assess RoB. Furthermore, one review only reported individual items that had high RoB in the included RCTs, but did not report on items that might have had a low or unclear RoB. Consequently, these seven reviews were excluded leaving a total of 42 included reviews (22 were published in 2012, 20 were published in 2013).
Selection of individual RCTs
All 42 reviews included a total of 402 individual RCTs (median number of RCTs included per review: 7, range 1–40). The median year of publication of the individual RCTs was 1998 (range 1950–2012). We included different, but comparable, numbers of RCTs per time stratum (<1990: n = 96, 1990–1995: n = 66, 1996–2000: n = 69, 2001–2005: n = 92, and 2006–2012: n = 79).
Of the 402 included RCTs, 10 were reported in two reviews; thus, 392 unique RCTs were assessed. Since the number of overlapping RCTs is so little (<2.5%), we based our analysis on the 402 RoB assessments. Moreover, the RoB assessments of the overlapping RCTs were often different between reviews (data not shown).
In total, 2,356 items were assessed (mean number of items per RCT 5.9, standard deviation 1.8).
As explained before, review authors used different versions of the Cochrane RoB tool, which resulted in a different total number of assessment per item. All items of the standard Cochrane RoB tool  were assessed more than 100 times: random sequence generation (n = 384), allocation concealment (n = 399), blinding of participants and personnel (n = 137), blinding of outcome assessment (n = 130), blinding (n = 227), incomplete outcome data (n = 345), selective reporting (n = 337) and other bias (n = 273). All other items were assessed <20 times (Additional file 2). For readability of the remainder of this paper, in our following analyses we only assessed the items from the standard Cochrane RoB tool . Together, these items constituted 2,232 (94.7%) of all 2,356 RoB assessments and therefore form a representative sample of our data. Thirty-six studies (9.0%) had a low RoB on all assessed items, and 208 studies (51.7%) had at least one item with a high RoB.
Figures 2a-h show the proportions of low, unclear and high RoB per item per time stratum (for the data tables of these figures, see Additional file 3). For example in Fig. 2a the development of RoB over time for random sequence generation is depicted. Before 1990, 28% of RCTs had a low RoB for random sequence generation, whereas the proportion of RCTs with a low RoB was 70% for RCTs published between 2006–2012. Between these two time strata, a gradual increase in the proportion of low RoB can be observed. The proportion of items that were assessed as unclear RoB gradually declined from 61% (<1990) to 24% (2006–2012), and the proportion of items that were assessed as high RoB slightly declined (<1990: 11%, 2006–2012: 5%). For other items, the development over time is less gradually (e.g. blinding outcome assessment).
When we look at the data for random sequence generation using binary logistic regression (Table 1), we note that there was no statistically significant difference in the odds for low RoB between RCTs published in 1990–1995 compared to RCTs published before 1990. However, RCTs that were published in all three later time strata were significantly more likely to score a low RoB compared to RCTs published before 1990 (1996–2000: 2.20 (1.17–4.50), 2001–2005: 3.07 (1.69–5.57), 2006–2012: 6.09 (3.11–11.95)).
Also for allocation concealment, RCTs that were published in the two latest time strata were significantly more likely to score a low RoB than RCTs published before 1990 (2001–2005: 2.09 (1.13–3.88), 2006–2012: 3.59 (1.87–6.90)). For the items selective reporting and other bias, RCTs that were published in later time strata had significantly lower RoB than RCTs published before 1990 (selective reporting: 1996–2000: 2.31 (1.06–5.01), 2001–2005: 3.65 (1.90–7.00); other bias: 2001–2005: 2.23 (1.12–4.42)).
On the other hand, for the items blinding outcome assessment and incomplete outcome data, RCTs that were published in later time strata were significantly more likely to have an unclear or high RoB than RCTs published before 1990 (blinding outcome assessment: 2001–2005: 0.17 (0.05–0.62); incomplete outcome data: 1990–1995: 0.49 (0.24–0.99)).
We provided an overview of the development of RoB in a sample of otorhinolaryngologic RCTs published from 1950–2012. When looking per item, random sequence generation and allocation concealment were significantly more likely to score a low RoB when comparing RCTs published between 2006–2012 to RCTs published before 1990. These two items are two of the key factors that make RCTs the most valuable study design to evaluate the effectiveness of therapeutic interventions, so we consider this a positive development.
Comparison with literature
An analysis like ours was performed by Reveiz et al. , who analyzed all RoB assessments in issue 12 (2012) of the Cochrane Database of Systematic Reviews; they thus investigated multiple medical specialties. They identified a lower RoB on items random sequence generation, allocation concealment, incomplete outcome data and selective reporting for articles published between 2006–2012 compared to articles published before 1990. These data are concordant with our findings with respect to items random sequence generation and allocation concealment.
They found that the rate of RCTs judged as having a low or high RoB significantly increased over time, whereas the rate of RCTs judged as having an unclear RoB decreased for several domains. In our study, we also observed that the rate of RCTs judged as having an unclear RoB decreased over time for items random sequence generation and allocation concealment. This would reflect better reporting of items, since the review authors could adequately identify the RoB from the articles. On the other hand, in our sample for items blinding of participants and personnel and blinding, the rate of RCTs judged as having an unclear RoB increased over time. In the RCTs published between 2006–2012, the proportion of RCTs with an unclear RoB was 48% and 30%, respectively (Additional file 3). We think these proportions are very large, and could easily decrease if authors reported their RCTs better. To help authors report their RCTs better, the Consolidated Standards of Reporting Trials (CONSORT) Statement (www.consort-statement.org) was developed  (and later revised in 2001  and 2010 [13, 14]). The CONSORT Checklist lists all important items that should be reported in an RCT. Adherence to the CONSORT reporting guideline has been associated with improved reporting [15, 16].
Yordanov et al. also published an analysis of RoB similar to ours, based on 1,286 trials from multiple medical specialties . They also identified that blinding was not often done properly in many included trials. Additionally, they calculated the avoidable waste of research, as identified previously by Chalmers and Glasziou : easy methodological adjustments at no or little cost were possible to lower the RoB in 50% of trials .
Both articles included RCTs from multiple medical specialties. A medical specialty-specific analysis like ours has not been performed previously; hence, we cannot directly compare our findings to specific other medical specialties.
Our study is characterized by several strengths. We based our conclusions on a large sample (n = 2,356 RoB assessments in 402 RCTs) of otorhinolaryngologic literature. Furthermore, we used a transparent strategy to yield our final selection of studies. Finally, we performed a unique analysis in our research field and we hope to inspire research groups from other medical disciplines to conduct a similar analysis.
However, we must also take some uncertainties of our study into account. First, our sample of Cochrane reviews and RCTs may be biased: not all therapeutic interventions in otorhinolaryngology have been researched in a randomized study design (e.g. surgical interventions for which randomization is considered unethical), nor have they been reviewed in a Cochrane review. Therefore, some subspecialties may be underrepresented in our sample, limiting the generalizability of our findings to the total field of otorhinolaryngology. Second, we did not look into the specific types of RCTs in our analysis. One could argue that RCTs with a placebo-controlled design may have a lower RoB, and studies with a pragmatic study design may have a higher RoB (because the determinant is not standardized). However, there were only three pragmatic trials in our sample; a subanalysis for these articles would not have been feasible, or have a significant impact on our findings. Third, the RoB assessments were done by individual Cochrane review authors and may have been done inconsistently across the reviews. However, all Cochrane Centres provide training and support for all Cochrane authors on “Writing a Cochrane review of intervention studies” to increase their skills and knowledge (including how to assess RoB). Therefore, we assume that all Cochrane authors have assessed the RoB of the included RCTs based on the Cochrane Handbook  (minimum two independent reviewers, consensus must be reached, etc.). Although even these assessments remain subjective, this is the best possible standardization of RoB assessments. Finally, there is limited possibility to improve the high RoB identified in items blinding of participants and personnel, blinding of outcome assessment, and blinding (Additional file 3). We acknowledge that blinding may be difficult to perform in otorhinolaryngology, as investigated treatments are often surgical interventions . However, authors should then report that they were unable to blind patients or outcome assessors due to the nature of the investigated intervention, so that the RoB is clear for readers and review authors.
In our analysis, we observed that two key items of RCTs (random sequence generation and allocation concealment) improved over time. However, the other six items did not show significant improvement over time. Of these six items, three items were associated with blinding; RoB on these items can never be completely eliminated in surgical trials. In contrast, researchers should focus on proper design, conduct, and reports of RCTs for the other three items (incomplete outcome data, selective reporting and other bias).
Furthermore, for all these six items, the proportion of unclear RoB can be decreased by transparent reporting of RCTs. Researchers may find the CONSORT Statement helpful to check if all important items have been addressed in their manuscript [13, 14]. Finally, we encourage journal editors to strictly adhere to reporting guidelines, and embed the reporting guidelines in their submission process. Ultimately, all these actions will lead to increased value of research findings and to higher quality of patient care.
We provided an overview of the development of RoB in a selected sample of otorhinolaryngological RCTs published from 1950–2012. When looking at specific items, random sequence generation and allocation concealment were significantly more likely to score a low RoB when comparing RCTs published between 2006–2012 to RCTs published before 1990. On all other items, there was no significant increase in the proportion of low RoB when comparing RCTs published in 2006–2012 with RCTs published before 1990.
- CONSORT Statement:
Consolidated standards of reporting trials statement
Randomized Controlled Trial
Risk of Bias
Higgins JP, Altman DG, Gotzsche PC, Jüni P, Moher D, Oxman AD, et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.
Ioannidis JP, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, et al. Increasing value and reducing waste in research design, conduct, and analysis. Lancet. 2014;383(9912):166–75.
Chan AW, Altman DG. Epidemiology and reporting of randomised trials published in PubMed journals. Lancet. 2005;365(9465):1159–62.
Jüni P, Altman DG, Egger M. Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ. 2001;323:42–6.
Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions Version 5.0.0 [updated February 2008]. The Cochrane Collaboration, 2008.
The Cochrane Collaboration, Cochrane Handbook, Ch 8.5., The Cochrane Collaboration’s tool for assessing risk of bias. Available via http://handbook.cochrane.org/chapter_8/8_5_the_cochrane_collaborations_tool_for_assessing_risk_of_bias.htm. Accessed 3 Sept 2014.
Peters JPM, Hooft L, Grolman W, Stegeman I. Reporting Quality of Systematic Reviews and Meta-Analyses of Otorhinolaryngologic Articles Based on the PRISMA Statement. PLoS ONE. 2015;10(8):e0136540.
The editorial team, Cochrane Ear Nose and Throat Disorders Group. About the Cochrane Collaboration (Cochrane Reviews Group (CRGs)), 2012 issue 7, art. no.: ENT. CENTRAL search strategy. Available via http://onlinelibrary.wiley.com/o/cochrane/clabout/articles/ENT/sect0-meta.html. Accessed 3 Sept 2014.
National Institutes of Health, US National Library of Medicine. Search Strategy Used to Create the Systematic Reviews Subset on PubMed. Available via http://www.nlm.nih.gov/bsd/pubmed_subsets/sysreviews_strategy.html. Accessed 3 Sept 2014.
Reveiz L, Chapman E, Asial S, Munoz S, Bonfill X, Alonso-Coello P. Risk of bias of randomized trials over time. J Clin Epidemiol. 2015;68(9):1036–45.
Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al. Improving the quality of reporting of randomized controlled trials. The CONSORT Statement. JAMA. 1996;276:637–9.
Moher D, Schulz KF, Altman DG. The CONSORT Statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. Lancet. 2001;357(9263):1191–4.
Schulz KF, Altman DG, Moher D, for the CONSORT group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c332.
Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, for the CONSORT group, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel groups randomised trials. BMJ. 2010;340:c869.
Plint AC, Moher D, Morrison A, Schulz K, Altman DG, Hill C, et al. Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review. Med J Aust. 2006;185(5):263–7.
Turner L, Shamseer L, Altman DG, Weeks L, Peters J, Kober T, et al. Consolidated standards of reporting trials (CONSORT) and the completeness of reporting randomized controlled trials (RCTs) published in medical journals. Cochrane Database Syst Rev. 2012;11:MR000030.
Yordanov Y, Dechartres A, Porcher R, Boutron I, Altman DG, Ravaud P. Avoidable waste of research related to inadequate methods in clinical trials. BMJ. 2015;350:h809.
Chalmers I, Glasziou P. Avoidable waste in the production and reporting of research evidence. Lancet. 2009;374(9683):86–9.
Boutron I, Tubach F, Giraudeau B, Ravaud P. Blinding was judged more difficult to achieve and maintain in nonpharmacologic than pharmacologic trials. J Clin Epidemiol. 2004;57(6):543–50.
The authors received no specific funding for this work.
Availability of data and materials
The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.
JPMP conception and design, acquisition of data, analysis and interpretation of data, drafting the article and revising it critically for important intellectual content, final approval of the version to be submitted. IS conception and design, revising the article critically for important intellectual content, final approval of the version to be submitted. WG revising the article critically for important intellectual content, final approval of the version to be submitted. LH conception and design, revising the article critically for important intellectual content, final approval of the version to be submitted.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Number of assessments per item. Items 1–8 are sorted in order of appearance in Cochrane’s RoB tool . Items 9–18 are sorted in order of total number of assessments. Items 9–18 are left out of the analysis (see manuscript). (DOCX 17 kb)
About this article
Cite this article
Peters, J.P.M., Stegeman, I., Grolman, W. et al. The risk of bias in randomized controlled trials in otorhinolaryngology: hardly any improvement since 1950. BMC Ear Nose Throat Disord 17, 3 (2017). https://doi.org/10.1186/s12901-017-0036-x