CICM Online CCR Journal logo CICM logo

Full Text View


Comparative effectiveness research in critically ill patients: risks associated with mischaracterising usual care

Willard N Applefeld*, Jeffrey Wang*, Harvey G Klein, Robert L Danner, Peter Q Eichacker, Charles Natanson, (*equal first authors)

Crit Care Resusc 2020; 22 (2): 110-118

  • Author Details
  • Competing Interests
    None declared
  • Abstract
    Comparative effectiveness research can help guide the use of common, routine medical practices. However, to be safe and informative, such trials must include at least one treatment arm that accurately portrays current practices. While comparative effectiveness research is widely perceived as safe and to involve no or only minimal risks, these assumptions may not hold true if unrecognised deviations from usual care exist in one or more study arms. For critically ill subjects in particular, such practice deviations may increase the risk of death or injury and undermine safety monitoring. Furthermore, unrecognised unusual care seems likely to corrupt informed consent documents, with underappreciated risks shrouded under the reassuring "comparative effectiveness" research label. At present, oversight measures are inadequate to ensure that research subjects enrolled in comparative effectiveness trials are actually receiving usual and not unusual care. Oversight by governmental and non-governmental entities with appropriate expertise, empowered to ensure that current clinical practice has been properly represented, could help prevent occurrences in clinical trials of unusual care masquerading as usual care.
  • References
    1. Macklin R, Natanson C. Misrepresenting “usual care” in research: an ethical and scientific error. Am J Bioeth 2020; 20: 31-9.
    2. SUPPORT Study Group of the Eunice Kennedy Shriver NICHD Neonatal Research Network; Carlo WA, Finer NN, Walsh MC, et al. Target ranges of oxygen saturation in extremely preterm infants. N Engl J Med 2010; 362: 1959-69.
    3. NICHD Neonatal Research Network. The Surfactant Positive Airway Pressure Pulse Oximetry Trial in Extremely Low Birth Weight Infants: the SUPPORT trial; 2005. (viewed Feb 2020).
    4. Lockwood C, Riley L, Blackmon L, Lemons JA; editors. Guidelines for perinatal care, 6th ed. American Academy of Pediatrics, American College of Obstetrics and Gynecologists, 2007
    5. Tin W, Milligan DW, Pennefather P, Hey E. Pulse oximetry, severe retinopathy, and outcome at one year in babies of less than 28 weeks gestation. Arch Dis Child Fetal Neonatal Ed 2001; 84: F106-10
    6. Cortés-Puch I, Wesley RA, Carome MA, et al. Usual care and informed consent in clinical trials of oxygen management in extremely premature infants. PLoS One 2016; 11: e0155005
    7. Hagadon JI, Furey AM, Nghiem TH, et al. Achieved versus intended pulse oximeter saturation in infants less than 28 weeks’ gestation: the AVIOx study. Pediatrics 2006; 118: 1574-82.
    8. US Department of Health and Human Services. 45 CFR 46.111 Criteria for IRB approval of research. (viewed Feb 2020).
    9. Schmidt B, Whyte RK, Asztalos EV, et al. Effects of targeting higher vs lower arterial oxygen saturations on death or disability in extremely preterm infants: a randomized clinical trial. JAMA 2013; 309: 2111-20
    10. BOOST II United Kingdom Collaborative Group; BOOST II Australia Collaborative Group; BOOST II New Zealand Collaborative Group; Stenson BJ, Tarnow-Mordi WO, Darlow BA, et al. Oxygen saturation and outcomes in preterm infants. New Engl J Med 2013; 368: 2094-104.
    11. Slutsky AS. Mechanical ventilation: American College of Chest Physicians’ Consensus Conference. Chest 1993; 104: 1833-59.
    12. Acute Respiratory Distress Syndrome Network; Brower RG, Matthay MA, Morris A, et al. Ventilation with lower tidal volumes as compared with traditional tidal volumes for acute lung injury and the acute respiratory distress syndrome. N Engl J Med 2000; 342: 1301-08.
    13. Carome MA. Human research subject protections under Multiple Project Assurance (MPA) M-1106 [letter from the Office for Human Research Protections]. Rockville, MD: OHRP, 2002. (viewed Feb 2020).
    14. Carome MA. Human research subject protections under Multiple Project Assurance (MPA) M-1025 [letter from the Office for Human Research Protections]. Rockville, MD: OHRP, 2002. (viewed Feb 2020).
    15. Carome MA. Human research subject protections under Multiple Project Assurance (MPA) M-1183 [letter from the Office for Human Research Protections]. Rockville, MD: OHRP, 2002. (viewed Feb 2020).
    16. Eichacker PQ, Gerstenberger EP, Banks SM, et al. A meta-analysis of acute lung injury and acute respiratory distress syndrome trials testing low tidal volumes. Am J Respir Crit Care Med 2002; 166: 1510-14.
    17. Deans KJ, Minneci PC, Suffredini AF, et al. Randomization in clinical trials of titrated therapies: unintended consequences of using fixed treatment protocols. Crit Care Med 2007; 35: 1509-16.
    18. Deans KJ, Minneci PC, Banks SM, et al. Tidal volumes in acute respiratory distress syndrome — one size does not fit all. Crit Care Med 2006; 34: 264-7.
    19. Deans KJ, Minneci PC, Cui X. Mechanical ventilation in ARDS: one size does not fit all. Crit Care Med 2005; 33: 1141-3.
    20. Carmichael LC, Dorinsky PM, Higgins SB, et al. Diagnosis and therapy of acute respiratory distress syndrome in adults: an international survey. J Crit Care 1996; 11: 9-18.
    21. Hébert PC, Wells G, Blajchman MA, et al. A multicenter, randomized, controlled clinical trial of transfusion requirements in critical care. N Engl J Med 1999; 340: 409-17.
    22. Hébert PC, Wells G, Martin C, et al. A Canadian survey of transfusion practices in critically ill patients. Crit Care Med 1998; 16: 482-7.
    23. Consensus conference. Perioperative red blood cell transfusion. JAMA 1988; 260: 2700-03.
    24. Deans KJ, Minneci PC, Klein HG, Natanson C. The relevance of practice misalignments to trials in transfusion medicine. Vox Sang 2010; 99: 16-23
    25. Cortés-Puch I, Wiley BM, Sun J, et al. Risks of restrictive red blood cell transfusion. Strategies in patients with cardiovascular disease (CVD): a meta-analysis. Transfus Med 2018; 28: 335-45.
    26. Jaswal DS, Leung JM, Sun J, et al. Tidal volume and plateau pressure use for acute lung injury from 2000 to present: a systematic review. Crit Care Med 2014; 42: 2278-89.
    27. Writing group for the PReVENT Investigators; Simonis FD, Serpa Neto A, Binnekade JM, et al. Effect of low vs intermediate tidal volume strategy on ventilator-free days in intensive care unit patients without ARDS: a randomized clinical trial. JAMA 2018; 320: 1872-80.
    28. Patient-Centered Outcomes Research Institute. About our research. Washington, DC: PCORI, 2019. (viewed Feb 2020).
One of us (CN) recently co-authored a critique of design errors in three purportedly usual care trials enrolling critically ill subjects that was published in a United States-based ethics journal. 1 These missteps incorrectly represented unusual care as usual, thereby jeopardising subject safety, confounding trial conclusions, and undermining the consent process. This prior examination was focused on ethical issues arising from these errors and was intended for an audience of medical ethicists. Here, our emphasis is on medical risks for patients emanating from these trial design errors. Previously published data are included to demonstrate how those risks may have affected study results and patient outcomes.

The three case studies presented here challenge the widespread belief that randomised trials of putative usual care, particularly those enrolling critically ill subjects, are invariably safe. Properly designed head-to-head comparisons of contemporary care can improve clinical decision making by better quantifying relative risks and benefits. However, for such research to be informative, at least one arm must be truly representative of current medical practice. Some trials purporting to compare usual care practices may not accurately reflect those practices and instead provide unusual care, either to all enrolled subjects or to important subgroups within a cohort. If subjects are critically ill with a high baseline risk of death, this unusual care may have grave consequences that include but are not limited to the following:
  • Inadequate informed consent: for research to be ethical and consent to be informed, subjects must understand how their care will change and the potential risks of study enrolment.
  • Suboptimal care: unusual therapies that are inferior to usual care can put trial subjects at increased risk, particularly when administered to critically ill subjects.
  • Inability to monitor safety: when a trial compares two unusual therapies, the benchmark for safety monitoring is lost. Without a usual care comparator, harm from either or both unusual care arms may go unrecognised.
  • Inaccurate and harmful conclusions: without a usual care comparator arm, if both therapies studied are inferior to usual care, there is a risk that the least harmful intervention will displace usual care in practice guidelines, putting future patients at risk.
In the three examples described below, unusual care misrepresented as usual care resulted in one or more of these unintended harmful consequences. Not accurately capturing current practice is a design error that has been difficult to recognise; the three protocols presented underwent extensive review processes, conducted by multiple groups at many institutions, before approval and implementation. Failure of the existing system to correct misconceptions of usual care can occur for many reasons (Box 1). However, as we describe at the conclusion of this review, awareness, regulatory guidance, and the development of mitigating procedures can help ensure that future trials intended to study usual care avoid these pitfalls.

Three case studies

Misapplication of a guideline as representative of usual care: the Surfactant, Positive Pressure, and Pulse Oximetry Randomized Trial (SUPPORT)

Premature, high risk infants have underdeveloped lungs and often require supplemental oxygen therapy at the time of delivery, and sometimes for weeks to months afterwards. However, both the upper and lower ends of the oxygen concentration range used in these infants can have different but potentially serious adverse effects. With too much oxygen, these infants can develop blindness or retinopathy of prematurity; with too little oxygen, they can have neurological injury or death. SUPPORT randomised 1316 premature, low birth weight, high risk infants to a high (91–95%) or low (85–89%) target range of oxygen saturation measured by pulse oximetry (Spo2) in an attempt to find an Spotarget range that minimises these competing risks. 2 From the SUPPORT study protocol, it does not appear that Spotarget ranges in common use at the time of the study were adequately investigated and used to validate the trial design. 3 Instead, investigators relied on the American Academy of Pediatrics’ (AAP) guidelines 4 and compared two narrow Spo2 target ranges lying at the extreme ends of the much broader range that was actually recommended. To justify the lower target range included in SUPPORT, data were cited from a decade-old, retrospective chart review of similar neonates done in northern England by Tin and colleagues. 5

The methodology employed for the design of study groups in SUPPORT failed for two main reasons. First, the two Spotarget ranges at either end of the AAP guideline systematically underrepresented the middle of the recommended range, which better reflected most, if not all, of usual care at the time. Second, the Tin and colleagues’ study 5 was an inadequate source to justify using the very low Spo2 range in SUPPORT. It presented only observational outcome data, collected a decade before SUPPORT began enrolling subjects. The study had a mortality rate close to double that seen at the time of SUPPORT, indicating that overall practices had changed substantially by the time SUPPORT was undertaken. Therefore, the Tin et al study was not relevant to care when SUPPORT was designed and conducted (Online Appendix, e-Supplementary Box 1).

To determine usual care at the time of SUPPORT, we performed a comprehensive review of all contemporaneous, medical literature that was available before the trial began describing Spo2 targets. 6 One study, the AVIOx trial, specifically examined usual care at the time SUPPORT was designed. Including 84 infants born at less than 28 weeks’ gestation who required oxygen therapy, extensive information was collected on Spo2levels over an extended period at 14 neonatal intensive care units (NICUs) in the US, the United Kingdom, Australia and New Zealand. 7 Importantly, subjects in the AVIOx study met enrolment criteria for SUPPORT. Our review of the AVIOx study found that maintaining infants in the low Spo2 range targeted in SUPPORT, encompassing the bottom half of the AAP guideline, was not consistent with practices at any of the 14 NICUs included in the study (ie, the low oxygen saturation arm was unusual care in those NICUs) (Figure 1). 2, 6 Only the high Spo2 target range of SUPPORT was consistent with usual care in AVIOx NICUs. This finding that the low Spo2range targeted in SUPPORT was not usual care was confirmed in our systematic review of more than 100 other distinct NICUs reporting such data. 6

Notably, an observational study of usual care could have been performed before or during the design of SUPPORT or the same information could have been obtained by conducting a thorough literature search to ascertain usual care practices. Furthermore, a survey of practices at the more than 21 NICUs enrolling subjects in SUPPORT would have most likely determined that the low Spo2 range to be studied was unusual care. Instead, expert opinion, a very general guideline recommendation, and a single, decade-old study seem to have been the major sources of information for the design of the SUPPORT trial. Because of this, the experimental nature of the low Spo2 range arm and the potential risks it posed to enrolled subjects were not recognised at the time of the trial. The informed consent documents from the 21 enrolling centres described both Spo2 ranges targeted in SUPPORT as “standard of care”, “standard care”, “normal”, “routinely used”, “currently used”, “desired approach”, “best approach” or “already being used”, without any discussion of the risks of using an overall lower target Spo2 range than currently used in US NICUs. 6 Failure to accurately define usual care in the types of infants studied in SUPPORT exposed trial subjects to abnormally low Spo2ranges and increased risks. 2, 6 The result was inaccurate and misleading informed consent documents 2, 6, 8 and a treatment regimen that inflicted unnecessary harm.

Unfortunately, two subsequent trials, started after SUPPORT, began enrolling subjects using the same methodology and examining similar extreme ranges of oxygen treatment in high risk, premature infants: one predominantly in Canada (Canadian Oxygen Trial [COT]), 9 and the other in the UK, Australia and New Zealand, (Benefits of Oxygen Saturation Targeting [BOOST II]). 10

Employing outdated rather than current practices as usual care: Acute Respiratory Distress Syndrome Network Lower Tidal Volume (ARMA) trial

The early stages of acute respiratory distress syndrome (ARDS) are characterised by diffuse alveolar damage for which patients frequently require mechanical ventilatory support. Starting in the late 1980s, concerns grew that excessive tidal volumes and airway pressures associated with mechanical ventilation would aggravate alveolar injury with ARDS. 11 In the latter half of 1990s, the ARMA trial randomised 861 critically ill mechanically ventilated patients with ARDS — independent of any acute need for a change in their ventilator-delivered tidal volume — to receive a fixed large 12 mL/kg predicted body weight (PBW) tidal volume or a fixed small 6 mL/kg PBW tidal volume to compare the impact of these settings on lung damage and mortality. 12 Investigators described the large tidal volume at the time of the study as “traditional” and used it as the usual care control for assessing safety and benefit. “Traditional control” has no accepted definition and may have been confusing to research subjects and their surrogates. On its face, “traditional” suggests care more consistent with treatment in the past. While the ARMA protocol and subsequent publications referred to the large tidal volume arm, somewhat accurately, as “traditional”, consent documents also described this “traditional volume” as being currently or commonly used by clinicians during usual care at the time of the trial. For example, the following statements appear in the ARMA consent forms:

“One [purpose of the study] is to compare two ways of inflating your lungs while on the machine. Doctors currently use both methods to breathe for patients, but it is not known if one method is better [than the other].” 13

“Presently doctors use different size breaths of oxygen-rich air to inflate the patient’s lungs. It is unknown whether it is better to use large (12 mL/kg) or small (6 mL/kg) [tidal volumes]. Both ways of inflating the lungs are acceptable methods that are commonly used to             treat patients with [acute lung injury] and [ARDS].” 14

“Presently doctors use varying volumes of oxygen-enriched air to inflate the lungs. It is unknown whether it is better to use a large or small volume of oxygen-enriched air to inflate [the] lungs of patients with lung injury. Both ways of inflating a patient’s lungs are considered acceptable methods and are commonly used [in] medical practice.” 15

An analysis of tidal volumes administered to patients before their enrolment and randomisation in the ARMA trial showed that usual care was not commonly or predominantly 6 mL/kg PBW or 12 mL/kg PBW but actually varied over a wide, normally distributed range related to the level of lung injury and underlying lung compliance (Figure 2 and Figure 3). 16, 17, 18, 19 During usual care pre-enrolment, patients with less compliant lungs received lower tidal volumes and those with more compliant lungs received higher tidal volumes. This was consistent with a survey published shortly before the ARMA trial, which reported that clinicians were more concerned with excessive airway pressures during mechanical ventilation rather than specific tidal volume levels. 20 To avoid high pressures in less compliant lungs, one common method used by clinicians was to reduce tidal volumes. The “traditional” 12 mL/kg PBW tidal volume chosen for the trial was being used routinely in only about 10–15% of patients with ARDS and the 6 mL/kg in less than 5% of patients with ARDS at the enrolling institutions (Online Appendix, e-Supplementary Box 2).

For about 80% of ARMA subjects assigned to the “traditional” 12 mL/kg PBW arm, ventilator tidal volumes had to be increased after randomisation to reach this level. Accordingly, airway pressures also increased in a substantial number of these participants beyond levels believed safe. 16 Contemporaneous patients with ARDS who met the eligibility criteria but were not enrolled for various reasons, and received usual care in the same intensive care units, had a significantly lower mortality rate than participants in the “traditional” arm (Figure 4). 19 As pre-randomisation ARMA data showed, usual care for these unenrolled patients with ARDS entailed adjusting tidal volumes to keep airway pressures on average at levels considered safe. 17, 18, 19 The absence of a truly representative usual care arm receiving titrated care deprived the data and safety monitoring board of an accurate benchmark against which to compare mortality and serious adverse events during the trial. Without such usual care data, the data and safety monitoring board lacked a clear basis to stop the trial early, when it would have otherwise become evident during interim analyses that the large “traditional” tidal volumes were harmful (Figure 4 and Online Appendix, e-Supplementary Box 2).

Ignoring usual care, resulting in harm to subgroups receiving care opposite to usual care treatment: Transfusion Requirements in Critical Care (TRICC) trial

The TRICC trial 21 randomised 838 critically ill but stable surgical patients who were not actively bleeding to either a “restrictive” or a “liberal” red blood cell (RBC) transfusion strategy. The restrictive strategy withheld RBCs transfusions until haemoglobin concentrations decreased to low levels (transfusion trigger, 7.0 g/dL; maintenance, 7–9 g/dL). The liberal strategy triggered transfusion when patients still had relatively high haemoglobin levels (transfusion trigger, 10.0 g/dL; maintenance, 10–12 g/dL). The TRICC trial reported that overall hospital mortality was significantly higher for patients assigned to the high RBC transfusion trigger arm compared with the low one. Based on this finding, the study concluded that critically ill surgical patients should receive RBC transfusions only if haemoglobin levels fall to low concentrations similar to the trial’s restrictive arm.

Randomising subjects to two fixed haemoglobin thresholds meant that transfusion strategies were no longer based on patient individual needs, comorbidities or underlying physiology. Instead, patients were randomly assigned to transfusion regimens chosen primarily to conserve blood supplies. Before beginning the trial, investigators surveyed 193 physicians to ascertain existing practices 22 (Online Appendix, e-Supplementary Box 3). This survey found that physicians used a wide range of haemoglobin levels to trigger transfusion but these triggers were not random. Physicians transfused RBCs to achieve higher haemoglobin levels in older patients and in those with particular comorbidities. Only 3% of physicians reported that they would use a haemoglobin trigger as low as 7 g/dL for patients with known cardiovascular disease and only 12% reported they would use one as high 10.0 g/dL for healthy young patients. This titration of care was consistent with consensus guidelines on transfusion practices at the time of the trial, which recommended individualising the administration of RBC transfusions. 23

The TRICC trial did not include a usual care group with transfusion triggers based on an individualised assessment of patient needs. The design of the trial heeded neither the results of the investigator survey nor the existing consensus guidelines on transfusion practices, instead making both arms unusual care for important patient subgroups by setting arbitrarily fixed triggers. 22, 23 This approach simplified the design but did not minimise risks.

We performed an analysis using data from the original TRICC trial publication combined with subsequently published trial data and demonstrated that subjects with pre-existing severe cardiovascular disease had a significantly different response to the two fixed transfusion thresholds compared with patients without that condition. 17, 24, 25 A more recent meta-analysis that included 15 subsequent studies randomly allocating patients to receive RBC transfusions triggered by fixed low or high haemoglobin levels and enrolling thousands of additional patients found the same result (Figure 5). 25 For a restrictive compared with liberal strategy in both analyses, subgroups with severe cardiovascular disease had increased mortality — an outcome opposite to those patients without this comorbidity. Subgroup analysis of the original TRICC trial by its investigators also indicated that younger, healthier patients with low severity of illness scores primarily drove the significant mortality increase in the liberal arm. 21

Randomisation to high and low haemoglobin thresholds in the TRICC trial ensured that one subgroup (younger, stable patients) in the high trigger arm would receive transfusions when not necessarily clinically indicated. 17, 21, 24 In contrast, another subgroup (older patients with cardiovascular disease) in the low trigger arm would receive fewer transfusions. For these different specific subgroups nested within the overall study population, one or the other of the two transfusion approaches represented unusual care. Excess mortality within the two subgroups — young, stable patients and older patients with cardiovascular disease — occurred in opposite arms of the study, rendering the overall comparison misleading. The young healthier patients in the liberal arm were exposed to unnecessary transfusions and volume challenges, while older patients with cardiovascular disease in the restrictive arm had an increased number of cardiac ischaemic events. A usual, individualised care arm might have had lower mortality than either arm. However, lack of an arm representing current medical practice, as identified in the original survey of clinicians, deprived the TRICC trial of a supported scientific basis for recommending a fixed transfusion threshold for most patients.

Remedies to improve usual care research

The three case studies above illustrate the pitfalls of designing usual care trials without first carefully documenting and understanding contemporaneous clinical practices at participating institutions and elsewhere. As shown for the three examples reviewed here, misunderstanding or oversimplifying usual care can render results uninformative, lead to inaccurate conclusions, obscure research risks, and fail to adequately protect research participants. Unfortunately, the methods employed by SUPPORT, ARMA and TRICC have been used in a large number of subsequent trials that similarly lacked treatment arms closely reflecting usual care practices. 24, 25, 26, 27 With the growth of comparative effectiveness research and its role in seeking to improve outcomes and cost-effectiveness, 28 it has become essential to carefully and rigorously document contemporaneous practices and incorporate that knowledge into trial designs and the processes for approving human subjects research. As previously described, adherence to the six recommendations listed in Box 2 may help safeguard participants by improving both the scientific and ethical integrity of comparative effectiveness research.


Randomised clinical trials of purported usual care currently have inadequate regulatory and institutional safeguards to guarantee that protocol-determined care has not become in fact unusual. Recognising and understanding risks associated with an inaccurate or incomplete representation of usual care is a critical first step toward preventing such errors in future trials. This awareness is necessary for developing regulations, guidance, and protocol approval processes that increase the rigour with which investigators, sponsors and funding agencies define usual care. Ultimately, comparative effectiveness research can only improve both resource utilisation and patient outcomes if trials have at least one study arm that is truly representative of medical practice at the time of the trial. Addressing pitfalls stemming from the incomplete, oversimplified or imperfect implementation of usual care promises to make such trials safer, more informative and better able to improve patient outcomes. Adherence to scientific rigour and accuracy in defining and delivering actual usual care is an essential cornerstone for such research to minimise risks to human subjects and realise its full potential.
Acknowledgements: We thank Michael Carome, for reviewing versions of this article; Ruth Macklin, for her work on the unrevised version of this manuscript; and Juli Maltagliati and Kelly Byrne, for help editing and submitting the manuscript. This research was supported by the Intramural Research Program of the National Institutes of Health Clinical Center. The opinions expressed in this article are the authors’ own and do not represent any position or policy of the National Institutes of Health, the Department of Health and Human Services, or the United States Government. The corresponding author confirms that they had access to all the data in the study and had final responsibility for the decision to submit for publication.