Common Misconceptions About Factorial Experiments

The RCT and the factorial design are very different designs intended for different purposes. Both can be efficient when properly applied, but they are efficient for different research questions. Because the logical underpinnings of the two types of designs are so different, it is understandable that people whose design background is primarily in RCTs might have some misconceptions about factorial experiments. It would be a shame if these misconceptions kept scientists from recognizing the advantages of factorial experiments for certain kinds of research questions! To try to prevent this, here we review some potential misconceptions and offer some suggestions for additional reading.

Overall message: If you see a 32-condition factorial experiment, try not to think of it as a 32-arm RCT, and keep an open mind about power.

Read an informal introduction to factorial experiments aimed at those with a background in the RCT.

You may also be interested in the FAQ about Factorial Experiments section of this web site. A brief but citable treatment of some of this material can be found in Collins, Dziak, Kugler, & Trail (2014).

 

Misconception 1: A factorial experiment is essentially an RCT with a lot of experimental conditions, and therefore is extremely difficult to power.

Reality: The RCT and the factorial experiment have very different logical underpinnings. In an RCT the main idea is direct comparison of a small number of experimental conditions. In a factorial experiment, the objective is estimation of main effects and interactions. These estimates are obtained by combining experimental conditions in a principled way by means of factorial analysis of variance (ANOVA). In fact, individual experimental conditions of a factorial experiment are NEVER directly compared in a factorial ANOVA (which is very counterintuitive for those trained in RCTs).

This difference in the underlying logic extends to how RCTs and factorial experiments are powered. An RCT with a small number of subjects per experimental condition is unlikely to have sufficient statistical power. In contrast, a factorial experiment with a small number of subjects per condition may have excellent statistical power. Why? Power for estimation of main effects and interactions in factorial ANOVA is based on comparison of combinations of experimental conditions, not direct comparison of individual conditions. So the number of subjects in each individual condition does not matter; what matters for power is the total sample size across all experimental conditions.

Read an informal introduction to factorial experiments aimed at those with a background in the RCT.

 

Misconception 2: Factorial experimental designs require larger numbers of subjects than available alternative designs.

Reality: When used to address suitable research questions, balanced factorial experimental designs often require many fewer subjects than alternative designs. For a brief explanation, see Collins et al. (2014); for a more extensive explanation, see Collins, Dziak, and Li (2009).

For example, Collins et al. (2011) wanted to use a factorial experiment to examine six components under consideration for inclusion in a clinic-based smoking cessation intervention. They found that whereas conducting individual experiments on each of the components would have required over 3,000 subjects, with a factorial design they would have sufficient power with about 500 subjects. In other words, conducting a factorial experiment rather than six individual experiments meant that they needed about 2,500 fewer subjects.

References

Collins, L. M., Baker, T. B., Mermelstein, R. J., Piper, M. E., Jorenby, D. E., Smith, S. S., Schlam, T. R., Cook, J. W., & Fiore, M. C. (2011). The multiphase optimization strategy for engineering effective tobacco use interventions. Annals of Behavioral Medicine, 41, 208-226. PMCID: PMC3053423

Collins, L. M., Dziak, J. J., & Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods, 14, 202-224. PMCID: PMC2796056

Collins, L. M., Dziak, J. D., Kugler, K. C., & Trail, J. B. (2014). Factorial experiments: Efficient tools for evaluation of intervention components. American Journal of Preventive Medicine, 47, 498-504.

 

Misconception 3: If you want to add a factor to a balanced factorial experiment, you will have to increase the number of subjects dramatically to maintain power.

Reality: This depends on the anticipated effect size of the factor to be added. If the factor to be added has an effect size greater than or equal to that of the factor with the smallest effect size that is already in the experiment, power will be about the same without any increase in the number of subjects. If the factor to be added has a smaller effect size than those upon which the power analyses was previously based, it will be necessary to increase the sample size accordingly to maintain power. However, unless the new effect is considerably smaller, the required increase will be modest. For more about this, see Collins et al. (2014) and Collins, Dziak, and Li (2009).

The power of a factorial experiment depends on the overall sample size, not the number of experimental conditions or the number of subjects in each condition (except to the extent that these impact overall sample size). Scientists whose backgrounds are primarily in designs like the RCT often find this counterintuitive.

Read an informal introduction to factorial experiments aimed at those with a background in the RCT.

Reference

Collins, L. M., Dziak, J. D., Kugler, K. C., & Trail, J. B. (2014). Factorial experiments: Efficient tools for evaluation of intervention components. American Journal of Preventive Medicine, 47, 498-504.

Collins, L. M., Dziak, J. J., & Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods, 14, 202-224. PMCID: PMC2796056

 

Misconception 4: The only reason to conduct a factorial experiment is to test for interactions between factors.

Reality: Even if it were somehow known with certainty that there were no interactions between factors, a factorial experiment might still be attractive if it required fewer research subjects than the alternatives being considered.

In fact, in some ways not expecting any interactions is an ideal scenario for the use of factorial designs, because it provides a great justification for the use of extremely efficient fractional factorial designs. (A brief introduction to fractional factorial designs can be found in Collins, Dziak, & Li, 2009; and Dziak, Nahum-Shani & Collins, 2012.)

Reference

Collins, L. M., Dziak, J. J., & Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods, 14, 202-224. PMCID: PMC2796056

Dziak, J., Nahum-Shani, I. R., Collins, L. M. (2012). Multilevel factorial experiments for developing behavioral interventions: Power, sample size, and resource considerations. Psychological Methods, 17, 153-175. Advance online publication. PMCID: PMC3351535

 

Misconception 5: There is always less statistical power for interactions than for main effects in a factorial ANOVA. Power decreases as the order of the interaction increases.

Reality: When effect coding is used, statistical power is the same for all regression coefficients of the same size, whether they correspond to main effects or interactions, and irrespective of the order of the interaction. (Note that the regression coefficient is not the only way to express the effect size of an interaction.)

However, the effect sizes for interactions may be smaller than those for the main effects in a given study, and the effect sizes for higher-order interactions may be smaller than those for lower-order interactions. (This is consistent with the sparsity, or Pareto, principle in engineering.) If that is the case, then the power of course will be lower for the smaller effect sizes. But the lower power is due to the smaller effect size, not to anything inherent about interactions or the use of a factorial design.

 

Misconception 6: Any interaction between factors necessarily makes interpretation of main effects impossible.

Reality: We recommend use of effect (-1,1) coding for component selection experiments in MOST. When effect coding is used and there are equal ns per condition, main effects and interactions are orthogonal. Of course it is always important to consider interactions thoughtfully when interpreting main effects, but the main effects are more readily interpretable when all the effects are orthogonal.

When dummy (0,1) coding is used many of the effects being tested are not orthogonal.  This can lead to interpretational difficulties.

Dummy coding and effect coding produce estimates of different effects, and thus the ANOVA results must be interpreted differently. For more information, please see Kugler, Trail, Dziak, and Collins (2012).

Reference

Kugler, K. C., Trail, J. B., Dziak, J. J., & Collins, L. M. (2012). Effect coding versus dummy coding in analysis of data from factorial experiments (Technical Report No. 12-120). University Park, PA: The Methodology Center, Penn State.