Maximum Likelihood vs. Multiple Imputation

Which is better for handling missing data: maximum likelihood approaches like the one incorporated in the structural equation modeling program AMOS, or multiple imputation approaches like the one implemented in Joe Schafer’s software NORM? — Signed, Not Uniformly There


Dear NUT,

Maximum likelihood (ML) and multiple imputation (MI) are two modern missing data approaches. Neither is inherently better than the other; in fact, when implemented in comparable ways the two approaches always produce nearly identical results. However, in practice ML and MI are sometimes implemented differently in ways that can affect data analysis results (Collins, Schafer, & Kam, 2001).

With either ML or MI the information used for modeling missing data comes, naturally, from the set of variables available to the procedure. With ML, this set of variables is typically confined to those included in the particular scientific analysis at hand, even if this means omitting one or more variables that contain information necessary to the missing data model. For example, in an analysis examining the relation between parental characteristics and offspring self-reported substance use, offspring reading test scores may not be included because this variable is not of immediate scientific interest. However, if slow readers are less likely to complete the questionnaire, then omitting this variable may mean that missing data will affect the results even though a ML procedure was used.

Because with MI the imputation is typically done separately from scientific data analyses, many additional variables in a data set easily can be included in the imputation process. Thus the likelihood of omitting a variable important to the missing data model is greatly reduced. We see this as an advantage of MI currently, but it is not an inherent advantage, because additional variables can be included in ML for the purpose of enhancing the missing data model (Graham, 2003). Unfortunately, most ML software makes this more difficult than it needs to be, and many users are not aware that adding such variables is either beneficial or possible.


Collins, L.M., Schafer, J.L., & Kam, C.M. (2001). A comparison of inclusive and restrictive strategies in modern missing-data procedures. Psychological Methods, 6,330-351.

Graham, J.W. (2003). Adding missing-data-relevant variables to FIML-based structural equation models. Structural Equation Modeling, 10, 80-100.