Modeling Multiple Risk Factors

I want to investigate multiple risk factors for health risk behaviors in a national study, but do not know how to handle the high levels of covariation among the different risk factors. Do you recommend that I regress the outcome on the entire  set of risk factors using multiple regression analysis? Or should I create a cumulative risk index by summing risk exposure,  and regress the outcome on that index? — Signed, Waiting to Regress


Dear WTR,

Recognizing that individuals develop within multiple contexts, and therefore simultaneously can be exposed to numerous—often highly correlated—risk factors, is critical in studies of human behavior and development. Historically, multiple risks for a poor outcome have typically been modeled using two approaches: multiple regression analysis and/or a cumulative risk  index.

Multiple regression allows us to examine the relative importance of each risk factor in predicting the outcome, but there can be drawbacks: without the inclusion of many higher-order interactions, it is impossible to examine how exposure to certain combinations of risk factors impacts the outcome. Also, high levels of multicollinearity (for example, among risk  factors such as maternal education, neighborhood disorganization, and residential crowding) can severely distort inference  based on multiple regression.

Alternatively, a cumulative risk index can be created by summing for each individual the number of risk factors to which  they are exposed. Instead of regressing the outcome on all individual risk factors, the outcome is simply regressed on the index score. Early work in this area represented an important step forward in thinking about multiple risks. A downside to this approach is that each risk factor is equally weighted; this means that exposure to each additional risk factor (regardless of which one) corresponds to an equal level of increased risk. Furthermore, such an index does not provide any insight into how particular risk factors co-occur or interact with each other.

A person-centered approach to modeling multiple risks was recently demonstrated by Lanza and colleagues (Lanza, Rhoades, Nix, Greenberg, and CPPRG, 2010). This study used latent class analysis to identify four unique risk profiles based on exposure to thirteen risk factors across child, family, school, and neighborhood domains. Each risk profile characterized a unique group of children: (1) Two-Parent Low Risk, (2) Single Parent with a History of Problems, (3) One-Parent Multilevel Risk, and (4) Two-Parent Multilevel Risk. Compared to a cumulative risk index, an examination of the link between risk  profile membership during kindergarten and externalizing problems, school failure, and low academic achievement in fifth grade provided a more nuanced understanding of the early precursors to negative outcomes. The application of latent class analysis to multiple risks holds promise for informing the refinement of preventive interventions for groups of children who share particular combinations of risk factors.


Lanza, S. T., Rhoades, B. L., Nix, R. L., Greenberg, M. T., & The Conduct Problems Prevention Research Group (2010). Modeling the interplay of multilevel risk factors for future academic and behavior problems: A person-centered approach. Development and Psychopathology, 22, 313-335.