"

Chapter 5. Issues in Building Multiple Regression Models

There are several other issues we will consider in conceptualizing how to build explanatory models for quantitative outcomes. In particular, several classes of variables exist that are often the focus of multiple regression and analysis of variance (ANOVA) analyses. The types of variables include confounding variables, mediating variables, and, moderating variables.

I. Confounding Variables

The idea of “confounding variables” or confounds arises in both ANOVA and multiple regression. Confounds arise when two predictor variables are highly related or “confounded” with each other. Such variables are confounded or “confused” with each other because they are interrelated.

At the extreme, confounding can lead to multicollinearity. This sort of confounding is not good because it means that two variables are so similar that we cannot tell “which is which.”

However if two confounded X are related, but not so highly related that they are multicollinear, it may be important to include them in our regression models in order to “control for” the confound.

If we include potential confounding variables, we may be able to eliminate “spurious relationships” between other X and the outcome. We’ll consider an example.

Suppose that we are interested in the relationship of numbers of publications to faculty salaries among UBalt faculty. If we simply count publications (X) and measure faculty salaries (Y), we will likely find a positive relationship.

However, it is also true that the longer a person has been on the faculty, the higher their salary will be. Also, in general, years in the profession is likely to be related to number of publications.

The relationship between numbers of publications and salary is likely to be smaller once we control for years in the profession. So years in profession and number of publications are somewhat confounded.

If we think that we are seeing a stronger relationship than is warranted (because we have not controlled for a confounder) then we are seeing a so-called “spurious relationship.” In such cases it is important to include potential confounds in our regression models in order to reveal the “true” nature of the relationships of other X to the outcome. If we are interested in causal inferences, confounds are similar to mediators.

II. Mediating Variables

The idea of mediating variables arises when we think of causal relationships. Influences of a prior variable are transferred or transmitted to an outcome via an intervening or mediating variable. In mediator relationships the prior or precursor variable is typically found to have no direct relationship to the outcome. Its role occurs only via the mediator.

Mediating variables can take many forms but let us focus on a simple type of relationship—suppose X_1 precedes X_2, which precedes Y. We may draw

X_1 \longrightarrow X_2 \longrightarrow Y

X_2 is a mediator if there is no direct relationship of X_1 to Y.

This example of a mediating relationship is from research I did with Frances Berry and Donwe Choi on the prediction of risk acceptability outcome in community residents living in the surrounding area of a proposed nuclear power plant (Nam-Speers et al., 2020). The variable X_1 was extent of trust in local government among the residents and X_2 represented the benefits they perceive from the nuclear facility. The Y was a measure of people’s risk acceptability of the proposed plant.

X_1 \longrightarrow Y

Trust in Local Government → Risk Acceptability

In studies where researchers had examined they had found no relationship. However, when they examined

Trust in Local Gov’t → Benefit Perception → Risk Acceptability

by adding the mediator of “Benefit Perception,” they found that, when residents had perceived like there would be a benefit(s) for them, more the people’s trust level in local government affected their risk acceptability of the proposed plant.

The X_2 (benefit perception) was a mediator of the trust in local government-outcome relationship. Without the mediator, the relationship between trust in local government and risk acceptability was not revealed. This example shows the assumption of model specification is important.

III. Moderating Variables

Moderator variables are often categorical variables—they moderate or change the results of another set of variables. Some authors have argued that it is reasonable to think of the relationship of two variables (say, X and Y) as a function of a third variable Z. The third variable is the moderator variable.

Clearly in its simplest form this kind of situation occurs when we have interactions—an interaction tells us that the effects of one predictor on an outcome depend on (are a function of) a second factor. When we have interactions in multiple regression, where perhaps the nature of the relationship of X to Y depends on subject gender, we can say that gender moderates the X-Y relationship.

Moderator variables typically appear in regressions as dummy variables (or sets of dummy variables) and interactions, and here they may have the effect of both representing a difference in level of the outcome (the raw dummy variables) and being a moderator—which we investigate by creating an interaction term for the regression model using the product of the dummy (moderator) and another X.

In multiple regression the interaction tells us that the relationship of one predictor to the outcome depends on (is a function of) the moderator variable. In both multiple regression and ANOVA a moderator can change the sign of the effect or relationship, or simply modify the size of the effect or relationship.

 

Sources: Modified from the class notes of Salih Binici (2012) and Russell G. Almond (2012).