# When categorical variables and moderation analysis goes wrong…

I stumbled across a dissertation (Bosh, 2020), in which the student performed a moderation analysis using categorical variables. By performing a moderation analysis, a researcher is examining if the causal relationship between an independent variable (X) and the dependent variable (Y) changes upon the introduction of a moderating variable (M). To test for moderation, both X and M must be entered into the regression formula, to examine the main or simple effect, along with the interaction (X*M).

Y = i + aX + bM + cXM + e                      (1)

If the p-value of the moderating variable is statistically significant, then the main effects are ignored and the moderator becomes the focus. I have found moderation analysis can be confusing to students who don’t have a good grasp of statistics.

The student examined categorical variables as moderators. Categorical variables of three or more levels should be dummy-coded, since categorical variables with two levels are naturally dichotomous (0/1). This study had four categorical variables: Age, Gender. Marital Status, and Tenure (p. 110). The student references dummy coding but only in relation to Gender and Marital Status; two variables that are either naturally dichotomous (Gender) or artificially dichotomized in the study (Marital Status). No reference to dummy coding was made for Age and Tenure (p. 117). Student Note #1: When using categorical variables in regression, make sure you understand dummy coding.

In dummy coding, a researcher transforms a nominal or ordinal variable into k-1 variables (k refers to the number of levels). For each variable, a specific category is coded as a 1 and all other units are coded as 0. Age, for example, has three levels: 18-33, 34-49, and 50-65. Age would be dummy coded into two variables (Age1 and Age2), with 34-49 being represented by a 1 in Age1 and 50-65 being represented by a 1 in Age2. The base level, 18-33, would be represented as a 0 in both Age1 and Age2. Thus, Age1 and Age2 would be represented as deviations from the base level. For a great discussion of dummy coding, effect coding, and weighted effect coding, see Grotenhuis et al. (2017).

When the student begins testing hypotheses (p. 134), I know two variables are coded correctly and two that are questionable. However, upon inspection of the output, I note that there is no evidence that moderation was examined. In reviewing Table 22 in the study, the main affect of Job Satisfaction was used as a predictor of Affective Commitment in Model 1. However, the two poorly formed MVs of Age and Tenure were entered as a block in Model 2. Entering additional variables into a regression formula and examining the changes is not moderation analysis; it is simply a measurement of change in a model upon the inclusion of additional variables. The other two dichotomous variables, Gender and Marital Status, are entered as a block in the third model.

What does all this mean? Well, the student didn’t structure the moderation analysis properly. First, ordinal independent variables were not dummy-coded properly. Second, interaction was not examined. Could there be a moderating effect of these categorical variables? Maybe. We’ll never know. Technically, this is an example of a combined Type II and Type III error.

I reached out to Capella University via email to request the student’s email address so I could include his thoughts in this discussion; potentially working to perform a post-publication analysis of data. The University did not reply to my email nor to my follow-up phone call to the University’s FERPA representative. I also reached out to the student’s chairperson for comment. No reply.

Instructions to Students

Ignore the results of this study. However, the framework set by Bosh (2020) is ripe for replication. Simply cite the results of the study and the problems in the analysis as a reason for the need to replicate, and do the analysis correctly.

Reference:

Bosh, G. B. (2020). Explanatory relationships among employees personal characteristics, job satisfaction, and employee organizational commitment (Doctoral dissertation). ProQuest Dissertations Publishing. (27837234)

Grotenhuis, M., Pelzer, B., Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017). When size matters: advantages of weighted effect coding in observational studies. International Journal of Public Health, 62(1), 163–167. https://doi.org/10.1007/s00038-016-0901-1