Ethnicity: M = 1.26, SD = .529?

I understand that one has to “numericize” categories for quantitative analysis, but any student and faculty should understand the numbers mean nothing when compared (see Figure 1).

Figure 1: Barplot of Ethnicity (Race) with Distribution overlay (Deonarinesingh, 2019, p. 59).

A chairperson has to perform a lot of reading when reviewing a student’s dissertation. A committee member can hopefully help. But this type of error appeared on every chart in this study’s Chapter 4; regardless of the type of variables (e.g., categorical, interval). Did the faculty not know, or did they simply not read the study?

Student Note: Understand your variables and how best to display them. Don’t rely on your committee; they might not know or remember.

This study will return in a later post…stay tuned.


Deonarinesingh, S. (2019). The effect of cultural intelligence upon organizational citizenship behavior, mediated by openness to experience (Doctoral dissertation). ProQuest Dissertations & Theses Global: The Humanities and Social Sciences Collection. (13880805)


Quantizing qualitative data outside of a mixed methods study?

As part of an upcoming paper on the quality of doctoral research in DBA programs, I came across a study where the emerging scholar did one of my pet peeves: quantizing qualitative information outside of a mixed method research design. As a result, I employ techniques performed in a prior post to illustrate how this approach can be detrimental to the effect of a study.

Wagner (2019) explored the effectiveness of the Department of Defense’s Transition Assistance Program (TAP) as it related to California veterans. According to the author’s sources, California is home to over 1.8M veterans; 230,000 of those serving post-9/11. The emerging scholar used a questionnaire, rather than an interview guide, to collect data. From information collected via questionnaire (N = 10), the scholar coded participants responses into three levels: high, moderate, and low regarding the participants confidence in a service provided by the TAP. For example, relating to preparedness to transition from the military to the private sector (Appendix E, p. 121), the scholar coded responses into three categories –

  • High – Veteran utilized resources provided by TAP for financial health, relocation, and career search
  • Moderate – Veteran able to locate financial, networking, and relocation resources, but did not use
  • Low – Veteran did not know where to (sic) locate essential transition resources.

Once coded, two faculty reviewed the coding and confirming the ‘classification’ (pp. 52-53). Student Note: This type of step is needed for internal validity in some research designs.

I chose one item for illustration purposes, but the other items are similar in form and content. Participants were asked about drafting a basic resume (see pp. 53-57). From their responses, the emerging scholar classified them into the three levels of confidence and created a cross tabulation of the results. Based on these results, the scholar stated –

The majority of respondents (60%) had a high degree of confidence in drafting a basic resume and cover letter after participating in TAP.

Wagner, 2019, p. 53

Later, the scholar wrote –

High confidence was exhibited by participants who felt empowered by the TAP workshop and were capable of drafting a basic resume. Moderate confidence was demonstrated by veterans who obtained skills to create a basic resume while veterans with low confidence struggled to translate their military career to a basic resume or lacked focus

Wagner, 2019, p. 54 (emphasis added)

By simply describing and interpreting the responses, and not quantizing them into levels, the emerging scholar’s analysis may have had more influence on a reader. There are still questions about the depth of inquiry (e.g., questionnaire vs. in-depth interviews), but that’s hard to explore without obtaining transcripts. However, when quantizing comes into play a reader has to consider the writer’s level of confidence in a “majority of respondents (60%)” statement based on a sample size of 10.

Using the information in Table 3 of the study, I added 95% CI error bars that equate to an N = 230,000 and an n = 10 (CI = 31%; Figure 1) –

Figure 1. Confidence in Drafting a Resume after receiving TAP training with 95% CI = 31% (N = 10)

As one can see, each level’s confidence interval covers the other levels, and detracts from the effect of the study. Just for the record, there was no statistical difference between the three groups, X2(2) = 0.9722, p = .615, due to the small sample size.

This was not a qualitative study; this was merely a quantitative descriptive study. Using the author’s words, the analysis was based on data elicited from “17 formalized questions used during the interview process” (p. 89). Later, the author used the phrase ‘general consensus’ when describing how pre-2011 Veterans Opportunity Work Act participants felt that TAP was a “check in the box as part of out-processing” (p. 51), and TAP provided “adequate support to draft a basic resume and cover letter” (p. 93). Had the scholar simply reported the descriptive statistics based on a larger population, he may have had something; however, would the University had granted a doctorate for that level of rigor?

From a management perspective, if you were in a position to redirect the TAP program and read this study, would you act on these types of results?

Student Note: Make sure you clearly align your research question, research method, and research design. Also, make sure you speak to several faculty members at your university who perform research to get their view on your proposed study methodology. Some faculty focus on only one type of method and try to stuff every study into that mold…right or wrong. Some faculty focus on certain types of QUAN or QUAL. Others only know the method they performed when they did their study. Heck, they may have done their study incorrectly…Remember: It’s your study and it will become a public record.


Wagner, J. D. (2019). Effectiveness of the Transition Assistance Process (TAP) in building career self-efficacy for California post-9/11 veterans (Doctoral dissertation). ProQuest LLC. (13865682)

Luck, inadvertent omission, or lack of knowledge?

Johnson (2018) explored the willingness to hire people who were convicted of drug crimes. The scope of the study was limited to the Central Virginia region. To answer the first research question (How does the willingness to hire returning citizens by Central Virginia employers differ by position/job role in private sector, for-profit business firms?), the emerging scholar used descriptive measures (rather than inferential statistics).

Johnson stated that the null hypothesis could be rejected for three types of jobs: Unskilled, Semi-skilled Labor, and Skilled Labor. I suppose that the decision was based on point estimates above 50% (Figure 1) –

Figure 1. Willingness to hire by position/job role (Johnson, 2018, p. 56)

However, when rejecting null hypotheses based on sample data, confidence intervals must be considered. Based on the information provided by the emerging scholar in his study, there were 653,193 businesses in the sample frame. A quota sample of 635 was chosen (p. 35). Using R and the samplingbook package (Manitz, 2017), that equates to having a 95% CI of 3.89% (see below).

sample.size.prop(e = .0389, N = 653193, level = .95)

sample.size.prop object: Sample size for proportion estimate
With finite population correction: N=653193, precision e=0.0389 and expected proportion P=0.5

Sample size needed: 635

I then recreated the graphic using the ggplot2 package (Wickham et al., 2020), and added the 95% CI (Figure 2).

Figure 2. Recreated Willingness to hire by position/job role with 95% CI = 3.89%

Okay. I see it. However, only 105 complete responses were obtained, not the target sample of 635. Using the same method to calculate the 95% CI above, I backed into a 9.6% 95% CI (see below):

sample.size.prop(e = .096, N = 653193, level = .95)

sample.size.prop object: Sample size for proportion estimate
With finite population correction: N=653193, precision e=0.096 and expected proportion P=0.5

Sample size needed: 105

Thus, the 95% CI changed from a planned 3.83% to an actual 9.6%; a 2.5x increase in interval width. When overlaying the new 95% CI on the data, new perspectives emerge (Figure 3).

Figure 3 Willingness to hire by position/job role (Johnson, 2019, p. 56) with 95% CI = 9.6%

Visually, one can see that the emerging scholar is correct when stating that Semi-skilled Labor and Skilled Labor fall above the 50% line; even when accounting for the 95% CI. However, the error bar for Unskilled Labor (a) drops below 50%, and the error bar for Clerical Labor (d) rises above 50%. Should Unskilled Labor be omitted from the rejection? Should Clerical Labor be included in the rejection of the null hypothesis? It appears both a Type I and a Type II error occurred.

One note: The emerging scholar reported his findings as being similar to research performed in Sweden by Ahmed and Lang (2017). The authors wrote –

We found that ex-offenders were discriminated against in the occupations accounting clerk, cleaner, preschool teacher, restaurant worker, sales person, and software developer. However, we did not observe any statistically significant discrimination against ex-offenders in the occupations auto mechanic, enrolled nurse, and truck driver. 

Ahmed & Lang, 2017, p. 17

Well, they don’t now. Also…Virginia = Sweden? That may be a stretch…

Student Note: Descriptive statistics are not inferential statistics. Know the difference.


Ahmed, A., & Lang, E. (2017). The employability of ex-offenders: A field experiment in
the Swedish labor market. IZA Journal of Labor Policy, 6(1), Article 6.

Johnson, R. (2018). Willingness of employers to hire individuals convicted of drug crimes in Central Virginia (Doctoral dissertation). ProQuest LLC. (13421921)

Manitz, J. (2017, May 21). samplingbook: Survey sampling procedures.

Wickham, H., Chang, W., Henry, L., Pederson, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., Dunningham, D., & RStudio (2020, June 19). ggplot2: Create elegant data visualizations using the Grammar of Graphics.

DV not measured but part of the title and other issues…

In Transformational Leadership, Organizational Commitment and Taking Charge in Small Businesses in the Northwestern United States, Willis (2020) reports to have measured three dimensions –

  • Transformation Leadership
  • Organizational Commitment
  • Taking Charge

The emerging scholar used the MLQ-5X Short Form to measure transformational leadership, and the Three-Component Model (TCM) of commitment Employee Commitment Survey to measure organizational commitment (pp. 61-67). Both are widely used in social science research. I was intrigued about how the “Taking Charge” dimension would be measured. During my time reviewing student dissertations, I had never seen the phrase operationalized. However, I was surprised to not find a discussion about it in the methodology section, although it is prominently listed in RQ2: What is the relationship between transformational leadership and organizational employee take-charge behavior? (emphasis added).

After a search for the phrase in the document, I found an interesting passage –

The instruments used by Morrison and Phelps and Kim and Lui ultimately did not measure take charge behavior as defined in this study. No instrument was found to measure take charge behavior adequately.

Willis, 2020, p. 27

So, the phrase has been defined, operationalized, and written about in peer-reviewed journals by Morrison and Phelps (1999), and used and confirmed by Kim and Lui (2017); however, that didn’t match the definition adopted by the emerging scholar (and the committee)? Why not simply change the definition to match other researchers? At a minimum, create an instrument that matches the new operationalized definition? Regardless, why keep the phrase in the title for search engines to find? Post-publication review begins!

Willis cited Love and Dustin (2014) as the source for this definition of Taking Charge –

The efforts both voluntary and constructive in the nature and exertion of the individual employee’s desire to affect change within the organization about how tasks are executed

Willis, 2020, p. 18

If Love and Dustin’s definition differed from Morrison and Phelps and Kim and Lui, why not use the instrument that Love and Dustin used in their study? Guess what? They measured the Taking Charge dimension using the Morrison and Phelps instrument!

Mads Østberg on Twitter: "As Homer Simpson would have put it, DOH! 🙈🤦… "
I can’t believe somebody read my doctoral study

I don’t understand why the committee didn’t press the issue with the student (if they even read the study), and how the title and abstract, which states RQ2 was never answered, got through at least three reviewers. Now back to RQ1.

In RQ1, the emerging scholar explored the relationship between transformation leadership and organizational commitment. This is a common research question for doctoral students. To answer the question, one would follow a standard process –

  • Collect data via survey
  • Form the dimensions and subdimensions of inquiry by averaging items (e.g., add items a, b, c, & d and divide by 4). Don’t forget to reverse code when necessary!
  • Report descriptive statistics (M, SD, SE)
  • Perform exploratory data analysis such as examining outliers and making a decision about the distribution of each variable by looking at graphs and performing statistical tests
  • Perform a statistical test appropriate for interval variables such as Pearson Product-Moment Correlation or Spearman Rank-Order Correlation.

What was done? The emerging scholar did not report any descriptive statistics regarding the dimensions formed (if they were even formed), and there is no reporting on the variable’s distribution. A Chi-square test was used to reject the null hypothesis. A Chi-square test is used to examine the association of categorical variables; think of a 2X4 matrix of Gender (0/1) and Education(0:4). In other words, the wrong test was used. The emerging scholar merely reported the effect size (Cramer’s V), and p-value.

Who reviewed this study? The review process let this student down. As a result, the results of this study should be ignored.

Student Note: Here’s a great chance to do some research; just do it correctly.


Liu, Y., Loi, R., & Lam, L. W. (2011). Linking organizational identification and employee
performance in teams: The moderating role of team-member exchange. International
Journal of Human Resource Management, 22

Love, M. S., & Dustin, S. L. (2014). An investigation of coworker relationship and psychological collectivism on employee propensity to take charge. The International Journal of Human Resource Managment, 25(9), 1208-1226.

Morrison, E. W., & Phelps, C. C. (1999). Taking charge at work: Extrarole efforts to initiate
workplace change. Academy of Management Journal, 42(4), 403–419.

Willis, S. G. (2020). Transformational leadership, organizational commitment and taking charge in small businesses in the Northwestern United States (Doctoral dissertation). ProQuest Dissertations & Theses Global: The Humanities and Social Sciences Collection. (27998399)

P-P plot vs. Q-Q plot…

I noticed a pattern at one University. The students in the business program were using a P-P plot to examine the distribution of residuals in regression models, when a Q-Q plot is widely referenced in statistics textbooks. So I looked deeper into the differences between P-P and Q-Q plots with simulated data.

Data Creation

First, I used R to create a normally distributed data set (N = 50) with an M = 3.0 and an SD = 1.0.

ndata <- rnorm(n = 100, mean = 3.0, sd = 1.0)

Review of Histogram

Next, I used ggplot2 (Wickham et al., 2020), to create a histogram of the data.

The data looks approximately normal; however, note the distance between the two tails and the other data points.

Tests of Normality

Next, I’ll perform a series of statistical tests to see if the data follows a theoretical normal distribution. For this illustration, I’ll use six different tests: Shapiro-Wilk test (found in the stats package, which is loaded automatically by R), Anderson-Darling, Cramer-von Mises, Kolmogorov-Smirnov w/Lilliefors correction, Pearson Chi-Square, and Shapiro-Francia (found in the nortest package; Gross & Ligges, 2015).


Shapiro-Wilk normality test

data: test$ndata
W = 0.99155, p-value = 0.02226

Anderson-Darling normality test

data: test$ndata
A = 0.72545, p-value = 0.05808

Cramer-von Mises normality test

data: test$ndata
W = 0.10841, p-value = 0.0859

Lilliefors (Kolmogorov-Smirnov) normality test

data: test$ndata
D = 0.042662, p-value = 0.07763

Pearson chi-square normality test

data: test$ndata
P = 62.88, p-value = 1.344e-06

Shapiro-Francia normality test

data: test$ndata
W = 0.99237, p-value = 0.03868

Interesting…three of the tests (Anderson-Darling, Cramer-von Mises, and Kolmogorov-Smirnov w/Lilliefors correction) found the distribution to follow a theoretical normal distribution (p > .05), while three others (Shapiro-Wilk, Pearson Chi-square, and Shapiro-Francia) did not. What to do?

One could pick a test and make a decision, but the histogram and test may demonstrate to the reader that the decision was subjective. Let’s try to plot the data against a theoretical normal distribution.

The P-P Plot

Using ggplot2 and qqplotr (Almeida et al., 2020), I created a P-P plot based on the data and plotted a 95% CI band on the AB line –

ggplot(data = test, mapping = aes(sample = ndata)) +
stat_pp_band() +
stat_pp_line() +
stat_pp_point() +
labs(x = “Probability Points”, y = “Cumulative Probability”)

Note the “submarine sandwich” 95% CI band around the data. A P-P plot focuses on the skewness or asymmetry of the distribution. Thus, the mode is magnified. If relying on a P-P plot, an emerging researcher could rely on some of the statistical tests to state the distribution following a normal distribution and use a P-P plot to support that conclusion.

The Q-Q Plot

Next, let’s plot a Q-Q plot using the same parameters –

ggplot(data = test, mapping = aes(sample = ndata)) +
stat_qq_band() +
stat_qq_line() +
stat_qq_point() +
labs(x = “Theoretical Quanitles”, y = “Sample Quantiles”)

Interesting. In the Q-Q plot, points at both tails deviate from the 95% CI of a theoretical normal distribution. A Q-Q plot magnifies deviations at the tails. Thus, if an emerging scholar was looking at a Q-Q plot with certain tests of normality, one could decide that a residual or a variable did (or did not) follow a normal distribution.

It appears a P-P plot is best when used to explore extremely peaked distributions, while a Q-Q plot is best used to explore the influence of tails of a distribution.

Why is a P-P Plot is chosen more frequently at this school?

I corresponded with a methodologist at this University and she shared a few thoughts –

  • Many universities (and students) use SPSS in their coursework. In the regression menu option, there is a Probability Plot option box. If checked, it creates a P-P plot. Note: A Q-Q plot is not offered within the regression menu. See this link on how to create a Q-Q plot from regression residuals in SPSS.
  • Field (2018) is used as the associated textbook when teaching SPSS in doctoral business programs. The author prominently discusses P-P plots in this version of the textbook. Note: He also covers Q-Q plots but in a more subtle way and the discussion is buried in a graphics section. When found, the author refers to an earlier discussion on quantiles and quartiles. In the R version of book (Field et al, 2012), the Q-Q plot is referenced and their is no reference to a P-P Plot.

Student Notes: Don’t be a slave to a single author’s view: Expand your knowledge by reading different points of view. Don’t be a slave to a menu-based system: Learn about the statistical tests, how they are interpreted, and what the plots represent.


Almeida, A., Loy, A., & Hofmann, H. (2020, February 4). qqplotr: Quantile-quantile plot extensions for ‘ggplot2’.

Field, A. (2018). Discovering statistics using IBM SPSS Statistics (5th Ed.). SAGE Publications.

Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. SAGE Publications

Gross, J., & Ligges, U. (2015, July 29). nortest: Tests for normality.

Wickham, H., Chang, W., Henry, L., Pederson, T. L., Takahshi, K., Wilke, C., Woo, K., Yutani, H., & Dunnington, D. (2020, June 19). ggplot2: Create elegant data visualizations using the Grammar of Graphics.