Alignment of themes to research question…

I started writing this blog post about priming interviewees in qualitative research. However, once I got into writing, I realized I simply found another poorly performed qualitative study. However, I did want to discuss aligning research-deduced themes with research questions. Here’s the study –

Job Satisfaction and Job-Related Stress among NCAA Division II Athletic Directors in
Historically Black Colleges and Universities

Name withheld (but you can search for the study)

I’ve been involved with many students who are exploring job satisfaction and job-related stress in a variety of industries, but I’ve never heard of a study on this topic in university athletic directors (AD’s). What surprised me was the study wasn’t quantitative; it was qualitative.

The emerging scholar’s overarching research question was –

What strategies do ADs at HBCUs implement to manage departments with limited resources?

p. 14

What does the phrase ‘limited resources’ mean? It would seem that some form of quantitative measure would need to be used to separate athletic departments into categories based on resources. However, I found this sentence –

…there was an assumption that HBCU athletic directors would experience job dissatisfaction and
job-related stress due to decreased funding, inadequate facility management, and
inconsistent roster management

p. 19

Wow! This statement makes it easy for a researcher…I’ll just assume something is happening whether true or not.

Now, a quick note about priming. The interview guide can be found on Appendix C of the dissertation. Honestly, it’s not really an interview guide. The student employed the ‘oral survey’ Q&A approach often suggested by faculty that have limited understanding of qualitative data collection methodologies. Rather than critique the self-described “interview questions,” I will point out one issue –

Q3 – What strategies have you implemented to motivate your staff and thereby increase
job satisfaction?

p. 133

This question requires the interviewee to –

  • Understand the word strategy or, at a minimum, understand the researcher’s definition of the term
  • Differentiate a strategy from a tactic
  • Reflect on how a strategy has been specifically applied to or influenced staff motivation
  • Reflect on staff responses to the strategy and subjectively estimate its influence on their own level of job satisfaction

In other words, the emerging scholar placed the responsibility for the study’s results on the interviewee responses, not on the interpretation of the responses. Ugh!

What would have happened if the emerging scholar simply started with –

  • How do you motivate your employees?
  • How do your employees respond to the techniques you employ to motivate?
  • When do you decide to change methods?

The aforementioned approach allows the interviewees to describe the methods they use to motivate employees, which would then be analyzed by the emerging scholar as a strategy or tactic. Each motivational technique could be explored in-depth by follow-up questions and, subsequently, tied back to the literature. Next, the emerging scholar could explore in-depth with the interviewee the responses by employees. Did the description provided by the interviewee align with the expectations found in the literature? Finally, discussing a change in methods and its impetus, could result in an alignment with the research question?

When I finally got to the themes, I chuckled:

  • Shared responsibility – “participants believed the workplace demands they face daily do not allow them to have the ability to make all decisions for the department. Having shared responsibilities among other leaders within the department was essential for each athletic director” (p. 97). Every job has some level of work demand. Some demands are based on the lack of resources (e.g., human capital), some are note (e.g., heavy lifting). In the academic literature, sharing responsibility within an organizational unit is the tenant of work-based teams. It would seem the study participants are simply employing widely-referenced management techniques. However, since the emerging scholar assumed all HBCU ADs face limited resources, this had to be a theme.
  • Empowering staff – The emerging scholar didn’t describe the meaning of this phrase; rather, paraphrased material was listed from external sources (two sources cited weren’t listed in the References). However, similar to shared responsibility, employee empowerment is an oft-studied topic in the literature.
  • Limited resources to grow facilities – The term ‘resources’ in this context relates to financial resources. ADs are often held accountable for promotion of their programs; however, how much of that job is part of their normal duties? Based on how the emerging scholar phrased the research question, this theme is not aligned with the research question.
  • Limited female participation – The emerging researcher delved into gender equity, the recruitment of females to play sports, and the balance between males and females in sports. This topic relates to recruitment, probably more about society than management…again unrelated to the research question.

In the emerging scholars biography she stated that she works for an HBCU athletic department, so I acknowledge the interest. She also stated that she would like to pursue an athletic department job. That’s great! If you, too, are an emerging researcher and you look at this study for references, that’s fine…just be wary about citing these results. Redo the research.



I was asked by a colleague to review a nearly completed doctoral manuscript to opine on a chairperson’s recommendation to the student on how to address a small sample size. According to the student’s use of G*Power, a sample size of 67 was required (see below):

While the student received 67 responses, only 46 (68.7%) were usable. In an attempt to help the student, the chairperson recommended the student (a) randomly select 4 records from the 46, and (b) add them back to the sample (increasing the sample from 46 to 50). Why 4? Why not 3? 5? Why not duplicate the entire sample and have an N = 92? With an N = 92, one could find an r = .327 (assuming a SP = .95). The student couldn’t answer those questions as he was merely following instructions of the faculty. That is a topic for another day…

What could have the student done?

Option 1: Use what you have

Do the study with the 46 records. If one reduces their view of statistical power to that of the widely-referenced .80 associated with social science (see the seminal work of the late Jacob Cohen), an r = .349 (well below the estimated effect size) could still be found (see below):

Whatever effect size is found by the student, the value can be compared to the effect size hypothesized (r = .377), and differences explored by the student in the study. A learned faculty would suggest the student report 95% Confidence Intervals (CI), and compare the hypothesized effect size to the upper and lower CI. If the hypothesized values are within the range of the CI, then it’s probably a sampling error issue. If the hypothesized values are NOT within the range of the CI, either the hypothesized effect size was in error or something is unique in the sample and more exploration is needed.

Option 2: Bootstrapping

Bootstrapping uses sampling with replacement to allow an inference to be made about a population (Efron 1979). Bootstrapping is used to assess uncertainty by “resampling data, creating a set of simulated datasets that can be used to approximate some aspects of the sampling distribution, thus giving some sense of the variation that could be expected if the data collection process had be re-done” (Gelman et al., 2021, p. 74). Using bootstrapping, I’m hypothesizing that the test statistic will be within a 95% CI’s, but the CI’s will be wider than those of the original data set, and the distribution of effect sizes will approximate a normal distribution.

To illustrate, I simulated a 46-record data set with two variables that had an projected relationship of r = .25 using the faux package in R (DeBruine et al., 2021). Rather than assume a directional hypothesis, I used the standard NHST assumption (two-sided test). The relationship between the two variables was positively sloped, moderate in size, and statistically significant, r(44) = .398, p = .006, 95% CI (.122, .617).

Next, I bootstrapped three different sample sizes (50, 67, and 1000) using the boot package (Canty & Ripley, 2021). The N = 50 represented what the faculty was trying to recommend. The N = 67 was the originally hypothesized sample size, and the N = 1000 is the widely-used sample size used in bootstrapping. The following table displays the results –

Data SetEffect Size (r)95% Lower Confidence Interval95% Upper Confidence Interval
Simulated Data (N = 46).398.122.617
Bootstrapped Simulation (N = 50).360.105.646
Bootstrapped Simulation (N = 67).360.109.608
Bootstrapped Simulation (N = 1000).360.093.638
Since bootstrapping involves sampling, it’s possible the test statistic may vary between iterations. To address that, I set the R set.seed() command to “040121”

Note in the table how the confidence intervals change as the sample increases. A better way to view the differences is to look at the density function between the three bootstrapped samples –

Note how the distribution base ‘fattens” as the sample size increases from N = 50 to N = 67. Finally, note how the effect size distribution becomes normally distributed with an N = 1000. Regardless, there is always a chance that the true effect size is less than or equal to 0, as depicted by the lower CI meeting or crossing the dotted red line (representing r = 0).

I speculate the chairperson was trying to suggest bootstrapping to the student. Either the faculty didn’t have sufficient knowledge of bootstrapping to guide the student, or the concept of bootstrapping was above the ability of the student to comprehend. I also speculate that faculty was trying to address a high p-value issue. Since the calculation of p-value is based on the sample size, there is nothing a student or faculty can do when a sample size is small except focus on the size of the effect. Perhaps that is what truly was lost in translation. Students, and the faculty advising them, need to understand that it’s not the p-value that is necessarily important but the effect size.

I suspect faculty and students will see more and more low sample sizes over the next year or so as people are fatigued or disinterested in completing surveys (thanks to COVID-19). Students need to be prepared to find a larger population to sample to counter potentially lower than expected response rates.


Canty, A., & Ripley, B. (2021, February 12). boot: Bootstrap functions.

DeBruine, L., Krystalli, A., & Heiss, A. (2021, March 27). faux: Simulation for factorial designs.

Efron, B. (1979). Bootstrap methods: Another look at the Jacknife. Annals of Statistics, 7(1), 1-26.

Gelman, A., Hill, J., & Vehtari, A. (2021). Regression and other stories. Cambridge University Press.

Code Snippet used in analysis:

# set random number seed to allow recreation
# create a 46 record datset with an r = 0.25
df <- rnorm_multi(46, 2, r = 0.25)
# perform correlation with original 46 records
cor_value <- cor.test(df$X1, df$X2, meth = "pearson")
# bootstrap with N = 50
boot_example1 <- boot(df, 
  statistic = function(data, i) {
    cor(data[i, "X1"], data[i, "X2"], method='spearman', use = "complete.obs")
  R = 50
summary(boot_example1), type = c("norm", "basic", "perc", "bca")) 
abline(v = c(0, .35998, 0.1054, 0.6461),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")
# bootstrap with N = 67
boot_example2 <- boot(df, 
  statistic = function(data, i) {
    cor(data[i, "X1"], data[i, "X2"], method='spearman')
  R = 67
summary(boot_example2), type = c("norm", "basic", "perc", "bca")) 
abline(v = c(0, .35998, 0.1091, 0.6078),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")
# bootstrap with N = 1000
boot_example3 <- boot(df, 
  statistic = function(data, i) {
    cor(data[i, "X1"], data[i, "X2"], method='spearman')
  R = 1000
summary(boot_example3), type = c("norm", "basic", "perc", "bca")) 
abline(v = c(0, .35998, 0.0932, 0.6382),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")
# create three plots in one graphic
abline(v = c(0, .35998, 0.1054, 0.6461),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")
abline(v = c(0, .35998, 0.1091, 0.6078),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")
abline(v = c(0, .35998, 0.0932, 0.6382),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")

Superficial vs Thorough Research…

I had interesting conversations with two colleagues; independently but about the same student. We chatted about how some students just touch the surface in their description of survey instruments (among other things), while others will dig deep to demonstrate thoroughness and thoughtfulness.

The student in question cited Campbell and Park (2017) as the source of a subjective-based measure to assess company (firm) performance. There was no discussion in the student’s study about the use of a subjective-based vs objective-based instrument; only that the instrument used in the study had a Cronbach’s alpha (α) of .87. That was it!

I have found that privately-held companies don’t like to share financial information with researchers. Thus, if a student goes down a path of surveying these type of businesses to collect objective data (e.g,. sales, gross margin), I advise them they can expect to have a higher non-response rate than anticipated, potentially a larger number of items not answered, and should plan ($$$) to obtain a larger sample.

I had two questions: –

Where did the use of subjective-based instruments to measure firm performance begin?

Are subjective-based instruments just as valid as objective-based instruments?

I couldn’t ask the student these questions, since he didn’t discuss it in his proposal. He probably doesn’t know. So, I started reading…starting with Campbell and Park (2017). Campbell and Park cited Campbell et al. (2011) and Runyan et al. (2008) on p. 305. Those references led to me to Frazier (2000), Niehm (2002), Droge et al. (2004), Runyan et al. (2006), Richard et al. (2009), and (most importantly) Venkatraman and Ramanujam (1986). Let’s start there…

Venkatraman and Ramanujam (1986) explored ten different approaches to measuring business performance. They posited that business performance has two dimensions: financial v operational, and primary data sources v secondary data sources. Relating to this student’s study, the second dimension was of interest. Venkatraman and Ramanujam discussed the benefits and limitations of using primary and secondary data as a measure of business performance (p. 808). More importantly, they discussed the use of financial data from secondary sources and operational data from primary sources to “enlarge the conceptualization of business performance” (p. 811). For example, a gross margin of 65% could be conceptualized as doing better or worse than the competition. Makes sense…but where was this type of instrument used first?

Frazier (2000), citing Venkatraman and Ramanujam, wrote “subjective assessments of performance are generally consistent with secondary performance measures” (p. 53). Frazier appears to have created a three-item instrument to measure firm performance. The three items, measured on a 5-point Likert scale from poor (1) to excellent (5), were –

  • How would you describe the overall performance of your store(s) last year?
  • How would you describe your performance relative to your major competitors?
  • How would you describe your performance relative to other stores like yours in the industry?

Frazier reported an α = .84 (N = 112). Niehm (2002), using a similarly worded instrument reported an α = .82 (N = 569). Runyan et al. (2008), citing Frazier (2000) and Niehm (2002), used the same instrument and reported an α = .82 (N = 267). However, what’s important to note is that Runyan et al. discussed an advantage that subjective questions have over objective questions – increased response rates – citing a study of one of the co-authors (Droge et al., 2004). Runyan et al. (2008) and Campbell et al. (2011) followed similar approaches and both reported an α = .87 (which could be an editorial error as the wording is similar). Campbell et al. (2011) also incorporated research performed by Richard et al. (2009) where the authors posit that the context of the study should dictate whether to use subjective or objective measures.

What did I learn in 2-3 hours of reading and writing –

  • Subjective-based instruments appear to similar in validity as objective-based instruments, but we (researchers) should periodically confirm by issuing both and examine construct validity.
  • A subjective instrument could reduce non-response rates, which is always an issue in research and incredibly important in today’s COVID-19 world as companies and people appear to over-surveyed and not responsive.
  • The three-item subjective-based instrument developed by Frazier (2000) appears to be reliable in test-retest situations

I also reflected on typical responses from students when asked about their instrument –

  • Superficial Student – “This person used it. Why can’t I?”
  • Thorough Student – “What would you like to know?”


Campbell, J. M., Line, N., Runyan, R. C., & Swinney, J. L. (2010). The moderating effect of family-ownership on firm performance: An examination of entrepreneurial orientation and social capital. Journal of Small Business Strategy, 21(2), 27-46.

Campbell, J. M., & Park, J. (2017). Extending the resource-based view: Effects of strategic orientation toward community on small business practice. Journal of Retailing and Consumer Services, 34(1), 302-308.

Droge, C., Jayaram, J., & Vickery, S. K. (2004). The effects of internal versus external integration practices on time-based performance and overall performance. Journal of Operations Management, 22(6), 557-573.

Frazier, B. J. (2000). The influence of network characteristics on information access, marketing competence, and perceptions of performance in small, rural businesses (Doctoral dissertation: Michigan State University).

Niehm, L. S. (2002). Retail superpreneurs and their influence on small communities (Doctoral dissertation: Michigan State University).

Richard, P., Devinney, T., Yip, G., & Johnson, G. (2009). Measuring organizational performance: Towards methodological best practice. Journal of Management, 35(3), 718-804.

Runyan, R., Droge, C., & Swinney, J. (2008). Entrepreneurial orientation versus small business orientation: What are their relationships to firm performance. Journal of Small Business Management, 46(4), 567-588.

Runyan, R. Huddleston, P., & Swinney, J. (2006). Entrepreneurial orientation and social capital as small firm strategies: A study of gender differences from a resource-based view. The International Entrepreneurship and Management Journal, 2(4), 455-477.

Venkatraman, N., & Ramanujum, V. (1986). Measurement of business performance in strategy research: A comparison of approaches. Academy of Management Review, 11(4), 801-814.

Book Review: Evaluating Research in Academic Journals: A Practical Guide to Realistic Evaluation (Pyrczak & Tcherni-Buzzeo, 2019)

I’m back! I’ve taken off a few months to read, recharge, and frame some study proposals with colleagues. I have a lot of things in the hopper…

During my break, I had a chance to help a PhD student graduate. It took some heavy lifting (another blog post by itself), but he made it! During my discussion with the emerging scholar, he asked me a question –

Which books do you own that could help me?

Hmmm…I have many, but let me start a list.

First up: Evaluating Research in Academic Journals: A Practical Guide to Realistic Evaluation

The most recent 7th Edition is written by Maria Tcherni-Buzzeo of New Haven University and pays homage to the originator of the book, the late Fred Pyrczak (1945-2014). The authors focus on how to read everything from Abstracts, Introductions, and Literature Reviews through the Analysis and Results section, ending in the Discussion section. In a checklist/rubric format, the book provides items (with example narrative in most places) such as –

  • Are primary variables mentioned in the title? (p. 17)
  • Does the introduction move from topic to topic instead of from citation to citation? (p. 43)
  • If the response rate was low, did the researcher make multiple attempts to contact potential participants? (p. 67)
  • If any differences are statistically significant but substantively small, have the researches noted they are small? (p. 123)

There are also specific sections in QUAN, QUAL, and MM research, which I have found invaluable.

This book is great for emerging scholars as they can apply it to learn how to critique academic research. It’s also great for chairpersons and people like me that critique research all day. It’s a must read (and buy!).

Research Question Error Types

If you review almost any research methods textbooks, there will be an explanation of Type I and Type II errors and why they must be avoided when performing quantitative research. To refresh –

  • A Type I error occurs when the null hypothesis is rejected in error, and the alternative hypothesis is accepted. This is a big deal since what you are saying is that there is some difference or relationship between two variables when there is not.
    • A frequent cause of Type I errors in research performed by novices has to do with performing multiple tests with the same dependent variable. Using the the widely-used p < .05 as the standard, what a researcher is saying is that they are willing to accept a 1 in 20 chance of error. If a dependent variable is examined more than once, then the 1 in 20 chance needs to be adjusted via a reduction in the accepted p-value, or the novice must accept that they are willing to make a Type I error. For example, if a dependent variable was tested 5 times, then there is a 20% chance of making an error.
    • I’ve seen too many students and faculty not understand this concept, and when this it is pointed out during a manuscript review or during a defense, it can be embarrassing for every for both the student and faculty. Bonferroni correction anybody?
  • A Type II error occurs when the null hypothesis is erroneously retained, when it should be rejected. This is an error, but its not as bad as a Type I error. This situation causes one’s work be referenced in the future as a need for future research (best case scenario), or as a study performed in error (a worst case scenario).
    • A common cause for a Type II error is misinterpretation. Another culprit is low statistical power.
    • A novice researcher (and their faculty) should have a full understanding on how to perform a power analysis. The team should be aware of prior research in the area and perform a weighted average of prior effect size measures (e.g., Pearson’s r, Cohen’s d) or, at a minimum, hypothesize an estimated effect size BEFORE determining the required sample size. A study that doesn’t have a sufficient sample size to identify a hypothesized effect is called underpowered, and a waste of time.
    • Conversely, using the wrong sampling method, such as a method for proportional sampling, might result in a sample size in excess of what is necessary to identify a hypothesized effect size. An overpowered study is a waste of resources and, in some domains, unethical.

How could Type I and II errors occur with computer software (e.g., R, SPSS, SAS, G*Power) readily available? Who knows? But, I want to explore two other types of errors that novice researchers make.

Type III Error

A Type III error is closely related to a Type I error. However, instead of rejecting the null hypothesis in error, the null hypothesis is rejected for the wrong reason. This type of error is not as severe as a Type I error since one arrives at the correct conclusion. Contributing factors to a Type III error are incorrect definition or operationalization of variables or poor theory. As stated by Schwartz and Carpenter (1999), a Type III error is a situation of obtaining the right answer to the wrong question.

Type IV Error

A Type IV error is also related to a Type III error. In fact, some scholars say it is a subset of the Type III error. Regardless, a Type IV error involves correctly rejecting the null hypothesis but misinterpreting the data. Common reasons are running the wrong test based on the data structure, collinearity in a regression model, or interpreting variables incorrectly (a three-level ordinal variable treated as interval).

To learn more about Type III and Type IV errors, see Gelman and Carlin (2014) for their discussion of Type S and Type M errors, Tate (2015) on Type III errors relating to mediation, MacKinnon and Pirlott (2014) for their discussion of Type IV errors relating to confounding in mediators, , and Umesh et al. (1996) for Type IV errors in marketing research.


Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing Type S (Sign) and Type M (Magnitude Errors). Perspectives on Psychological Science, 9(6), 641-651.

MacKinnon, D. P., & Pirlott, A. G. (2014). Statistical approaches for enhancing causal interpretation of the M to Y relation in mediation analysis. Personality and Social Psychology Review, 19(1), 30-43.

Schwartz, S., & Carpenter, K. M. (1999). The right answer for the wrong question: Consequences of Type III error for public health research. American Journal of Public Health, 89(8), 1175-1180.

Tate, C. U. (2015). On the overuse and misuse of mediation analysis: It may be a matter of timing. Basic and Applied Social Psychology, 37(4), 235-246.

Umesh, U. N., Peterson, R. A., McCann-Nelson, M., & Vaidyanatyan, R. (1996). Type IV error in marketing research: The investigation of ANOVA interactions. Journal of the Academic of Marketing Science, 24(1), 17-26.