Measuring perception…

A week ago, a doctoral business student was referred to me for a discussion about a specific research design. While the primary research question was somewhat convoluted, once edited the focus was on measuring perception of a behavior in a group through the lens of another (the sample), then relating that perception of the group to the sample’s perception of project performance. If one were to ask a survey participant a single question, it would be “What do you think the relationship is between A and B?” This type of question falls into the category (loosely) of (public) opinion research, not business research.

Research into perceptions (and attitudes) are often studied using a qualitative research methodology. The perception of something or someone is generally explored via interviews, where the researcher collapses groups of thoughts into themes that answer the research question. This type of approach is often covered in some depth in many research methods and design textbooks.

When it comes to quantitative research, though, measuring perception is focused on the self-assessment of a sample. For example, the Perceived Stress Scale for measuring the perception of stress, or the Buss-Durkee Hostility Inventory for measuring aspects of hostility and guilt; both instruments developed by psychologists.

Using a subject’s perception of another person is problematic due to cognitive bias. This type of bias involves the systematic error in one’s perception of others. Within cognitive bias, there are three groups –

  • Fundamental Attribution Error, which involves labeling people without sufficient information, knowledge or time
  • Confirmation Bias, which is widely referred to as a common judgmental bias. Research has shown that people trick their minds on focusing on a small piece of information to confirm already developed belief
  • Self-serving Bias, which involves perceiving a situation in a manner so to plays the one perceiving in a more positive light

How would you measure validity in the proposed study? Have the sample assess behaviors in people, measure the behaviors of the people, compare the two assessments for accuracy, and factor that accuracy into the study? Seems like a long way to go, and all you are really doing is measuring the assessment ability of the sample.

I don’t know who’s at fault here for not identifying this type of fundamental issue before the student’s research proposal was developed. It may have been identified by faculty along the way and ignored by the student? It could be faculty didn’t really understand what the student was proposing due to how the research question was formed?

Bootstrapping…

I was asked by a colleague to review a nearly completed doctoral manuscript to opine on a chairperson’s recommendation to the student on how to address a small sample size. According to the student’s use of G*Power, a sample size of 67 was required (see below):

While the student received 67 responses, only 46 (68.7%) were usable. In an attempt to help the student, the chairperson recommended the student (a) randomly select 4 records from the 46, and (b) add them back to the sample (increasing the sample from 46 to 50). Why 4? Why not 3? 5? Why not duplicate the entire sample and have an N = 92? With an N = 92, one could find an r = .327 (assuming a SP = .95). The student couldn’t answer those questions as he was merely following instructions of the faculty. That is a topic for another day…

What could have the student done?

Option 1: Use what you have

Do the study with the 46 records. If one reduces their view of statistical power to that of the widely-referenced .80 associated with social science (see the seminal work of the late Jacob Cohen), an r = .349 (well below the estimated effect size) could still be found (see below):

Whatever effect size is found by the student, the value can be compared to the effect size hypothesized (r = .377), and differences explored by the student in the study. A learned faculty would suggest the student report 95% Confidence Intervals (CI), and compare the hypothesized effect size to the upper and lower CI. If the hypothesized values are within the range of the CI, then it’s probably a sampling error issue. If the hypothesized values are NOT within the range of the CI, either the hypothesized effect size was in error or something is unique in the sample and more exploration is needed.

Option 2: Bootstrapping

Bootstrapping uses sampling with replacement to allow an inference to be made about a population (Efron 1979). Bootstrapping is used to assess uncertainty by “resampling data, creating a set of simulated datasets that can be used to approximate some aspects of the sampling distribution, thus giving some sense of the variation that could be expected if the data collection process had be re-done” (Gelman et al., 2021, p. 74). Using bootstrapping, I’m hypothesizing that the test statistic will be within a 95% CI’s, but the CI’s will be wider than those of the original data set, and the distribution of effect sizes will approximate a normal distribution.

To illustrate, I simulated a 46-record data set with two variables that had an projected relationship of r = .25 using the faux package in R (DeBruine et al., 2021). Rather than assume a directional hypothesis, I used the standard NHST assumption (two-sided test). The relationship between the two variables was positively sloped, moderate in size, and statistically significant, r(44) = .398, p = .006, 95% CI (.122, .617).

Next, I bootstrapped three different sample sizes (50, 67, and 1000) using the boot package (Canty & Ripley, 2021). The N = 50 represented what the faculty was trying to recommend. The N = 67 was the originally hypothesized sample size, and the N = 1000 is the widely-used sample size used in bootstrapping. The following table displays the results –

Data SetEffect Size (r)95% Lower Confidence Interval95% Upper Confidence Interval
Simulated Data (N = 46).398.122.617
Bootstrapped Simulation (N = 50).360.105.646
Bootstrapped Simulation (N = 67).360.109.608
Bootstrapped Simulation (N = 1000).360.093.638
Since bootstrapping involves sampling, it’s possible the test statistic may vary between iterations. To address that, I set the R set.seed() command to “040121”

Note in the table how the confidence intervals change as the sample increases. A better way to view the differences is to look at the density function between the three bootstrapped samples –

Note how the distribution base ‘fattens” as the sample size increases from N = 50 to N = 67. Finally, note how the effect size distribution becomes normally distributed with an N = 1000. Regardless, there is always a chance that the true effect size is less than or equal to 0, as depicted by the lower CI meeting or crossing the dotted red line (representing r = 0).

I speculate the chairperson was trying to suggest bootstrapping to the student. Either the faculty didn’t have sufficient knowledge of bootstrapping to guide the student, or the concept of bootstrapping was above the ability of the student to comprehend. I also speculate that faculty was trying to address a high p-value issue. Since the calculation of p-value is based on the sample size, there is nothing a student or faculty can do when a sample size is small except focus on the size of the effect. Perhaps that is what truly was lost in translation. Students, and the faculty advising them, need to understand that it’s not the p-value that is necessarily important but the effect size.

I suspect faculty and students will see more and more low sample sizes over the next year or so as people are fatigued or disinterested in completing surveys (thanks to COVID-19). Students need to be prepared to find a larger population to sample to counter potentially lower than expected response rates.

References:

Canty, A., & Ripley, B. (2021, February 12). boot: Bootstrap functions. https://cran.r-project.org/web/packages/boot/boot.pdf

DeBruine, L., Krystalli, A., & Heiss, A. (2021, March 27). faux: Simulation for factorial designs. https://cran.r-project.org/web/packages/faux/faux.pdf

Efron, B. (1979). Bootstrap methods: Another look at the Jacknife. Annals of Statistics, 7(1), 1-26. https://doi.org/10.1214/aos/1176344552

Gelman, A., Hill, J., & Vehtari, A. (2021). Regression and other stories. Cambridge University Press.

Code Snippet used in analysis:

library(tidyverse)
library(faux)
library(car)
library(boot)
# set random number seed to allow recreation
set.seed(040121)
# create a 46 record datset with an r = 0.25
df <- rnorm_multi(46, 2, r = 0.25)
# perform correlation with original 46 records
cor_value <- cor.test(df$X1, df$X2, meth = "pearson")
cor_value
# bootstrap with N = 50
set.seed(040121)
boot_example1 <- boot(df, 
  statistic = function(data, i) {
    cor(data[i, "X1"], data[i, "X2"], method='spearman', use = "complete.obs")
  },
  R = 50
)
summary(boot_example1)
boot.ci(boot_example1, type = c("norm", "basic", "perc", "bca")) 
plot(density(boot_example1$t))
abline(v = c(0, .35998, 0.1054, 0.6461),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")
       )
# bootstrap with N = 67
set.seed(040121)
boot_example2 <- boot(df, 
  statistic = function(data, i) {
    cor(data[i, "X1"], data[i, "X2"], method='spearman')
  },
  R = 67
)
summary(boot_example2)
boot.ci(boot_example2, type = c("norm", "basic", "perc", "bca")) 
plot(density(boot_example2$t))
abline(v = c(0, .35998, 0.1091, 0.6078),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")
       )
# bootstrap with N = 1000
set.seed(040121)
boot_example3 <- boot(df, 
  statistic = function(data, i) {
    cor(data[i, "X1"], data[i, "X2"], method='spearman')
  },
  R = 1000
)
summary(boot_example3)
boot.ci(boot_example3, type = c("norm", "basic", "perc", "bca")) 
plot(density(boot_example3$t))
abline(v = c(0, .35998, 0.0932, 0.6382),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")
       )
# create three plots in one graphic
par(mfrow=c(1,3))
plot(density(boot_example1$t))
abline(v = c(0, .35998, 0.1054, 0.6461),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")
       )
plot(density(boot_example2$t))
abline(v = c(0, .35998, 0.1091, 0.6078),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")
       )
plot(density(boot_example3$t))
abline(v = c(0, .35998, 0.0932, 0.6382),
       lty = c("dashed", "solid", "dashed", "dashed"),
       col = c("red", "blue", "blue", "blue")
       )

Superficial vs Thorough Research…

I had interesting conversations with two colleagues; independently but about the same student. We chatted about how some students just touch the surface in their description of survey instruments (among other things), while others will dig deep to demonstrate thoroughness and thoughtfulness.

The student in question cited Campbell and Park (2017) as the source of a subjective-based measure to assess company (firm) performance. There was no discussion in the student’s study about the use of a subjective-based vs objective-based instrument; only that the instrument used in the study had a Cronbach’s alpha (α) of .87. That was it!

I have found that privately-held companies don’t like to share financial information with researchers. Thus, if a student goes down a path of surveying these type of businesses to collect objective data (e.g,. sales, gross margin), I advise them they can expect to have a higher non-response rate than anticipated, potentially a larger number of items not answered, and should plan ($$$) to obtain a larger sample.

I had two questions: –

Where did the use of subjective-based instruments to measure firm performance begin?

Are subjective-based instruments just as valid as objective-based instruments?

I couldn’t ask the student these questions, since he didn’t discuss it in his proposal. He probably doesn’t know. So, I started reading…starting with Campbell and Park (2017). Campbell and Park cited Campbell et al. (2011) and Runyan et al. (2008) on p. 305. Those references led to me to Frazier (2000), Niehm (2002), Droge et al. (2004), Runyan et al. (2006), Richard et al. (2009), and (most importantly) Venkatraman and Ramanujam (1986). Let’s start there…

Venkatraman and Ramanujam (1986) explored ten different approaches to measuring business performance. They posited that business performance has two dimensions: financial v operational, and primary data sources v secondary data sources. Relating to this student’s study, the second dimension was of interest. Venkatraman and Ramanujam discussed the benefits and limitations of using primary and secondary data as a measure of business performance (p. 808). More importantly, they discussed the use of financial data from secondary sources and operational data from primary sources to “enlarge the conceptualization of business performance” (p. 811). For example, a gross margin of 65% could be conceptualized as doing better or worse than the competition. Makes sense…but where was this type of instrument used first?

Frazier (2000), citing Venkatraman and Ramanujam, wrote “subjective assessments of performance are generally consistent with secondary performance measures” (p. 53). Frazier appears to have created a three-item instrument to measure firm performance. The three items, measured on a 5-point Likert scale from poor (1) to excellent (5), were –

  • How would you describe the overall performance of your store(s) last year?
  • How would you describe your performance relative to your major competitors?
  • How would you describe your performance relative to other stores like yours in the industry?

Frazier reported an α = .84 (N = 112). Niehm (2002), using a similarly worded instrument reported an α = .82 (N = 569). Runyan et al. (2008), citing Frazier (2000) and Niehm (2002), used the same instrument and reported an α = .82 (N = 267). However, what’s important to note is that Runyan et al. discussed an advantage that subjective questions have over objective questions – increased response rates – citing a study of one of the co-authors (Droge et al., 2004). Runyan et al. (2008) and Campbell et al. (2011) followed similar approaches and both reported an α = .87 (which could be an editorial error as the wording is similar). Campbell et al. (2011) also incorporated research performed by Richard et al. (2009) where the authors posit that the context of the study should dictate whether to use subjective or objective measures.

What did I learn in 2-3 hours of reading and writing –

  • Subjective-based instruments appear to similar in validity as objective-based instruments, but we (researchers) should periodically confirm by issuing both and examine construct validity.
  • A subjective instrument could reduce non-response rates, which is always an issue in research and incredibly important in today’s COVID-19 world as companies and people appear to over-surveyed and not responsive.
  • The three-item subjective-based instrument developed by Frazier (2000) appears to be reliable in test-retest situations

I also reflected on typical responses from students when asked about their instrument –

  • Superficial Student – “This person used it. Why can’t I?”
  • Thorough Student – “What would you like to know?”

References:

Campbell, J. M., Line, N., Runyan, R. C., & Swinney, J. L. (2010). The moderating effect of family-ownership on firm performance: An examination of entrepreneurial orientation and social capital. Journal of Small Business Strategy, 21(2), 27-46.

Campbell, J. M., & Park, J. (2017). Extending the resource-based view: Effects of strategic orientation toward community on small business practice. Journal of Retailing and Consumer Services, 34(1), 302-308. https://doi.org/10.1016/j.jretconser.2016.01.013

Droge, C., Jayaram, J., & Vickery, S. K. (2004). The effects of internal versus external integration practices on time-based performance and overall performance. Journal of Operations Management, 22(6), 557-573. https://doi.org/10.1016/j.jom.2004.08.001

Frazier, B. J. (2000). The influence of network characteristics on information access, marketing competence, and perceptions of performance in small, rural businesses (Doctoral dissertation: Michigan State University).

Niehm, L. S. (2002). Retail superpreneurs and their influence on small communities (Doctoral dissertation: Michigan State University).

Richard, P., Devinney, T., Yip, G., & Johnson, G. (2009). Measuring organizational performance: Towards methodological best practice. Journal of Management, 35(3), 718-804. https://doi.org/10.1177/0149206308330560

Runyan, R., Droge, C., & Swinney, J. (2008). Entrepreneurial orientation versus small business orientation: What are their relationships to firm performance. Journal of Small Business Management, 46(4), 567-588. https://doi.org/10.1111/j.1540-627x.2008.00257.x

Runyan, R. Huddleston, P., & Swinney, J. (2006). Entrepreneurial orientation and social capital as small firm strategies: A study of gender differences from a resource-based view. The International Entrepreneurship and Management Journal, 2(4), 455-477. https://doi.org/10.1007/s11365-006-0010-3

Venkatraman, N., & Ramanujum, V. (1986). Measurement of business performance in strategy research: A comparison of approaches. Academy of Management Review, 11(4), 801-814. https://doi.org/10.2307/258398

Book Review: Evaluating Research in Academic Journals: A Practical Guide to Realistic Evaluation (Pyrczak & Tcherni-Buzzeo, 2019)

I’m back! I’ve taken off a few months to read, recharge, and frame some study proposals with colleagues. I have a lot of things in the hopper…

During my break, I had a chance to help a PhD student graduate. It took some heavy lifting (another blog post by itself), but he made it! During my discussion with the emerging scholar, he asked me a question –

Which books do you own that could help me?

Hmmm…I have many, but let me start a list.

First up: Evaluating Research in Academic Journals: A Practical Guide to Realistic Evaluation

The most recent 7th Edition is written by Maria Tcherni-Buzzeo of New Haven University and pays homage to the originator of the book, the late Fred Pyrczak (1945-2014). The authors focus on how to read everything from Abstracts, Introductions, and Literature Reviews through the Analysis and Results section, ending in the Discussion section. In a checklist/rubric format, the book provides items (with example narrative in most places) such as –

  • Are primary variables mentioned in the title? (p. 17)
  • Does the introduction move from topic to topic instead of from citation to citation? (p. 43)
  • If the response rate was low, did the researcher make multiple attempts to contact potential participants? (p. 67)
  • If any differences are statistically significant but substantively small, have the researches noted they are small? (p. 123)

There are also specific sections in QUAN, QUAL, and MM research, which I have found invaluable.

This book is great for emerging scholars as they can apply it to learn how to critique academic research. It’s also great for chairpersons and people like me that critique research all day. It’s a must read (and buy!).