# Bootstrapping…

I was asked by a colleague to review a nearly completed doctoral manuscript to opine on a chairperson’s recommendation to the student on how to address a small sample size. According to the student’s use of G*Power, a sample size of 67 was required (see below):

While the student received 67 responses, only 46 (68.7%) were usable. In an attempt to help the student, the chairperson recommended the student (a) randomly select 4 records from the 46, and (b) add them back to the sample (increasing the sample from 46 to 50). Why 4? Why not 3? 5? Why not duplicate the entire sample and have an N = 92? With an N = 92, one could find an r = .327 (assuming a SP = .95). The student couldn’t answer those questions as he was merely following instructions of the faculty. That is a topic for another day…

What could have the student done?

Option 1: Use what you have

Do the study with the 46 records. If one reduces their view of statistical power to that of the widely-referenced .80 associated with social science (see the seminal work of the late Jacob Cohen), an r = .349 (well below the estimated effect size) could still be found (see below):

Whatever effect size is found by the student, the value can be compared to the effect size hypothesized (r = .377), and differences explored by the student in the study. A learned faculty would suggest the student report 95% Confidence Intervals (CI), and compare the hypothesized effect size to the upper and lower CI. If the hypothesized values are within the range of the CI, then it’s probably a sampling error issue. If the hypothesized values are NOT within the range of the CI, either the hypothesized effect size was in error or something is unique in the sample and more exploration is needed.

Option 2: Bootstrapping

Bootstrapping uses sampling with replacement to allow an inference to be made about a population (Efron 1979). Bootstrapping is used to assess uncertainty by “resampling data, creating a set of simulated datasets that can be used to approximate some aspects of the sampling distribution, thus giving some sense of the variation that could be expected if the data collection process had be re-done” (Gelman et al., 2021, p. 74). Using bootstrapping, I’m hypothesizing that the test statistic will be within a 95% CI’s, but the CI’s will be wider than those of the original data set, and the distribution of effect sizes will approximate a normal distribution.

To illustrate, I simulated a 46-record data set with two variables that had an projected relationship of r = .25 using the faux package in R (DeBruine et al., 2021). Rather than assume a directional hypothesis, I used the standard NHST assumption (two-sided test). The relationship between the two variables was positively sloped, moderate in size, and statistically significant, r(44) = .398, p = .006, 95% CI (.122, .617).

Next, I bootstrapped three different sample sizes (50, 67, and 1000) using the boot package (Canty & Ripley, 2021). The N = 50 represented what the faculty was trying to recommend. The N = 67 was the originally hypothesized sample size, and the N = 1000 is the widely-used sample size used in bootstrapping. The following table displays the results –

Note in the table how the confidence intervals change as the sample increases. A better way to view the differences is to look at the density function between the three bootstrapped samples –

Note how the distribution base ‘fattens” as the sample size increases from N = 50 to N = 67. Finally, note how the effect size distribution becomes normally distributed with an N = 1000. Regardless, there is always a chance that the true effect size is less than or equal to 0, as depicted by the lower CI meeting or crossing the dotted red line (representing r = 0).

I speculate the chairperson was trying to suggest bootstrapping to the student. Either the faculty didn’t have sufficient knowledge of bootstrapping to guide the student, or the concept of bootstrapping was above the ability of the student to comprehend. I also speculate that faculty was trying to address a high p-value issue. Since the calculation of p-value is based on the sample size, there is nothing a student or faculty can do when a sample size is small except focus on the size of the effect. Perhaps that is what truly was lost in translation. Students, and the faculty advising them, need to understand that it’s not the p-value that is necessarily important but the effect size.

I suspect faculty and students will see more and more low sample sizes over the next year or so as people are fatigued or disinterested in completing surveys (thanks to COVID-19). Students need to be prepared to find a larger population to sample to counter potentially lower than expected response rates.

References:

Canty, A., & Ripley, B. (2021, February 12). boot: Bootstrap functions. https://cran.r-project.org/web/packages/boot/boot.pdf

DeBruine, L., Krystalli, A., & Heiss, A. (2021, March 27). faux: Simulation for factorial designs. https://cran.r-project.org/web/packages/faux/faux.pdf

Efron, B. (1979). Bootstrap methods: Another look at the Jacknife. Annals of Statistics, 7(1), 1-26. https://doi.org/10.1214/aos/1176344552

Gelman, A., Hill, J., & Vehtari, A. (2021). Regression and other stories. Cambridge University Press.

Code Snippet used in analysis:

``````library(tidyverse)
library(faux)
library(car)
library(boot)``````
``````# set random number seed to allow recreation
set.seed(040121)
# create a 46 record datset with an r = 0.25
df <- rnorm_multi(46, 2, r = 0.25)
# perform correlation with original 46 records
cor_value <- cor.test(df\$X1, df\$X2, meth = "pearson")
cor_value``````
``````# bootstrap with N = 50
set.seed(040121)
boot_example1 <- boot(df,
statistic = function(data, i) {
cor(data[i, "X1"], data[i, "X2"], method='spearman', use = "complete.obs")
},
R = 50
)
summary(boot_example1)
boot.ci(boot_example1, type = c("norm", "basic", "perc", "bca"))
plot(density(boot_example1\$t))
abline(v = c(0, .35998, 0.1054, 0.6461),
lty = c("dashed", "solid", "dashed", "dashed"),
col = c("red", "blue", "blue", "blue")
)``````
``````# bootstrap with N = 67
set.seed(040121)
boot_example2 <- boot(df,
statistic = function(data, i) {
cor(data[i, "X1"], data[i, "X2"], method='spearman')
},
R = 67
)
summary(boot_example2)
boot.ci(boot_example2, type = c("norm", "basic", "perc", "bca"))
plot(density(boot_example2\$t))
abline(v = c(0, .35998, 0.1091, 0.6078),
lty = c("dashed", "solid", "dashed", "dashed"),
col = c("red", "blue", "blue", "blue")
)``````
``````# bootstrap with N = 1000
set.seed(040121)
boot_example3 <- boot(df,
statistic = function(data, i) {
cor(data[i, "X1"], data[i, "X2"], method='spearman')
},
R = 1000
)
summary(boot_example3)
boot.ci(boot_example3, type = c("norm", "basic", "perc", "bca"))
plot(density(boot_example3\$t))
abline(v = c(0, .35998, 0.0932, 0.6382),
lty = c("dashed", "solid", "dashed", "dashed"),
col = c("red", "blue", "blue", "blue")
)``````
``````# create three plots in one graphic
par(mfrow=c(1,3))
plot(density(boot_example1\$t))
abline(v = c(0, .35998, 0.1054, 0.6461),
lty = c("dashed", "solid", "dashed", "dashed"),
col = c("red", "blue", "blue", "blue")
)
plot(density(boot_example2\$t))
abline(v = c(0, .35998, 0.1091, 0.6078),
lty = c("dashed", "solid", "dashed", "dashed"),
col = c("red", "blue", "blue", "blue")
)
plot(density(boot_example3\$t))
abline(v = c(0, .35998, 0.0932, 0.6382),
lty = c("dashed", "solid", "dashed", "dashed"),
col = c("red", "blue", "blue", "blue")
)``````