# Luck, inadvertent omission, or lack of knowledge?

Johnson (2018) explored the willingness to hire people who were convicted of drug crimes. The scope of the study was limited to the Central Virginia region. To answer the first research question (How does the willingness to hire returning citizens by Central Virginia employers differ by position/job role in private sector, for-profit business firms?), the emerging scholar used descriptive measures (rather than inferential statistics).

Johnson stated that the null hypothesis could be rejected for three types of jobs: Unskilled, Semi-skilled Labor, and Skilled Labor. I suppose that the decision was based on point estimates above 50% (Figure 1) –

However, when rejecting null hypotheses based on sample data, confidence intervals must be considered. Based on the information provided by the emerging scholar in his study, there were 653,193 businesses in the sample frame. A quota sample of 635 was chosen (p. 35). Using R and the samplingbook package (Manitz, 2017), that equates to having a 95% CI of 3.89% (see below).

```sample.size.prop(e = .0389, N = 653193, level = .95)

sample.size.prop object: Sample size for proportion estimate
With finite population correction: N=653193, precision e=0.0389 and expected proportion P=0.5

Sample size needed: 635
```

I then recreated the graphic using the ggplot2 package (Wickham et al., 2020), and added the 95% CI (Figure 2).

Okay. I see it. However, only 105 complete responses were obtained, not the target sample of 635. Using the same method to calculate the 95% CI above, I backed into a 9.6% 95% CI (see below):

```sample.size.prop(e = .096, N = 653193, level = .95)

sample.size.prop object: Sample size for proportion estimate
With finite population correction: N=653193, precision e=0.096 and expected proportion P=0.5

Sample size needed: 105
```

Thus, the 95% CI changed from a planned 3.83% to an actual 9.6%; a 2.5x increase in interval width. When overlaying the new 95% CI on the data, new perspectives emerge (Figure 3).

Visually, one can see that the emerging scholar is correct when stating that Semi-skilled Labor and Skilled Labor fall above the 50% line; even when accounting for the 95% CI. However, the error bar for Unskilled Labor (a) drops below 50%, and the error bar for Clerical Labor (d) rises above 50%. Should Unskilled Labor be omitted from the rejection? Should Clerical Labor be included in the rejection of the null hypothesis? It appears both a Type I and a Type II error occurred.

One note: The emerging scholar reported his findings as being similar to research performed in Sweden by Ahmed and Lang (2017). The authors wrote –

We found that ex-offenders were discriminated against in the occupations accounting clerk, cleaner, preschool teacher, restaurant worker, sales person, and software developer. However, we did not observe any statistically significant discrimination against ex-offenders in the occupations auto mechanic, enrolled nurse, and truck driver.

Ahmed & Lang, 2017, p. 17

Well, they don’t now. Also…Virginia = Sweden? That may be a stretch…

Student Note: Descriptive statistics are not inferential statistics. Know the difference.

References:

Ahmed, A., & Lang, E. (2017). The employability of ex-offenders: A field experiment in
the Swedish labor market. IZA Journal of Labor Policy, 6(1), Article 6. https://doi.org/10.1186/s40173-017-0084-2

Johnson, R. (2018). Willingness of employers to hire individuals convicted of drug crimes in Central Virginia (Doctoral dissertation). ProQuest LLC. (13421921)

Manitz, J. (2017, May 21). samplingbook: Survey sampling procedures. https://cran.r-project.org/web/packages/samplingbook/samplingbook.pdf

Wickham, H., Chang, W., Henry, L., Pederson, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., Dunningham, D., & RStudio (2020, June 19). ggplot2: Create elegant data visualizations using the Grammar of Graphics. https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf