Week 6: Scales & Indicies, and Dimension Reduction |
Reading Chapter 17: Exploratory Factor Analysis in Field, Miles, and Field, 2012. Discovering Statistics Using R. Learning Objectives By the end of this class, students should be able to (1) define, (2) know when to use, (3) interpret R output for, and (4) - with the assistance of methods101.com and Google - run the R commands for the following types of statistical analysis:
Lecture The lecture is broken into four parts:
Exercise Using the dataset for your project, complete the following three tasks. For each Task, please post to the Google Doc here screenshots of your results (e.g. figures or tables) and two or three sentences explaining what is important, surprising, or interesting about the results.
External students and those who miss class, please post your answers to the blog on iLearn. Please post your code and images within the blog, not as attachments. Please do attach your dataset/s so that we can all follow along. |
SOCI832: Overview: Week 6
Some handy code for cleaning data
Cheatsheet for Data Manipulation and Data Cleaning
Note useful functions like:
- dplyr::filter() - select rows that meet a critiera
- dplyr::distinct() - remove duplicate rows
- dplyr::select () - select columns by name
- dplyr::mutate() - make new variable
- dplyr::left_join() - joins matching rows on specified column
- dplyr::bind_rows() - binds rows to bottom of dataset
- dplyr::bind_cols() - binds columns to right side of dataset
- dplyr::group_by() - group data into rows with same values - can use to create multiple groups for summary statistics, or regressions
- tidyr::drop_na() - drop cases that have missing values in one or more variables
# Load packages into memory
lga <- readRDS(url("https://methods101.com/data/nsw-lga-crime-clean.RDS"))
mean_unemp <- mean(lga$unemploy, na.rm = TRUE)
lga %>%
dplyr::select(giniinc, unemploy, robbery) %>%
tidyr::drop_na() %>%
stats::lm(robbery ~ giniinc + unemploy, data = .) %>%
## Call:
## stats::lm(formula = robbery ~ giniinc + unemploy, data = .)
## Residuals:
## Min 1Q Median 3Q Max
## -29.369 -14.717 -5.500 8.512 185.460
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -30.468 30.242 -1.007 0.31654
## giniinc 60.649 53.466 1.134 0.25980
## unemploy 4.994 1.838 2.716 0.00798 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 27.5 on 86 degrees of freedom
## Multiple R-squared: 0.08177, Adjusted R-squared: 0.06042
## F-statistic: 3.829 on 2 and 86 DF, p-value: 0.02552
lga %>%
dplyr::select(giniinc, unemploy, sexoff) %>%
tidyr::drop_na() %>%
filter(unemploy > mean_unemp) %>%
stats::lm(sexoff ~ giniinc + unemploy, data = .) %>%
## Call:
## stats::lm(formula = sexoff ~ giniinc + unemploy, data = .)
## Residuals:
## Min 1Q Median 3Q Max
## -177.661 -58.942 -4.406 63.487 253.247
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -129.59 169.50 -0.765 0.44808
## giniinc 860.64 300.39 2.865 0.00604 **
## unemploy -6.05 11.00 -0.550 0.58466
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 95.01 on 51 degrees of freedom
## Multiple R-squared: 0.1479, Adjusted R-squared: 0.1144
## F-statistic: 4.425 on 2 and 51 DF, p-value: 0.01691
lga %>%
dplyr::select(giniinc, unemploy, robbery) %>%
tidyr::drop_na() %>%
filter(unemploy < mean_unemp) %>%
stats::lm(robbery ~ giniinc + unemploy, data = .) %>%
## Call:
## stats::lm(formula = robbery ~ giniinc + unemploy, data = .)
## Residuals:
## Min 1Q Median 3Q Max
## -22.186 -8.022 -2.730 6.177 32.350
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.763 24.842 0.272 0.787
## giniinc -43.777 31.227 -1.402 0.170
## unemploy 7.479 3.150 2.374 0.023 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 11.93 on 36 degrees of freedom
## Multiple R-squared: 0.2645, Adjusted R-squared: 0.2237
## F-statistic: 6.474 on 2 and 36 DF, p-value: 0.003964