The final lab introduces 1) how to run a multiple regression model, 2) how to use dummy variables in the regression model, and 3) how to create regression tables using R. We use five packages in this lab. Load them with the following code:
library(dplyr)
library(sjlabelled)
library(sjmisc)
library(summarytools)
library(sjPlot)
Also, run the following code. Otherwise, you will see scientific notations (e.g., 2e-16) instead of numbers in the R output.
options(digits=5, scipen=15)
This lab uses the Crime Rates Dataset of NSW Local Government Areas (NSW Crime) that you used in the last lab. I assume that the dataset is already in your working folder. If so, run the following code:
crime <- readRDS("nsw-lga-crime-2023.RDS")
In Lab 10, you constructed a dataset of complete observations from the crime dataset. We will use this dataset again in this lab. Therefore, you will need to run the follwoing code again to construct this complete observation dataset. The new dataset name is crime2. And you also rescale the median sales price of houses by dividing it by 10,000. If you can’t remember how to construct a dataset of complete observations, see Creating a Dataset of Complete Cases.
crime2 <- crime %>%
select(code, lga, medage, unemploy, housmedprice, sexoff, region, urban)
crime2 <- crime2 %>%
mutate(mark = 0)
crime2 <- crime2 %>%
mutate(mark = replace(mark, complete.cases(crime2), 1))
crime2 <- crime2 %>%
filter(mark == 1)
crime2 <- crime2 %>%
mutate(housmedprice = housmedprice/10000)
crime2$housmedprice <- set_label(crime2$housmedprice, label = "Median sale price of house ($10,000)")
Simple Regression Analysis
Let’s start with some simple regression analyses. In the first model, we estimate the effect of the median sales price of houses (housmedprice) on the rate of sexual offences (sexoff). Run the following code.
model.2.1 <- lm(sexoff ~ housmedprice, data = crime2)
summary(model.2.1)
##
## Call:
## lm(formula = sexoff ~ housmedprice, data = crime2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -208.5 -56.8 -17.1 45.9 314.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 276.6542 11.1605 24.79 < 0.0000000000000002 ***
## housmedprice -0.6382 0.0929 -6.87 0.00000000033 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 88.6 on 118 degrees of freedom
## Multiple R-squared: 0.286, Adjusted R-squared: 0.279
## F-statistic: 47.2 on 1 and 118 DF, p-value: 0.000000000326
In the second model, we estimate the effect of the median age of residents (medage) on the rate of sexual offences. Run the following code.
model.2.2 <- lm(sexoff ~ medage, data = crime2)
summary(model.2.2)
##
## Call:
## lm(formula = sexoff ~ medage, data = crime2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -182.4 -89.6 -11.3 54.6 356.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 156.81 75.27 2.08 0.039 *
## medage 1.58 1.76 0.90 0.371
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 105 on 118 degrees of freedom
## Multiple R-squared: 0.00679, Adjusted R-squared: -0.00163
## F-statistic: 0.807 on 1 and 118 DF, p-value: 0.371
In the third model, we estimate the effect of the unemployment rate (unemploy) on the rate of sexual offences. Run the following code.
model.2.3 <- lm(sexoff ~ unemploy, data = crime2)
summary(model.2.3)
##
## Call:
## lm(formula = sexoff ~ unemploy, data = crime2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -210.74 -63.10 -9.37 46.23 269.19
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.44 32.85 2.27 0.025 *
## unemploy 24.69 5.23 4.72 0.0000065 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 96.2 on 118 degrees of freedom
## Multiple R-squared: 0.159, Adjusted R-squared: 0.152
## F-statistic: 22.3 on 1 and 118 DF, p-value: 0.00000654
Multiple Regression Analysis
Running multiple regression models is quite simple. We use lm()
again but add more independent variables. Thus, model name <- lm(dependent ~ independent 1 + independent 2 + independent 3 + …, data = data name)
will conduct a multiple regression analysis. The following code simmultaneously estimates the effect of the median sales price of houses and the median age of residents on the rate of sexual offences. For an interpretation of this result, see the week 13 lecture sldies.
model.2.4 <- lm(sexoff ~ housmedprice + medage, data = crime2)
summary(model.2.4)
##
## Call:
## lm(formula = sexoff ~ housmedprice + medage, data = crime2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -196.7 -50.8 -16.9 44.0 315.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 343.1565 69.4031 4.94 0.00000257768 ***
## housmedprice -0.6655 0.0971 -6.85 0.00000000036 ***
## medage -1.5153 1.5608 -0.97 0.33
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 88.7 on 117 degrees of freedom
## Multiple R-squared: 0.291, Adjusted R-squared: 0.279
## F-statistic: 24 on 2 and 117 DF, p-value: 0.00000000179
In the next model, let’s try to estimate the effect of the median sales price of houses and the unemploment rates on the rate of sexual offences simultaneously. Run the following code.
model.2.5 <- lm(sexoff ~ housmedprice + unemploy, data = crime2)
summary(model.2.5)
##
## Call:
## lm(formula = sexoff ~ housmedprice + unemploy, data = crime2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -187.3 -48.4 -17.3 41.0 250.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 172.6219 33.7375 5.12 0.000001232 ***
## housmedprice -0.5423 0.0941 -5.76 0.000000068 ***
## unemploy 15.8759 4.8810 3.25 0.0015 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 85.3 on 117 degrees of freedom
## Multiple R-squared: 0.345, Adjusted R-squared: 0.334
## F-statistic: 30.8 on 2 and 117 DF, p-value: 0.0000000000181
In the final model, let’s try to estimate the effect of all the three independent variables on the rate of sexual offences simultaneously. Run the following code. As you can see in the code, all the three independent variables are added using the plus(+) sign.
model.2.6 <- lm(sexoff ~ housmedprice + medage + unemploy, data = crime2)
summary(model.2.6)
##
## Call:
## lm(formula = sexoff ~ housmedprice + medage + unemploy, data = crime2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -180.4 -46.7 -13.6 39.8 249.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 216.7383 78.0375 2.78 0.0064 **
## housmedprice -0.5616 0.0993 -5.66 0.00000011 ***
## medage -0.9506 1.5153 -0.63 0.5317
## unemploy 15.5098 4.9284 3.15 0.0021 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 85.5 on 116 degrees of freedom
## Multiple R-squared: 0.347, Adjusted R-squared: 0.33
## F-statistic: 20.5 on 3 and 116 DF, p-value: 0.0000000000952
Creating a Regression Table
So far, We have run the six regression models that examine the effect of community characteristics on the rate of sexual offences. However, it is not efficient to look at the results of each model separately. Instead, we can present all of the regression models in a just one single table. We will use the tab_model()
function to summarise all the regression models so far. The following code will combine the six regression models we have run and create a single regression table.
tab_model(model.2.1, model.2.2, model.2.3, model.2.4, model.2.5, model.2.6,
dv.labels = c("Model 1", "Model 2", "Model 3", "Model 4", "Model 5", "Model 6"),
title = "Regression Coefficients on Sexual Offence Rates", digits = 3,
show.se = TRUE, show.ci = FALSE, collapse.se = TRUE, p.style = "stars")
Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | |
---|---|---|---|---|---|---|
Predictors | Estimates | Estimates | Estimates | Estimates | Estimates | Estimates |
(Intercept) |
276.654 *** (11.161) |
156.814 * (75.269) |
74.438 * (32.853) |
343.156 *** (69.403) |
172.622 *** (33.737) |
216.738 ** (78.038) |
Median sale price of house($10,000) |
-0.638 *** (0.093) |
-0.665 *** (0.097) |
-0.542 *** (0.094) |
-0.562 *** (0.099) |
||
Median age of residents |
1.582 (1.761) |
-1.515 (1.561) |
-0.951 (1.515) |
|||
Unemployment rate |
24.686 *** (5.230) |
15.876 ** (4.881) |
15.510 ** (4.928) |
|||
Observations | 120 | 120 | 120 | 120 | 120 | 120 |
R2 / R2 adjusted | 0.286 / 0.279 | 0.007 / -0.002 | 0.159 / 0.152 | 0.291 / 0.279 | 0.345 / 0.334 | 0.347 / 0.330 |
|
In the code, you need to list all the name of regression models to be included in a regression table. dv.labels = c()
sets the labels of the models to be displayed in the first row of the regression table. title = “”
sets the title of the table. digits = value
sets the decimal places for the numbers in the table. show.se = TRUE
shows the standard errors of the coefficients, show.ci = FALSE
does not show the confidence intervals of the coefficients, and collapse.se = TRUE
shows the standard errors together with the coefficients in the same cell. p.style = “stars”
displays the p-value in star format.
Dummy Variable Analysis
Creating a Dummy Variable.
A dummy variable is a binary variable used in the regression analysis that represents subgroups (or categories) of the sample in your study. And it is used to estimate the effect of nominal variables. Let’s make a dummy variable of urban. The following code shows two categories of urban: rural and urban.
frq(crime2$urban)
## Urban or Rural (x) <categorical>
## # total N=120 valid N=120 mean=0.59 sd=0.49
##
## Value | Label | N | Raw % | Valid % | Cum. %
## ------------------------------------------------
## 0 | 0. Rural | 49 | 40.83 | 40.83 | 40.83
## 1 | 1. Urban | 71 | 59.17 | 59.17 | 100.00
## <NA> | <NA> | 0 | 0.00 | <NA> | <NA>
urban has two categories and the reference category, rural, already has a lower value assigned to it. Therefore, we can use urban as it is for a dummy variable. However, we need to change this dummy variable to a factor. Otherwise, R will not treat it as a dummy variable in the regression analysis.
crime2$urban <- to_factor(crime2$urban)
In some cases, you may want to use urban as a reference category instead of rural. In this case, you can easily change the reference category by using the ref_lvl(variable, lvl = value for the reference category)
function. Make sure that the variable used for ref_lvl()
is one that has already been converted to a factor. The following code creates a new variable (rural) where 0 is urban, and 1 is rural.
crime2$rural <- ref_lvl(crime2$urban, lvl = 1)
Then, let’s make dummy variables for region. The following code shows two categories of region.
frq(crime2$region)
## Regions of LGA (x) <categorical>
## # total N=120 valid N=120 mean=5.93 sd=3.92
##
## Value | Label | N | Raw % | Valid % | Cum. %
## ----------------------------------------------------------------------
## 1 | 1. Greater metropolitan Sydney | 30 | 25.00 | 25.00 | 25.00
## 2 | 2. Sydney surrounds | 4 | 3.33 | 3.33 | 28.33
## 3 | 3. Mid north coast | 6 | 5.00 | 5.00 | 33.33
## 4 | 4. Murray | 9 | 7.50 | 7.50 | 40.83
## 5 | 5. Murrumbidgee | 9 | 7.50 | 7.50 | 48.33
## 6 | 6. Hunter | 10 | 8.33 | 8.33 | 56.67
## 7 | 7. Illawarra | 5 | 4.17 | 4.17 | 60.83
## 8 | 8. Richmond-Tweed | 6 | 5.00 | 5.00 | 65.83
## 9 | 9. Canberra region | 8 | 6.67 | 6.67 | 72.50
## 10 | 10. Northern | 12 | 10.00 | 10.00 | 82.50
## 11 | 11. Central west | 13 | 10.83 | 10.83 | 93.33
## 12 | 12. North western | 7 | 5.83 | 5.83 | 99.17
## 13 | 13. Far west | 1 | 0.83 | 0.83 | 100.00
## <NA> | <NA> | 0 | 0.00 | <NA> | <NA>
As this result shows, region has 13 categories. We will recode it into a variable with three categories following the recoding scheme of <Table 1>.
Values | Labels | Values | Labels |
---|---|---|---|
1 | Greater Metropolitan Sydney | 1 | Greater Metropolitan Sydney |
2 | Sydney Surrounds | 2 | Sydney Surrounds |
3 | Mid North Coast | 3 | Rural and Regional Areas |
4 | Murray | ||
5 | Murrumbidgee | ||
6 | Hunter | ||
7 | Illawarra | ||
8 | Richmond-Tweed | ||
9 | Canberra Region | ||
10 | Northern | ||
11 | Central West | ||
12 | North Western | ||
13 | Far west |
The following code will create this new three-region cateogy variable.
crime2 <- rec(crime2, region, rec = "1=1; 2=2; 3:13=3")
Then, we rename the newly-recoded variable(region_r) to region2 and assign an appropriate variable name and value labels.
crime2 <- var_rename(crime2, region_r = region2)
crime2$region2 <- set_label(crime2$region2, label = "Three regions")
crime2$region2 <- set_labels(crime2$region2, labels = c("Greater Metropolitan Sydney" = 1,
"Sydney Surrounds" = 2,
"Rural and Regional Areas" = 3))
Finally, we need to hange this dummy variable to a factor.
crime2$region2 <- to_factor(crime2$region2)
Dummy Variable Regression Analysis
First, we estimate the effect of urban on the rate of sexual offences, which shows the difference in the sexual offences between urban and rural LGAs.
model.3.1 <- lm(sexoff ~ urban, data = crime2)
summary(model.3.1)
##
## Call:
## lm(formula = sexoff ~ urban, data = crime2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -215.0 -75.3 -11.7 78.2 312.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 270.6 13.9 19.47 < 0.0000000000000002 ***
## urban1 -79.0 18.1 -4.37 0.000027 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 97.3 on 118 degrees of freedom
## Multiple R-squared: 0.139, Adjusted R-squared: 0.132
## F-statistic: 19.1 on 1 and 118 DF, p-value: 0.0000268
In the output, ‘urban1’ indicates ‘being a urban LGA (when the value of urban is 1). Thus, the coefficient of ‘urban1’ means that the sexual offence rate of urban LGAs is lower than that of rural LGAs by 79.0. This difference is statistically significant (the p-value is less than .05).
Second, we estimate the effect of region on the rate of sexual offences, which shows the difference in the sexual offences across the three regions.
model.3.2 <- lm(sexoff ~ region2, data = crime2)
summary(model.3.2)
##
## Call:
## lm(formula = sexoff ~ region2, data = crime2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -204.0 -49.3 -8.9 32.0 323.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 127.2 16.0 7.94 0.0000000000013 ***
## region22 52.8 46.7 1.13 0.26
## region23 132.4 18.6 7.12 0.0000000000925 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 87.7 on 117 degrees of freedom
## Multiple R-squared: 0.307, Adjusted R-squared: 0.295
## F-statistic: 25.9 on 2 and 117 DF, p-value: 0.000000000491
The output shows two coefficients of region2. In region2, 2 means “Sydney Surrounds”, and 3 means “Rural or Regional Areas”. Thus, the coefficient of ‘region22’ is for “Sydney Surrounds” and that of ‘region23’ is for Rural or Regional Areas”. In this dummy variable, the reference category is “Greater Metropolitan Sydney”. As a result, the interpretation is that the sexual offence rate of LGAs in Sydney Surrounds is higher than that of LGAs in Greater Metropolitan Sydney by 52.8. However, this difference is NOT statistically significant (the p-value is greater than .05). It also indicate that the sexual offence rate of LGAs in Rural and Regional Areas is higher than that of LGAs in Greater Metropolitan Sydney by 132.4. And this difference is statistically significant (the p-value is less than .05).
Third, we estimate the effect of region on the rate of sexual offences after controlling for the median sales price of houses, the median age of residents, and the unemployment rate.
model.3.3 <- lm(sexoff ~ urban + housmedprice + medage + unemploy, data = crime2)
summary(model.3.3)
##
## Call:
## lm(formula = sexoff ~ urban + housmedprice + medage + unemploy,
## data = crime2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -188.87 -53.53 -2.45 42.31 212.38
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 273.411 77.288 3.54 0.00058 ***
## urban1 -61.209 19.368 -3.16 0.00201 **
## housmedprice -0.389 0.110 -3.53 0.00060 ***
## medage -2.385 1.529 -1.56 0.12144
## unemploy 19.817 4.940 4.01 0.00011 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 82.3 on 115 degrees of freedom
## Multiple R-squared: 0.399, Adjusted R-squared: 0.378
## F-statistic: 19.1 on 4 and 115 DF, p-value: 0.00000000000454
Fourth, we estimate the effect of region on the rate of sexual offences fter controlling for the median sales price of houses, the median age of residents, and the unemployment rate.
model.3.4 <- lm(sexoff ~ region2 + housmedprice + medage + unemploy, data = crime2)
summary(model.3.4)
##
## Call:
## lm(formula = sexoff ~ region2 + housmedprice + medage + unemploy,
## data = crime2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -153.1 -47.7 -13.7 33.7 240.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 207.3124 72.8374 2.85 0.00525 **
## region22 73.0265 46.4907 1.57 0.11901
## region23 141.0931 30.3602 4.65 0.0000091 ***
## housmedprice -0.0716 0.1407 -0.51 0.61176
## medage -4.4925 1.5875 -2.83 0.00550 **
## unemploy 18.0721 4.6425 3.89 0.00017 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 78.9 on 114 degrees of freedom
## Multiple R-squared: 0.454, Adjusted R-squared: 0.43
## F-statistic: 18.9 on 5 and 114 DF, p-value: 0.000000000000116
Finally, we estimate the effect of urban and region on the rate of sexual offences fter controlling for the median sales price of houses, the median age of residents, and the unemployment rate.
model.3.5 <- lm(sexoff ~ urban + region2 + housmedprice + medage + unemploy, data = crime2)
summary(model.3.5)
##
## Call:
## lm(formula = sexoff ~ urban + region2 + housmedprice + medage +
## unemploy, data = crime2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -160.62 -50.81 0.96 31.28 236.06
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 254.7107 71.5715 3.56 0.00055 ***
## urban1 -58.8171 18.3154 -3.21 0.00172 **
## region22 100.3419 45.5029 2.21 0.02947 *
## region23 138.7570 29.2004 4.75 0.0000060 ***
## housmedprice 0.0911 0.1445 0.63 0.52952
## medage -5.7397 1.5750 -3.64 0.00041 ***
## unemploy 22.6293 4.6839 4.83 0.0000043 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 75.8 on 113 degrees of freedom
## Multiple R-squared: 0.499, Adjusted R-squared: 0.473
## F-statistic: 18.8 on 6 and 113 DF, p-value: 0.00000000000000463
Creating a Regression Table
Let’s make a regression table that shows all the results of dummy variable analysis.
tab_model(model.3.1, model.3.2, model.3.3, model.3.4, model.3.5,
pred.labels = c("Intercept", "Urban (ref. = Rural)",
"Sydney surrounds (ref. = Greater Sydney)",
"Rural or regional (ref. = Greater Sydney)",
"Median sale price of houses",
"Median age of residents",
"Unemployment rates"),
dv.labels = c("Model 1", "Model 2", "Model 3", "Model 4", "Model 5"),
title = "Regional Differences in Sexual Offence Rates", digits = 3,
show.se = TRUE, show.ci = FALSE, collapse.se = TRUE, p.style = "stars")
Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | |
---|---|---|---|---|---|
Predictors | Estimates | Estimates | Estimates | Estimates | Estimates |
Intercept |
270.596 *** (13.899) |
127.190 *** (16.010) |
273.411 *** (77.288) |
207.312 ** (72.837) |
254.711 *** (71.571) |
Urban (ref. = Rural) |
-78.972 *** (18.069) |
-61.209 ** (19.368) |
-58.817 ** (18.315) |
||
Sydney surrounds (ref. = Greater Sydney) |
52.835 (46.676) |
73.026 (46.491) |
100.342 * (45.503) |
||
Rural or regional (ref. = Greater Sydney) |
132.446 *** (18.594) |
141.093 *** (30.360) |
138.757 *** (29.200) |
||
Median sale price of houses |
-0.389 *** (0.110) |
-0.072 (0.141) |
0.091 (0.144) |
||
Median age of residents |
-2.385 (1.529) |
-4.492 ** (1.587) |
-5.740 *** (1.575) |
||
Unemployment rates |
19.817 *** (4.940) |
18.072 *** (4.643) |
22.629 *** (4.684) |
||
Observations | 120 | 120 | 120 | 120 | 120 |
R2 / R2 adjusted | 0.139 / 0.132 | 0.307 / 0.295 | 0.399 / 0.378 | 0.454 / 0.430 | 0.499 / 0.473 |
|
We use codes similar to what we used for the multiple regression model. But we changed variable names using pred.labels
. pred.labels = c()
sets the labels of independent variables displayed in the first column of the table.
Lab 11 Participation Activity |
No Lab Participation Activity this week. Completing R Analysis Task 4 will contribute to your participation mark. |
The R codes you have written so far look like:
################################################################################
# Lab 11: Multiple Regression
# 29/5/2024
# SOCI8015 & SOCX8015
################################################################################
# Load packages
library(dplyr)
library(sjlabelled)
library(sjmisc)
library(summarytools)
library(sjPlot)
# Avoid scientific notation
options(digits=5, scipen=15)
# Load datasets
crime <- readRDS("nsw-lga-crime-2023.RDS")
# Creating a Dataset of Complete Cases
crime2 <- crime %>%
select(code, lga, medage, unemploy, housmedprice, sexoff, region, urban)
crime2 <- crime2 %>%
mutate(mark = 0)
crime2 <- crime2 %>%
mutate(mark = replace(mark, complete.cases(crime2), 1))
crime2 <- crime2 %>%
filter(mark == 1)
# Changing the Scale of Variables
crime2 <- crime2 %>%
mutate(housmedprice = housmedprice/10000)
crime2$housmedprice <- set_label(crime2$housmedprice, label = "Median sale price of house ($10,000)")
# Simple regression: Model 1
model.2.1 <- lm(sexoff ~ housmedprice, data = crime2)
summary(model.2.1)
# Simple regression: Model 2
model.2.2 <- lm(sexoff ~ medage, data = crime2)
summary(model.2.2)
# Simple regression:Model 3
model.2.3 <- lm(sexoff ~ unemploy, data = crime2)
summary(model.2.3)
# Multiple regression: Model 4
model.2.4 <- lm(sexoff ~ housmedprice + medage, data = crime2)
summary(model.2.4)
# Multiple regression: Model 5
model.2.5 <- lm(sexoff ~ housmedprice + unemploy, data = crime2)
summary(model.2.5)
# Multiple regression: Model 6
model.2.6 <- lm(sexoff ~ housmedprice + medage + unemploy, data = crime2)
summary(model.2.6)
# Creating a regression table
tab_model(model.2.1, model.2.2, model.2.3, model.2.4, model.2.5, model.2.6,
dv.labels = c("Model 1", "Model 2", "Model 3", "Model 4", "Model 5", "Model 6"),
title = "Regression Coefficients on Sexual Offence Rates", digits = 3,
show.se = TRUE, show.ci = FALSE, collapse.se = TRUE, p.style = "stars")
# Check the variable of urban
frq(crime2$urban)
# convert into a factor for dummy
crime2$urban <- to_factor(crime2$urban)
# make a dummy variable for rural in which urban are the reference group
crime2$rural <- ref_lvl(crime2$urban, lvl = 1)
# Creating a dummy variable for region
frq(crime2$region)
crime2 <- rec(crime2, region, rec = "1=1; 2=2; 3:13=3")
# rename variable
crime2 <- var_rename(crime2, region_r = region2)
# Assign variable name and value labels
crime2$region2 <- set_label(crime2$region2, label = "Three regions")
crime2$region2 <- set_labels(crime2$region2, labels = c("Greater Metropolitan Sydney" = 1,
"Sydney Surrounds" = 2,
"Rural and Regional Areas" = 3))
# convert into a factor for dummy
crime2$region2 <- to_factor(crime2$region2)
# Dummy Variable Analysis
model.3.1 <- lm(sexoff ~ urban, data = crime2)
summary(model.3.1)
model.3.1.2 <- lm(sexoff ~ rural, data = crime2)
summary(model.3.1.2)
model.3.3 <- lm(sexoff ~ urban + housmedprice + medage + unemploy, data = crime2)
summary(model.3.3)
model.3.2 <- lm(sexoff ~ region2, data = crime2)
summary(model.3.2)
model.3.4 <- lm(sexoff ~ region2 + housmedprice + medage + unemploy, data = crime2)
summary(model.3.4)
model.3.5 <- lm(sexoff ~ urban + region2 + housmedprice + medage + unemploy, data = crime2)
summary(model.3.5)
# Creating regression table
tab_model(model.3.1, model.3.2, model.3.3, model.3.4, model.3.5,
pred.labels = c("Intercept", "Urban (ref. = Rural)",
"Sydney surrounds (ref. = Greater Sydney)",
"Rural or regional (ref. = Greater Sydney)",
"Median sale price of houses",
"Median age of residents",
"Unemployment rates"),
dv.labels = c("Model 1", "Model 2", "Model 3", "Model 4", "Model 5"),
title = "Regional Differences in Sexual Offence Rates", digits = 3,
show.se = TRUE, show.ci = FALSE, collapse.se = TRUE, p.style = "stars")