# Example 7.12, A Linear Probability Model of Arrests
# Data set: crime1

# Function for result reporting
source("_report.R")

# Load the data, create a new variable and estimate the model
load("crime1.Rdata")
data$arr86=as.numeric(data$narr86!=0)
model=lm(arr86~pcnv+avgsen+tottime+ptime86+qemp86,data=data)
dig=c(3,3,4,4,3,3,4)

# Describe the model
cat("This example uses the arrest data set that was used in Example 5.3. In the previous example, we estimated the model: narr86 = beta0 + beta1 * pcnv + beta2 * avgsen + beta3 * tottime + beta4 * ptime86 + beta5 * qemp86 + u",
    "\nwhere narr86 is ", paste(desc[desc[,1]=="narr86",2]),
    "\npcnv is ", paste(desc[desc[,1]=="pcnv",2]),
    "\navgsen is ", paste(desc[desc[,1]=="avgsen",2]),
    "\ntottime is ", paste(desc[desc[,1]=="tottime",2]),
    "\nptime86 is ", paste(desc[desc[,1]=="ptime86",2]),
    "\nand qemp86 is ", paste(desc[desc[,1]=="qemp86",2]),
    "\nHere, we change the dependent variable to arr86, a dummy variable which equals 1 if one is arrested in 1986, and 0 otherwise. This makes the model a linear probability model",
    sep="")

# Report results
{
  cat("The estimated regression line is")
  reportreg(model,dig)
}

# Interpretation
cat("The intercept, ", printcoef(model,1,dig[1]), ", is the predicted probability of one getting arrested when all the other independent variables equal zero, i.e. one who has not been convicted (and so pcnv = 0, avgsen = 0), has spent no time in prison since age 18, spent no time in prison in 1986, and was unemployed during the entire year",
    "\nThe variables avgsen and tottime are insignificant both individually and jointly",
    "\nThe effect of pcnv is statistically significant, but its magnitude should be interpreted carefully because pcnv is a proportion between 0 and 1. Increasing pcnv by 0.5 is predicted to decrease the probability by ",
    round(as.numeric(printabscoef(model,2,dig[2]))*0.5,3),
    "\nAn increase of six months in ptime86 is predicted to decrease the probability by ",
    round(as.numeric(printabscoef(model,5,dig[5]))*6,3), ". It is worth noting that the linear probability model cannot be true over all ranges of the independent variables. If a man is in prison all 12 months of 1986, he cannot be arrested in 1986. Setting all other variables equal to zero, the predicted arr86 when ptime86 = 12 is ",
    printcoef(model,1,dig[1]), " - ", printabscoef(model,5,dig[5]), "(12) = ",
    round(as.numeric(printcoef(model,1,dig[1]))+12*as.numeric(printcoef(model,5,dig[5])),3),
    ", which is not zero. Nevertheless, if we start from the unconditional probability of arrest, ",
    round(mean(model$model$arr86),3), ", 12 months in prison reduces the probability to essentially zero: ",
    round(mean(model$model$arr86),3), " - ", printabscoef(model,5,dig[5]), "(12) = ",
    round(round(mean(model$model$arr86),3)+12*as.numeric(printcoef(model,5,dig[5])),3),
    "\nA man employed in all four quarters of 1986 is predicted to be ",
    printabscoef(model,6,dig[6]), "(4) = ", as.numeric(printabscoef(model,6,dig[6]))*4,
    " less likely to be arrested than one who is unemployed throughout the year",
    sep="")