Use fl2003.RData
, which is a cleaned up version of the data from Fearon and Laitin (2003). Fit a model where onset is explained by all variables. Use cv.glmnet()
to fit elastic net models for a variety of \(\alpha\) values, using a loss function that is appropriate for the binomial nature of the data. Present plots of the model’s predictive accuracy for different \(\alpha\) values. Fit a model with glmnet()
using the \(\alpha\) value you found that minimizes predictive error. Report coefficient estimates for all variables, and plot the changes in coefficient values vs. the L1 norm, log-lambda value, and deviance explained.
Randomly sample five cases where onset = 0
and five where onset = 1
. Fit an elastic net with the optimal \(\alpha\) value you found for the whole dataset. Are the most important coefficients the same?
library(dplyr)
library(glmnet)
library(caret)
library(parallel)
load('fl2003.RData')
# create predictors and response
fl_x <- as.matrix(fl[, -1])
fl_y <- as.factor(fl$onset)
# sequences of alpha values to evaluate
alphas <- seq(0, 1, by = .1)
# cross validation elastic nets for different penalty parameters
fits <- mclapply(alphas, function(x) cv.glmnet(fl_x, fl_y, type.measure = 'auc',
alpha = x, family = 'binomial'))
# plot AUC for different penalty parameters
par(mfrow = c(3,1))
plot(fits[[1]])
plot(fits[[6]])
plot(fits[[11]])
# penalty parameter w/ highest AUC
alpha_best <- which.max(sapply(fits, function(x) max(x$cvm)))
# fit elastic net w/ best penalty parameters
best_fit <- glmnet(fl_x, fl_y, family = 'binomial', alpha = alphas[alpha_best])
# plot coefficients
par(mfrow = c(3,1))
plot(best_fit, xvar = 'norm', label = T)
plot(best_fit, xvar = 'lambda', label = T)
plot(best_fit, xvar = 'dev', label = T)
# coefficients from whole dataset
betas <- varImp(best_fit, lambda = best_fit$lambda[length(best_fit$lambda)])
# sample 10 observations for each outcome
fl_samp <- fl %>% group_by(onset) %>% sample_n(10)
# create predictors and response
fl_x_samp <- as.matrix(fl_samp[, -1])
fl_y_samp <- as.factor(fl_samp$onset)
# fit sample data w/ best penalty parameter from entire dataset
best_fit_samp <- glmnet(fl_x_samp, fl_y_samp, family = 'binomial', alpha = alphas[alpha_best])
# plot coefficients
plot(best_fit_samp, xvar = 'norm', label = T)
plot(best_fit_samp, xvar = 'lambda', label = T)
plot(best_fit_samp, xvar = 'dev', label = T)
# coefficients for sampled data
betas_samp <- varImp(best_fit_samp, lambda = best_fit_samp$lambda[length(best_fit_samp$lambda)])
# compare coefficients for both models
kable(data.frame(betas, betas_samp) %>% rename(All = Overall, Sample = Overall.1))
All | Sample | |
---|---|---|
warl | 1.0222 | 0.0000 |
gdpenl | 0.2353 | 2.3865 |
lpopl1 | 0.2668 | 7.3504 |
lmtnest | 0.2198 | 3.8555 |
ncontig | 0.7021 | 12.0521 |
Oil | 0.6626 | 1.0948 |
nwstate | 1.6557 | 0.0000 |
instab | 0.6097 | 8.2590 |
polity2l | 0.0247 | 0.8997 |
ethfrac | 0.0407 | 7.2016 |
relfrac | 0.0010 | 21.7585 |
western | 1.9047 | 0.0000 |
eeurop | 0.6906 | 0.0000 |
lamerica | 0.3673 | 9.8158 |
ssafrica | 0.0650 | 1.0215 |
asia | 0.2107 | 0.2941 |
Fearon, James D., and David D. Laitin. 2003. “Ethnicity, Insurgency, and Civil War.” The American Political Science Review 97 (1): 75–90.