Load the UCDP conflict termation data in conflict termination.csv
. Drop any asterisks from observations in Years
, and drop any text between parentheses in SideB
. Convert EpStartDate
to a date object, and then report the range of dates present in the data. Using Years
create a dataframe of conflicts that last more than one year, and a second dataframe of conflicts that only span one year.
library(lubridate)
# rad in conflict termination data
term <- read.csv('conflict termination.csv')
# drop asterisks
term$Years <- gsub('\\*', '', term$Years)
# drop anything between parentheses
term$SideB <- gsub('\\(.*\\)', '', term$SideB)
# convert episode start date to date
term$EpStartDate <- parse_date_time(term$EpStartDate, orders = c('mdy', 'dmy', 'ymd'))
# range of dates
range(term$EpStartDate)
## [1] "1946-01-01 UTC" "2009-12-17 UTC"
# create dataframe of multiyear conflicts
term[grep('.*-.*', term$Years), ]
# create dataframe of single year conflicts
term[!grepl('.*-.*', term$Years), ]