library(network)
library(sna)
library(latentnet)
One of the simplest divisions in type of networks is between directed and undirected networks. Undirected networks consist of ties between actors, and do not have any directionality e.g. trade networks and military alliances. Directed networks still measure ties between actors, but also capture which created the tie e.g. which state in an alliance proposed the agreement or which combatant in a conflict initiated the violence. The data in this lab are from Cranmer et al. (2017), and are a network of organizations in the Swiss climate change mitigation network. We have a number of exogenous covariates that we wish to use to explain the formation of ties:
Read in these data from the .csv
files.
# policy forum affiliation data
# 1 = affiliation; 0 = no affiliation
# committee names are in the column labels; organizations in the row labels
forum <- as.matrix(read.table(file = 'climate0205-committee.csv',
header = T, row.names = 1, sep = ';'))
# influence reputation data
# square matrix with influence attribution
# 1 = influential; 0 = not influential
# cells contain the ratings of row organizations about column organizations
infrep <- as.matrix(read.table(file = 'climate0205-rep.csv',
header = T, row.names = 1, sep = ';'))
# collaboration; directed network
collab <- as.matrix(read.table(file = 'climate0205-collab.csv',
header = T, row.names = 1, sep = ';'))
# type of organization; vector with five character types
types <- as.character(read.table(file='climate0205-type.csv',
header = T, row.names = 1, sep = ';')[, 2])
# alliance-opposition perception; -1 = row organization perceives column organization as
# an opponent; 1 = row organization perceives column organization as an ally; 0 = neutral
allopp <- as.matrix(read.table(file = 'climate0205-allop.csv',
header = T, row.names = 1, sep = ';'))
# preference dissimilarity; Manhattan distance over four important policy issues
prefdist <- as.matrix(read.table(file = 'climate0205-prefdist.csv',
header = T, row.names = 1, sep = ';'))
Next we need to prepare the covariates for use in a network model. Multiple the forum matrix by its transpose to compute the one mode project of this membership matrix. Then create a matrix where each entry denotes whether its organization pair is a private-NGO pairing (hint: will this matrix by symmetric?). Finally, create matrices that capture whether the target of a tie is a government organization and whether the sender of a tie is an NGO organization.
# compute one-mode projection over different policy forums
forum <- forum %*% t(forum)
# 0 out the diagonal because it has no meaning
diag(forum) <- 0
# create matrix capturing all private-NGO pairs
priv_ngo <- matrix(0, nrow = nrow(collab), ncol = ncol(collab))
for (i in 1:nrow(priv_ngo)) {
for (j in 1:ncol(priv_ngo)) {
if ((types[i] == 'private' && types[j] == 'ngo') ||
(types[i] == 'ngo' && types[j] == 'private')) {
priv_ngo[i, j] <- 1
priv_ngo[j, i] <- 1
}
}
}
# create matrix capturing whether alter is a government organization
gov_alt <- matrix(rep(as.numeric(types == 'gov'), length(types)), byrow = T,
nrow = length(types))
# create matrix capturing whether ego is an NGO
ngo_ego <- matrix(rep(as.numeric(types == 'ngo'), length(types)), byrow = F,
nrow = length(types))
Which of these matrices will be symmetric? Which will be asymmetric? Why are some symmetric and some asymmetric? Finally, we need to convert the collaboration matrix to a network object so that we can fit a latent network model.
# create network object
nw_collab <- network(collab)
# inspect network object
nw_collab
## Network attributes:
## vertices = 34
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 207
## missing edges= 0
## non-missing edges= 207
##
## Vertex attribute names:
## vertex.names
##
## No edge attributes
We can see that we have no edge attributes and onle one vertex attribute: the names of the vertices themselves. Since the point of a latent space network is the ability to use exogenous covariates to predict ties, we need to get some covariates in our network object. We’ll be using several measures of network position as well as exogenous covariates to explain this network.
Betweenness centrality is a measure of how central a vertex is for paths connecting other vertices. It is calculated by:
\[ g(v) = \sum_{s\neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}} \]
where \(\sigma_{st}\) is the total number of shortest paths from vertex \(s\) and vertex \(t\), and \(\sigma_{st}(v)\) is the number of those shortest paths that pass through \(v\).
In a directed network, indegree centrality is the number of ties where a vertex is the target. In matrix form, it is calculated as:
\[ k^{in} = \mathbf{Ae} \]
where \(\mathbf{A}\) is the adjacency matrix of the network and \(\mathbf{e}\) is an \(n\) length vector of 1s i.e. the row sum of the adjacency matrix. Outdegree centrality is the number of ties where a vertex is the sender, and is the column sum of the adjacency matrix. Use the set.vertex.attribute()
function to add information on organization type, betweenness centrality, and degree centrality in the subjective influence network to each vertex.
# set node attribute for organization type
set.vertex.attribute(nw_collab, 'orgtype', types)
# set node attribute for betweenness centrality
set.vertex.attribute(nw_collab, 'betweenness', betweenness(nw_collab))
# set node attribute for degree centrality in influence network
set.vertex.attribute(nw_collab, 'influence', degree(infrep, gmode = 'digraph', cmode = 'indegree'))
# inspect vertex attributes of a random organization
nw_collab$val[[17]]
## $na
## [1] FALSE
##
## $vertex.names
## [1] "AQ"
##
## $orgtype
## [1] "party"
##
## $betweenness
## [1] 24
##
## $influence
## [1] 17
Now we’re ready to fit a latent space network model. The ergmm()
function allows us to fit these models, but the forumal is significantly different from those we’re used to seeing. It still starts nw_collab ~
since the network is our response variable, but the right side is where things get weird. We can’t just include our covariates; we have to specify how they enter the model. The nodematch()
argument calculates the homophily of a vertex for the given attribute; in our case, we want to include the homophily for organization type, so use nodematch()
on our organization type vertex attribute. The edgecov()
argument includes the edge values of a given matrix in our model, so use this argument with the government target, NGO sender, private-NGO pairing, forum membership, subjective influence, preference distance, and percevied alliance-opposition matrices (one call per matrix). The nodeicov()
argument includes the vertex values of a given attribute of the network, so include our indegree centrality influence measure, along with the absdiff()
argument, which will include the absolute difference in influence between each vertex. Finally, we need to define the dimensionality of our latent space. Use the euclidean()
argument and set d = 2
and G = 0
for a latent space network with two dimensions and no clusters.
# set seed to use in GOF statistics later
seed <- 01110011
mod_0c <- ergmm(nw_collab ~
nodematch('orgtype') +
edgecov(gov_alt) +
edgecov(ngo_ego) +
edgecov(priv_ngo) +
edgecov(forum) +
edgecov(infrep) +
edgecov(prefdist) +
edgecov(allopp) +
nodeicov('influence') +
absdiff('influence') +
euclidean(d = 2, G = 0),
seed = seed,
control = control.ergmm(sample.size = 10000, burnin = 50000, interval = 100))
Let’s take a quick look at the output of our model.
summary(mod_0c)
## NOTE: It is not certain whether it is appropriate to use latentnet's BIC to select latent space dimension, whether or not to include actor-specific random effects, and to compare clustered models with the unclustered model.
##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: nw_collab ~ nodematch("orgtype") + edgecov(gov_alt) + edgecov(ngo_ego) +
## edgecov(priv_ngo) + edgecov(forum) + edgecov(infrep) + edgecov(prefdist) +
## edgecov(allopp) + nodeicov("influence") + absdiff("influence") +
## euclidean(d = 2, G = 0)
## Attribute: edges
## Model: Bernoulli
## MCMC sample of size 10000, draws are 100 iterations apart, after burnin of 50000 iterations.
## Covariate coefficients posterior means:
## Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
## (Intercept) -1.9936 -2.8848 -1.11 <2e-16 ***
## nodematch.orgtype 1.4304 0.8222 2.02 0.0002 ***
## edgecov.gov_alt 0.6999 0.0688 1.35 0.0278 *
## edgecov.ngo_ego 1.9418 1.0577 2.89 <2e-16 ***
## edgecov.priv_ngo -1.4141 -2.5814 -0.34 0.0084 **
## edgecov.forum 1.1867 0.4598 1.90 0.0012 **
## edgecov.infrep 1.6127 1.0942 2.15 <2e-16 ***
## edgecov.prefdist -0.9956 -1.9536 -0.02 0.0458 *
## edgecov.allopp 1.4552 0.9756 1.94 <2e-16 ***
## nodeicov.influence 0.1236 0.0840 0.17 <2e-16 ***
## absdiff.influence -0.0609 -0.1097 -0.01 0.0200 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Overall BIC: 932
## Likelihood BIC: 661
## Latent space/clustering BIC: 271
##
## Covariate coefficients MKL:
## Estimate
## (Intercept) 1.519
## nodematch.orgtype 1.277
## edgecov.gov_alt -0.246
## edgecov.ngo_ego -0.570
## edgecov.priv_ngo -1.415
## edgecov.forum 1.477
## edgecov.infrep -0.231
## edgecov.prefdist -1.809
## edgecov.allopp -0.199
## nodeicov.influence -0.061
## absdiff.influence 0.054
Coefficients indicate the effect of each statistic on the probability of a tie between organizations \(i\) and \(j\). For example, the edgecov.gov_alt
coefficient means that an edge is more likely to another organization when it is a governmental one than when it is another type. Similarly, the edgecov.prefdist
coefficient means that a tie less likely between two organizations with greater preference distance. Least surprising of all, the presence of a perceived alliance between two organizations makes ties between them more likely.
Now let’s fit two more networks with the same covariates, but one with one cluster, and one with two clusters.
# one cluster
mod_1c <- ergmm(nw_collab ~
nodematch('orgtype') +
edgecov(gov_alt) +
edgecov(ngo_ego) +
edgecov(priv_ngo) +
edgecov(forum) +
edgecov(infrep) +
edgecov(prefdist) +
edgecov(allopp) +
nodeicov('influence') +
absdiff('influence') +
euclidean(d = 2, G = 1),
seed = seed,
control = control.ergmm(sample.size = 10000, burnin = 50000, interval = 100))
# two clusters
mod_2c <- ergmm(nw_collab ~
nodematch('orgtype') +
edgecov(gov_alt) +
edgecov(ngo_ego) +
edgecov(priv_ngo) +
edgecov(forum) +
edgecov(infrep) +
edgecov(prefdist) +
edgecov(allopp) +
nodeicov('influence') +
absdiff('influence') +
euclidean(d = 2, G = 2),
seed = seed,
control = control.ergmm(sample.size = 10000, burnin = 50000, interval = 100))
Let’s compare the results for all three models:
texreg::htmlreg(list(mod_0c, mod_1c, mod_2c), html.tag = F, head.tag = F, body.tag = F)
Model 1 | Model 2 | Model 3 | ||
---|---|---|---|---|
(Intercept) | -1.99* | -2.03* | -2.06* | |
[-2.88; -1.11] | [-2.93; -1.14] | [-2.94; -1.16] | ||
nodematch.orgtype | 1.43* | 1.42* | 1.41* | |
[0.82; 2.02] | [0.82; 2.02] | [0.81; 2.01] | ||
edgecov.gov_alt | 0.70* | 0.70* | 0.69* | |
[0.07; 1.35] | [0.05; 1.33] | [0.07; 1.34] | ||
edgecov.ngo_ego | 1.94* | 1.96* | 2.00* | |
[1.06; 2.89] | [1.10; 2.89] | [1.08; 3.07] | ||
edgecov.priv_ngo | -1.41* | -1.34* | -1.36* | |
[-2.58; -0.34] | [-2.57; -0.22] | [-2.50; -0.32] | ||
edgecov.forum | 1.19* | 1.20* | 1.18* | |
[0.46; 1.90] | [0.47; 1.93] | [0.44; 1.89] | ||
edgecov.infrep | 1.61* | 1.63* | 1.61* | |
[1.09; 2.15] | [1.12; 2.13] | [1.10; 2.16] | ||
edgecov.prefdist | -1.00* | -0.98* | -0.99* | |
[-1.95; -0.02] | [-1.96; -0.02] | [-1.93; -0.02] | ||
edgecov.allopp | 1.46* | 1.48* | 1.44* | |
[0.98; 1.94] | [1.02; 1.97] | [0.99; 1.92] | ||
nodeicov.influence | 0.12* | 0.12* | 0.12* | |
[0.08; 0.17] | [0.08; 0.17] | [0.08; 0.16] | ||
absdiff.influence | -0.06* | -0.06* | -0.06* | |
[-0.11; -0.01] | [-0.11; -0.01] | [-0.11; -0.01] | ||
BIC (Overall) | 931.74 | 929.84 | 883.55 | |
BIC (Likelihood) | 660.52 | 649.31 | 654.56 | |
BIC (Latent Positions) | 271.22 | 280.53 | 228.99 | |
* 0 outside the confidence interval |
Unfortunately, we can’t use BIC for model comparison between latent space network models, so lets calculate some other goodness of fit statistics for each model. The gof()
function allows us to supply the statistics we want (and there a are a lot of them for network models). Calculate the dyad−wise and edge-wise shared partners for each model, as well as the indegree and outdegree centrality. Make sure you set control.gof.ergmm(seed = seed)
so that these caluclations use the same seed as the models. After obtaining these GOF statistics, use plot()
to plot them against the observed values of the statistics for the collaboration network.
# goodness of fit assessments
gof_0c <- gof(mod_0c, GOF = ~ dspartners + espartners +
idegree + odegree, control = control.gof.ergmm(seed = seed))
gof_1c <- gof(mod_1c, GOF = ~ dspartners + espartners +
idegree + odegree, control = control.gof.ergmm(seed = seed))
gof_2c <- gof(mod_2c, GOF = ~ dspartners + espartners +
idegree + odegree, control = control.gof.ergmm(seed = seed))
par(mfrow = c(2,2))
plot(gof_0c, main = 'no cluster model goodness of fit')
par(mfrow = c(2,2))
plot(gof_1c, main = 'one cluster model goodness of fit')
par(mfrow = c(2,2))
plot(gof_2c, main = 'two cluster model goodness of fit')
If a model fits the data well, then the thick black line, which represents the statistics for the observed network, should intersect the box plots which represent the distributions of these statistics calculated on 100 simulated networks. We can’t really say whether the zero or one cluster model is better, but we can proably rule out the two cluster one.
Finally, we can plot the actual latent network estimated by our model. Note that this is only really informative for \(d \leq 2\) due to the difficulty of visualizing higher dimensional models. The + represents the center of the latent space in the no cluster model, and the center(s) of the cluster(s) in the other models.
plot(mod_0c, labels = T, print.formula = F,
main = 'no cluster model latent positions')
plot(mod_1c, labels = T, print.formula = F,
main = 'one cluster model latent positions')
plot(mod_2c, labels = T, print.formula = F,
main = 'two cluster model latent positions')
Notice that the one cluster model has a group near the center that is tightly clustered, but so does the no cluster model. However, in the no cluster model, these groups appear to be relatively in a line, while the one cluster model places them in more of a blob. There are significant differences in the relative positions of organizations in each model – where are AK, AR, and AZ in each model? Theory and cross-validation become especially important if such disparate models can each fit the data relatively similarly.
Cranmer, Skyler J., Philip Leifeld, Scott D. McClurg, and Meredith Rolfe. 2017. “Navigating the Range of Statistical Tools for Inferential Network Analysis.” American Journal of Political Science 61 (1): 237–51.