- Network mapping, especially in internet-limited settings, should be done off-line;
- Network mapping (probably anywhere) should be done face-to-face. Otherwise respondents are unlikely to respond;
- One should always pilot their data collection tools!
statnet suite of packages. Feel free to downloadStep 1: Make sure the packages are installed
Install the latest version ofergm and sna: install.packages('ergm'); install.packages('sna')library(ergm)
library(sna)col.list <-c("white", "darkblue", "cornflowerblue", "darkorange1", "darkred")
palette(col.list)Step 2: Import data and convert to network
ONASurveys.com exported the data as an edge-list, and I had to do some extensive work to delete duplicate names, ensure IDs matched, etc. But you can use the cleaned files. There is one for each network, as well as an attributes file that is used for all the networks.Save the files somewhere and set that folder as your working directory in R.
setwd("~/Dropbox/Dissertation_jan4/Conferences/Cape Town 2014")
attr <- read.csv(file="attr.csv", header=T, stringsAsFactors=FALSE)
social <-read.csv(file="social.csv", header=TRUE, stringsAsFactors=TRUE)
spre <- network(social, matrix.type="edgelist", directed=F)
smat <- as.matrix(spre)
snet <- network(smat, matrix.type="adjacency", directed=F, vertex.attr=attr)summary(snet)summary() command shows us the network vertices, edges and density. Note that this is the entire network of all 1515 participants, 90% who did not complete the survey. Let’s plot that network to see what it looks like (and then we’ll get rid of the isolates).Step 3: Plot the network
plot.network(snet, edge.col="darkgrey", vertex.border="black")You can see a cluster of activity in the center, with edges in grey. But otherwise this is not a helpful graph. Let’s delete isolates and check out the summary stats.
sno_iso <- delete.vertices(snet, which(degree(snet)<1))summary(sno_iso) command to see ALL the details, or simply:centralization(sno_iso, degree, mode="graph")## [1] 0.1035052network.density(sno_iso)## [1] 0.007237999Do people socialize with others from their region?
s2coord <- plot.network(sno_iso, edge.col="darkgrey", vertex.border="black")plot.network(sno_iso,  coord=s2coord, vertex.col="region", edge.col="darkgrey", vertex.border="black")
legend("bottomleft", legend=c("Africa", "Americas", "Europe", "South-East Asia", "Unknown"), pch=21,
       cex=1, pt.bg=c("darkblue", "cornflowerblue", "darkorange1", "darkred", "white"))Again, it is somewhat difficult to tell with the missing attribute data, but it doesn’t seem as though there is clustering by region. (On a side note, if these data were very important, I could look up all the alters’ regions. For smaller networks this would certainly be worth it).
Is sociality based on similar organization?
Hmm… first of all, we see that most of our respondents are from research organizations. Second, they seem to be more central in the network. Are they more likely than chance to eat lunch with other researchers? We will find out soon. But first, let’s examine by age.
Finally, which nodes are in the most strategic position to broker other nodes? This is measured by betweenness centrality, and can be applied to understand who the brokers are, and how to most efficiently disseminate ideas or information. We will calculate the betweenness centrality scores for all nodes, and then size our graphed nodes according to their betweenness.
s2between<-betweenness(sno_iso, g=1, gmode="graph", cmode="undirected")plot.network(sno_iso, coord=s2coord, vertex.col="region", vertex.cex=s2between/1500, edge.col="darkgrey", vertex.border="black")
legend("bottomleft", legend=c("Africa", "Americas", "Europe", "South-East Asia", "Unknown"), pch=21,
       cex=1, pt.bg=c("darkblue", "cornflowerblue", "darkorange1", "darkred", "white"))The most strategically located brokers are from South-East Asia.
Step 4: Construct ergm models to test hypotheses about why conference participants socialize with each other
Ok, now let’s examine these inergm models. Exponential random graph models (ergm) are a class of logistic regression model that allow us to test hypotheses related to dyads, i.e., network ties/edges. See the statnet website for a list of ergm resources. I highly recommend “Birds of a Feather or Friend of a Friend” by Goodreau, Kitts and Morris (2009) for both a master class in ergm modeling as well as a wonderful application to adolescent friendship networks.Unlike traditional statistical models, where the covarariates are some function of the units of analysis, ergm models allow us to alo reprensent covariates that are functions of the network itself. I usually build my ergms in two waves: 1. A set of attribute-only models where covariates are tested separately and then added to the final model stepwise if they improve model fit; 2. A set of structural-only models (following the same process as above).
Starting with the attributes, let’s test each in a model with an edges term, which is like an intercept in a traditional regression model.
smodel.02 <- ergm(sno_iso ~ edges+nodematch("region"))
summary(smodel.02)## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   sno_iso ~ edges + nodematch("region")
## 
## Iterations:  20 
## 
## Monte Carlo MLE Results:
##                  Estimate Std. Error MCMC % p-value    
## edges            -3.66743    0.05216     NA  <1e-04 ***
## nodematch.region -3.02832    0.14465     NA  <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 82741  on 59685  degrees of freedom
##  Residual Deviance:  4375  on 59683  degrees of freedom
##  
## AIC: 4379    BIC: 4397    (Smaller is better.)Let’s look at some models with structural covariates. What are these magical structural covariates? They are underlying social processes which have been documented empirically to occur more than chance alone. Today we will examine transitivity, or triangle formation, which describes the propensity for people to form relationships with ‘friends of friends.’ Transitivity has many implications. Think of triangles, literally, as cliques. Cliques might be fun for lunch, but they are not conducive to exposure to new ideas, innovation, behavior or policy change, etc. In the first model I test whether social ties are more likely to exist if they close a triangle. We expect that a person is more likely to socialize with their friends’ friends.
*A note about missing edge data: While our structural models will not be affected by missing attribute data, they will be affected by missing edge data. We only know the edges of respondents, not the edges of the alters
smodel.06 <- ergm(sno_iso ~ edges+gwesp)summary(smodel.06)## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   sno_iso ~ edges + gwesp
## 
## Iterations:  20 
## 
## Monte Carlo MLE Results:
##             Estimate Std. Error MCMC % p-value    
## edges       -5.19997    0.05653      0  <1e-04 ***
## gwesp        0.99650    0.17815      0  <1e-04 ***
## gwesp.alpha  1.18096    0.10367      0  <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 82741  on 59685  degrees of freedom
##  Residual Deviance:  5011  on 59682  degrees of freedom
##  
## AIC: 5017    BIC: 5044    (Smaller is better.)gwesp term measures the change in log odds of a tie forming between two nodes given that this tie will close a triangle. Yes, even with missing edges, ties are more likely to exist if they close a triangle between three nodes. This is the result we expected for sociality, but let’s check to see whether this happens with collaboration ties. We would expect that people are more likely to collaborate with their collaborators’ collaborators.cmodel.06 <- ergm(cno_iso ~ edges+gwesp)summary(cmodel.06)## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   cno_iso ~ edges + gwesp
## 
## Iterations:  20 
## 
## Monte Carlo MLE Results:
##             Estimate Std. Error MCMC %  p-value    
## edges       -4.96450    0.06448      0  < 1e-04 ***
## gwesp        0.93190    0.25254      0 0.000225 ***
## gwesp.alpha  1.27732    0.12876      0  < 1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 50343  on 36315  degrees of freedom
##  Residual Deviance:  3729  on 36312  degrees of freedom
##  
## AIC: 3735    BIC: 3761    (Smaller is better.)There are a few ways to deal with missing edge data:
- We could have asked the respondents to report their alters’ edges (this is typical in ego-network sampling, but its accuracy depends on the relationship being measured)
- We can remove the nodes we didn’t interview (and thus their edges). This will leave us with a network of complete edges, but not a complete network. I.e., the network we will be left with is not a real network. But neither is the missing edges network…
- We could try to impute edges based on attribute data. Wait! This is what we’d do if we didn’t know that edges are predicted not just on attributes, but also on network structure! Network dependencies make it difficult to impute edges. Man, these networks! Next time
 
No comments:
Post a Comment