Network Models

Exponential Random Graphs and their Log-Linear Representations

by Philipp Burckhardt

Outline

  1. Motivating Example
  2. Features of Social Networks
  3. History of Social Network Models
  4. Exponential Random Graph Models
    • $\beta$-Model
    • $p_1$-Model
    • Log-linear representation
    • Exponential Random Graph Models (ERGMs)

Sampson Monk Data Set

Common Features of Social Networks

  • Mutuality of ties
  • Heterogeneity in propensity to form ties
  • Homophily of attributes (formation of ties due to similar personal traits)
  • Context is important $\implies$ triads, not dyads should be fundamental social unit, Simmel (1950)
  • ...

A Bit of History

  • Early work in Sociology (Moreno, Simmel)
  • Statistical Network Models 1970s (Holland & Leinhardt, Fienberg & Wasserman)
  • Journal Social Networks published since 1979

A Bit of History

  • Early work in Sociology (Moreno, Simmel)
  • Statistical Network Models 1970s (Holland & Leinhardt, Fienberg & Wasserman)
  • Journal Social Networks published since 1979
  • Spatial Statistics (Besag 1974, Ripley 1981)
  • Exponential Family Theory (Barndorff-Nielsen 1978)
  • Statistical Physics (Barabási, Watts)
  • Machine Learning:
    • Stochastic blockmodels (Choi, Snijders)
    • Mixed-Membership Models (Airoldi et. al, 2008)
  • Graphical Models?

Edges of Graph are Random Variables

Graphical Model Network Model
Nodes Random Fixed
Edges Fixed, express conditional independence Random, express relation between nodes

Edges of Graph are Random Variables

  • We first consider undirected graphs
  • Let $X_{ij} \in \{ 0, 1\}$ be indicator random variables.
  • $X_{i,j} = X_{j,i} = 1$ if there exists an edge between $i$ and $j$.
  • $X_{ii} = 0$ by convention
  • Set $\{ X_{i,j} : i < j\}$ of indicator random variables uniquely characterizes network

Beta Model

  • Extension of Erdős–Rényi random graph model in which the probability of observing an edge between two nodes is $p$
  • Allows for heterogeneity in propensity of nodes to form ties: $P \left( X_{i,j} = 1\right)$ not all equal to $p$

Beta Model

Models log odds of existences of edge between nodes $i$ and $j$:

$$ \log \frac{P \left( X_{ij} = 1\right)}{1 - P \left( X_{ij} = 1\right)} = \beta_i + \beta_j \quad \forall i \ne j $$ $$ \definecolor{lviolet}{RGB}{114,0,172} \definecolor{lgreen}{RGB}{45,177,93} \definecolor{lred}{RGB}{251,0,29} \definecolor{lblue}{RGB}{18,110,213} \definecolor{circle}{RGB}{217,86,16} \definecolor{average}{RGB}{203,23,206} $$

Beta Model

Models log odds of existences of edge between nodes $i$ and $j$:

$$ \log \frac{ {\color{lred} P \left( X_{ij} = 1\right) }} {1 - {\color{lred} P \left( X_{ij} = 1\right)}} = \beta_i + \beta_j \quad \forall i \ne j $$
  • In simplest form, model for graph with undirected edges and $n$ vertices. Edges are Bernoulli random variables with probabilities $\color{lred}p_{i,j}$

Beta Model

Models log odds of existences of edge between nodes $i$ and $j$:

$$ \log \frac{ {\color{lred} p_{i,j} }} {1 - {\color{lred} p_{i,j} }} = \beta_i + \beta_j \quad \forall i \ne j $$
  • In simplest form, model for graph with undirected edges and $n$ vertices. Edges are Bernoulli random variables with probabilities $\color{lred}p_{i,j}$

MLE

The likelihood function for the $\beta$ model is given by

$$ p(x ; \beta) = \prod_{i < j} p_{i,j}^{x_{i,j}} \left( 1 - p_{i,j} \right)^{1-x_{i,j}} = \prod_{i < j} \left( \frac{ p_{i,j}}{1-p_{i,j}} \right)^{x_{ij}} \left( 1 - p_{i,j} \right) $$

MLE

The likelihood function for the $\beta$ model is given by

$$ p(x ; \beta) = \prod_{i < j} p_{i,j}^{x_{i,j}} \left( 1 - p_{i,j} \right)^{1-x_{i,j}} = \prod_{i < j} \left( \frac{ p_{i,j}}{1-p_{i,j}} \right)^{x_{ij}} \left( 1 - p_{i,j} \right) $$

Written in exponential family form:

$$ p(x ; \beta) = \exp \left( \sum_{i < j} x_{i,j} \log \frac{ p_{i,j}}{1-p_{i,j}} + \log(1+p_{i,j}) \right) $$

Sufficient Statistics

$$ \begin{align} p(x ; \beta) &= \exp \left( \sum_{i < j} x_{i,j} \; {\color{lred} \log \frac{ p_{i,j}}{1-p_{i,j}} } + \sum_{i < j} \log(1+p_{i,j}) \right) \\ &= \exp \left( \sum_{i < j} x_{i,j} \; {\color{lred}\beta_i} + \sum_{i < j} x_{i,j} \; {\color{lred}\beta_j} - A(\beta) \right) \\ &= \exp \left( \sum_{i=1}^n \beta_i \left[ \sum_{j > i} x_{i,j} + \sum_{j < i} x_{i,j} \right] - A(\beta) \right) \end{align} $$

Sufficient Statistics

$$ p(x ; \beta) = \exp \left( \sum_{i=1}^n \beta_i \underbrace{ \left[ \sum_{j > i} x_{i,j} + \sum_{j < i} x_{i,j}\right]}_{{\color{lviolet}d_i}} - A(\beta) \right) $$
  • Minimal sufficient statistics $\color{lviolet} d_i$, the number of neighbors of node $i$ (degree of node $i$)
  • the vector $\color{lviolet} d(x)$ consisting of all $\color{lviolet} d_i$ is called the degree sequence of the graph $x$

Moment Equations for MLE

  • Maximum likelihood estimates are given by solution to equation $\mathbb{E}[{\color{lviolet}d(X)}] = \color{lviolet} d(x)$

Moment Equations for MLE

  • Maximum likelihood estimates are given by solution to equation $\mathbb{E}[{\color{lviolet}d(X)}] = \color{lviolet} d(x)$
  • Hence, $\hat \beta$ found by $$ d_i = \sum_{j \ne i} {\color{circle}\frac{\exp(\hat \beta_i + \hat \beta_j)}{1 + \exp(\hat \beta_i + \hat \beta_j)}},\qquad i=1, \ldots, n $$ as $$ \mathbb{E}[{\color{lviolet}d_i(X)}] = \mathbb{E} \left[ {\color{lviolet} \sum_{j > i} X_{i,j} + \sum_{j < i} X_{i,j} } \right] = \mathbb{E}\left[\sum_{j \ne i} X_{ij} \right] = \sum_{j \ne i} {\color{circle}p_{ij}} $$

Existence of MLE

  • Does the MLE exist?
  • Note that parameter space $\{ \beta_i \}_{i=1, \ldots, n}$ grows with the number of nodes
  • But although we only observe one graph, there are ${n \choose 2} = \frac{n (n-1)}{2}$ indicator variables $X_{ij}$, making estimation tractable most of the time
  • Conditions for Existence of MLE in the $\beta$-model:
Rinaldo, A., Petrović, S., & Fienberg, S. E. (2011). Maximum lilkelihood estimation in the $\beta$-model.

$p_1$-Model

(Holland & Leinhardt 1981)

  • Again $n$ actors, but this time directed edges
  • Decompose X into $n \choose 2$ dyads $D_{ij} = (X_{ij},X_{ji})$ for $i < j$

$p_1$-Model

  • Model the probability of an edge between nodes $i$ and $j$:
  • $$ \begin{align} P(D_{ij} &= (0,0)) \propto \exp \left( 0 \right) \\ P(D_{ij} &= (1,0)) \propto \exp \left( \alpha_i + \beta_j + \theta \right) \\ P(D_{ij} &= (0,1)) \propto \exp \left( \alpha_j + \beta_i + \theta \right) \\ P(D_{ij} &= (1,1)) \propto \exp \left( \alpha_i + \beta_j + \alpha_j + \beta_i + 2\theta + \rho \right) \end{align} $$

$p_1$-Model

  • Model the probability of an edge between nodes $i$ and $j$:
  • $$ \begin{align} \log P(D_{ij} &= (0,0)) = \lambda_{ij} \\ \log P(D_{ij} &= (1,0)) = \lambda_{ij} + \alpha_i + \beta_j + \theta \\ \log P(D_{ij} &= (0,1)) = \lambda_{ij} + \alpha_j + \beta_i + \theta \\ \log P(D_{ij} &= (1,1)) = \lambda_{ij} + \alpha_i + \beta_j + \alpha_j + \beta_i + 2\theta + {\color{lviolet} \rho} \end{align} $$
  • ${\color{lviolet} \rho}$ is force of reciprocation

$p_1$-Model

  • Model the probability of an edge between nodes $i$ and $j$:
  • $$ \begin{align} \log P(D_{ij} &= (0,0)) = \lambda_{ij} \\ \log P(D_{ij} &= (1,0)) = \lambda_{ij} + \alpha_i + \beta_j + {\color{circle}\theta} \\ \log P(D_{ij} &= (0,1)) = \lambda_{ij} + \alpha_j + \beta_i + {\color{circle}\theta} \\ \log P(D_{ij} &= (1,1)) = \lambda_{ij} + \alpha_i + \beta_j + \alpha_j + \beta_i + 2{\color{circle}\theta} + {\color{lviolet} \rho} \end{align} $$
  • ${\color{lviolet} \rho}$ is force of reciprocation
  • ${\color{circle} \theta}$ is density parameter (= governs number of edges in digraph)

$p_1$-Model

  • Model the probability of an edge between nodes $i$ and $j$:
  • $$ \begin{align} \log P(D_{ij} &= (0,0)) = \lambda_{ij} \\ \log P(D_{ij} &= (1,0)) = \lambda_{ij} + {\color{lgreen} \alpha_i} + \beta_j + {\color{circle}\theta} \\ \log P(D_{ij} &= (0,1)) = \lambda_{ij} + {\color{lgreen} \alpha_j} + \beta_i + {\color{circle}\theta} \\ \log P(D_{ij} &= (1,1)) = \lambda_{ij} + {\color{lgreen} \alpha_i} + \beta_j + {\color{lgreen} \alpha_j} + \beta_i + 2{\color{circle}\theta} + {\color{lviolet} \rho} \end{align} $$
  • ${\color{lviolet} \rho}$ is force of reciprocation
  • ${\color{circle} \theta}$ is density parameter (= governs number of edges in digraph)
  • ${\color{lgreen} \alpha_i}$ is productivity parameter, governs how likely it is for node $i$ to have outgoing ties

$p_1$-Model

  • Model the probability of an edge between nodes $i$ and $j$:
  • $$ \begin{align} \log P(D_{ij} &= (0,0)) = \lambda_{ij} \\ \log P(D_{ij} &= (1,0)) = \lambda_{ij} + {\color{lgreen} \alpha_i} + {\color{lviolet} \beta_j} + {\color{circle}\theta} \\ \log P(D_{ij} &= (0,1)) = \lambda_{ij} + {\color{lgreen} \alpha_j} + {\color{lviolet} \beta_i} + {\color{circle}\theta} \\ \log P(D_{ij} &= (1,1)) = \lambda_{ij} + {\color{lgreen} \alpha_i} + {\color{lviolet} \beta_j} + {\color{lgreen} \alpha_j} + {\color{lviolet} \beta_i} + 2{\color{circle}\theta} + {\color{lblue} \rho} \end{align} $$
  • ${\color{lblue} \rho}$ is force of reciprocation
  • ${\color{circle} \theta}$ is density parameter (= governs number of edges in digraph)
  • ${\color{lgreen} \alpha_i}$ is productivity parameter, governs how likely it is for node $i$ to have outgoing ties
  • ${\color{lviolet} \beta_i}$ is attractiveness parameter as it governs how many incoming ties a node has

$p_1$-Model

$$ \begin{align} \log P(D_{ij} &= (0,0)) = \lambda_{ij} \\ \log P(D_{ij} &= (1,0)) = \lambda_{ij} + {\color{lgreen} \alpha_i} + {\color{lviolet} \beta_j} + {\color{circle}\theta} \\ \log P(D_{ij} &= (0,1)) = \lambda_{ij} + {\color{lgreen} \alpha_j} + {\color{lviolet} \beta_i} + {\color{circle}\theta} \\ \log P(D_{ij} &= (1,1)) = \lambda_{ij} + {\color{lgreen} \alpha_i} + {\color{lviolet} \beta_j} + {\color{lgreen} \alpha_j} + {\color{lviolet} \beta_i} + 2{\color{circle}\theta} + {\color{lblue} \rho} \end{align} $$
Likelihood is of exponential form and evaluates to $$ p_1(x) \propto \exp \left( {\color{lblue} \rho} \sum_{ij} x_{ij} x_{ji} + {\color{circle}\theta} x_{++} + \sum_i {\color{lgreen} \alpha_i} x_{i+} + \sum_j {\color{lviolet} \beta_j} x_{+j} \right) $$

Log-Linear Representation

  • Define $$ y_{{\color{blue}i}{\color{circle}j}{\color{average}k}{\color{lgreen}l}} = \begin{cases} 1 & \text{if } D_{{\color{blue}i}{\color{circle}j}} = (x_{{\color{blue}i}{\color{circle}j}}, x_{{\color{circle}j}{\color{blue}i}}) = ({\color{average}k},{\color{lgreen}l}) \\ 0 & \text{otherwise} \end{cases} $$ which yields ${\color{blue} n} \times {\color{circle}n} \times {\color{average}2} \times {\color{lgreen}2}$ array with $y_{iikl}$ equal to zero for all $i$ and $y_{ijkl} = y_{jilk}$.
  • Fienberg & Wasserman (1981) have shown that fitting the $p_1$ model to $x$ corresponds to estimating the no second-factor interaction model on $y$, i.e. the following log-linear model: $$ [12] [13] [24] [14] [23] [34] $$
  • Use of standard iterative proportional fitting

Log-Linear Representation

Corresponding log-linear model: $$ \scriptsize \log y_{ijkl} = u_{12(ij)} + u_{13(ik)} + u_{24(jl)} + u_{14(il)} + u_{23(jk)} + u_{34(kl)}. $$

Here: $$ {\scriptsize \begin{align} u_{12(ij)} &= \lambda_{ij} \\ u_{13(ik)} &= \left( \alpha_i + \theta \right) \times I(k=1) \\ u_{24(jl)} &= \left( \alpha_j + \theta \right) \times I(l=1) \\ u_{14(il)} &= \beta_i \times I(l=1) \\ u_{23(jk)} &= \beta_j \times I(k=1) \\ u_{34(kl)} &= \rho \times I(k=1,l=1) \\ \end{align} } $$

Sufficient Statistics

$$ \begin{align} 1/2 y_{++11} &= \sum_{i < j} x_{ij} x_{ji} &\text{number of mutuals} \\ y_{i+1+} &= x_{i+} &\text{out-degree of node } i \\ y_{+j+1} &= x_{+j} &\text{in-degree of node } j \\ y_{++1+} &= x_{++} &\text{total number of choices} \\ \end{align} $$

ERGMs or $p^*$-models

(Frank & Strauss 1986)

Extend the $p_1$ class by adding topoligical features of the graph, e.g. the mumber of triangles $$ T(x) = \sum_{i,j,k} x_{ij} x_{jk} x_{ki}. $$ by adding an additional parameter multiplied with $T(x)$ to the likelihood of the $p_1$ model: $$ \log p(x) \propto \tau T(x) + \rho m + \theta x_{++} + \sum_i a_i x_{i+} + \sum_j \beta_j x_{+j} $$
  • No dyadic independence anymore, joint density cannot be written as product of individual components
  • ERGMs or $p^*$-models

    (Frank & Strauss 1986)

    • Normalizing contant for $p(x)$ is a sum over all possible graphs $\implies$ intractable to compute
      • Independent-dyad approximation (Strauss & Ikeda, 1990)
      • MCMC approaches (e.g. Snijders, 2002)