General Exponential Families

\(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\R}{\mathbb{R}}\) \( \newcommand{\N}{\mathbb{N}} \) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\var}{\text{var}}\) \(\newcommand{\sd}{\text{sd}}\) \(\newcommand{\skw}{\text{skew}}\) \(\newcommand{\kur}{\text{kurt}}\) \( \newcommand{\bs}{\boldsymbol} \)

Basic Theory

Definition

We start with a probability space \( (\Omega, \mathscr F, \P) \) as a model for a random experiment. So as usual, \( \Omega \) is the set of outcomes, \( \mathscr F \) the \( \sigma \)-algebra of events, and \( \P \) the probability measure on the sample space \( (\Omega, \mathscr F) \). For the general formulation that we want in this section, we need two additional spaces, a measure space \( (S, \mathscr S, \mu) \) (where the probability distributions will live) and a measurable space \( (T, \mathscr T) \) (serving the role of a parameter space). Typically, these spaces fall into our two standard categories. Specifically, the measure space is usually one of the following:

Similarly, the parameter space \( (T, \mathscr T) \) is usually either discrete, so that \( T \) is countable and \( \mathscr T \) the collection of all subsets of \( T \), or Euclidean so that \( T \) is a sufficiently nice Borel measurable subset of \( \R^m \) for some \( m \in \N_+ \) and \( \mathscr T \) is the \( \sigma \)-algebra of Borel measurable subsets of \( T \).

Suppose now that \(X\) is random variable defined on the probability space, taking values in \(S\), and that the distribution of \(X\) depends on a parameter \(t \in T\). For \( t \in T \) we assume that the distribution of \( X \) has probability density function \(f_t\) with respect to \( \mu \).

for \( k \in \N_+ \), the family of distributions of \(X\) is a \(k\)-parameter exponential family if \[ f_t(x) = \alpha(t) \, g(x) \, \exp \left( \sum_{i=1}^k \beta_i(t) \, h_i(x) \right); \quad x \in S, \, t \in T\] where \(\alpha\) and \(\left(\beta_1, \beta_2, \ldots, \beta_k\right)\) are measurable functions from \( T \) into \( \R \), and where \(g\) and \(\left(h_1, h_2, \ldots, h_k\right)\) are measurable functions from \( S \) into \( \R \). Moreover, \(k\) is assumed to be the smallest such integer.

The parameters \(\left(\beta_1(t), \beta_2(t), \ldots, \beta_k(t)\right)\) are called the natural parameters of the distribution.
the random variables \(\left(h_1(X), h_2(X), \ldots, h_k(X)\right)\) are called the natural statistics of the distribution.

Although the definition may look intimidating, exponential families are useful because many important theoretical results in statistics hold for exponential families, and because many special parametric families of distributions turn out to be exponential families. It's important to emphasize that the representation of \( f_t(x) \) given in the definition must hold for all \( x \in S \) and \( t \in T \). If the representation only holds for a set of \( x \in S \) that depends on the particular \( t \in T \), then the family of distributions is not a general exponential family.

The next result shows that if we sample from the distribution of an exponential family, then the distribution of the random sample is itself an exponential family with the same natural parameters.

Suppose that the distribution of random variable \(X\) is a \(k\)-parameter exponential family with natural parameters \((\beta_1(t), \beta_2(t), \ldots, \beta_k(t))\), and natural statistics \((h_1(X), h_2(X), \ldots, h_k(X))\). Let \(\bs{X} = (X_1, X_2, \ldots, X_n)\) be a sequence of \(n\) independent copies of \(X\). Then \(\bs X\) is a \(k\)-parameter exponential family with natural parameters \((\beta_1(t), \beta_2(t), \ldots, \beta_k(t))\), and natural statistics \[ u_j(\boldsymbol{X}) = \sum_{i=1}^n h_j(X_i), \quad j \in \{1, 2, \ldots, k\} \]

Details:

Let \( f_t \) denote the PDF of \( X \) corresponding to the parameter value \( t \in T \), so that \( f_t(x) \) has the representation given in the definition in for \( x \in S \) and \( t \in T \). Then for \( t \in T \), \( \bs X = (X_1, X_2, \ldots, X_n) \) has PDF \( g_t \) given by \[ g_t(x_1, x_2, \ldots, x_n) = f_t(x_1) f_t(x_2) \cdots f_t(x_n), \quad (x_1, x_2, \ldots, x_n) \in S^n \] Substituting and simplifying gives the result.

Examples and Special Cases

Special Distributions

Many of the special distributions studied in this chapter are general exponential families, at least with respect to some of their parameters. On the other hand, most commonly, a parametric family fails to be a general exponential family because the support set depends on the parameter. The following theorems give a number of examples. Details will be given in the individual sections.

The Bernoulli distribution is a one-parameter exponential family in the success parameter \( p \in [0, 1] \)

The beta distiribution is a two-parameter exponential family in the shape parameters \( a \in (0, \infty) \), \( b \in (0, \infty) \).

The beta prime distribution is a two-parameter exponential family in the shape parameters \( a \in (0, \infty) \), \( b \in (0, \infty) \).

The binomial distribution is a one-parameter exponential family in the success parameter \( p \in [0, 1] \) for a fixed value of the trial parameter \( n \in \N_+ \).

The chi-square distribution is a one-parameter exponential family in the degrees of freedom \( n \in (0, \infty) \).

The exponential distribution is a one-parameter exponential family (appropriately enough), in the rate parameter \( r \in (0, \infty) \).

The gamma distribution is a two-parameter exponential family in the shape parameter \( k \in (0, \infty) \) and the scale parameter \( b \in (0, \infty) \).

The geometric distribution is a one-parameter exponential family in the success probability \( p \in (0, 1) \).

The half normal distribution is a one-parameter exponential family in the scale parameter \( \sigma \in (0, \infty) \)

The Laplace distribution is a one-parameter exponential family in the scale parameter \( b \in (0, \infty) \) for a fixed value of the location parameter \( a \in \R \).

The Lévy distribution is a one-parameter exponential family in the scale parameter \( b \in (0, \infty) \) for a fixed value of the location parameter \( a \in \R \).

The logarithmic distribution is a one-parameter exponential family in the shape parameter \( p \in (0, 1) \)

The lognormal distribution is a two parameter exponential family in the shape parameters \( \mu \in \R \), \( \sigma \in (0, \infty) \).

The Maxwell distribution is a one-parameter exponential family in the scale parameter \( b \in (0, \infty) \).

The \( k \)-dimensional multinomial distribution is a \( k \)-parameter exponential family in the probability parameters \( (p_1, p_2, \ldots, p_k) \) for a fixed value of the trial parameter \( n \in \N_+ \).

The \( k \)-dimensional multivariate normal distribution is a \( \frac{1}{2}(k^2 + 3 k) \)-parameter exponential family with respect to the mean vector \( \bs{\mu} \) and the variance-covariance matrix \( \bs{V} \).

The negative binomial distribution is a one-parameter exponential family in the success parameter \( p \in (0, 1) \) for a fixed value of the stopping parameter \( k \in \N_+ \).