Probability on Free Semigroups

\(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\) \(\newcommand{\ms}{\mathscr}\) \(\newcommand{\rta}{\rightarrow}\) \(\newcommand{\lfa}{\leftarrow}\) \(\newcommand{\upa}{\uparrow}\) \(\newcommand{\Upa}{\Uparrow}\) \(\newcommand{\lfrta}{\leftrightarrow}\) \(\newcommand{\nea}{\nearrow}\) \(\newcommand{\sea}{\searrow}\) \(\newcommand{\nwa}{\nwarrow}\) \(\newcommand{\swa}{\swarrow}\) \(\newcommand{\Rta}{\Rightarrow}\) \(\newcommand{\bs}{\boldsymbol}\)

Suppose now that \((S, \cdot)\) is the free semigroup over a countable alphabet \(I\) as described in Section 2. The corresponding graph \((S, \preceq)\) is a partial order graph whose covering graph \((S, \upa)\) is a rooted tree. So all of the results in Section 2 apply to the graphs \((S, \preceq)\), \((S, \prec)\), \((S, \upa)\), and \((S, \Upa)\) (the reflexive closure of \((S, \upa)\)). We will review some of these results using the notation of the free semigroup.

Constant Rate Distributions

Suppose that \(X\) is a random variable in \(S\) with density function \(f\). By definition, the reliability function \(F\) of \(X\) for \((S, \preceq)\) is given by \[F(x) = \P(X \succeq x) = \sum_{x \preceq y} f(y) = \sum_{t \in S} f(x t), \quad x \in S\]

A function \(F: S \to [0, \infty]\) is a reliability function for \((S, \preceq)\) if and only if

\(F(e) = 1\)
\(\sum_{i \in I} F(x i) \le F(x)\) for \(x \in S\)
\(\sum_{d(x) = n} F(x) \to 0\) as \(n \to \infty\)

In this case, the density function \(f\) is given by \[f(x) = F(x) - \sum_{i \in I} F(x i), \quad x \in S\]

For \(\alpha \in (0, 1)\), the following recursive scheme defines a probability density function \(f\) on \(S\):

\(f(e) = \alpha\)
If \(f(x)\) is defined for some \(x \in S\) then define \(f(x i)\) for \(i \in I\) arbitrarily, subject to the conditions
1. \(f(x i) \gt 0\) for \(i \in I\)
2. \(\sum_{i \in I} f(x i) = (1 - \alpha) f(x)\)

Suppose that \(X\) has density function \(f\) with parameter \(\alpha \in (0, 1)\) as constructed in . Then

\(X\) has constant rate \(\alpha\) for \((S, \preceq)\)
\(X\) has constant rate \(1 / (1 - \alpha)^n\) on \((S, \upa^n)\) for each \(n \in \N\).
\(X\) has constant rate \(1 / (2 - \alpha)\) on \((S, \Upa)\).
\(X\) has constant rate \(\alpha / (1 - \alpha)\) on \((S, \prec)\).

Suppose again that \(X\) has density function \(f\) with parameter \(\alpha \in (0, 1)\) as constructed in . Then \(N = d(X)\), the number of letters in \(X\), has the geometric distribution on \(\N\) with parameter \(\alpha\).

Exponential Distributions

Since we have a semigroup and not just a rooted tree, we are particularly interested in exponential distributions. Let \(P_I\) denote the collection of probability density functions \(\bs{p} = (p_i: i \in I)\) on \(I\) that have complete support, so that \(p_i \gt 0\) for \(i \in I\) and \(\sum_{i \in I} p_i = 1\). Recall some definitions from Section 2: \(d_i(x)\) is the number of times that letter \(i \in I\) occurs in the word \(x \in S\), and \(d(x) = \sum_{i \in I} d_i(x)\) is the length of \(x\). Also \[\N_I = \left\{\bs n = (n_i: i \in I): n_i \in \N \text{ for } i \in I \text{ and } n_i = 0 \text{ for all but finitely many } i \in I \right\} \] and \(C(\bs n)\) is the multinomial coefficient of \(\bs n \in \N_I\): \[C(\bs{n}) = \#\left\{x \in S: d_i(x) = n_i \text{ for } i \in I\right\} = \frac{\left(\sum_{i \in I} n_i \right)!}{\prod_{i \in I} n_i!}, \quad \bs n = (n_i: i \in I) \in \N_I\]

Random variable \(X\) has an exponential distribution on \((S, \cdot)\) if and only if \(X\) has density function \(f\) of the form \[f(x) = \alpha (1 -\alpha)^{d(x)} \prod_{i \in I} p_i^{d_i(x)}, \quad x \in S\] where \(\alpha \in (0, 1)\) is the rate constant, and where \(\bs p = (p_i: i \in I) \in P_I\).

Details:

We apply results from Section 2.5. Let \(F\) denote the reliability function of \(X\) and let \(F(i) = \beta_i \in (0, 1)\) for \(i \in I\). The memoryless property requires that \[F(x) = \prod_{i \in I} \beta_i^{d_i(x)}, \quad x \in S\] The constant rate property requires that \(\sum_{x \in S} F(x) \lt \infty\) (and the reciprocal of this sum is the rate parameter). But from the multinomial theorem \[\sum_{x \in S} F(x) = \sum_{n = 0}^\infty \sum\left\{ C(\bs n) \prod_{i \in I} \beta_i^{n_i}: \bs n \in \N_I, \, \sum_{i \in I} n_i = n \right\} = \sum_{n = 0}^\infty \beta^n\] where \(\beta = \sum_{i \in I} \beta_i\). Hence we must have \(\beta \in (0, 1)\) and the rate constant is \(1 - \beta\). So \(X\) has density function \(f\) given by \[f(x) = (1 - \beta) \prod_{i \in I} \beta_i^{d_i(x)}, \quad x \in S\] Finally, we re-define the parameters: let \(\alpha = 1 - \beta\) and let \(p_i = \beta_i / \beta\) for \(i \in I\). Then \[f(x) = \alpha \prod_{i \in I} [p_i (1 - \alpha)]^{d_i(x)} = \alpha (1 - \alpha)^{d(x)} \prod_{i \in I} p_i^{d_i(x)}, \quad x \in S\]

In particular, in the free semigroup, every memoryless distribution has constant rate and hence is exponential. The converse is not true. The exponential distribution with parameters \(\alpha \in (0, 1)\) and \((p_i: i \in I) \in P_I\) corresponds to choosing \[f(x i) = (1 - \alpha) p_i f(x), \quad x \in S, \, i \in I\] in . But of course the family of constant rate distributions in this proposition includes many others not of this type. Here is a simple example:

Consider the free binary semigroup, so that \(I = \{0, 1\}\) and the elements of \(S = I^*\) are bit strings. Let \(a \gt 0\), \(b \gt 0\), \(a \neq b\), and \(a + b \lt 1\). Define \(F: S \to (0,\,1]\) by the recursion scheme in as follows: \begin{align*} F(e) &= 1 \\ F(0) &= b, \, F(1) = a\\ F(x 0) &= a F(x), \, F(x 1) = b F(x), \quad x \ne e \end{align*} Then \(F\) is a reliability function relative to \((S, \preceq)\) for a distribution with constant rate \(1 - a - b\) but this distribution is not exponential for \((S, \cdot)\) or even a mixture of exponential distributions.

There is a simple interpretation of the exponential distribution in terms of a random product of independent, identically distributed letters.

Suppose that \(\bs U = (U_1, U_2, \ldots)\) is a sequence of independent copies of \(U\), where \(U\) is a random variable in \(I\) with density function \(\bs p \in P_I\). Let \(N\) be independent of \(\bs U\) and have a geometric distribution on \(\N\) with success parameter \(\alpha \in (0, 1)\). Let \(X\) be the random variable in \(S\) defined by \(X = U_1 U_2 \cdots U_N\). Then \(X\) has the exponential distribution with parameters \(\alpha\) and \(\bs p\).

Details:

Let \(x = i_1 i_2 \cdots i_n \in S\) where \(n \in \N\) and \((i_1, i_2, \ldots, i_n) \in I^n\) Then \begin{align*} \P(X = x) & = \P(X = x, N = n) = \P(X = x \mid N = n) \P(N = n) \\ &= \P(U_1 = i_i, U_2 = i_2 \ldots U_n = i_n) \alpha (1 - \alpha)^n = \P(U_1 = i_1) \P(U_2 = i_2) \cdots \P(U_n = i_n) \alpha (1 - \alpha)^n \\ &= \alpha (1 - \alpha)^{d(x)} \prod_{i \in I} p_i^{d_i(x)} \end{align*}

It's worth noting again that the number of letters \(N\) has the exponential distribution on \((\N, +)\) with rate \(\alpha\). The exponential distribution on \((S, \cdot)\) is clearly invariant under a permutation of the letters. If \(X = U_1 U_2 \cdots U_N\) has the exponential distribution in and if \((V_1, V_2, \ldots, V_N)\) is a permutation of \((U_1, U_2, \cdots, U_N)\) then \(Y = V_1 V_2 \ldots V_N\) has the same exponential distribution.

Suppose that \(X\) has the exponential distribution on \((S, \cdot)\). with parameters \(\alpha \in (0, 1)\) and \(\bs p = (p_i: i \in I) \in P_I\).

For \(n \in \N\), the distribution of \([d_i(X): i \in I]\) given \(d(X) = n\) is multinomial with parameters \(n\) and \(\bs p\): \[\P[d_i(X) = n_i, \, i \in I \mid d(X) = n] = C(\bs n) \prod_{i \in I} p_i^{n_i}, \quad \bs n = (n_i: i \in I) \in \N_I, \; \sum_{i \in I} n_i = n\]
The distribution of \(d_i(X)\) is exponential on \((\N, +)\) (that is, geometric) with rate parameter \(\alpha / [\alpha + (1 - \alpha) p_i]\) for each \(i \in I\).

Suppose again that \(X\) has the exponential distribution on \((S, \cdot)\). with parameters \(\alpha \in (0, 1)\) and \(\bs p = (p_i: i \in I) \in P_I\). Random variable \(X\) maximizes entropy over all random variables \(Y \in S\) with \[\E[d_i(Y)] = \E[d_i(X)] = \frac{1 - \alpha}{\alpha} p_i, \quad i \in I\]

Suppose again that \(X\) has the exponential distribution on \((S, \cdot)\) with parameters \(\alpha \in (0, 1)\) and \(\bs p \in P_I\). Then \(X\) has a compound Poisson distribution.

Details:

Recall from that \(X\) can be decomposed as \[X = U_1 U_2 \cdots U_N\] where \(\bs U = (X_1, X_2, \ldots)\) are independent and identically distributed on the alphabet \(I\), with probability density function \((p_i: i \in I)\), and where \(N\) is independent of \(\bs U\) and has the geometric distribution on \(\N\) with success parameter \(\alpha\). (That is, the exponential distribution on \((\N, +)\) with rate \(\alpha\).) But from Section 4.1, \(N\) has a compound Poisson distribution and can be written in the form \[N = M_1 + M_2 + \cdots + M_K\] where \(\bs M = (M_1, M_2, \ldots)\) are independent and identically distributed on \(\N_+\) with the logarithmic distribution \[\P(M = n) = -\frac{(1 - \alpha)^n}{n \ln \alpha}, \quad n \in \N_+\] and where \(K\) is independent of \(\bs M\) and has the Poisson distribution with parameter \(-\ln \alpha\). It follows that we can take \(\bs X\), \(\bs M\), and \(K\) mutually independent, and thus \[X = V_1 V_2 \cdots V_K\] where \(V_i = U_{M_{i-1} + 1} \cdots U_{M_i}\) for \(i \in \N_+\) (and with \(M_0 = 1\)). Note that \(\bs V = (V_1, V_2, \ldots)\) are independent and identically distributed on \(S\) and thus \(X\) has a compound Poisson distribution.

So from it follows that \(X\) also has an infinitely divisible distribution on \((S, \cdot)\).

From , it's easy to construct the random walk on \((S, \cdot)\) corresponding to the exponential distribution with parameters \(\alpha \in (0, 1)\) and \(\bs p = (p_i: i \in I) \in P_I\). Let \(\bs U = (U_1, U_2, \ldots)\) be independent variables in \(I\) with common density \(\bs p\) so that \(\P(U_j = i) = p_i\) for \(i \in I\) and \(j \in \N_+\). Let \(\bs J = (J_1, J_2, \ldots)\) be independent variables each with the geometric distribution on \(\N\) with rate \(\alpha\), and with \(\bs J\) independent of \(\bs U\). Thus \(K_n = \sum_{i=1}^n J_i\) has the negative binomial distribution on \(\N\) with parameters \(\alpha\) and \(n\) for \(n \in \N_+\).

Let \(Y_n = U_1 \cdots U_{K_n}\) for \(n \in \N_+\). Then \(\bs Y = (Y_1, Y_2, \ldots)\) is the random walk on \((S, \cdot)\) associated with the exponential distribution. For \(n \in \N_+\), the probability density function \(f_n\) of \(X_n\) is given by \[f_n(x) = \binom{d(x) + n - 1}{n - 1} \alpha^n (1 - \alpha)^{d(x)} \prod_{i \in I} p_i^{d_i(x)}, \quad x \in S\]

Of course, this also follows from the general theory. From our standard most random interpretation, observing \(Y_{n+1} = x\) for \(n \in \N_+\) and \(x \in S\) gives no information about the increasing sequence of random prefixes \((Y_1, Y_2, \ldots, Y_n)\) of \(x\).

Suppose that \(\bs Y = (Y_1, Y_2, \ldots)\) is the random walk on \((S, \cdot)\) associated with the exponential distribution with parameters \(\alpha \in (0, 1)\) and \(\bs p = (p_i: i \in I) \in P_I\). Let \(n \in \N_+\).

The conditional distribution of \([d_i(Y_n): i \in I]\) given \(d(Y_n) = m \in \N\) is multinomial with parameters \(m\) and \(\bs p\).
The distribution of \(d_i(Y_n)\) is negative binomial with parameters \(\alpha / [\alpha + (1 - \alpha) p_i]\) and \(n\) for each \(i \in I\).

Our final discussion concerns quotient spaces as studied in Section 2.8. The most natural way to form a sub-semigroup of \((S, \cdot)\) is to start with a proper subset of letters \(J \subset I\) and then consider the set of finite words \(T = J^*\) with letters in \(J\). Clearly \((T, \cdot)\) is a complete sub-semigroup of \((S, \cdot)\), and both are free semigroups. The quotient space \(S / T\) consists of the empty word \(e\) and words \(x \in S\) with first letter in \(I \setminus J\). The assumptions in Section 2.8 are satisfied, so \(x \in S\) has the unique factoring \(x = y z\) with \(y \in T\) and \(z \in S / T\). Clearly \(y\) is the largest initial prefix of \(x\) with letters in \(J\) and \(z\) is the corresponding suffix. The following result follows easily from the general theory:

Suppose that \(\bs X\) has the exponential distribution on \((S, \cdot)\) with parameters \(\alpha \in (0, 1)\) and \(\bs p = (p_i: i \in I) \in P_I\), and let \(p_J = \sum_{j \in J} p_j\). Consider the factoring \(X = Y Z\) where \(Y\) takes values in \(T\) and \(Z\) takes values in \(S / T\). Then

The distribution of \(Y\) is the same as the conditional distribution of \(X\) given \(X \in T\). This distribution is exponential on \((T, \cdot)\) with parameters \(1 - (1 - \alpha)p_J\) and \(\bs q = (q_j: j \in J) \in P_J\) where \(q_j = p_j / p_J\) for \(j \in J\).
The distribution of \(Z\) is the same as the conditional distribution of \(X\) given \(X \in S / T\), with density function \(h\) given by \[h(x) = \frac{\alpha (1 - \alpha)^{d(x)} \prod_{i \in I} p_i^{d_i(x)}}{1 - (1 - \alpha p_J)}, \quad x \in S / T\]
\(Y\) and \(Z\) are independent.

Details:

The results follow from the general theory in Section 2.8, except for the form of the density functions. We use the construction in Proposition .

Conditioning on \(N\) we have \[\P(X \in T) = \sum_{n=0}^\infty \P(U_1 \cdots U_n \in T \mid N = n) \alpha (1 - \alpha)^n = \sum_{n=0}^\infty p_J^n \alpha (1 - \alpha)^n = \frac{\alpha}{1 - (1 - \alpha) p_J}\] Hence the conditional distribution of \(X\) given \(X \in T\) has density \(g\) defined by \begin{align*} g(x) &= \frac{f(x)}{\P(X \in T)} = [1 - (1 - \alpha)p_J](1 - \alpha)^{d(x)} \prod_{i \in I} p_i^{d_i(x)} \\ &= [1 - (1 - \alpha) p_J][(1 - \alpha) p_J]^{d(x)} \prod_{j \in J} \left(\frac{p_j}{p_J}\right)^{d_j(x)}, \quad x \in T \end{align*} For the last equation, note that if \(x \in T\) then \(d_i(x) = 0\) for \(i \in I \setminus J\).
Note that \(X \notin S / T\) if and only if \(N \gt 0\) and \(U_1 \in J\). Hence \(\P(X \in S/T) = 1 - (1 - \alpha) p_J\). So the conditional distribution of \(X\) given \(X \in S/T\) has density \(h\) defined by \[h(x) = \frac{f(x)}{\P(X \in S / T)} = \frac{\alpha (1 - \alpha)^{d(x)} \prod_{i \in I} p_i^{d_i(x)}}{1 - (1 - \alpha) p_J}, \quad x \in S / T\]

Simulation Exercises

The app below simulates the exponential distribution on the binary semigroup (the free semigroup over the alphabet \(\{0, 1\}\) as described in Section 2). The parameter \(\alpha\) is the constant rate and the parameter \(p\) is the probability that each letter is 1. Random variable \(N\) is the length of the random word \(X\) and random variable \(M\) is the number of 1s in \(X\).

The app below simulates the exponential distribution on the genetic semigroup (the free semigroup over the alphabet \(\{A, C, G, T\}\) as described in Section 2). The rate parameter \(\alpha\) can be varied with the scrollbar, and the letter probabilities \(p_A\), \(p_C\), \(p_G\), and \(p_T\) can be changed with the input controls. Click the Set button after changing the probabilities. If the probabilities do not sum to 1 then they will be rescaled so that they do. Random variable \(N\) is the length of the random word \(X\).