In this section, we consider a model that leads to interesting and explicit results and is related to certain exponential distributions in Section 5. We start with a general formulation and then successively impose additional assumptions.
Suppose that \(N\) is a random variable in \(\N\) with probability density function \(g\). Let \(\bs{V} = (V_1, V_2, \ldots)\) be a sequence of independent random variables in \(\N_+\), and independent of \(N\), with common probability density function \(h\). Define random variable \(X\) in \(S\) by \(X = \{V_1, V_1 + V_2, \ldots, V_1 + \cdots + V_N\}\).
So \(N = \#(X)\) and in particular, \(X = \emptyset\) if and only if \(N = 0\). Note that \(\bs{V}\) is a sequence of independent copies of a random variable \(V\) in \(\N_+\).
Let \(f\) denote the probability density function of \(X\). Then \(f(\emptyset) = g(0)\). If \(x = \{i_1, i_2, \ldots, i_n\} \in S_+\) with \(i_1 \lt i_2 \lt \cdots \lt i_n\) then \[f(x) = g(n) h(i_1) h(i_2 - i_1) \cdots h(i_n - i_{n - 1})\]
This follows from the assumptions in . As noted earlier \(X = \emptyset\) if and only if \(N = 0\). For \(x \in S_+\) as defined, \begin{align*} \{X = x\} & = \{N = n, V_1 = i_1, V_1 + V_2 = i_2, \ldots, V_1 + \cdots V_n = i_n\}\\ & = \{N = n, V_1 = i_1, V_2 = i_2 - i_1, \ldots, V_n = i_n - i_{n - 1}\} \end{align*}
The model simplifies considerably when \(V\) has a geometric distribution. So in this subsection we assume that \(V\) has the geometric distribution on \(\N_+\) with success parameter \(p \in (0, 1)\), so that \(h(i) = (1 - p)^{i - 1} p\) for \(i \in \N_+\).
Random variable \(X\) has probability density function \(f\) given by \[f(x) = g[\#(x)] p^{\#(x)} (1 - p)^{\max(x) - \#(x)}, \quad x \in S\]
So the distribution of \(X\) depends only \(N = \#(X)\) and \(M = \max(X)\), so it's natural to give the joint distribution of these variables. Recall from Section 1 that \(S_{n, m} = \{x \in S: \#(x) = n, \max(x) = m\}\), defined for \(n = m = 0\) and for \(n \in \N_+\) and \(m \in \{n, n + 1, \ldots\}\). From , \[f(x) = g(n) p^n (1 - p)^{m - n}, \quad x \in S_{n, m}\]
The joint density of \((N, M)\) is given by \begin{align*} \P(M = 0, N = 0) & = g(0) \\ P(N = n, M = m) & = g(n) p^n (1 - p)^{m - n} \binom{m - 1}{n - 1}, \quad n \in \N_+, \, m \in \{n, n + 1, \ldots\} \end{align*} Given \(N = n\) and \(M = m\) where \(n = m = 0\) or \(n \in \N_+\) and \(m \in \{n, n + 1, \ldots\}\), random variable \(X\) is uniformly distributed on \(S_{n, m}\).
This follows from . Recall from Section 1 that \(\#(S_{n, m}) = \binom{m - 1}{n - 1}\) for \(n \in \N_+\) and \(m \in \{n, n + 1, \ldots\}\).
The conditional distribution of \(M\) given \(N\) is next.
The conditional distribution of \(M\) given \(N = n \in \N_+\) is negative binomial on \(\{n, n + 1, \ldots\}\) with stopping parameter \(n\) and success parameter \(p\): \[ \P(M = m \mid N = n) = \binom{m - 1}{n - 1} p^n (1 - p)^{m - n}, \quad k \in \N\]
Of course, given \(N = 0\), random variable \(M = 0\) also (so we can think of this as a degenerate version of the negative binomial distribution). Propositions and provide an alternative way of defining the distribution of \(X\):
Random variable \(X\) can be defined as follows:
The density of \(M\) is given by \begin{align*} \P(M = 0) & = g(0) \\ \P(M = m) & = \sum_{n = 1}^m g(n) \binom{m - 1}{n - 1} p^n (1 - p)^{m - n}, \quad m \in \N_+ \end{align*}
The following result gives some moments of \((M, N)\) in terms of moments of \(N\).
Assume that \(\E(N^2) \lt \infty\). Then
The results follow from and basic results for the negative binomial distribution:
In particular, \(M\) and \(N\) are always positively correlated, regardless of the distribution of \(N\). Moreover, \(\cor(M, N) \to 1\) as \(p \to 1\), hardly surprising since in this case \(M = N\) with probability 1. On the other hand, \[\cor(M, N) \to \sqrt{\frac{\var(N)}{\var(N) + \E(N)}} \quad \text {as } p \to 0 \]
Next we give the reliability function of \(X\) for the covering graph \((S, \upa)\) of the partial order graph \((S, \subseteq)\).
Let \(F_1\) denote the reliability function of \(X\) for \((S, \upa)\). If \(x \in S\) with \(n = \#(x)\) and \(m = \max(x)\) (so that either \(n = m = 0\) or \(n \in \N_+\) and \(m \in \{n, n + 1, \ldots\}\)) then \[ F_1(x) = g(n + 1) p^{n} (1 - p)^{m - (n + 1)} [(m - n) p + (1 - p)] \]
Let \(x \in S\) with \(\#(x) = n\) and \(\max(x) = m\). Then \[F_1(x) = \sum_{i \notin x} f(x \cup\{i\}) = \sum\{f(x \cup\{i\}): i \notin x, \, i \lt m\} + \sum_{i = m + 1}^\infty f(x \cup\{i\}) \] If \(i \notin x\) then \(\#(x \cup \{i\}) = n + 1\). Moreover, if \(i \lt m\) then \(\max(x \cup\{i\}) = \max(x) = m\). Hence \(f(x \cup \{i\}) = g(n + 1) p^{n + 1} (1 - p)^{m - (n + 1)}\) and there are \(m - n\) such values of \(i\). If \(i \gt m\) then of course \(\max(x \cup \{i\}) = i\). Hence \[ F_1(x) = (m - n) g(n + 1) p^{n + 1} (1 - p)^{m - (n + 1)} + \sum_{i = m + 1} ^\infty g(n + 1) p^{n + 1} (1 - p)^{i - (n + 1)}\] Summing the geometric series and simplifying gives the result.
So the rate function \(r_1\) of \(X\) for \((S, \upa)\) is constant on \(S_{n, m} = \{x \in S: \#(x) = n, \, \max(x) = m\}\). That is, \[ r_1(x) = \frac{g(n)}{g(n + 1)} \frac{1 - p} {(m - n) p + (1 - p)}, \quad x \in S_{n, m} \]
We specialize the last subsection further by assuming that \(N = \#(X)\) has the geometric distribution on \(\N\) with success parameter \(q \in (0, \infty)\), so that \(g(n) = (1 - q)^n q\) for \(n \in \N\). The mean and variance are \(\E(N) = (1 - q) / q\) and \(\var(N) = (1 - q) / q^2\).
Random variable \(X\) has probability density funtion \(f\) given as follows: \[ f(x) = q [p (1 - q)]^{\#(x)} (1 - p)^{\max(x) - \#(x)}, \quad x \in S\]
The joint density of \((N, M)\) is given by \begin{align*} \P(N = 0, M = 0) & = q \\ P(N = n, M = m) & = q [p (1 - q)]^n (1 - p)^{m - n} \binom{m - 1}{n - 1}, \quad n \in \N_+, \, m \in \{n, n + 1, \ldots\} \end{align*}
For the next result, recall the modified geometric distribution defined in Section 4.2.
Random variable \(M\) has a modified geometric distribution on \(\N\) with success parameters \(q\) and \( p q \). The density function is given by \begin{align*} \P(M = 0) & = q \\ \P(M = m) & = (1 - q) p q (1 - p q)^{m - 1}, \quad m \in \N_+ \end{align*}
Moments
So \(\cor(M, N) \to 1\) as \(p \to 1\) for fixed \(q \in (0, 1)\) or as \(q \to 0\) for fixed \(p \in (0, 1)\). In addition, \(\cor(M, N) \to 1 / \sqrt{2 - p}\) as \(q \to 1\).
The app below is a simulation of the distribution of \(X\) on \((S, \cdot)\). The parameters \(p\) and \(q\) can be varied with scrollbars. The probability density functions of \(N = \#(X)\) and \(M = \max(X)\) are shown in the distribution plots and tables.
Next we formulate an alternate model that is equivalent to the one we are studying here. This model leads to a surprisingly easy way to compute the reliability function for \((S, \subseteq)\).
Suppose that \(U\) has the geometric distribution on \(\N\) with success parameter \(\alpha \in (0, 1)\). Random variable \(\hat X\) in \(S\) is defined as follows: Given \(U = n\), we place element \(i \in \hat X\) independently with probability \(\beta \in (0, 1)\) for \(i \in \{1, 2, \ldots, n\}\).
So \(\{1, 2, \ldots, U\}\) can be thought of as a random population and then \(\hat X\) is obtained by sampling in an IID fashion from this population.
Random variable \(\hat X\) has probability density function \(\hat f\) given by \[ \hat f(x) = \frac{\alpha}{\alpha + \beta - \alpha \beta} [(1 - \alpha) \beta]^{\#(x)} [(1 - \alpha) (1 - \beta)]^{\max(x) - \#(x)}, \quad x \in S\]
From the model assumptions in , \begin{align*} \hat f(x) & = \P(\hat X = x) = \sum_{n = \max(x)}^\infty \P(U = n) \P(\hat X = x \mid U = n) = \sum_{n = \max(x)}^\infty (1 - \alpha)^n \alpha \beta^{\#(x)} (1 - \beta)^{n - \#(x)} \\ & = \alpha \left(\frac{\beta} {1 - \beta}\right)^{\#(x)} \sum_{n = \max(x)}^\infty [(1 - \alpha) (1 - \beta)]^n = \alpha \left(\frac{\beta} {1 - \beta}\right)^{\#(x)} \frac{[(1 - \alpha) (1 - \beta)]^{\max(x)}}{1 - (1 - \alpha) (1 - \beta)} \end{align*} Simplifying gives the result.
Our main result is that this model leads to the same family of distributions.
The distribution of \(\hat X\) in is the same as the distribution of \(X\) in with \(p = \alpha + \beta - \alpha \beta\) and \(q = \alpha / (\alpha + \beta - \alpha \beta)\). Equivalently, \(\alpha = p q\) and \(\beta = p (1 - q) / (1 - p q)\).
This follows simply by matching the parameters.
So as promised, the two parametric families of distributions are the same.
The reliability function \(\hat F\) of \(\hat X\) for \((S, \subseteq)\) is given by \[\hat F(x) = \beta^{\#(x)} (1 - \alpha)^{\max(x)}, \quad x \in S\]
We can now return to our main modeling formulation for random variable \(X\).
The reliability function \(F\) of \(X\) for \((S, \subseteq)\) is given by \[F(x) = [p (1 - q)]^{\#(x)} (1 - p q)^{\max(x) - \#(x)}, \quad x \in S\]
The rate function \(r\) of \(X\) for \((S, \subseteq)\) is given by \[r(x) = q \left(\frac{1 - p}{1 - p q}\right)^{\max(x) - \#(x)}, \quad x \in S\]
So \(X\) has constant rate \(q [(1 - p) / (1 - p q)]^k\) on the set \(S_k = \{x \in S: \max(x) - \#(x) = k\}\) for \(k \in \N\). In Section 5 we will see that \(X\) also has the memoryless property on \(S_k\), relative to a semigroup operation that generates the subset partial order.
Next we explore the events \(\{i \in X\}\) for \(i \in \N_+\).
Suppose again that \(X\) has the distribution in with parameters \(p, \, q \in (0, 1)\). Then
Property (b) is reminiscent of a homogeneous Markov property. This property implies that the events \(\{\{i \in X\}: i \in \N_+\}\) are positively correlated, but asymptotically uncorrelated. In fact the correlation decays exponentially since \[\P(i + j \in X \mid i \in X) = \P(j \in X) = p (1 - q)(1 - p q)^{j - 1} \to 0 \text{ as } j \to \infty\]
Next, recall the the hitting probability function \(G\) of \(X\), introduced in Section 1, is defined by \[G(x) = \P(X \cap x \ne \emptyset), \quad x \subseteq \N_+\]
Suppose again that \(X\) has the distribution in definition with parameters \(p, \, q \in (0, 1)\). Then \[G(x) = \frac{p (1 - q)}{1 - p q} \sum_{i = 1}^{\#(x)} \left(\frac{1 - p} {1 - p q}\right)^{i - 1} (1 - p q)^{x(i)}, \quad x \subseteq \N_+\] where as usual, \(x(i)\) is the \(i\)th smallest element of \(x\).
It's easier to use the alternative formulation in with parameters \(\alpha, \, \beta \in (0, 1)\). Suppose first that \(x\) is finite (so that \(x \in S\)). Recall from Section 1 that \[G(x) = \sum_{k = 1}^{\#(x)} (-1)^{k - 1} \sum_{y \subseteq x, \#(y) = k} F(y)\] where \(F\) is the reliability function of \(X\) for \((S, \subseteq)\). Hence, using we have. \begin{align*} G(x) &= \sum_{k = 1}^{\#(x)} (-1)^{k - 1} \sum_{y \subseteq x, \#(y) = k} \beta^{\#(y)} (1 - \alpha)^{\max(y)} \\ &= \sum_{k = 1}^{\#(x)} (-1)^{k - 1} \beta^k \sum_{i = k}^{\#(x)} \sum \{(1 - \alpha)^{x(i)}: y \subseteq x, \#(y) = k, \max(y) = x(i)\} \\ &= \sum_{k = 1}^{\#(x)} (-1)^{k - 1} \beta^k \sum_{i = k}^{\#(x)} \binom{i - 1}{k - 1} (1 - \alpha)^{x(i)} = \sum_{i = 1}^{\#(x)} (1 - \alpha)^{x(i)} \sum_{k = 1}^i \binom{i - 1}{k - 1} (-1)^{k - 1} \beta^k \\ &= \sum_{i = 1}^{\#(x)} \beta (1 - \beta)^{i - 1} (1 - \alpha)^{x(i)} \end{align*} Replacing \(\alpha\) with \(p q\) and \(\beta\) with \(p (1 - q) / (1 - p q)\) gives the result. For infinite \(x\), the formula holds by the continuity theorem.
As noted in the details, the alternate formulation in with parameters \(\alpha, \, \beta \in (0, 1)\) gives a simpler result: \(G(x) = \sum_{i = 1}^{\#(x)} \beta (1 - \beta)^{i - 1} (1 - \alpha)^{x(i)}\). The following hitting probabilities provide a check on our work.
Suppose again that \(X\) has the distribution in definition with parameters \(p, \, q \in (0, 1)\). Then
Suppose again that \(X\) has the distribution in with parameters \(q, \, p \in (0, 1)\). Find the probability that
Let \(e\) denote the set of even positive intgers and \(o\) the set of odd positive integers. The computations involve standard geometric series.