\(\newcommand{\R}{\mathbb{R}}\)
\(\newcommand{\N}{\mathbb{N}}\)
\(\newcommand{\Z}{\mathbb{Z}}\)
\(\newcommand{\E}{\mathbb{E}}\)
\(\newcommand{\P}{\mathbb{P}}\)
\(\newcommand{\var}{\text{var}}\)
\(\newcommand{\sd}{\text{sd}}\)
\(\newcommand{\cov}{\text{cov}}\)
\(\newcommand{\cor}{\text{cor}}\)
\(\newcommand{\skw}{\text{skew}}\)
\(\newcommand{\kur}{\text{kurt}}\)

- Random
- 4. Special Distributions
- Discrete Uniform Distributions

Suppose that \( S \) is a finite set. A random variable \( X \) taking values in \( S \) has the uniform distribution on \( S \) if \[ \P(X \in A) = \frac{\#(A)}{\#(S)}, \quad A \subseteq S \]

The discrete uniform distribution is a special case of the general uniform distribution with respect to a measure, in this case counting measure. The distribution corresponds to picking an element of \( S \) *at random*. Most classical, combinatorial probability models are based on underlying discrete uniform distributions. The chapter on Finite Sampling Models explores a number of such models.

The probability density function \( f \) of \( X \) is given by \[ f(x) = \frac{1}{\#(S)}, \quad x \in S \]

This follows from the definition of (discrete) density: \( \P(X \in A) = \sum_{x \in A} f(x) \) for \( A \subseteq S \)

Like all uniform distributions, the discrete uniform distribution on a finite set is characterized by the property of constant density on the set. Another property that all uniform distributions share is invariance under conditioning on a subset.

Suppose that \( R \) is a nonempty subset of \( S \). Then the conditional distribution of \( X \) given \( X \in R \) is uniform on \( R \).

For \( A \subseteq R \), \[ \P(X \in A \mid X \in R) = \frac{\P(X \in A)}{\P(X \in R)} = \frac{\#(A) \big/ \#(S)}{\#(R) \big/ \#(S)} = \frac{\#(A)}{\#(R)} \]

If \( h: S \to \R \) then the expected value of \( h(X) \) is simply the arithmetic average of the values of \( h \): \[ \E[h(X)] = \frac{1}{\#(S)} \sum_{x \in S} h(x) \]

This follows from the change of variables theorem for expected value: \( \E[h(X)] = \sum_{x \in S} f(x) h(x) \).

The entropy of \( X \) depends only on the number of points in \( S \).

The entropy of \( X \) is \( H(X) = \ln[\#(S)] \).

Let \( n = \#(S) \). Then \[ H(X) = \E\{-\ln[f(X)]\} = \sum_{x \in S} -\ln\left(\frac{1}{n}\right) \frac{1}{n} = -\ln\left(\frac{1}{n}\right) = \ln(n) \]

Without some additional structure, not much more can be said about discrete uniform distributions. Thus, suppose that \( n \in \N_+ \) and that \( S = \{x_1, x_2, \ldots, x_n\} \) is a subset of \( \R \) with \( n \) points. We will assume that the points are indexed in order, so that \( x_1 \lt x_2 \lt \cdots \lt x_n \). Suppose that \( X \) has the uniform distribution on \( S \).

The probability density function \( f \) of \( X \) is given by \( f(x) = \frac{1}{n} \) for \( x \in S \).

The distribution function \( F \) of \( X \) is given by

- \( F(x) = 0 \) for \( x \lt x_1 \)
- \( F(x) = \frac{k}{n} \) for \( x_k \le x \lt x_{k+1}\) and \( k \in \{1, 2, \ldots n - 1 \} \)
- \( F(x) = 1 \) for \( x \gt x_n \)

This follows from the definition of the distribution function: \( F(x) = \P(X \le x) \) for \( x \in \R \).

The quantile function \( F^{-1} \) of \( X \) is given by \( F^{-1}(p) = x_{\lceil n p \rceil} \) for \( p \in (0, 1] \).

By definition, \( F^{-1}(p) = x_k \) for \(\frac{k - 1}{n} \lt p \le \frac{k}{n}\) and \(k \in \{1, 2, \ldots, n\} \). It follows that \( k = \lceil n p \rceil \) in this formulation.

The moments of \( X \) are ordinary arithmetic averages.

For \( n \in \N \) \[ \E\left(X^n\right) = \frac{1}{n} \sum_{i=1}^n x_i^n \]

In particular,

The mean and variance of \( X \) are

- \( \mu = \frac{1}{n} \sum_{i=1}^n x_i \)
- \( \sigma^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \mu)^2 \)

We specialize further to the case where the finite subset of \( \R \) is a discrete interval, that is, the points are uniformly spaced.

Suppose that \( n \in \N_+ \) and that \( Z \) has the discrete uniform distribution on \( S = \{0, 1, \ldots, n - 1 \} \). The distribution of \( Z \) is the standard discrete uniform distribution with \( n \) points.

Of course, the results in the previous subsection apply with \( x_i = i - 1 \) and \( i \in \{1, 2, \ldots, n\} \).

The probability density function \( g \) of \( Z \) is given by \( g(z) = \frac{1}{n} \) for \( z \in S \).

Open the Special Distribution Simulation and select the discrete uniform distribution. Vary the number of points, but keep the default values for the other parameters. Note the graph of the probability density function. Run the simulation 1000 times and compare the empirical density function to the probability density function.

The distribution function \( G \) of \( Z \) is given by \( G(z) = \frac{1}{n}\left(\lfloor z \rfloor + 1\right) \) for \( z \in [0, n - 1] \).

Note that \(G(z) = \frac{k}{n}\) for \( k - 1 \le z \lt k \) and \( k \in \{1, 2, \ldots n - 1\} \). Thus \( k - 1 = \lfloor z \rfloor \) in this formulation.

The quantile function \( G^{-1} \) of \( Z \) is given by \( G^{-1}(p) = \lceil n p \rceil - 1 \) for \( p \in (0, 1] \). In particular

- \( G^{-1}(1/4) = \lceil n/4 \rceil - 1 \) is the first quartile.
- \( G^{-1}(1/2) = \lceil n / 2 \rceil - 1 \) is the median.
- \( G^{-1}(3/4) = \lceil 3 n / 4 \rceil - 1 \) is the third quartile.

Note that \(G^{-1}(p) = k - 1\) for \( \frac{k - 1}{n} \lt p \le \frac{k}{n}\) and \(k \in \{1, 2, \ldots, n\} \). Thus \( k = \lceil n p \rceil \) in this formulation.

Open the special distribution calculator and select the discrete uniform distribution. Vary the number of points, but keep the default values for the other parameters. Note the graph of the distribution function. Compute a few values of the distribution function and the quantile function.

For the standard uniform distribution, results for the moments can be given in closed form.

The mean and variance of \( Z \) are

- \( \E(Z) = \frac{1}{2}(n - 1) \)
- \( \var(Z) = \frac{1}{12}(n^2 - 1) \)

Recall that \begin{align} \sum_{k=0}^{n-1} k & = \frac{1}{2}n (n - 1) \\ \sum_{k=0}^{n-1} k^2 & = \frac{1}{6} n (n - 1) (2 n - 1) \end{align} Hence \( \E(Z) = \frac{1}{2}(n - 1) \) and \( \E(Z^2) = \frac{1}{6}(n - 1)(2 n - 1) \). Part (b) follows from \( \var(Z) = \E(Z^2) - [\E(Z)]^2 \).

Open the Special Distribution Simulation and select the discrete uniform distribution. Vary the number of points, but keep the default values for the other parameters. Note the size and location of the mean\(\pm\)standard devation bar. Run the simulation 1000 times and compare the empirical mean and standard deviation to the true mean and standard deviation.

The skewness and kurtosis of \( Z \) are

- \( \skw(Z) = 0 \)
- \( \kur(Z) = \frac{3}{5} \frac{3 n^2 - 7}{n^2 - 1} \)

Recall that \begin{align} \sum_{k=1}^{n-1} k^3 & = \frac{1}{4}(n - 1)^2 n^2 \\ \sum_{k=1}^{n-1} k^4 & = \frac{1}{30} (n - 1) (2 n - 1)(3 n^2 - 3 n - 1) \end{align} Hence \( \E(Z^3) = \frac{1}{4}(n - 1)^2 n \) and \( \E(Z^4) = \frac{1}{30}(n - 1)(2 n - 1)(3 n^2 - 3 n - 1) \). The results now follow from the results on the mean and variance above and the standard formulas for skewness and kurtosis. Of course, the fact that \( \skw(Z) = 0 \) also follows from the symmetry of the distribution.

Note that \( \skw(Z) \to \frac{9}{5} \) as \( n \to \infty \). The limiting value is the skewness of the uniform distribution on an interval.

\( Z \) has probability generating function \( P \) given by \( P(1) = 1 \) and \[ P(t) = \frac{1}{n}\frac{1 - t^n}{1 - t}, \quad t \in \R \setminus \{1\} \]

We now generalize the standard discrete uniform distribution by adding location and scale parameters.

Suppose that \( Z \) has the standard discrete uniform distribution on \( n \) points, and that \( a \in \R \) and \( h \in (0, \infty) \). Then \( X = a + h Z \) has the uniform distribution on \( n \) points with location parameter \( a \) and scale parameter \( h \).

Note that \( X \) takes values in \[ S = \{a, a + h, a + 2 h, \ldots, a + (n - 1) h\} \] so that \( S \) has \( n \) elements, starting at \( a \), with step size \( h \), a discrete interval. In the further special case where \( a \in \Z \) and \( h = 1 \), we have an integer interval. Note that the last point is \( b = a + (n - 1) h \), so we can clearly also parameterize the distribution by the endpoints \( a \) and \( b \), and the step size \( h \). With this parametrization, the number of points is \( n = 1 + (b - a) / h \). The distribution of \( X \) really is uniform:

\( X \) has probability density function \( f \) given by \( f(x) = \frac{1}{n} \) for \( x \in S \)

Recall that \( f(x) = g\left(\frac{x - a}{h}\right) \) for \( x \in S \), where \( g \) is the PDF of \( Z \) given above.

Open the Special Distribution Simulation and select the discrete uniform distribution. Vary the parameters and note the graph of the probability density function. For various values of the parameters, run the simulation 1000 times and compare the empirical density function to the probability density function.

The distribution function \( F \) of \( x \) is given by \[ F(x) = \frac{1}{n}\left(\left\lfloor \frac{x - a}{h} \right\rfloor + 1\right), \quad x \in [a, b] \]

Recall that \( F(x) = G\left(\frac{x - a}{h}\right) \) for \( x \in S \), where \( G \) is the CDF of \( Z \) given above.

The quantile function \( F^{-1} \) of \( X \) is given by \( G^{-1}(p) = a + h \left( \lceil n p \rceil - 1 \right)\) for \( p \in (0, 1] \). In particular

- \( F^{-1}(1/4) = a + h \left(\lceil n/4 \rceil - 1\right) \) is the first quartile.
- \( F^{-1}(1/2) = a + h \left(\lceil n / 2 \rceil - 1\right) \) is the median.
- \( F^{-1}(3/4) = a + h \left(\lceil 3 n / 4 \rceil - 1\right) \) is the third quartile.

Recall that \( F^{-1}(p) = a + h G^{-1}(p) \) for \( p \in (0, 1] \), where \( G^{-1} \) is the quantile function of \( Z \) given above.

Open the special distribution calculator and select the discrete uniform distribution. Vary the parameters and note the graph of the distribution function. Compute a few values of the distribution function and the quantile function.

The mean and variance of \( X \) are

- \( \E(X) = a + \frac{1}{2}(n - 1) h = \frac{1}{2}(a + b) \)
- \( \var(X) = \frac{1}{12}(n^2 - 1) h^2 = \frac{1}{12}(b - a)(b - a + 2 h) \)

Recall that \( \E(X) = a + h \E(Z) \) and \( \var(X) = h^2 \var(Z) \), so the results follow from the corresponding results for the standard distribution, given above.

Note that the mean is the average of the endpoints (and so is the midpoint of the interval \( [a, b] \)) while the variance depends only on the number of points and the step size.

Open the Special Distribution Simulator and select the discrete uniform distribution. Vary the parameters and note the shape and location of the mean/standard deviation bar. For selected values of the parameters, run the simulation 1000 times and compare the empirical mean and standard deviation to the true mean and standard deviation.

The skewness and kurtosis of \( Z \) are

- \( \skw(X) = 0 \)
- \( \kur(X) = \frac{3}{5} \frac{3 n^2 - 7}{n^2 - 1} \)

Recall that skewness and kurtosis are defined in terms of the standard score, and hence are the same for \( X \) as for \( Z \) given above.

\( X \) has moment generating function \( M \) given by \( M(0) = 1 \) and \[ M(t) = \frac{1}{n} e^{t a} \frac{1 - e^{n t h}}{1 - e^{t h}}, \quad t \in \R \setminus \{0\} \]

Note that \( M(t) = \E\left(e^{t X}\right) = e^{t a} \E\left(e^{t h Z}\right) = e^{t a} P\left(e^{t h}\right) \) where \( P \) is the probability generating function of \( Z \) given above.

Since the discrete uniform distribution on a discrete interval is a location-scale family, it is trivially closed under location-scale transformations.

Suppose that \( X \) has the discrete uniform distribution with endpoints \( a \) and \( b \), and step size \( h \). Suppose also that \( c \in \R \) and \( w \in (0, \infty) \). Then \( Y = c + w X \) has the discrete uniform distribution with endpoints \( c + w a \) and \( c + w b \), and step size \( h w \). The number of points is the same for the two distributions.

The uniform distribution on a discrete interval converges to the continuous uniform distribution on the interval with the same endpoints, as the step size decreases to 0.

Suppose that \( X_n \) has the discrete uniform distribution with endpoints \( a \) and \( b \), and step size \( (b - a) / n \), for each \( n \in \N_+ \). Thus there are \( n + 1 \) points. Then the distribution of \( X_n \) converges to the uniform distribution on \( [a, b] \) as \( n \to \infty \).

The CDF \( F_n \) of \( X_n \) is given by \[ F_n(x) = \frac{1}{n} \left\lfloor n \frac{x - a}{b - a} \right\rfloor, \quad x \in [a, b] \] But \( n y - 1 \le \lfloor ny \rfloor \le n y \) for \( y \in \R \) so \( \lfloor n y \rfloor / n \to y \) as \( n \to \infty \). Hence \( F_n(x) \to (x - a) / (b - a) \) as \( n \to \infty \) for \( x \in [a, b] \), and this is the CDF of the uniform distribution on \( [a, b] \).