The zeta distribution is used to model the size or ranks of certain types of objects randomly chosen from certain types of populations. Typical examples include the frequency of occurrence of a word randomly chosen from a text, or the population rank of a city randomly chosen from a country. The zeta distribution is also known as the Zipf distribution, in honor of the American linguist George Zipf.
The Riemann zeta function \(\zeta\), named after Bernhard Riemann, is defined as follows: \[ \zeta(a) = \sum_{n=1}^\infty \frac{1}{n^a}, \quad a \in (1, \infty) \]
You might recall from calculus that the series in the zeta function converges for \(a \gt 1\) and diverges for \(a \le 1\).
The zeta function satifies the following properties:
The zeta function is transcendental, and most of its values must be approximated. However, \(\zeta(a)\) can be given explicitly for even integer values of \(a\); in particular, \(\zeta(2) = \frac{\pi^2}{6}\) and \(\zeta(4) = \frac{\pi^4}{90}\).
The zeta distribution with shape parameter \( a \in (1, \infty) \) is a discrete distribution on \( \N_+ \) with probability density function \( f \) given by. \[ f(n) = \frac{1}{\zeta(a) n^a}, \quad n \in \N_+ \]
Clearly \( f \) is a valid PDF, since by definition, \( \zeta(a) \) is the normalizing constant for the function \( n \mapsto \frac{1}{n^a} \) on \( \N_+ \). Part (a) is clear. For part (b), note that the function \( x \mapsto x^{-a} \) on \( [1, \infty) \) has a positive second derivative.
Open the special distribution simulator and select the zeta distribution. Vary the shape parameter and note the shape of the probability density function. For selected values of the parameter, run the simulation 1000 times and compare the empirical density function to the probability density function.
The distribution function and quantile function do not have simple closed forms, except in terms of other special functions.
Open the quantile app and select the zeta distribution. Vary the parameter and note the shape of the distribution and probability density functions. For selected values of the parameter, compute the median and the first and third quartiles.
Suppose that \( N \) has the zeta distribution with shape parameter \( a \in (1, \infty) \). The moments of \( X \) can be expressed easily in terms of the zeta function.
If \( k \ge a - 1 \), \( \E\left(N^k\right) = \infty \). If \( k \lt a - 1 \), \[\E\left(N^k\right) = \frac{\zeta(a - k)}{\zeta(a)}\]
Note that \[ \E\left(N^k\right) = \sum_{n = 1}^\infty n^k \frac{1}{\zeta(a) n^a} = \frac{1}{\zeta(a)} \sum_{n = 1}^\infty \frac{1}{n^{a - k}}\] If \( a - k \le 1 \), the last sum diverges to \( \infty \). If \( a - k \gt 1 \), the sum converges to \( \zeta(a - k) \)
The mean and variance of \(N\) are as follows:
Open the special distribution simulator and select the zeta distribution. Vary the parameter and note the shape and location of the mean \( \pm \) standard deviation bar. For selected values of the parameter, run the simulation 1000 times and compare the empirical mean and standard deviation to the distribution mean and standard deviation.
The skewness and kurtosis of \(N\) are as follows:
The probability generating function of \( N \) can be expressed in terms of the polylogarithm function \( \Li \) that was introduced in the section on the exponential-logarithmic distribution. Recall that the polylogarithm of order \( s \in \R \) is defined by \[ \Li_s(x) = \sum_{k = 1}^\infty \frac{x^k}{k^s}, \quad x \in (-1, 1) \]
\( N \) has probability generating function \( P \) given by \[ P(t) = \E\left(t^N\right) = \frac{\Li_a(t)}{\zeta(a)}, \quad t \in (-1, 1) \]
Note that \[ \E\left(t^N\right) = \sum_{n=1}^\infty t^n \frac{1}{n^a \zeta(a)} = \frac{1}{\zeta(a)} \sum_{n=1}^\infty \frac{t^n}{n^a} \] The last sum is \( \Li_a(t) \).
In an algebraic sense, the zeta distribution is a discrete version of the Pareto distribution. Recall that if \( a \gt 1 \), the Pareto distribution with shape parameter \( a - 1 \) is a continuous distribution on \( [1, \infty) \) with probability density function \[ f(x) = \frac{a - 1}{x^a}, \quad x \in [1, \infty) \]
Naturally, the limits of the zeta distribution with respect to the shape parameter \( a \) are of interest.
The zeta distribution with shape parameter \( a \in (1, \infty) \) converges to point mass at 1 as \( a \to \infty \).
Finally, the zeta distribution is a member of the family of general exponential distributions.
Suppose that \(N\) has the zeta distribution with parameter \(a\). Then the distribution is a one-parameter exponential family with natural parameter \(a\) and natural statistic \(-\ln N\).
Let \(N\) denote the frequency of occurrence of a word chosen at random from a certain text, and suppose that \(X\) has the zeta distribution with parameter \(a = 2\). Find \(\P(N \gt 4)\).
\(\P(N \gt 4) = 1 - \frac{49}{6 \pi^2} \approx 0.1725\)
Suppose that \(N\) has the zeta distribution with parameter \(a = 6\). Approximate each of the following: