For our first discussion, we assume that the universal set \(S\) is finite. Recall the following definition of cardinality:
For \(A \subseteq S\), the cardinality of \(A\) is the number of elements in \(A\), and is denoted \(\#(A)\). The function \(\#\) on \(\ms P(S)\) is called counting measure.
Counting measure plays a fundamental role in discrete probability structures, and particularly those that involve sampling from a finite set. The set \(S\) is typically very large, hence efficient counting methods are essential. The first combinatorial problem is attributed to the Greek mathematician Xenocrates.
In many cases, a set of objects can be counted by establishing a one-to-one correspondence between the given set and some other set. Naturally, the two sets have the same number of elements, but for various reasons, the second set may be easier to count.
The addition rule of combinatorics is simply the additivity axiom of counting measure.
If \(n \in \N_+\) and \(\{A_1, A_2, \ldots, A_n\}\) is a collection of disjoint subsets of \(S\) then \[ \#\left( \bigcup_{i=1}^n A_i \right) = \sum_{i=1}^n \#(A_i) \]
The following counting rules are simple consequences of the addition rule in . Be sure to try the proofs yourself before reading the ones in the text.
If \(A \subseteq S\) then \(\#(A^c) = \#(S) - \#(A)\). This is the complement rule.
Note that \(A\) and \(A^c\) are disjoint and their union is \(S\). Hence \( \#(A) + \#(A^c) = \#(S) \).
If \(A, \, B \subseteq S\) then \(\#(B \setminus A) = \#(B) - \#(A \cap B)\). This is the difference rule.
Note that \(A \cap B\) and \(B \setminus A\) are disjoint and their union is \(B\). Hence \( \#(A \cap B) + \#(B \setminus A) = \#(B) \).
If \(A \subseteq B \subseteq S\) then \(\#(B \setminus A) = \#(B) - \#(A)\). This is the proper difference rule.
If \(A \subseteq B \subseteq S\) then \(\#(A) \le \#(B)\).
Thus, \(\#\) is an increasing function relative to the subset partial order \(\subseteq\) on \(\ms P(S)\), and the ordinary order \(\le\) on \(\N\).
Our next disucssion concerns two inequalities that are useful for obtaining bounds on the number of elements in a set. The first is Boole's inequality (named after George Boole) which gives an upper bound on the cardinality of a union.
If \(n \in \N_+\) and \(\{A_1, A_2, \ldots, A_n\}\) is a collection of subsets of \(S\) then \[ \#\left( \bigcup_{i=1}^n A_i \right) \le \sum_{i=1}^n \#(A_i) \]
Let \(B_1 = A_1\) and \(B_i = A_i \setminus (A_1 \cup \cdots A_{i-1})\) for \(i \in \{2, 3, \ldots, n\}\). Note that \(\{B_1, B_2, \ldots, B_n\}\) is a pairwise disjoint collection and has the same union as \(\{A_1, A_2, \ldots, A_n\}\). From the increasing property , \( \#(B_i) \le \#(A_i) \) for each \( i \in \{1, 2, \ldots, n\} \). Hence by the addition rule , \[ \#\left( \bigcup_{i=1}^n A_i \right) = \#\left(\bigcup_{i=1}^n B_i\right) \le \sum_{i=1}^n \#(A_i) \]
Intuitively, Boole's inequality holds because parts of the union have been counted more than once in the expression on the right. The second inequality is Bonferroni's inequality (named after Carlo Bonferroni), which gives a lower bound on the cardinality of an intersection.
If \(n \in \N_+\) and \(\{A_1, A_2, \ldots, A_n\}\) is a collection of subsets of \(S\) then \[ \#\left( \bigcap_{i=1}^n A_i \right) \ge \#(S) - \sum_{i=1}^n [\#(S) - \#(A_i)] \]
Using the complement rule , Boole's inequality , and De Morgan's law, \[ \#\left(\bigcap_{i=1}^n A_i\right) = \#(S) - \#\left(\bigcup_{i=1}^n A_i^c\right) \ge \#(S) - \sum_{i=1}^n \#(A_i^c) = \#(S) - \sum_{i=1}^n [\#(S) - \#(A_i)] \]
The inclusion-exclusion formula gives the cardinality of a union of sets in terms of the cardinality of the various intersections of the sets. The formula is useful because intersections are often easier to count. We start with the special cases of two sets and three sets. As usual, we assume that the sets are subsets of a finite universal set \( S \).
If \( A \) and \( B \) are subsets of \( S \) then \(\#(A \cup B) = \#(A) + \#(B) - \#(A \cap B)\).
If \( A \), \( B \), \( C \) are subsets of \( S \) then \(\#(A \cup B \cup C) = \#(A) + \#(B) + \#(C) - \#(A \cap B) - \#(A \cap C) - \#(B \cap C) + \#(A \cap B \cap C)\).
Note that \( A \cup B \cup C = (A \cup B) \cup [C \setminus (A \cup B)] \) and that \( A \cup B \) and \( C \setminus (A \cup B) \) are disjoint. Using the addition rule and the difference rule , \[ \#(A \cup B \cup C) = \#(A \cup B) + \#[C \setminus (A \cup B)] = \#(A \cup B) + \#(C) - \#[C \cap (A \cup B)] = \#(A \cup B) + \#(C) - \#[(A \cap C) \cup (B \cap C)]\] Now using inclusion-exclusion rule for two events in (twice) we have \[ \#(A \cup B \cup C) = \#(A) + \#(B) - \#(A \cap B) + \#(C) - \#(A \cap C) - \#(B \cap C) + \#(A \cap B \cap C) \]
The inclusion-exclusion rule for two and three sets can be generalized to a union of \(n\) sets; the generalization is known as the (general) inclusion-exclusion formula.
Suppose that \(\{A_i: i \in I\}\) is a collection of subsets of \(S\) where \(I\) is an index set with \(\#(I) = n \in \N_+\). Then \[ \# \left( \bigcup_{i \in I} A_i \right) = \sum_{k = 1}^n (-1)^{k - 1} \sum_{J \subseteq I, \; \#(J) = k} \# \left( \bigcap_{j \in J} A_j \right) \]
The proof is by induction on \(n\). The formula holds for \( n = 2 \) sets by . Suppose the formula holds for \( n \in \{2, 3, \ldots\} \), and suppose that \( \{A_1, A_2, \ldots, A_n, A_{n+1}\} \) is a collection of \( n + 1 \) subsets of \( S \). Then \[ \bigcup_{i=1}^{n+1} A_i = \left(\bigcup_{i=1}^n A_i\right) \cup \left[A_{n+1} \setminus \left(\bigcup_{i=1}^n A_i\right)\right] \] and the two sets connected by the central union are disjoint. Using the addition rule and the difference rule , \begin{align} \#\left(\bigcup_{i=1}^{n+1} A_i\right) & = \#\left(\bigcup_{i=1}^n A_i\right) + \#(A_{n+1}) - \#\left[A_{n+1} \cap \left(\bigcup_{i=1}^n A_i\right)\right]\\ & = \#\left(\bigcup_{i=1}^n A_i\right) + \#(A_{n+1}) - \#\left[\bigcup_{i=1}^n (A_i \cap A_{n+1})\right] \end{align} By the induction hypothesis, the formula holds for the two unions of \( n \) sets in the last expression. The result then follows by simplification.
The general Bonferroni inequalities, named again for Carlo Bonferroni, state that if sum on the right in is truncated after \(k\) terms (\(k \lt n\)), then the truncated sum is an upper bound for the cardinality of the union if \(k\) is odd (so that the last term has a positive sign) and is a lower bound for the cardinality of the union if \(k\) is even (so that the last terms has a negative sign).
The multiplication rule of combinatorics is based on the formulation of a procedure (or algorithm) that generates the objects to be counted. In this discussion, suppose that \(k \in \N_+\) and \(n_j \in \N_+\) for \(j \in \{1, 2, \ldots, k\}\).
Suppose that a procedure consists of \(k\) steps, performed sequentially, and that for each \(j \in \{1, 2, \ldots, k\}\), step \(j\) can be performed in \(n_j\) ways, regardless of the choices made on the previous steps. Then the number of ways to perform the entire procedure is \(n_1 n_2 \cdots n_k\).
The key to a successful application of the multiplication rule to a counting problem is the clear formulation of an algorithm that generates the objects being counted, so that each object is generated once and only once. That is, we must neither overcount nor undercount. It's also important to notice that the set of choices available at step \( j \) may well depend on the previous steps; the assumption is only that the number of choices available does not depend on the previous steps.
The first two results below give equivalent formulations of the multiplication principle.
Suppose that \(S\) is a set of sequences of length \(k\), and that we denote a generic element of \(S\) by \((x_1, x_2, \ldots, x_k)\). Suppose that for each \(j \in \{1, 2, \ldots, k\}\), \(x_j\) has \(n_j\) different values, regardless of the values of the previous coordinates. Then \(\#(S) = n_1 n_2 \cdots n_k\).
A procedure that generates the sequences in \( S \) consists of \( k \) steps. Step \( j \) is to select the \( j \)th coordinate.
Suppose that \(T\) is an ordered tree with depth \(k\) and that each vertex at level \(i - 1\) has \(n_i\) children for \(i \in \{1, 2, \ldots, k\}\). Then the number of endpoints of the tree is \(n_1 n_2 \cdots n_k\).
Suppose again that \(k \in \N_+\) and \(n_i \in \N_+\) for \(i \in \{1, 2, \ldots, k\}\). If \(S_i\) is a set with \(n_i\) elements for \(i \in \{1, 2, \ldots, k\}\) then \[ \#(S_1 \times S_2 \times \cdots \times S_k) = n_1 n_2 \cdots n_k \]
Suppose that \(n, \, k \in \N_+\). If \(S\) is a set with \(n\) elements, then \(S^k\) has \(n^k\) elements.
Note that the elements of \( S^k \) can be thought of as ordered samples of size \(k\) that can be chosen with replacement from a population of \(n\) objects. Elements of \(\{0, 1\}^n\) are sometimes called bit strings of length \(n\). Thus, there are \( 2^n \) bit strings of length \( n \).
Suppose that \(m, \, n \in \N_+\). The number of functions from a set \(A\) of \(m\) elements into a set \(B\) of \(n\) elements is \(n^m\).
An algorithm for constructing a function \(f: A \to B\) is to choose the value of \(f(x) \in B\) for each \(x \in A\). There are \( n \) choices for each of the \( m \) elements in the domain.
Recall that the set of functions from a set \(A\) into a set \(B\) (regardless of whether the sets are finite or infinite) is denoted \(B^A\). This theorem is motivation for the notation. Note also that if \(n, \, \in \N_+\) and \( S \) is a set with \( n \) elements, then the elements in the Cartesian power set \( S^k \) can be thought of as functions from \( \{1, 2, \ldots, k\} \) into \( S \). So the counting formula for sequences can be thought of as a corollary of counting formula for functions.
Suppose that \(S\) is a set with \(n\) elements, where \(n \in \N\). There are \(2^n\) subsets of \(S\).
Proof from the multiplication principle: An algorithm for constructing \(A \subseteq S\), is to decide whether \(x \in A\) or \(x \notin A\) for each \(x \in S\). There are 2 choices for each of the \( n \) elements of \( S \).
Proof using indicator functions: Recall that there is a one-to-one correspondence between subsets of \( S \) and indicator functions on \( S \). An indicator function is simply a function from \( S \) into \( \{0, 1\} \), and the number of such functions is \( 2^n \) by .
Suppose that \( \{A_1, A_2, \ldots A_k\} \) is a collection of \(k\) subsets of a set \(S\), where \(k \in \N_+\). There are \(2^{2^k}\) different (in general) sets that can be constructed from the \(k\) given sets, using the operations of union, intersection, and complement. These sets form the algebra generated by the given sets.
First note that there are \(2^k\) pairwise disjoint sets of the form \(B_1 \cap B_2 \cap \cdots \cap B_k\) where \(B_i = A_i\) or \(B_i = A_i^c\) for each \(i\). Next, note that every set that can be constructed from \(\{A_1, A_2, \ldots, A_n\}\) is a union of some (perhaps all, perhaps none) of these intersection sets.
Open the Venn diagram app.
Suppose that \(S\) is a set with \(n\) elements and that \(A\) is a subset of \(S\) with \(k\) elements, where \(n, \, k \in \N\) and \(k \le n\). The number of subsets of \(S\) that contain \(A\) is \(2^{n - k}\).
Our last result in this discussion generalizes .
Suppose that \(n, \, k \in \N\) and that \( S \) is a set with \(n\) elements. The number of sequences of subsets \((A_1, A_2, \ldots, A_k)\) with \(A_1 \subseteq A_2 \subseteq \cdots \subseteq A_k \subseteq S\) is \((k + 1)^n\).
To construct a sequence of the type in the theorem, we can use the following algorithm: For each \(x \in S\), either \(x\) is not in the sets, or \(x\) occurs for the first time in set \(A_i\) where \(i \in \{1, 2, \ldots, k\}\). (That is, \(x \notin A_j\) for \(j \in \{1, \ldots, i - 1\}\) and \(x \in A_j\) for \(j \in \{i, \ldots, k\}\).) So there are \(k + 1\) choices for each of the \(n\) elements of \(S\).
When \( k = 1 \) we get \( 2^n \) as the number of subsets of \( S \), as before.
A license number consists of two letters (uppercase) followed by five digits. How many different license numbers are there?
\(26^2 \cdot 10^5 = 67 \, 600 \, 000\)
Suppose that a Personal Identification Number (PIN) is a four-symbol code word in which each entry is either a letter (uppercase) or a digit. How many PINs are there?
\(36^4 = 1 \, 679 \, 616\)
In the board game Clue, Mr. Boddy has been murdered. There are 6 suspects, 6 possible weapons, and 9 possible rooms for the murder.
An experiment consists of rolling a standard die, drawing a card from a standard deck, and tossing a standard coin. How many outcomes are there?
\(6 \cdot 52 \cdot 2 = 624\)
A standard die is rolled 5 times and the sequence of scores recorded. How many outcomes are there?
\(6^5 = 7776\)
In the card game Set, each card has 4 properties: number (one, two, or three), shape (diamond, oval, or squiggle), color (red, blue, or green), and shading (solid, open, or stripped). The deck has one card of each (number, shape, color, shading) configuration. A set in the game is defined as a set of three cards which, for each property, the cards are either all the same or all different.
A coin is tossed 10 times and the sequence of scores recorded. How many sequences are there?
\(2^{10} = 1024\)
The die-coin experiment consists of rolling a die and then tossing a coin the number of times shown on the die. The sequence of coin results is recorded.
Run the die-coin experiment 100 times and observe the outcomes.
Consider a standard deck of cards as a set \(D\) with 52 elements.
Consider a group of 10 persons.
In the usual model of structural reliability, a system consists of components, each of which is either working or defective. The system as a whole is also either working or defective, depending on the states of the components and how the components are connected.
A string of lights has 20 bulbs, each of which may be good or defective. How many configurations are there?
\(2^{20} = 1 \, 048 \, 576\)
If the components are connected in series, then the system as a whole is working if and only if each component is working. If the components are connected parallel, then the system as a whole is working if and only if at least one component is working.
A system consists of three subsystems with 6, 5, and 4 components, respectively. Find the number of component states for which the system is working in each of the following cases:
Suppose that a sandwich at a restaurant consists of bread, meat, cheese, and various toppings. There are 4 choices for the bread, 3 choices for the meat, 5 choices for the cheese, and 10 different toppings (each of which may be chosen). How many sandwiches are there?
\(4 \cdot 3 \cdot 5 \cdot 2^{10} = 61 \, 440\)
At a wedding dinner, there are three choices for the entrée, four choices for the beverage, and two choices for the dessert.
Braille is a tactile writing system used by people who are visually impaired. The system is named for the French educator Louis Braille and uses raised dots in a \( 3 \times 2 \) grid to encode characters. How many meaningful Braille configurations are there?
\( 2^6 - 1 = 63 \). Note that the grid with no raised dots is not meaningful.
The Meyers-Briggs personality typing is based on four dichotomies: A person is typed as either extroversion (E) or introversion (I), either sensing (S) or intuition (I), either thinking (T) or feeling (F), and either judgement (J) or perception (P).
The Galton Board, named after Francis Galton, is a triangular array of pegs. Galton, apparently too modest to name the device after himself, called it a quincunx from the Latin word for five twelfths (go figure). The rows are numbered, from the top down, by \((0, 1, \ldots )\). Row \(n\) has \(n + 1\) pegs that are labeled, from left to right by \((0, 1, \ldots, n)\). Thus, a peg can be uniquely identified by an ordered pair \((n, k)\) where \(n\) is the row number and \(k\) is the peg number in that row.
A ball is dropped onto the top peg \((0, 0)\) of the Galton board. In general, when the ball hits peg \((n, k)\), it either bounces to the left to peg \((n + 1, k)\) or to the right to peg \((n + 1, k + 1)\). The sequence of pegs that the ball hits is a path in the Galton board.
There is a one-to-one correspondence between each pair of the following three collections:
Thus, each of these collections has \(2^n\) elements.
Open the Galton board app and set \(n = 20\) rows. Click on the left and right buttons to generate a path in the Galton board. Note the corresponding bit string and subset.