Modes of Convergence
Modes of Convergence for Random Variables
Random variables can converge in several different ways. Here is a brief introduction, highlighting examples of the different behaviors that can occur. Many of the examples below are taken from [1].
There are at least six different notions of convergence for random variables.
- \(X_n\) converges to \(X\) pointwise if \(X_n(\omega)\to X(\omega)\) for all \(\omega\).
- \(X_n\) converges to \(X\) almost surely or almost everywhere if \(X_n(\omega)\to X(\omega)\) for almost all \(\omega\), i.e., for all \(\omega\) in a set of probability 1. Thus, almost sure convergence requires \(\mathsf{Pr}(\{\omega \mid X_n(\omega)\to X(\omega) \})=1\).
- \(X_n\) converges to \(X\) in probability if for any \(\epsilon>0\) we have \(\mathsf{Pr}(|X_n(\omega)-X(\omega)|>\epsilon)\to 0\) as \(n\to\infty\).
- \(X_n\) converges to \(X\) in distribution or law or weakly or in the weak* topology if \(F_n(x)\to F(x)\) for all \(x\) for which \(F(x)\) is continuous, where \(F_n,F\) are the distribution functions of \(X_n,X\).
- \(X_n\) converges to \(X\) in the \(L^1\)-norm if \(\int |X_n(\omega) - X(\omega)| \,\mathsf{Pr}(d\omega)\to 0\) as \(n\to\infty\). More generally, for \(p\ge 1\) there is convergence for the \(L^p\)-norm if \(\int |X_n(\omega) - X(\omega)|^p\, \mathsf{Pr}(d\omega)\to 0\).
- \(X_n\) converges to \(X\) in the sup-norm if \(\sup_\Omega |X_n - X| \to 0\).
Pointwise convergence is the strongest notion, and it obviously implies almost sure convergence. Almost sure convergence implies convergence in probability, which implies convergence in distribution. \(L^1\) convergence implies convergence in probability. The figure lays out the relationships schematically. \(L^p\) convergence implies \(L^r\) convergence for \(p\ge r\ge 1\). sup-norm convergence can be regarded as a special case of \(L^p\) as \(p\to\infty\). Notice that since probability spaces have total probability (measure) 1, we are concerned about large values of \(X\) only . Random variables never fail to be integrable because of small values of \(X\). (On \([1,\infty)\) the variable \(X(x)=1/x\) is divergent, but \([1,\infty)\) does not have finite measure.)
Convergence in distribution is special to probability theory. It is equivalent to a number of other conditions, spelled out in the Portmanteau theorem, [2]. In particular, on a standard probability space, convergence in distribution is equivalent to \(\mathsf{Pr}(X_n\in A)\to\mathsf{Pr}(X\in A)\) for all events \(A\) whose boundary has probability zero and to \(\mathsf E[g(X_n)]\to \mathsf E[g(x)]\) for all bounded, continuous functions \(g\). The last condition partially explains the condition for convergence in distribution using Fourier transforms (moment generating functions), since \(g(x)=e^{2\pi i x\theta}\) is bounded for fixed \(\theta\).
The relationships between the different modes of convergence are best understood by considering examples.
Examples
- Convergence in probability but not almost surely.
- \(X_n\) independent with \(\mathsf{Pr}(X_n=1)=1/n\) and \(\mathsf{Pr}(X_n=0)=1-1/n\); \(X=0\). \(X_n(\omega)\to 0\) requires that all \(X_n\) for \(n\ge N\) equal zero, which has probability \(\prod_{n\ge N}(1-\frac{1}{n})=0\) (take logs and use \(\log(1-1/n)<-1/n\) and the fact \(\sum_n 1/n\) diverges).
- (Typewriter sequence.) Each integer \(n\ge 1\) can be written uniquely as \(n=2^m+k\) for \(0\le k < 2^m\). Let \(X_n(\omega)=1\) if \(\omega\in [k2^{-m}, (k+1)2^{-m}]\) and 0 otherwise, and \(X=0\). Then \(X_n\) converges to \(X\) in probability but not almost surely (for given \(\omega\), \(X_n(\omega)=1\) for one \(k\) for each \(m\) and is zero otherwise, hence it takes the values 0 and 1 infinitely many times and so \(X_n(\omega)\) does converge for any \(\omega\)).
- Convergence in distribution but not in probability.
- Let \(X_n=X\) be Bernoulli and \(Y=1-X\). Then \(X_n\) tends to \(X\) and \(Y\) in distribution (they have the same distribution) but not in probability because \(\mathsf{Pr}(X_n=Y)=\mathsf{Pr}(X=Y)=0\). Just as law invariant risk measures do not see the actual events, convergence in distribution does not consider explicit events.
- The same example works if \(X\) is any non-trivial, symmetric random variable, and \(Y=-X\).
- Let \(X_n\) be uniform on \(k/n\) for \(k=0,1,\dots,n-1\) and \(X\) be uniform on \([0,1]\). Then \(X_n\) converges to \(X\) in distribution (the distribution of \(X_n\) is a finer and finer stair-step function converging to the distribution of \(X\)) but not probability (the distribution of \(X_n\) is supported on rational numbers, which have probability zero.)
- \(L^1\) convergence or almost sure but not both.
- \(X_n(\omega)=n\) if \(\omega<1/n\) and 0 otherwise converges to \(X=0\) almost surely but not in \(L^1\), since \(\int X_n=1\) for all \(n\) but \(\int X=0\). Note \(X_n\) is unbounded; if \(X_n\) is dominated by an integrable function then Lebesgue’s dominated convergence theorem ensures \(L^1\) convergence.
- The typewriter sequence has \(L^1\) convergence but not almost sure convergence, since \(\int X_n\to 0\). In fact, it converges in \(L^p\) for all \(p<\infty\). It does not converge in \(L^\infty\) since \(\sup X_n=1\not=\sup X=0\).
- Equivalent formulations for convergence in distribution.
- The test function \(g\) must be continuous. Let \(X_n=1/n\) with probability \(1-1/n\) and \(1\) otherwise. \(X_n\) converges to 0 in probability (for all \(\epsilon>0\), \(\mathsf{Pr}(X_n>\epsilon)\to 0\) as \(n\to \infty\)). Let \(g(x)=0\) for \(x\le 0\) and \(g(x)=1\) for \(x>0\). For all \(n\), \(g(X_n)=1\), but \(g(0)=0\).
- Test sets \(A\) must have a boundary of probability zero. Apply the third (uniform) example from (2) to \(A=\mathbb Q\cap [0,1]\), the rationals in \([0,1]\). \(\mathsf{Pr}(X_n\in A)=1\) for all \(n\), but \(\mathsf{Pr}(X\in A)=0\). In this case the boundary of \(A\) is the set of all irrational numbers, which has probability 1 (the rationals have probability zero: they can be covered by an open set of arbitrarily small probability by putting an open interval of width \(\epsilon /2^{n+1}\) around the \(n\)th rational).
- The strong law of large numbers is a statement that the sample mean converges to the true mean almost surely. For an iid sequence it holds iff \(\mathsf E[|X_1|]<\infty\).
- The weak law of large numbers is a statement that the sample mean converges in probability, which is true under weaker conditions that do not require the mean exists, see [3].
- The central limit theorem is a statement about the convergence of the distribution of the mean of a sample as the sample size increases.