Why Almost Sure Convergence is Not Topological
When studying sequences of random variables on a standard probability space \((\Omega, \mathcal{F}, \mathsf P)\), we frequently encounter different modes of convergence. Two prominent modes are almost sure (a.s.) convergence and convergence in probability.
A natural question arises from a functional analysis perspective: Can these modes of convergence be characterized by a topology? To answer this, we need to understand what it takes for sequential convergence to be considered “topological,” and how a fundamental rule about subsequences draws a hard line between a.s. convergence and convergence in probability.
1 Definitions
Let \((X_n)\) be a sequence of random variables and \(X\) be a random variable.
Almost Sure Convergence (\(X_n \xrightarrow{\text{a.s.}} X\)) \[ \mathsf P\left(\left\{\omega \in \Omega : \lim_{n \to \infty} X_n(\omega) = X(\omega)\right\}\right) = 1. \] The sequence converges point-wise everywhere, except possibly on a set of measure zero.
Convergence in Probability (\(X_n \xrightarrow{P} X\)) For every \(\epsilon > 0\), \[ \lim_{n \to \infty} \mathsf P(|X_n - X| > \epsilon) = 0. \]
It is a standard, relatively straightforward result that almost sure convergence implies convergence in probability.1 The converse, however, is false, and understanding why leads us directly into topology.
2 1. The Topological Rule (\(L\) vs \(L^*\) spaces)
When Maurice Fréchet attempted to axiomatize the concept of limits without relying on metrics or open sets, he defined an L-space (Limit space) as a set equipped with a notion of sequential convergence satisfying two fundamental axioms:
- If \(X_n = X\) for all \(n\), then the sequence converges to \(X\).
- If a sequence converges to \(X\), then every subsequence also converges to \(X\).
Almost sure convergence perfectly satisfies these two axioms, making it a valid L-space.
To bridge the gap between L-spaces and actual topologies, Fréchet defined an \(L^*\)-space. This is an L-space that additionally satisfies the Urysohn subsequence axiom:
- If every subsequence of \((X_n)\) contains a further sub-subsequence converging to \(X\), then the original sequence \((X_n)\) converges to \(X\).
Any convergence structure generated by a topology must satisfy all three axioms. If a mode of convergence fails the Urysohn rule, it is strictly non-topological (often called a filter convergence space).
3 The Failure of Almost Sure Convergence
To see if a.s. convergence is a topology, we must ask: what happens if we force the Urysohn property onto it? The answer is given by the following theorem, which links a.s. convergence directly to convergence in probability.
Theorem 1 \(X_n \xrightarrow{P} X\) if and only if every subsequence \((X_{n_k})\) contains a further sub-subsequence \((X_{n_{k_j}})\) such that \(X_{n_{k_j}} \xrightarrow{\text{a.s.}} X\).
Proof. (\(\implies\)): Assume \(X_n \xrightarrow{P} X\). Let \((X_{n_k})\) be an arbitrary subsequence. It clearly maintains convergence in probability. For each \(j \in \mathbb{N}\), choose an index \(k_j\) (strictly increasing in \(j\)) such that: \[ \mathsf P(|X_{n_{k_j}} - X| > 2^{-j}) < 2^{-j} \]
Define the events \(A_j = \{|X_{n_{k_j}} - X| > 2^{-j}\}\). Because \(\sum_{j=1}^\infty \mathsf P(A_j) < \sum_{j=1}^\infty 2^{-j} = 1 < \infty\), the first Borel-Cantelli lemma implies \(\mathsf P(\limsup_{j \to \infty} A_j) = 0\).
Thus, with probability 1, only finitely many \(A_j\) occur. For almost every \(\omega\), there exists an integer \(J(\omega)\) such that for all \(j \ge J(\omega)\), \(|X_{n_{k_j}}(\omega) - X(\omega)| \le 2^{-j}\). This forces \(X_{n_{k_j}}(\omega) \to X(\omega)\) as \(j \to \infty\), meaning \(X_{n_{k_j}} \xrightarrow{\text{a.s.}} X\).
(\(\impliedby\)): Assume every subsequence contains a further sub-subsequence converging almost surely to \(X\). We proceed by contradiction. Assume \(X_n \not\xrightarrow{P} X\). Then there exists an \(\epsilon > 0\) and \(\delta > 0\) such that \(\mathsf P(|X_n - X| > \epsilon) \ge \delta\) for infinitely many \(n\).
This extracts a subsequence \((X_{n_k})\) where \(\mathsf P(|X_{n_k} - X| > \epsilon) \ge \delta\) for all \(k\). By hypothesis, \((X_{n_k})\) contains a sub-subsequence \((X_{n_{k_j}})\) that converges almost surely to \(X\). Since a.s. convergence implies convergence in probability, \(X_{n_{k_j}} \xrightarrow{P} X\).
This requires \(\lim_{j \to \infty} \mathsf P(|X_{n_{k_j}} - X| > \epsilon) = 0\), contradicting \(\mathsf P(|X_{n_{k_j}} - X| > \epsilon) \ge \delta > 0\).
4 The Typewriter Counterexample
The theorem reveals a fatal flaw for topologizing a.s. convergence. The condition “every subsequence contains a further sub-subsequence that converges a.s. to \(X\)” is the equivalent to convergence in probability by Theorem 1. If a.s. convergence were an \(L^*\)-space, it would mean a.s. convergence and convergence in probability are identical concepts.
To see that they are distinct, consider the “typewriter” sequence on \([0,1]\) with Lebesgue measure. Let \(X_n\) be the indicator function of an interval of length \(1/k\) sliding left to right: \[X_1 = \mathbf{1}_{[0,1]}\] \[X_2 = \mathbf{1}_{[0, 1/2]}, \quad X_3 = \mathbf{1}_{[1/2, 1]}\] \[X_4 = \mathbf{1}_{[0, 1/4]}, \quad X_5 = \mathbf{1}_{[1/4, 2/4]}, \dots\]
- Convergence in Probability: \(X_n \xrightarrow{P} 0\) because the measure of the support, \(1/k\), goes to \(0\).
- Fails a.s. Convergence: For every \(\omega \in [0,1]\), the sliding interval passes over \(\omega\) infinitely many times. Thus, \(X_n(\omega) = 1\) infinitely often, and \(\limsup X_n = 1\). It fails to converge a.s. anywhere.
Because the two convergence modes are distinct, a.s. convergence fails the Urysohn rule. Almost sure convergence is an L-space, but not an \(L^*\)-space, and therefore cannot be generated by a topology.
As guaranteed by Theorem 1, we can still extract an a.s. convergent sub-subsequence from the typewriter sequence. If we select only the intervals anchored at \(0\), such as \(\mathbf{1}_{(0, 1/n)}\), we strip away the sweeping motion. For any fixed \(\omega \in (0,1]\), the indicator eventually evaluates to \(0\) permanently, ensuring a.s. convergence to \(0\).
5 The Success of Convergence in Probability
Since convergence in probability acts as the Urysohn extension of a.s. convergence, it naturally satisfies Fréchet’s third axiom. It is an \(L^*\)-space. In fact, it is fully topological and metrizable.
The space \(L^0\) of all random variables (identifying those equal almost surely) can be equipped with a metric such that convergence in this metric is exactly equivalent to convergence in probability: \[ d(X, Y) = \mathsf P\left[ \frac{|X - Y|}{1 + |X - Y|} \right] \]
To see why \(d(X_n, X) \to 0\) if and only if \(X_n \xrightarrow{P} X\):
(\(\implies\)): Let \(\epsilon > 0\). The function \(f(x) = \frac{x}{1+x}\) is strictly increasing for \(x \ge 0\). On the event \(\{|X_n - X| > \epsilon\}\), we have \(\frac{|X_n - X|}{1 + |X_n - X|} > \frac{\epsilon}{1 + \epsilon}\). \[ \mathsf P\left[ \frac{|X_n - X|}{1 + |X_n - X|} \right] \ge \mathsf P\left[ \frac{\epsilon}{1 + \epsilon} \mathbf{1}_{\{|X_n - X| > \epsilon\}} \right] = \frac{\epsilon}{1 + \epsilon} \mathsf P(|X_n - X| > \epsilon) \] Since the left-hand side goes to \(0\), \(\mathsf P(|X_n - X| > \epsilon) \to 0\). Thus, \(X_n \xrightarrow{P} X\).
(\(\impliedby\)): Let \(\epsilon > 0\). We split the expectation over two disjoint sets, \(A_n = \{|X_n - X| \le \epsilon\}\) and its complement \(A_n^c\): \[ d(X_n, X) = \mathsf P\left[ \frac{|X_n - X|}{1 + |X_n - X|} \mathbf{1}_{A_n} \right] + \mathsf P\left[ \frac{|X_n - X|}{1 + |X_n - X|} \mathbf{1}_{A_n^c} \right] \] On \(A_n\), the integrand is strictly bounded above by \(\epsilon\). On \(A_n^c\), the integrand is strictly bounded above by \(1\). \[ d(X_n, X) \le \epsilon \mathsf P(A_n) + 1 \cdot\mathsf P(A_n^c) \le \epsilon + \mathsf P(|X_n - X| > \epsilon) \] As \(n \to \infty\), \(\mathsf P(|X_n - X| > \epsilon) \to 0\). This leaves \(\limsup d(X_n, X) \le \epsilon\). Since \(\epsilon\) is arbitrary, \(d(X_n, X) \to 0\).
6 The Capping Metric vs. The \(L^1\) Norm
A natural follow-up is how the probability metric differs from the standard \(L^1\) norm, \(\|X - Y\|_{L^1} = \mathsf P|X - Y|\).
The difference lies in the numerator-denominator structure of the probability metric, which acts as a cap. It restricts the penalty for large deviations to a maximum of \(1\), whereas the \(L^1\) norm penalizes large deviations linearly and without bound. Convergence in probability requires only the measure of the set where variables differ to shrink to zero; it ignores how large the variables grow on that shrinking set.
To clearly see the difference, consider a tall and thin sequence that grows vertically while shrinking horizontally. On the space \([0,1]\) with Lebesgue measure, define \[ X_n = n \mathbf{1}_{(0, 1/n)}. \]
Under the probability metric we have \[ \begin{aligned} d(X_n, 0) &= \mathsf P\left[ \frac{n \mathbf{1}_{(0, 1/n)}}{1 + n} \right] \\ &= \frac{n}{1+n} \cdot \mathsf P((0, 1/n)) \\ &= \frac{n}{1+n} \cdot \frac{1}{n} \\ &= \frac{1}{n+1} \end{aligned} \] As \(n \to \infty\), \(d(X_n, 0) \to 0\). The capping function perfectly absorbed the explosive vertical growth of \(n\).
Under the \(L^1\) norm: \[ \begin{aligned} \|X_n\|_{L^1} &= \mathsf P[|X_n|] \\ &= n \cdot \mathsf P((0, 1/n)) \\ &= n \cdot \frac{1}{n} \\ &= 1 \end{aligned} \] As \(n \to \infty\), \(\|X_n\|_{L^1} \not\to 0\).
For a sequence to converge in \(L^1\), it must converge in probability and be uniformly integrable. The probability metric drops the uniform integrability requirement entirely, caring only about the horizontal footprint of the deviation, not its vertical magnitude.
7 A Note on the Weak-* Topology
When dealing with \(L^\infty\), it is tempting to relate a.s. convergence of uniformly bounded sequences to the weak-* topology \(\sigma(L^\infty, L^1)\).
By the Dominated Convergence Theorem, if a sequence \((X_n)\) is uniformly bounded and \(X_n \xrightarrow{\text{a.s.}} X\), then \(\int X_n Y \, d\mathsf P \to \int X Y \, d\mathsf P\) for all \(Y \in L^1\). Therefore, uniformly bounded a.s. convergence strictly implies weak-* convergence.
The converse fails. A sequence can converge in the weak-* topology without converging almost surely (or even in probability). Consider the Rademacher functions \(X_n(\omega) = \text{sgn}(\sin(2^n \pi \omega))\) on \([0,1]\). This sequence is uniformly bounded (\(\|X_n\|_\infty = 1\)) and converges weak-* to \(0\) by the Riemann-Lebesgue lemma. Yet, it alternates between \(-1\) and \(1\), failing to converge almost surely anywhere.
Footnotes
Proof that almost sure convergence implies convergence in probability. Assume \(X_n \xrightarrow{\text{a.s.}} X\). Fix \(\epsilon > 0\). Let \(A_n = \{|X_n - X| > \epsilon\}\). We want to show \(\mathsf P(A_n) \to 0\). Define the tail sequence of events \(B_n = \bigcup_{m=n}^\infty A_m\). The sequence \(B_n\) is monotonically decreasing (\(B_{n+1} \subseteq B_n\)). If \(\omega \in \bigcap_{n=1}^\infty B_n\), then \(|X_n(\omega) - X(\omega)| > \epsilon\) for infinitely many \(n\), meaning the sequence fails to converge at \(\omega\). Since \(X_n \xrightarrow{\text{a.s.}} X\), the probability of this intersection must be \(0\). By the continuity of probability measure from above on decreasing sets, \(\lim_{n \to \infty} \mathsf P(B_n) = \mathsf P(\bigcap_{n=1}^\infty B_n) = 0\). Since \(A_n \subseteq B_n\) for all \(n\), it immediately follows that \(\lim_{n \to \infty} \mathsf P(A_n) = 0\). Thus, \(X_n \xrightarrow{P} X\).↩︎
