Representing Distortions: A Circle of Equivalences

notes
mathematics
probability
llm
Distortion functions correspond to measures on \([0,1]\). There are six equivalent ways to define the associated functional.
Author

Stephen J. Mildenhall

Published

2026-01-12

Modified

2026-01-14

Mathematics is full of wonderful circles of equivalences, concepts with multiple definitions which gradually unite as your understanding grows. The exponential function is a good example, and the exponential family distributions is one of my favorites. In this post, I present another example: the link between distortion functions and integral representations of spectral risk measures.

We relate three equivalent ways to describe a loss spectral risk measure. The starting object is a concave distortion \(g:[0,1]\to[0,1]\) with \(g(0)=0\) and \(g(1)=1\). Equivalently, \(g\) is a mixture of the elementary tail-kernels \(\mathsf{tvar}_p(t)=1\wedge t/(1-p)\), \(p\in[0,1)\), with mixing measure \(\mu\) on \([0,1]\). Each description induces the same pricing/risk functional on integrable losses: it can be written as a Choquet (survival) integral, as an integral of quantiles against the distortion, as a mixture of \(\mathsf{TVaR}_p\), or as an ordinary expectation under the “distorted” distribution \(G_X(x)=1-g(S_X(x))\). The main theorem below collects these representations and records the mild regularity needed to pass between them cleanly.

Warning. It’s boring to start with notation, but you probably should, because some of it is non-standard.

This post was written with help from Gemini 3.0 Pro and GPT5.2.

1 Probability Measures, Distortions, and the Kusuoka Correspondence

Let \(\mathcal{M}\) be the set of Borel probability measures on \([0,1]\), and let \(\mathcal{D}_c\) be the set of concave distortion functions \(g: [0,1] \to [0,1]\) such that \(g(0)=0\), \(g(1)=1\), and \(g\) is concave. Lebesque measure on \([0,1]\) is denoted \(\mathsf P\).

We define the TVaR distortion kernel for \(p \in [0,1)\) as: \[ \mathsf{tvar}_p(t) = 1 \wedge \frac{t}{1-p} = \begin{cases} \dfrac{t}{1-p} & 0 \le t < 1-p \\ 1 & 1-p \le t \le 1. \end{cases} \] In the limiting case, \(\mathsf{tvar}_1(t)=\set{t>0}\).

Proposition 1 (The Kusuoka Correspondence.) There exists a linear bijection \(\Psi: \mathcal{M} \to \mathcal{D}_c\) defined by: \[ g(t) = \Psi(\mu)(t) = \int_{[0,1]} \mathsf{tvar}_p(t) \, \mu(dp). \]

Proof (Linear). Linearity follows from linearity of the integral with respect to the measure. Let \(\mu_1, \mu_2 \in \mathcal{M}\) be two probability measures on \([0,1]\) and let \(\alpha, \beta \in \mathbb{R}\) such that \(\alpha \ge 0\), \(\beta \ge 0\) and \(\alpha+\beta=1\).

For any fixed \(t \in [0,1]\), the integrand \(\mathsf{tvar}_p(t)\) is a bounded, non-negative, measurable function of \(p\). By the definition of the integral with respect to a linear combination of measures: \[ \begin{aligned} \Psi(\alpha \mu_1 + \beta \mu_2)(t) &= \int_{[0,1]} \mathsf{tvar}_t(p) \, (\alpha \mu_1 + \beta \mu_2)(dp) \\ &= \alpha \int_{[0,1]} \mathsf{tvar}_t(p) \, \mu_1(dp) + \beta \int_{[0,1]} \mathsf{tvar}_t(p) \, \mu_2(dp) \\ &= \alpha \Psi(\mu_1)(t) + \beta \Psi(\mu_2)(t). \end{aligned} \] Since this holds for all \(t \in [0,1]\), we have \(\Psi(\alpha \mu_1 + \beta \mu_2) = \alpha \Psi(\mu_1) + \beta \Psi(\mu_2)\). Thus, \(\Psi\) is linear. \(\square\)

We prove the Kusuoka correspondence is a bijection using results from convex analysis, after some setup in the next section.

2 Notation

We use the notation conventions recommended by Pollard (2002), using \(\mathsf PX\) rather than \(\mathsf EX\) and identifying a set with its indicator function.

As an example of the Pollard approach, here is a handy notational trick that appears often in the literature. If \(m\) is a measure and \(A\) is an \(m\) measurable set, then \[ \int A\,dm = m(A) \] where \(A\) is (Pollard) shorthand for the indicator function on \(A\). This trick is used to write the quantile function as \[ q(p) = \int_0^\infty \set{F(x) < p}\,dx. \tag{1}\] For clarity, the integrand in Equation 1 is the function \[ x\mapsto \set{F(x) < p}(x) = \begin{cases} 1 & F(x) < p \\ 0 & F(x) \ge p \end{cases} = \begin{cases} 1 & x < q(p) \\ 0 & x \ge q(p). \end{cases} \]

3 Convex Sets and Extreme Points

Before characterizing the structure of distortion functions, we recall the standard definitions of convexity and extreme points in a vector space.

Definition 1 (Convex Sets and Extreme Points.) Let \(V\) be a vector space. A subset \(K \subseteq V\) is convex if the line segment connecting any two points in the set lies entirely within the set. That is, for all \(x, y \in K\) and \(\lambda \in [0,1]\): \[ \lambda x + (1-\lambda)y \in K. \]

An element \(e \in K\) is an extreme point if it cannot be decomposed as a non-trivial convex combination of other points in \(K\). Formally, \(e \in \mathsf{Ext}(K)\) if the equality \[ e = \lambda x + (1-\lambda)y \] with \(x, y \in K\) and \(\lambda \in (0,1)\) implies that \(x = y = e\).

Geometrically, extreme points correspond to the “corners” or “vertices” of the set. For example, in a triangle, the extreme points are the three vertices, and in a disk, the extreme points are its circular boundary. In the triangle, points on the edges are convex combinations of the endpoint vertices, and interior points are combinations of all three vertices. In the disk, points in the interior are combinations of boundary points.

4 TVaR Distortions as Extreme Points

The set \(\mathcal{D}_c\) is a convex set: if \(g=\sum \lambda_ig_i\), \(\lambda_i\ge 0\) and \(\sum\lambda_i=1\), then \(g(0)=\sum \lambda_ig_i(0)=0\) and \(g(1)=\sum \lambda_ig_i(1)=\sum \lambda_i=1\), and similarly \(g\) is increasing.

The extreme points of \(\mathcal{D}_c\) are the functions \(\mathsf{tvar}_p\). We can see this “manually” using a geometric proof as follows. Consider \(\mathsf{tvar}_p\) and \(t\) in two regions.

  1. For \(t \in [1-p, 1]\) (the flat region), \(\mathsf{tvar}_p(t)=1\). If \(\mathsf{tvar}_p = \lambda h_1 + (1-\lambda)h_2\) for concave distortions \(h_1, h_2\), then \(h_1(t)=h_2(t)=1\) on this interval, as 1 is the upper bound of any distortion.
  2. For \(t \in [0, 1-p]\) (the linear region), \(\mathsf{tvar}_p(t)\) is the chord connecting \((0,0)\) to \((1-p, 1)\). By concavity, any distortion \(h\) with \(h(1-p)=1\) must satisfy \(h(t) \ge \mathsf{tvar}_p(t)\) on this interval.
  3. Since for \(\mathsf{tvar}_p\) the weighted average equals the lower bound, we must have \(h_1(t) = h_2(t) = \mathsf{tvar}_p(t)\) everywhere.

Thus, \(\mathsf{tvar}_p\) cannot be decomposed.

5 Proof the Kusuoka Correspondence is Bijective

The correspondence \(\Psi: \mathcal{M} \to \mathcal{D}_c\) is illuminated by comparing the extreme points in each space. We the extreme points of \(\mathcal{D}_c\). The set \(\mathcal{M}\) is also convex. By Aliprantis and Border (2006) Theorem 15.9, its extreme points are precisely the Dirac measures: \[ \mathsf{Ext}(\mathcal{M}) = \set{ \delta_p : p \in [0,1] }. \] The next lemma shows that \(\Psi\) connects the extreme points of \(\mathcal{M}\) and \(\mathcal{D}_c\).

Lemma 1 (Mapping of Extreme Points.) Let \(\delta_q \in \mathcal{M}\) be the Dirac measure concentrated at \(q \in [0,1)\). Then \(\Psi(\delta_q) = \mathsf{tvar}_q\).

Proof. By the defining property of the Dirac measure, for any bounded measurable function \(f\), \(\int f(p) \, \delta_q(dp) = f(q)\). Thus, substituting \(\mu = \delta_q\) in the definition of \(\Psi\), gives \[ \Psi(\delta_q)(t) = \int \mathsf{tvar}_p(t)\delta_q(dp) = \mathsf{tvar}_q(t) \] the TVaR distortion kernel. \(\square\)

Since \(\Psi\) is linear, it preserves convex structure. Thus we can deduce that \(\mathsf{tvar}\) are extreme from the fact \(\delta_p\) are extreme.

By Choquet’s representation theorem, Theorem 3, every point in a convex set is a weighted sum of extreme points. Thus, every distortion is a weighted sum (integral) of TVaRs.

\(\mathcal{M}\) and \(\mathcal{D}_c\) are compact convex sets and \(\Psi\) is a continuous linear bijection between their extreme points. Surjectivity follows by the Krein-Milman theorem. \(\mathcal{D}_c\) is the closed convex hull of its extreme points. Since the image of \(\Psi\) contains all extreme points of \(\mathcal{D}_c\) and \(\Psi\) is linear, the image must be the whole space \(\mathcal{D}_c\). Injectivity follows using Shapiro (2012), which shows \(\mathcal{D}_c\) is a Choquet Simplex. This implies that for any \(g \in \mathcal{D}_c\), the representing measure \(\mu\) supported on the extreme points is unique. Therefore, \(\Psi\) is a bijection. See Section 10 for more.

6 The Spectrum of a Distortion

Let \(g(t) = \int_{[0,1)} \mathsf{tvar}_p(t) \, \mu(dp)\) be a typical distortion. Differentiating with respect to \(t\) yields the spectral function: \[ \begin{aligned} g'(t) &= \frac{d}{dt} \int_{[0,1)} 1\wedge \frac{t}{1-p} \, \mu(dp) \\ &= \int_{[0,1)} \frac{1}{1-p} \set{t < 1-p} \, \mu(dp) \\ &= \int_{[0, 1-t)} \frac{1}{1-p} \, \mu(dp). \end{aligned} \]

Note: The integral is restricted to \([0,1)\) because the term corresponding to \(p=1\) is \(\mathsf{tvar}_1(t) = \set{t>0}\). On the open interval \((0,1)\), this function is constant (equal to 1), and thus its derivative is zero. Excluding \(p=1\) also avoids the singularity at \(\dfrac{1}{1-p}\).

To align this with standard spectral representations, we perform a change of variables. Let \(s = 1-p\) represent the significance level (or tail probability). This transformation maps the confidence level \(p \in [0, 1-t)\) to the tail region \(s \in (t, 1]\).

Let \(\nu\) be the image measure of \(\mu\) under the map \(T(p) = 1-p\). That is, for any Borel set \(A\), \(\nu(A) = \mu\set{ p : 1-p \in A }\). (If \(\mu\) has a density \(f\), then \(\nu\) has density \(h(s)=f(1-s)\); standard change of variables.) Substituting \(s\) for \(1-p\) in the integral gives \[ g'(t) = \int_{(t,1]} \frac{1}{s} \, \nu(ds). \] If \(\mu\) has an atom at \(p=1\), \(g\) has a jump at \(t=0\), and the derivative contains a Dirac delta component. This expression now matches the spectral weight construction in Follmer et al. (2016), Prop 4.69. The weight \(\phi(t) := g'(1-t)\) at quantile level \(t\) accumulates the weights \(\frac{1}{s}\) for all components active in the tail (where the significance level \(s > t\)).

See also Simon (2011) Theorem 1.29.

6.1 Recovering the Measure: The Curvature Heuristic

The previous derivation constructs \(g\) from a known measure \(\mu\). However, in practice, we often start with a desired risk profile \(g\) and need to determine its constituent TVaR weights. This inverse problem highlights the second dynamic in our circle of equivalences: the measure \(\nu\) is proportional to the curvature of the distortion.

Since \(g'(t)\) is an integral over \((t, 1]\), the Fundamental Theorem of Calculus (generalized to measures) implies that the measure \(\nu\) is related to the negative derivative of \(g'\) \[ dg'(t) = -\frac{1}{t} \, \nu(dt). \] Rearranging this relates the mixing measure directly to the second distributional derivative of \(g\): \[ \nu(dt) = -t \, dg'(t). \tag{2}\] (Note: Since \(g\) is concave, \(g'\) is decreasing, so \(dg'\) is a negative measure. Thus \(\nu\) is a positive measure).

Equation 2 offers a powerful heuristic: highly curved regions of the distortion function correspond to heavy weighting of the TVaR parameters in that region. A pure TVaR is the extreme case: all the “curvature” at one point!

6.2 Detecting Weight on the Mean

A subtle but important feature of this relationship arises at the endpoint \(t=1\). The standard Expected Value principle corresponds to \(\mathsf{TVaR}_0\), or \(s=1\). Does a given distortion \(g\) place any weight on the simple average?

We can detect this by inspecting the terminal slope \(g'(1)\). From the spectral integral: \[ \lim_{t \to 1} g'(t) = \nu(\{1\}). \] Because \(g\) is concave and \(g(t) \ge t\), the slope \(g'(1)\) is always between 0 and 1.

  • If \(g'(1) = 0\): The measure places no weight on the mean. The risk measure is entirely driven by tail events (e.g., \(\mathsf{TVaR}_{0.99}\)).
  • If \(g'(1) > 0\): The measure includes a discrete atom at \(s=1\) (the mean) with weight exactly equal to this final slope.

Example 1 (The Wang \(\alpha=0.5\) Distortion.) Consider the Wang distortion \(g(t) = \sqrt{t}\). This function is concave, distorting probabilities to be larger than they are (\(g(t) > t\)). The terminal slope is \(g'(t) = \dfrac{1}{2\sqrt{t}}\), so \(g'(1) = 0.5\). This immediately tells us that 50% of the risk measure is simply the expected value (\(\mathsf{TVaR}_0\)). The curvature is \(g''(t) = -\dfrac{1}{4} t^{-3/2}\). Using the curvature formula, the continuous density is \(\nu(dt) = -t [-\dfrac{1}{4} t^{-3/2}] dt = \dfrac{1}{4\sqrt{t}} dt\). Integrating this density over \([0,1]\) yields \(\int_0^1 \dfrac{1}{4\sqrt{t}} dt = 0.5\). As a result the spectral measure \(\nu\) consists of a continuous density \(\dfrac{1}{4\sqrt{t}}\) summing to 0.5, plus a Dirac mass of 0.5 at \(s=1\). The distortion is an equal mix of the mean and a curvature component.

Example 2 (The Cantor Distortion.) The Cantor’s Devil’s Staircase provides a counterexample to the intuition that risk measures must be either smooth (have a density) or discrete (scenarios). It forces us to use the full generality of the spectral measure \(\nu\).

Let \(c(t)\) be the standard Cantor function. This function is continuous, non-decreasing, and constant on the intervals removed during the construction of the Cantor set (the “middle thirds”). It increases from 0 to 1, yet its derivative is zero almost everywhere. (It maps a point \(t\) of the form \(\sum_n 2a_n/3^n\), \(a)n\in\set{0,1}\) to \(\sum_n a_n2^n\) and fills in the gaps to be flat and continuous.)

To construct a valid spectral weight function \(g'\) that is decreasing and integrates to 1, we define: \[ g'(t) = 2(1 - c(t)). \] The factor of 2 is necessary because the standard Cantor function integrates to \(0.5\).

Since \(g'\) is a “staircase” function composed of flat steps, the distortion \(g\) is piecewise linear. On every “middle third” interval (e.g., \((1/3, 2/3)\)), the weight \(g'(t)\) is constant. Consequently, \(g(t)\) is a straight line segment on these intervals. However, because the Cantor set is a dust of points with no length, these linear segments are stitched together continuously to form a concave shape that has no “corners” in the traditional sense, yet strictly changes slope uncountably many times.

What is the mixing measure \(\nu\) for this strange \(g\)? Recall that the spectral measure \(\nu\) relates to the derivative of the spectral weight: \(-dg'(t) = \dfrac{1}{t} \nu(dt)\). Since \(g'(t)\) is a linear transformation of the Cantor function \(c(t)\), the differential \(-dg'\) is proportional to the differential \(dc\) \[ -dg'(t) = 2 \, dc(t). \] By definition, \(c(t)\) is the cumulative distribution function of the uniform distribution on the Cantor set, \(\mathcal{K}\). Thus, the curvature of our distortion is entirely supported on the Cantor set. Substituting back into our structure equation gives \[ \frac{1}{t} \nu(dt) = 2 \, \mathcal{K}(dt) \implies \nu(dt) = 2t \, \mathcal{K}(dt). \]

The spectral measure \(\nu\) is not a set of discrete weights, nor is it a continuous density. It is the singular continuous uniform measure on the Cantor set, weighted linearly by \(t\). This confirms that the Circle of Equivalences accommodates every possible type of measure—discrete, absolutely continuous, and singular continuous. \(\square\)

Example 3 (Lebesgue’s Singular Function.) Consider the random variable \(Y\) defined by a binary expansion with biased probabilities: \[ Y = \sum_{n=1}^\infty \frac{B_n}{2^n} \] where \(B_n\) are i.i.d. Bernoulli trials with \(P(B_n=1) = p\) and \(p \neq 1/2\).

Let \(F_p(t)\) be the cumulative distribution function of \(Y\). If \(p=1/2\), \(Y\) is Uniform on \([0,1]\) and \(F_{1/2}(t) = t\). But if \(p \neq 1/2\), \(F_p(t)\) is a singular continuous function known as Lebesgue’s Singular Function. It is strictly increasing and continuous, yet its derivative is zero almost everywhere.

To build a concave distortion \(g\), we need a decreasing spectral weight function \(g'\). Since \(F_p(t)\) is increasing, we can define the spectral weights by reflecting the CDF: \[ g'(t) = c \cdot F_p(1-t), \] where \(c\) is a normalizing constant to ensure \(\int g' = 1\).

The spectral measure \(\nu\) associated with this distortion is precisely the probability distribution of \(Y\) (reflected). This example is distinct from the Cantor case in having full support. Unlike the Cantor measure, which lives on a sparse “dust,” the support of this Bernoulli measure is the entire interval \([0,1]\). Every possible open interval of risk levels carries some weight. Despite covering the whole interval, the measure is mutually singular with respect to the Uniform distribution. It concentrates all its mass on the “unlikely” (to Lebesgue measure) set of numbers whose binary expansion has a proportion of 1s equal to \(p\). Lebesgue almost all numbers have a proportion of \(0.5\), the normal number theorem Billingsley (2017).

This provides a spectral representation \(\nu\) that is “everywhere and nowhere”—fully supported on \([0,1]\) yet orthogonal to the standard continuum.

See Mildenhall and Major (2022) 10.9 for more (mundane) examples computing \(\mu\).

7 The Functional \(g(X)\)

Given a distortion function \(g\), we define the risk functional \(g: L^\infty \to \mathbb{R}\) as the Choquet integral with respect to the capacity \(c = g \circ P\). For a non-negative random variable \(X\) this defines \[ g(X) := \int_0^\infty g(S_X(x)) \, dx, \] where \(S_X(x) = P(X > x)\). We use the same notation following Pollard’s recommendation. This avoids using \(\rho_g\) or \(\rho\) (hidden \(g\)). The payoff is that \(\mathsf{tvar}\leftrightarrow\mathsf{TVaR}\), explaining our choice of notation, though here we distinguish the two for clarity until their equivalence is established. The functional \(g\) can be extended to any bounded (below) \(X\) by writing \(X=(X+k)-k\) to obtain \[ g(X) := -\int_{-\infty}^0 \check g(F_X(x)) \, dx + \int_0^\infty g(S_X(x)) \, dx. \] Going forward, concentrate on \(X\ge 0\) for simplicity.

The next proposition confirms that Pollard’s notation works as expected for the TVaR distortion kernels. Define the usual TVaR (or expected shortfall) functional as \[ \mathsf{TVaR}_p(X) := \frac{1}{1-p}\int_p^1 q(s)\,ds \tag{3}\] where \(q(p) := \inf \{x : S_X(x) \le 1-p\}\) is the lower \(p\) quantile function of \(X\).

Proposition 2 (TVaR Functional.) The functional induced by the extreme TVaR distortion kernel \(\mathsf{tvar}_p\) is the TVaR functional \(\mathsf{TVaR}_p\).

Proof. We need to show the functional defined by \(\mathsf{tvar}_p(X)\) equals TVaR defined by Equation 3. This follows using the definition, notational trick, and Fubini’s theorem: \[ \begin{aligned} \mathsf{tvar}_p(X) &= \int_0^\infty \mathsf{tvar}_p(S_X(x))\, dx \\ &= \int_0^\infty 1 \wedge \frac{S_X(x)}{1-p}\, dx \\ &= \int_0^\infty \frac{1}{1-p} \int_p^1 \set{F(x)<t}\,dt\,dx \\ &= \int_0^\infty \frac{1}{1-p} \int_0^1 \set{t \ge p} \set{F(x)<t}\,dt\,dx \\ &= \int_0^1 \frac{\set{t \ge p} }{1-p} \int_0^\infty \set{F(x)<t}\,dx\,dt \\ &= \int_p^1 \frac{1}{1-p} \int_0^\infty \set{F(x)<t}\,dx\,dt \\ &= \frac{1}{1-p} \int_p^1 q(t)\,dt. \end{aligned} \] In detail, as a function of \(x\) and \(p\) we have: \[ \begin{aligned} \frac{1}{1-p} \int_p^1 \set{F(x)<t}\,dt &= \begin{cases} \dfrac{1 - F(x)}{1-p} & p \le F(x),\ x\ge q(p) \\ 1 & p > F(x),\ x < q(p). \end{cases} \end{aligned} \]

8 Behavior of \(g\) at \(s=0\)

This section proves an important technical lemma. It means that a mass at 1 corresponds to a point mass at the worst tail, producing a jump at 0 in \(g\).

Lemma 2 A distortion \(g\) is continuous at \(0\) if and only if \(\mu\set{1} = 0\).

Proof. As an increasing, concave function defined on a convex set, \(g\) is necessarily continuous on the interior of its domain, \((0,1)\) Borwein and Vanderwerff (2010), Theorem 2.1.12. It is continuous at \(s=1\) by the Squeeze Theorem. Since \(g\) is concave with \(g(0)=0\) and \(g(1)=1\), it must satisfy \(g(s) \ge s\) for all \(s \in [0,1]\). Combined with the upper bound \(g(s) \le 1\), we have: \[ s \le g(s) \le 1 \implies \lim_{s \uparrow 1} g(s) = 1 = g(1). \] It can have a jump at zero of size \(g(0+) := \lim_{s\downarrow 0} g(s) = \mu\set{1}\) because of the Dominated Convergence Theorem applied to the decomposition of the measure. We decompose \(\mu\) into a part on \([0,1)\) and the atom at \(p=1\): \[ g(s) = \int_{[0,1)} 1\wedge \frac{s}{1-p} \mu(dp) + \mu\set{1} \cdot \set{s>0}. \] Take the limit as \(s \downarrow 0\). For the integral term, the integrand converges to 0 pointwise for every \(p < 1\). Since it is bounded by 1, the integral vanishes by the Dominated Convergence Theorem. For the atom term, \(\set{s>0}\) is constantly 1 as \(s \downarrow 0\). Thus, \(g(0+) = 0 + \mu(\{1\})\). \(\square\)

9 The Six Representations of \(g(X)\)

Here are the six equivalent ways to define the functional \(g(X)\), the main result reported in this post.

Theorem 1 Let \(g \in \mathcal{D}_c\) be continuous distortion with associated measure \(\mu\) and spectral derivative \(g'\), and define \(G_X(x):=1-g(S_X(x))\). Then, the following representations are equivalent: \[ \begin{aligned} g(X) &\stackrel{(a)}{=} \int_{[0,1)} \mathsf{TVaR}_p(X) \, \mu(dp) \\ &\stackrel{(b)}{=} \int_0^1 q_X(t) g'(1-t) \, dt \\ &\stackrel{(c)}{=} \int_0^1 q_X(\hat{g}(s)) \, ds \\ &\stackrel{(d)}{=} \int_0^1 G_X^{-1}(t) \, dt \\ &\stackrel{(e)}{=} \int_0^\infty g(S_X(x)) \, dx \\ &\stackrel{(f)}{=} \int X \, d(g \circ P) \end{aligned} \]

Proof. Throughout, \(X\ge 0\) is integrable, \(S_X(x)=\mathsf P(X>x)\), and \(q_X\) is the (left-continuous) quantile function. Because \(g\) is concave and increasing, it is absolutely continuous on compact subintervals of \((0,1)\), has a right-derivative \(g'_+\) a.e., and the Lebesgue–Stieltjes measure \(dg\) decomposes as \(dg = g'_+(u)\,du + dg_s\). To keep notation light, we write \(g'(u)\) for \(g'_+(u)\) and, when \(g\) has a singular part, interpret identities involving \(g'(u)\,du\) as the corresponding Stieltjes identities (replace \(g'(1-t)\,dt\) by \(d(g(1-t))\)). Under absolute continuity, the displayed formulas hold literally.

(b) \(\iff\) (e): Choquet/survival to spectral. For \(x\ge 0\), \[ g(S_X(x))=\int_0^{S_X(x)} g'(u)\,du =\int_0^1 \set{u<S_X(x)}\,g'(u)\,du. \] Insert into (e) and apply Tonelli/Fubini: \[ \int_0^\infty g(S_X(x))\,dx =\int_0^1 g'(u)\int_0^\infty \set{u<S_X(x)}\,dx\,du. \] Now \(u<S_X(x)\) is equivalent to \(P(X>x)>u\), i.e. \(F_X(x)<1-u\). Since \(X\ge 0\), \[ \int_0^\infty \set{F_X(x)<s}\,dx = q_X(s),\qquad s\in(0,1), \] because \(\{x\ge 0:F_X(x)<s\}=[0,q_X(s))\). With \(s=1-u\) this gives \[ \int_0^\infty \set{u<S_X(x)}\,dx = q_X(1-u), \] hence \[ \int_0^\infty g(S_X(x))\,dx =\int_0^1 g'(u)\,q_X(1-u)\,du =\int_0^1 q_X(t)\,g'(1-t)\,dt, \] after the substitution \(t=1-u\). This is (b).

(a) \(\iff\) (b): mixture to spectral. From the distortion-mixture representation \[ g(t)=\int_{[0,1)} \mathsf{tvar}_p(t)\,\mu(dp), \qquad \mathsf{tvar}_p(t)= 1\wedge \frac{t}{1-p}, \] differentiate under the integral (valid a.e.) to obtain, for a.e. \(u\in(0,1)\), \[ g'(u)=\int_{[0,u)} \frac{1}{1-p}\,\mu(dp). \] Substitute into (b) and apply Fubini: \[ \begin{aligned} \int_0^1 q_X(t)\,g'(1-t)\,dt &=\int_0^1 q_X(t)\int_{[0,1-t)}\frac{1}{1-p}\,\mu(dp)\,dt \\ &=\int_{[0,1)}\frac{1}{1-p}\int_0^{1-p} q_X(t)\,dt\mu(dp). \end{aligned} \] Finally, rewrite the inner integral as \(\int_p^1 q_X(s)\,ds\) (substitute \(s=t+p\)), giving \[ \begin{aligned} \int_0^1 q_X(t)\,g'(1-t)\,dt &=\int_{[0,1)} \frac{1}{1-p}\int_p^1 q_X(s)\,ds\mu(dp) \\ &=\int_{[0,1)} \mathsf{TVaR}_p(X)\,\mu(dp), \end{aligned} \] which is (a).

(b) \(\iff\) (c): change of variables via the dual–inverse. Let \(\check g(t)=1-g(1-t)\) and let \(\hat g\) be its upper inverse: \[ \hat g(s)=\sup\set{t\in[0,1]:\check g(t)\le s}. \] Then \(\check g(\hat g(s))=s\) for a.e. \(s\), and where \(\check g\) is differentiable we have \[ \frac{d}{dt}\check g(t)=g'(1-t). \] Using the substitution \(s=\check g(t)\) (equivalently \(t=\hat g(s)\)) yields \[ \int_0^1 q_X(\hat g(s))\,ds =\int_0^1 q_X(t)\,d\check g(t) =\int_0^1 q_X(t)\,g'(1-t)\,dt, \] with the middle expression understood as a Stieltjes integral when \(\check g\) is not absolutely continuous. This is (b).

(d) \(\iff\) (e): expectation under the distorted distribution. Let \(Y\) have distribution function \(G_X\). Then \[ \int_0^1 G_X^{-1}(t)\,dt = \mathsf{P}Y. \] By the tail-sum formula for \(Y\ge 0\), \[ \mathsf{P}Y=\int_0^\infty P(Y>x)\,dx=\int_0^\infty (1-G_X(x))\,dx =\int_0^\infty g(S_X(x))\,dx, \] which is (e).

(f) \(\iff\) (e): Choquet notation. For the capacity \(c(A)=g(P(A))\), the Choquet integral of \(X\ge 0\) is defined by \[ \int X\,dc := \int_0^\infty c(\{X>x\})\,dx =\int_0^\infty g(P(X>x))\,dx =\int_0^\infty g(S_X(x))\,dx. \] This is exactly (e), written as (f). \(\square\)

Dhaene et al. (2012) provides a rigorous treatment of the ideas and equivalences we have discussed.

10 Theoretical Foundations: From Extreme Points to Integral Representations

The structural correspondence between the set of concave distortions \(\mathcal{D}_c\) and the mixtures of TVaR functions relies on two fundamental results from convex analysis and functional analysis: the Krein-Milman Theorem and Choquet’s Representation Theorem.

Theorem 2 (Krein-Milman) Let \(K\) be a compact, convex subset of a locally convex topological vector space (e.g., equipped with the weak* topology). Then \(K\) is the closed convex hull of its extreme points: \[ K = \overline{\mathsf{conv}(\mathsf{Ext}(K))}. \]

In our context, \(\mathcal{D}_c\) is a convex, compact set under a natural topology (e.g., in the topology of pointwise convergence). Its extreme points are the TVaR distortions. The theorem guarantees that any concave distortion \(g \in \mathcal{D}_c\) can be approximated arbitrarily well by a finite convex combination (weighted average) of TVaR functions. This provides the existence of the representation of \(g\) in terms of TVaRs..

Theorem 3 (Choquet’s Representation Theorem) If \(K\) is a metrizable, compact, convex subset of a locally convex space, then for every \(x \in K\), there exists a probability measure \(\mu\) supported on the extreme points \(\mathsf{Ext}(K)\) such that \(x\) is the barycenter of \(\mu\). \[ x = \int_{\mathsf{Ext}(K)} e \, \mu(de). \]

This theorem refines Krein-Milman by replacing the “limit of finite sums” with a direct integral representation. It allows us to write any \(g\) as a continuous mixture of the extreme points: \[ g(t) = \int_{[0,1]} \mathsf{tvar}_p(t) \, \mu(dp). \]

While Choquet’s theorem guarantees the existence of the measure \(\mu\), it does not guarantee uniqueness in general. For that we must do more work. A compact convex set \(K\) is called a Choquet Simplex if the representing measure \(\mu\) is unique for every point in \(K\). Shapiro (2012) proves the set \(\mathcal{D}_c\) forms a Choquet Simplex. This uniqueness is crucial—it establishes the bijection (isomorphism) between the space of mixing measures \(\mathcal{M}\) and the space of concave distortions \(\mathcal{D}_c\), allowing us to treat the measure \(\mu\) as a unique “fingerprint” of the risk measure.

See Phelps (2002) and Simon (2011) for longer treatments of Choquet’s representation theorem.

11 Distortions as “Shrunken” Uniforms

Identifying \(g\) with a random variable on the unit interval provides insight into the nature of the \(g\) functional. Since any \(g \in \mathcal{D}_c\) is non-decreasing, \(g(0)=0\), and \(g(1)=1\), we can view it as the cumulative distribution function of a random variable \(G\) supported on \([0,1]\): \[ P(G \le t) = g(t). \]

In this framework the identity distortion \(g(t)=t\) corresponds to the standard uniform random variable \(U\) on \([0,1]\). The variable \(G\) acts as a re-weighting kernel. Its probability density function, \(\phi(t) = g'(t)\), provides the spectral weights applied to the loss quantiles.

The defining property \(\mathcal{D}_c\) is concavity. This geometric property implies First-Order Stochastic Dominance (FSD). Because \(g\) is concave and fixes the endpoints, the graph of \(g\) must lie above the chord \(y=t\). \[ g(t) \ge t \quad \forall t \in [0,1]. \] In terms of distribution functions, this inequality means \(G\) is stochastically smaller than \(U\) (denoted \(G \preceq_{st} U\)). I.e., \(G\) tends to take smaller values than the Uniform distribution. It concentrates probability mass closer to 0. (This works for the payoff sign convention; for the loss convention we need to swap to the right tail, which is why \(g'(1-t)\) appears.

Concavity of \(g\) implies that its derivative \(g'\) (the density of \(G\)) is non-increasing. This is a stronger condition than simple dominance. It means the re-weighting scheme is monotonically conservative: it assigns the highest weight to the worst outcomes (lowest \(t\), representing the tail of the survival function) and decreasing weight thereafter.

The proof (d) \(\iff\) (e) shows the functional \(g\) can be interpreted as simulating with respect to the pessimistic variable \(G\) rather than an objective uniform.

The identification \(G \sim g\) helps demystify the extreme points. Viewed as a CDF, the TVaR distortion describes a random variable that is Uniformly distributed on the sub-interval \([0, 1-p]\). Thus, the “basis vectors” of coherent risk measures are simply standard Uniform distributions that have been truncated and compressed into the worst-case region \([0, 1-p]\). Any coherent risk measure is constructed by taking a mixture of these compressed Uniforms.

References

Aliprantis, Charalambos D, and Kim Border, 2006, Infinite Dimensional Analysis: A Hitchhiker’s Guide (Springer Verlag).
Billingsley, Patrick, 2017, Probability and Measure (John Wiley & Sons).
Borwein, Jonathan M, and Jon D Vanderwerff, 2010, Convex Functions - Construction, Characterizations and Counterexamples (Cambridge University Press).
Dhaene, Jan, Alexander Kukush, Daniel Linders, and Qihe Tang, 2012, Remarks on quantiles and distortion risk measures, European Actuarial Journal 2, 319–328.
Follmer, Hans, Alexander Schied, Hans Föllmer, and Alexander Schied, 2016, Stochastic Finance: An Introduction in Discrete Time. Fourth. (Walter de Gruyter, Berlin, Boston).
Mildenhall, Stephen J, and John A Major, 2022, Pricing Insurance Risk: Theory and Practice (John Wiley & Sons, Inc.).
Phelps, Robert R, 2002, Lectures on Choquet’s Theorem (Springer).
Pollard, David, 2002, A User’s Guide to Measure Theoretic Probability. 8 (Cambridge University Press).
Shapiro, Alexander, 2012, On Kusuoka Representation of Law Invariant Risk Measures, Mathematics of Operations Research 38, 142–152.
Simon, Barry, 2011, Convexity: An Analytic Viewpoint. Vol. 187 (Cambridge University Press).