In Praise of VaR Part II
In Praise of VaR Part II: The Re-Arrangement Algorithm
“Can he have everything louder than everything else?” Ian Gillan, Deep Purple, Made In Japan
Last month’s column introduced three situations where Value at Risk (VaR) fails to be sub-additive and respect diversification:
- When the dependence structure is of a special, highly asymmetric form
- When the marginals have a very skewed distribution
- When the marginals are very heavy-tailed.
It showed that given two non-trivial marginal distributions \(X_1\) and \(X_2\) and a confidence level \(\alpha\) it is always possible to find a particular form of dependence where sub-additivity fails. This is surprising. It shows that dependence trumps characteristics of the marginal distributions. The column described the worst dependence structure. It takes the largest \(1-\alpha\) proportion of the observations from \(X_1\) and \(X_2\) and forms their crossed combination: the largest from \(X_1\) with the smallest from \(X_2\), the second largest with second smallest and so forth. There are no restrictions on the smallest \(\alpha\) proportion of values since they are irrelevant to the VaR computation.
Now consider the same problem with \(d>2\) marginals. Specifically, what is the range of possible values of \(\text{VaR}_{\alpha}\) of \(A = X_1 + \cdots + X_d\) when only the distributions of the marginals \(X_j\) are known, and not their joint multivariate distribution? Practitioners often address this problem.
It is common in capital modeling, operational risk modeling, and some kinds of catastrophe risk modeling that the univariate distributions \(X_j\) are understood quite well, but there is considerable uncertainty about their joint distribution. Joint distributions are very hard to quantify as \(d\) becomes large. There are \(d(d-1)\approx d^2\) bivariate relationships, and \(d^2\) quickly overwhelms 10 years of loss data. The problem goes beyond estimating association parameters, such as linear correlation or Kendall’s tau, because the marginal distributions can be combined using different copulas to yield the same association measures. As a result practitioners want to determine the range of \(\text{VaR}_{\alpha}(A)\) as the multivariate distribution of \((X_1, \dots, X_d)\) varies over all possible multivariate dependencies.
Standard practice often selects a base dependency structure, perhaps using coarse correlation coefficients such as multiples of 0.25, and a normal or \(t\) copula, and uses it to derive a baseline \(\text{VaR}_{\alpha}(A)\). It then stress tests the result by increasing or decreasing the coefficients. The resulting VaRs can be compared to those computed assuming the marginals are either independent or perfectly correlated using the comonotonic copula. The comonotonic copula ranks all the marginals from largest to smallest and combines in order, so VaR computed using the comonotonic copula is additive. However, as shown the last issue, the comonotonic copula does not give the worst possible outcome for \(d=2\) marginals, nor does it for \(d>2\). How much worse than additive can the VaR be for a given set of marginals?
Let \(X_j=(x_{1j}, x_{2j},\dots,x_{Mj})^t\), \(j=1,2, \dots, d\) be column vectors of equally-likely losses and let \(X=(x_{ij})\) be the \(M\times d\) matrix with columns given by \(X_j\). In cat model-speak the \(x_{ij}\) are samples from the yearly loss tables for \(d\) different peril-region combinations. \(\text{VaR}_{\alpha}(X_j)\) can be computed by sorting \(X_j\) from largest to smallest and selecting the \((1-\alpha)M\)th observation.
How should the observations from \(X_j\) be combined so that the VaR of the sum is as large as possible? We want to rearrange each column of \(X\) so that the \((1-\alpha)M\)th largest observation of the row-sum \(A\) is as large as possible.
First observe it is only necessary to combine the \((1-\alpha)M\) largest observations of each marginal. Any candidate combination that does not satisfy this condition can be made better, i.e. have a larger \((1-\alpha)M\) largest observation, by swapping combinations using observations outside the “top \((1-\alpha)M\)” with unused top \((1-\alpha)M\) entries. From hereon assume that \(X\) has been truncated to only include the \(N=(1-\alpha)M\) largest observations of each marginal.
When there are only two marginals it is plausible that the crossed arrangement is best: it does not “waste” any large observations by pairing them with other large values and hence maximizes the minimum sum. It is the perfectly negatively dependent arrangement of two marginal loss distributions. However, if there are \(d>2\) marginals we can’t combine largest with smallest in the same way. Having paired largest and smallest, what do we do for the third marginal? We can’t “make everything louder than everything else”.
The logic used in last issue’s column suggests a two part approach. First, any very large outlier observations should be combined with the smallest observations from all the other marginals. These values will be above the resulting VaR. Second, the middle-sized observations should be grouped together so that their sums are as close in value as possible. The least such value will be the VaR. These groupings minimize the variance of the sum. The Re-Arrangement Algorithm finds a near-optimal worst-VaR dependence structure that uses this approach.
The Re-Arrangement Algorithm was introduced in [1] and subsequently improved in [2]. The set up is as follows. Inputs are samples arranged in a matrix \(X = (x_{ij})\) with \(i=1,\dots, M\) rows corresponding to the simulations and \(j=1,\dots, d\) columns corresponding to the different marginals. These samples could be produced by a capital, catastrophe, or operational risk model, for example. We want to find the arrangement of the individual losses in each column producing the highest \(\text{VaR}_\alpha\) of the sum of losses for \(0<\alpha< 1\). Start by sorting each marginal in descending order and select just the top \(N:=(1-\alpha)M\) observations. For simplicity assume \(N\) is an integer. Only these top \(N\) values need special treatment, all the smaller values can be combined arbitrarily. Select a level of accuracy \(\epsilon>0\) for the algorithm.
Re-Arrangement Algorithm
- Randomly permute each column of \(X\), the \(N\times d\) matrix of top \(1-\alpha\) observations
- Loop:
- Create a new matrix \(Y\) as follows. For column \(j=1,\dots,d\)
- Create a temporary matrix \(V_j\) by deleting the \(j\)th column of \(X\)
- Create a column vector \(v\) whose \(i\)th element equals the sum of the elements in the \(i\)th row of \(V_j\)
- Set the \(j\)th column of \(Y\) equal to the \(j\)th column of \(X\) arranged to have the opposite order to \(v\), i.e. the largest element in the \(j\)th column of \(X\) is placed in the row of \(Y\) corresponding to the smallest element in \(v\), the second largest with second smallest, etc.
- Compute \(y\), the \(N\times 1\) vector with \(i\)th element equal to the sum of the elements in the \(i\)th row of \(Y\) and let \(y^*=\min(y)\) be the smallest element of \(y\) and compute \(x^*\) from \(X\) similarly
- If \(y^*-x^* \ge \epsilon\) then set \(X=Y\) and repeat the loop
- If \(y^*-x^* < \epsilon\) then break from the loop
- Create a new matrix \(Y\) as follows. For column \(j=1,\dots,d\)
- The arrangement \(Y\) is an approximation to the worst \(\text{VaR}_\alpha\) arrangement of \(X\).
Given that \(X\) consists of the worst \(1-\alpha\) proportion of each marginal, the required estimated \(\text{VaR}_\alpha\) will be the least row sum of \(Y\), that is \(y^*\). In implementation \(x^*\) is carried forward from the previous iteration and not recomputed. The statistics \(x^*\) and \(y^*\) can be replaced with the variance of the row-sums of \(X\) and \(Y\) and yield essentially the same results.
[3] report that while there is no analytic proof the algorithm always works, based on examples and tests where the answer can be computed analytically it performs very well.
Here is an example. Compute the worst \(\text{VaR}_{0.99}\) of the sum of lognormal distributions with mean 10 and coefficient of variations 1, 2 and 3. To solve, take a stratified sample of \(N=40\) observations at and above the 99th percentile for the matrix \(X\). Table \(\ref{tab-results}\) shows the input and output of the Re-Arrangement Algorithm.
\(x_0\) | \(x_1\) | \(x_2\) | Sum | \(s_0\) | \(s_1\) | \(s_3\) | Max VaR |
---|---|---|---|---|---|---|---|
49.0 | 85.6 | 107.9 | 242.5 | 87.1 | 124.6 | 141.1 | 352.8 |
49.4 | 86.6 | 109.5 | 245.6 | 70.8 | 113.6 | 169.3 | 353.7 |
49.9 | 87.7 | 111.2 | 248.8 | 98.8 | 127.9 | 127.4 | 354.1 |
50.3 | 88.9 | 112.9 | 252.1 | 79.9 | 118.8 | 155.5 | 354.1 |
50.7 | 90.0 | 114.7 | 255.5 | 83.1 | 107.1 | 164.3 | 354.5 |
51.2 | 91.3 | 116.6 | 259.1 | 92.1 | 139.7 | 122.8 | 354.6 |
51.6 | 92.6 | 118.6 | 262.8 | 67.7 | 135.4 | 151.5 | 354.7 |
52.1 | 93.9 | 120.6 | 266.6 | 108.8 | 116.1 | 129.8 | 354.7 |
52.6 | 95.3 | 122.8 | 270.7 | 62.8 | 105.1 | 186.9 | 354.8 |
53.2 | 96.7 | 125.0 | 274.9 | 63.9 | 170.6 | 120.6 | 355.0 |
53.7 | 98.3 | 127.4 | 279.3 | 69.2 | 111.3 | 174.6 | 355.1 |
54.3 | 99.9 | 129.8 | 284.0 | 72.7 | 144.5 | 138.1 | 355.3 |
54.9 | 101.5 | 132.4 | 288.8 | 59.9 | 101.5 | 194.1 | 355.5 |
55.5 | 103.3 | 135.2 | 293.9 | 127.5 | 103.3 | 125.0 | 355.8 |
56.1 | 105.1 | 138.1 | 299.3 | 60.8 | 162.6 | 132.4 | 355.9 |
56.8 | 107.1 | 141.1 | 305.0 | 66.3 | 109.1 | 180.5 | 355.9 |
57.5 | 109.1 | 144.4 | 311.1 | 61.8 | 149.8 | 144.4 | 356.0 |
58.3 | 111.3 | 147.9 | 317.5 | 65.0 | 155.8 | 135.2 | 356.0 |
59.1 | 113.6 | 151.5 | 324.3 | 74.8 | 121.6 | 159.7 | 356.1 |
59.9 | 116.1 | 155.5 | 331.5 | 77.1 | 131.5 | 147.9 | 356.5 |
60.8 | 118.8 | 159.7 | 339.3 | 59.1 | 179.9 | 118.6 | 357.5 |
61.8 | 121.6 | 164.3 | 347.7 | 58.3 | 99.9 | 202.0 | 360.1 |
62.8 | 124.6 | 169.3 | 356.7 | 57.5 | 191.1 | 116.6 | 365.3 |
63.9 | 127.9 | 174.6 | 366.4 | 56.8 | 98.3 | 210.9 | 366.0 |
65.0 | 131.5 | 180.5 | 377.0 | 56.1 | 96.7 | 221.0 | 373.9 |
66.3 | 135.4 | 186.9 | 388.7 | 55.5 | 205.1 | 114.7 | 375.4 |
67.7 | 139.7 | 194.1 | 401.5 | 54.9 | 95.3 | 232.7 | 382.9 |
69.2 | 144.5 | 202.0 | 415.7 | 54.3 | 223.3 | 112.9 | 390.5 |
70.8 | 149.8 | 210.9 | 431.6 | 53.7 | 93.9 | 246.3 | 393.9 |
72.7 | 155.8 | 221.0 | 449.5 | 53.2 | 92.6 | 262.5 | 408.2 |
74.8 | 162.6 | 232.7 | 470.1 | 52.6 | 248.7 | 111.2 | 412.5 |
77.1 | 170.6 | 246.3 | 494.0 | 52.1 | 91.3 | 282.3 | 425.7 |
79.9 | 179.9 | 262.5 | 522.3 | 51.2 | 288.1 | 109.5 | 448.8 |
83.1 | 191.1 | 282.3 | 556.5 | 51.6 | 90.0 | 307.2 | 448.9 |
87.1 | 205.1 | 307.2 | 599.4 | 50.7 | 88.9 | 340.0 | 479.6 |
92.1 | 223.3 | 340.0 | 655.5 | 50.3 | 87.7 | 386.6 | 524.6 |
98.8 | 248.7 | 386.6 | 734.1 | 49.9 | 366.9 | 107.9 | 524.7 |
108.8 | 288.1 | 461.1 | 858.0 | 49.4 | 86.6 | 461.1 | 597.2 |
127.5 | 366.9 | 615.7 | 1110.1 | 49.0 | 85.6 | 615.7 | 750.3 |
The table illustrates the worst-case VaR may be substantially higher than when the marginals are perfectly correlated, here 45 percent higher at 352.8 vs. 242.5. The form of the output columns shows the two part structure. There is a series of values up to 356 involving moderate sized losses from each marginal with approximately the same total. The larger values are then dominated by a single large value from one marginal combined with smaller values from the other two.
Performing the same calculation with \(N=1000\) samples from the largest 1 percent of each marginal produces an estimated worst VaR of 360.5. Figure 1 shows plots of the marginals with the worst VaR dependence structure, highlighting the same concepts shown in Table \(\ref{tab-results}\). The diagonal plots show histograms of the tail of each marginal. Working with even larger values of \(N\) does not change the result significantly. Figure 2 reveals the strange dependence between the marginals by plotting in three dimensions.
Just as for the case \(d=2\) there several important points to note about the Re-Arrangement Algorithm output and the failure of sub-additivity it induces.
- The algorithm works for any non-trivial marginal distributions—it is universal.
- The output is tailored to a specific value of \(\alpha\) and does not work for other \(\alpha\)s. It will actually produce relatively thinner tails for higher values of \(\alpha\) than the comonotonic copula. In Table 1 the comonotonic sum is greater than the maximum VaR sum for the top 40 percent of observations, above 366.4.
- The implied dependence structure only specifies how the larger values each marginal are related; for values below \(\text{VaR}_\alpha\) any dependence structure can be used.
- The dependence structure does not have right tail dependence.
The Re-Arrangement Algorithm is easy to program and can be applied in cases with hundreds of thousands of simulation and hundreds or more marginals. It gives a definitive answer to the question “Just how bad could things get?”, and perhaps provides a better base against which to measure diversification effect than either independence or the comonotonic copula. It is a valuable additional feature for any risk aggregation software. While the multivariate structure it reveals is odd, and specific to \(\alpha\), it can’t be dismissed as wholly improbable. It pinpoints a worst case driven by a combination of moderately severe, but not absolutely extreme, tail event outcomes. Anyone who remembers watching their investment portfolio during the financial crisis has seen that behavior before!