The Class of Bar-Lev-Enis-Tweedie Family Distributions

The class of power-variance family distributions jointly discovered by Bar-Lev and Enis, and Tweedie.
Author

Stephen J. Mildenhall

Published

2025-02-06

Modified

2025-03-03

All severity curves!

All severity curves!

[M]athematics is about a world that is not quite the same as this real one outside the window, but it is just as obstinate. You can’t make things do what you want them to. [T]he very fact that I can’t just boss these mathematical entities around is evidence that they really exist. John Conway quoted in .

On the name: Bar-Lev explains and provides supporting evidence that Bar-Lev and Enis was an “Independent and identical result” with the much-quoted (Jørgensen ) but rarely seen paper An index which distinguishes between some important exponential families published in “Statistics: Applications and new directions: Proc. Indian statistical institute golden Jubilee International conference (Vol. 579, pp. 579-604)”. Seems reasonable, hence the title.

1 Introduction

Everyone who’s modeled with GLMs has seen something like Table 1 describing the class of Tweedie Power Variance Family (PVF) distributions. This version is from the R statmod documentation.

Table 1: Exponential families with power variance functions.
Distribution Name Variance Function V(μ)
Gaussian μ0=1
Poisson μ
Compound Poisson, non-negative with mass at 0 μp, 1<p<2
Gamma μ2
Inverse Gaussian μ3
Stable, support on x>0 μp, p>2

Table 1 is fascinating and raises several questions:

  • What is a variance function?
  • Why these distributions?
  • Why this order?
  • What about 0<p<1? Or p<0?

The Bar-Lev-Enis-Tweedie PVF (BLETPVF) turns out to be a large, three-parameter class of distributions including the Poisson, gamma, and Gaussian distributions. Table 2 shows the full story and this post will gradually explain all its entries. The theory of BLETPVFs is an example of Conway’s quote: you can control the starting definitions but not where things go from there! It is also an example of Mathematical Tourism. For me, this theory is like visiting and walking in the Lake District: beautiful views an unexpected connections. It’s like the crossing from Wast Water over the Sty Head Pass between Great Gable and Scafell Pike into the stunning valley of Borrowdale.

This post has eight main parts plus an supplement on the software.

  1. Introduction – you’re here.
  2. Setting the Stage
  3. Theoretic framework
  4. The NEF Nonet
  5. Identifying the BLETPVF distributions
  6. Parameter interpretation
  7. For Four/Five Esses
  8. Actuarial modeling reconsidered
  9. Software supplement

The post borrows heavily from my Four Part Series on Minimum Bias and Exponential Family Distributions

Themes: unreasonable effectiveness of maths; mapping real world to maths world; hiding in plain sight: looks different but actually the same, nature of actuarial modeling, what is a unit?

The post uses several tools from my aggregate software, Mildenhall , including the FourierTools and the Tweedie classes. These leverage Fourier and Fast Fourier methods to estimate densities which have no closed form expression.

Terminology. The class of Bar-Lev-Enis-Tweedie power variance families consists of multiple natural exponential families. One family is the Tweedie Poisson-gamma compound distributions genus which we call a Tweedie distribution.

Bent’s uber paper ref Jørgensen .

1.1 The Full Story

We are going to explain and illustrate all the entries in Table 2.

Table 2: Properties of the Bar-Lev-Enis-Tweedie Power Variance (BLETPV) family distributions.

Table 2 uses the following terms, all of which are explained in due course.

  • The parameters p and α are related by p=(α2)/(α1) and α=(p2)/(p1), so (1p)(1α)=(p1)(α1)=1. They are restricted to the range [,2].
  • The distributions all have Lévy measure proportional to eθxxα+1 when α<2.
  • τ is the mean value mapping from the canonical parameter θ to the mean parameter μ, the mean of the reproductive form.
  • Θ is the set of canonical parameters θ so that eθyc(y)dy<.
  • Ω is the mean domain, that is, the set of possible means.
  • S is the support of the family. The trick here is distributions like the Poisson where Ω is a strict subset of S, because 0 is a legitimate observation but not an allowed mean.
  • Ω is a subset of C=conv(S), the closure of the convex hull of S.
  • A natural exponential family is regular if the canonical parameter domain Θ is open.
  • It is steep when Ω=C, i.e., any value in interior of the convex closure of the support is a possible mean.
  • The Levy 1/2-stable distribution is a limit case of the inverse Gaussian as θ0.

1.2 Lietfaden

Overall plan of post.

Overall plan of post.

1.3 Remember…

  • Change alpha-p diagram to say Landau not Cauchy
  • Have Tweedie class use Landau
  • Pitman and Pitman (Euler-Cauchy-Saalschütz integral - gamma with compensated exponential terms! Nice!
  • Bar-Lev and Enis talks about reproducibility
  • Bar-Lev, Bshouty, and Letac has a treatment of Tweedie family from perspective of self-decomposability.
  • Swap or symmetry reciprocal?
  • Letac dicusses V(μ)=eμ as a variance function and reciprocity. States the NEF generated by a positive stable distribution whose parameter is in (0,1) has no reciprocal.
  • Menn and Rachev covers the fancy Simpson’s method use of FFTs.
  • The papers by Thomas Simon on unimodality look interesting too, e.g., Simon .
  • Books (all in Mendeley)
  • Delong2021 - making more accessible
  • Bar-Lev Cauchy -> Landau distribution

2 Setting the Stage

static portfolios — loss or normalized loss, pure premium, loss ratio — defining a unit or line — heterogeneous units — GLM modeling

2.1 Volume and Loss Aggregation

In modeling insurance losses, volume is a central consideration that affects both properties of the chosen loss distribution and statistical parameter inference. Volume reflects two dimensions: the size of an individual risk and the time horizon over which losses are accumulated. The resulting expected volume of losses depends on the product of exposure size and time horizon.

Size is measured using an exposure base. Units for auto, value for property, revenue or area for general liability, contract value for surety, payroll for workers compensation are examples. Each has issues. Ideally, size would be directly proportional to expected losses per unit of time, and we presume we have an ideal measure of size, despite essentially none actually existing. Call size z.

Time horizon is measured in some time unit, usually years. However, the unit is an arbitrary choice ‒ an uncomfortable situation. Two time units are canonical: an instant and forever. The latter being too long, focus on the former. Turns out, it ties naturally to modeling in a multiple-state model or using a Lévy process, both of which are based on instantaneous transition rates @ REFs.

With volume in hand, modeling can look at total losses or losses per unit exposure (for a given time period), per time period (for a given exposure) or per unit per period. Losses are modeled by additive exponential dispersion models (EDMs) and pure premiums by reproductive EDMs. Some definitions are in order.

From Prob Models for Ins Research

Losses have a complicated, dynamic structure. An insurance portfolio’s loss depends on the expected loss rate and the length of time it is exposed, analogous to distance equals speed multiplied by time. Total losses L=L(e,t) over a period (0,t] are a function of exposure and time. Exposure is measured in homogeneous units, each with the equal expected loss rates. What is the best way to model L(e,t)?

Is it reasonable to assume losses occur at a constant rate? Generally, not. Even for earthquake losses, where events appear perfectly random, the occurrence time influences losses. Did the loss occur during the commute, working hours, nighttime, or over a weekend? There are two ways to address this reality. One, over the time frame of an insurance policy the rate averages out and can be treated as constant. Two, use a time change to equalize the claim rate. The time change, called an operational time, runs time faster during high expected loss periods and slower during low. It provides the perfect clock against which to earn premium. As always, what matters is the deviation from expected. We proceed assuming a constant expected rate of claim occurrence, justified by either of these rationales.

If insured risks are independent, we could posit that losses over (0,t1+t2] should have the same distribution as the sum of independent evaluations over (0,t1] and (0,t2]. But there could be good reasons why this is not the case. Weather, for example, is strongly autoregressive: weather today is the best predictor of the weather tomorrow. Failure of independence manifests itself in an inter-insured and inter-temporal loss dependency or correlation. However, using common operational time and economic drivers can explain much of this correlation; see Avanzi, Taylor, and Wong and the work of Glenn Meyers on contagion.

Having stipulated caveats for expected loss rate and dependency, various loss models present themselves. A homogeneous model says L(e,t)=eRt where Rt represents a common loss rate for all exposures. This model is appropriate for a stock investment portfolio, with position size e and stock return Rt. It implies the CV is independent of volume.

A homogeneous model has L(e,t)D(et,ϕ), a distribution with mean m=et and additional parameters ϕ. A GLM then models m as a function of a linear combination of covariates, often with e as an offset. We can select the time period so that the expected loss from one unit of exposure in one period of time equals 1. In this model, time and exposure are interchangeable. The loss distribution of a portfolio twice the size is the same as insuring for twice as long. Thus we can merge loss rate and time into one variable ν and consider L=L(ν). If ν is an integer then L(ν)=1iνXi where XiL(1) are independent. Therefore Var(L(ν))=νVar(L(1)) and the CV of L(ν) is CV(L(1))/ν. The CV of losses are not independent of volume, in contrast to the investment modeling paradigm. The actuary knows this: the shape of the loss distribution from a single policy (highly skewed, large probability of zero losses) is very different to the shape for a large portfolio (more symmetric, very low probability of zero losses.)

As always, reality is more complex. Mildenhall discusses five alternative forms for L(e,t). It shows that only a model with random operational time is consistent with NAIC Schedule P data, after adjusting for the underwriting cycle.

The focus of this section is to determine which distributions are appropriate for modeling homogeneous blocks with a constant loss rate relative to (operational) time. These distributions are used in predictive modeling for ratemaking and reserving, for example. Other applications, such as capital modeling and Enterprise Risk Management, model heterogeneous portfolios where understanding dependencies becomes critical. These are not our focus and are therefore ignored, despite their importance.

The ability to embed a static loss model L into a family, or stochastic process, of models L(ν) appropriate for modeling different volumes of business over different time frames is an important concept that is introduced in this section. Dynamic modeling leads to a compound Poisson process and, more generally, a Lévy process. We begin by recalling the aggregate loss model of collective risk theory.

2.2 The Aggregate Loss Model

An aggregate loss model random variable has the form Y=X1++XN where N models claim count (frequency) and Xi are independent, identically distributed (iid) severity variables. Aggregate loss models have an implicit time dimension: N measures the number of claims over a set period of time, usually one year. Aggregate loss models are the basis of the collective risk model and are central to actuarial science.

Expected aggregate losses equal expected claim count times average severity, E[Y]=E[N]E[X]. There is a tricky formula for the variance of Y; here is an easy way to remember it. If X=x is constant then Y=xN has variance x2Var[N]. If N=n is constant then X1++Xn has variance nVar(X) because Xi are independent. The variance formula must specialize to these two cases and therefore, replacing n with E[N] and x with E[X], suggests Var(Y)=E[X]2Var(N)+E[N]Var(X) is a good bet. By conditioning on N you can check it is correct.

2.3 Compound Poisson Distributions

A compound Poisson (CP) random variable is an aggregate loss model where N has a Poisson distribution. Let CP(λ,X)=X1++XN where N is a Poisson with mean λ and XiX are iid severities. We shall see that the Tweedie family is a CP where Xi have a gamma distribution.

By the results in the previous section E[CP(λ,X)]=λE[X]. Since Var(N)=E[N] for Poisson N and Var(X)=E[X2]E[X]2, we get Var(CP(λ,X))=E[N]E[X2].

CP distributions are a flexible and tractable class of distributions with many nice properties. They are the building block for almost all models of loss processes that occur at discrete times.

2.4 Diversifying and Non-Diversifying Insurance Growth

We want to understand the relationship between the variance of losses and the mean loss for an insurance portfolio as the mean varies, i.e., the variance function. Is risk diversifying, so larger portfolios have relatively lower risk, or are there dependencies which lower the effectiveness of diversification? Understanding how diversification scales has important implications for required capital levels, risk management, pricing, and the optimal structure of the insurance market. Let’s start by describing two extreme outcomes using simple CP models. Assume severity X is normalized so E[X]=1 and let x2:=E[X2].

Consider CP1(μ,X). The variance function is V(μ)=Var(CP1)=μx2, which grows linearly with μ. CP1 models diversifying growth in a line with severity X. Each risk is independent: increasing expected loss increases the expected claim count but leaves severity unchanged. CP1 is a good first approximation to growth for a non-catastrophe exposed line.

Alternatively, consider CP2(λ,(μ/λ)X) for a fixed expected event count λ. Now, the variance function is V(μ)=Var(CP2)=λ(μ/λ)2x2=μ2(x2/λ) grows quadratically with μ. The distribution of the number of events is fixed, as is the case in a regional hurricane or earthquake cover. Increasing portfolio volume increases the expected severity for each event. CP2 is a model for non-diversifying growth in a catastrophe-exposed line. It is the same as the position size-return stock investment model. Risk, measured by the coefficient of variation, is constant, equal to x2/λ, independent of volume. This is a homogeneous growth model.

We will see in Part IV that the Tweedie family of compound Poisson distributions interpolates between linear and quadratic variance functions.

A more realistic model adds uncertainty in μ, resulting in a variance function of the form μ(a+bμ) which is diversifying for small μ and homogeneous for larger μ.

2.5 The Moment Generating Function of a CP

Many useful properties of CP distributions are easy to see using MGFs. To compute the MFG of a CP we first to compute the MGF of a Poisson distribution. We need two facts: the Poisson probability mass function Pr(N=n)=eλλn/n! and the Taylor series expansion of the exponential function ex=n0xn/n!. Then MN(t)=E[etN]=n0etneλλnn!=eλn0(etλ)nn!=eλeλet=exp(λ(et1)) We can now compute the MGF of CP(λ,X) using conditional expectations and the identity xn=enlog(x): MCP(λ,X)(t)=E[etCP(λ,X)]=EN[EX[et(X1+XN)N]]=EN[(EX[etX])N]=EN[MX(t)N]=EN[eNlog(MX(t))]=MN(log(MX(t)))=exp(λ(MX(t)1)). A Poisson variable can be regarded as a CP with degenerate severity X1, which has MGF et. Thus the formula for the MGF of a CP naturally extends the MGF of a Poisson variable.

An example showing the power of MGF methods comes from the problem of thinning. We want to count the number of claims excess x where ground-up claims have a distribution CP(λ,X). Take severity to be a Bernoulli random variable B with Pr(B=1)=Pr(X>x)=p and Pr(B=0)=1p. B indicates if a claim exceeds x. CP(λ,B) counts the number of excess claims. It is called a thinning of the original Poisson. It is relatively easy to see that CP(λ,B)Poisson(λp) using the conditional probability formula Pr(CP=n)=knPr(CP=nN=k)Pr(N=k) (exercise). But it is very easy to see the answer using MGFs. By definition MB(t)=(1p)+pet, and therefore MCP(t)=MN(λ(MB(t)1))=MN(λ((1p+pet1)))=exp(λp(et1)).

The MGF of a mixture is the weighted average of the MGFs of the components. If X¯ is a p1,p2, p1+p2=1, mixture of X1 and X2, so FX¯(x)=p1F1(x)+p2F2(x), then MX¯(t)=etx(p1fX1(x)+p2fX2(x))dx=p1etxfX1(x)dx+p2etxfX2(x)dx=p1MX1(t)+p2MX2(t). This fact allows us to create discrete approximation CPs and convert between a frequency-severity and a jump severity distribution view. It is helpful when working with catastrophe model output.

2.6 The Addition Formula

CP distributions satisfy a simple addition formula CP(λ1,X1)+CP(λ2,X2)CP(λ,X¯) where λ=λ1+λ2 and X¯ is a mixture of independent X1 and X2 with weights λ1/λ and λ2/λ. The addition formula is intuitively obvious: think about simulating claims and remember two facts. First, that the sum of two Poisson variables is Poisson, and second, that to simulate from a mixture, you first simulate the mixture component based on the weights and then simulate the loss from the selected component.

It is easy to demonstrate the the addition formula using MGFs: MCP(λ1,X1)+CP(λ2,X2)(t)=MCP(λ1,X1)(t)MCP(λ2,X2)(t)=exp(λ1(MX1(t)1))exp(λ2(MX2(t)1))=exp(λ([(λ1/λ)MX1(t)+(λ1/λ)MX2(t)]1))=exp(λ(MX¯(t)1))=MCP(λ,X¯)(t). The addition formula is the MGF for a mixture in reverse. To sum of multiple CPs you sum the frequencies and form the frequency-weighted mixture of the severities.

The addition formula is handy when working with catastrophe models. catastrophe models output a set of events i, each with a frequency λi and a loss amount xi. They model the number of occurrences of each event using a Poisson distribution. Aggregate losses from the event i are given by CP(λi,xi) with a degenerate severity.

It is standard practice to normalize the λi, dividing by λ=iλi, and form the empirical severity distribution F^(x)=i:xixλi/λ. F^ is the λi weighted mixture of the degenerate distribution Xi=xi. By the addition formula, aggregate losses across all events are CP(λ,X^) where X^ has distribution F^.

2.7 Lévy Processes and the Jump Size Distribution

The CP model of insurance losses is very compelling. The split into frequency and severity mirrors the claims process. However, we could take a different approach. We could specify properties an insurance loss model must have, and then try to determine all distributions satisfying those properties. In the process, we might uncover a new way of modeling losses. The latter approach has been used very successfully to identify risk measures. For example, coherent risk measures emerge as those satisfying a set of axioms.

As we discussed in the introduction, in a basic insurance model losses occur homogeneously over time at a fixed expected rate. As a result, losses over a year equal the sum of twelve identically distributed monthly loss variables, or 52 weekly, or 365 daily variables, etc. If the random variable Y1 represents losses from a fixed portfolio insured for one year, then for any n there are independent, iid variables Xi, i=1,,n so that Y1=X1++Xn. Random variables with this divisibility property for every n1 are called infinitely divisible (ID). The Poisson, normal, gamma, negative binomial, lognormal, and Pareto distributions are all ID. This is obvious for the first four from the form of their MGFs. For example, the Poisson MGF shows eλ(et1)={eλn(et1)}n so the Poisson is a sum of n Poisson variables for any n. The same applies to a CP. Proving the lognormal or Pareto is ID is much trickier. A distribution with finite support, such as the uniform or binomial distributions, is not ID. A mixture of ID distributions is ID if the mixture distribution is ID.

Any infinitely divisible distribution Y can be embedded into a special type of stochastic process called a Lévy process so that Y has the same distribution as Y1. The process Yt shows how losses occur over time, or as volume increases, or both. Yt is a Lévy process if it has independent and identically distributed increments, meaning the distribution of Ys+tYs only depends on t and is independent of Ys. Lévy processes are Markov processes: the future is independent of the past given the present. The Lévy processes representation is built-up by determining the distribution of Yϵ for a very small ϵ and then adding up independent copies to determine Yt. The trick is to determine Yϵ.

Lévy processes are very well understood. See Sato for a comprehensive reference. They are the sum of a deterministic drift, a Brownian motion, a compound Poisson process for large losses, and a process that is a limit of compound Poisson processes for small losses. All of four components do not need to be present. Insurance applications usually take Yt to be a compound Poisson process Yt=X1++XN(t), where Xi are iid severity variables and N(t) is a Poisson variable with mean proportional to t.

When a normal, Poisson, gamma, inverse Gaussian, or negative binomial distribution is embedded into a Lévy processes, all the increments have the same distribution, albeit with different parameters. This follows from the form of their MGFs, following the Poisson example above. But it is not required that the increments have the same distribution. For example, when a lognormal distribution is embedded the divided distributions are not lognormal—it is well known (and a considerable irritation) that the sum of two lognormals is not lognormal.

The structure of Lévy processes implies that the MGF of Xt has the form E[esXt]=exp(tκ(s)) where κ(s)=logE[esX1] is cumulant generating function of X1. To see this, suppose t=m/n is rational. Then, by the additive property E[esXm/n]=E[esX1/n]m and E[esX1]=E[esX1/n]n, showing E[esX1]1/n=E[esX1/n]. Thus E[esXm/n]=E[esX1]m/n. The result follows for general t by continuity. Therefore, the cumulant generating function of Xt is κt(s)=tκ(s).

3 Theoretical Framework

dispersion models — DM and EDM — NEF — additive and reproductive

Figure 1 shows the relationship between dispersion models, regular proper DMs (not discussed), and reproductive and additive Exponential DMs. When the scale index is known an EDM becomes a NEF.

Figure 1: Dispersion models.

3.1 Dispersion Models

We start by defining some terminology around dispersion models (DMs). The idea is to generalize the normal with density p(y;μ,σ2)=12πσ2exp(12σ2(yμ)2).

Dispersion models have reproductive and additive parameterizations.

In the reproductive form, the natural parameter θ and the canonical statistic T(X) are structured so that sums of independent observations preserve the distributional form. If X1,,Xn are i.i.d. from an exponential family distribution, their sufficient statistic Tn=i=1nT(Xi) follows the same type of distribution with an updated natural parameter. The logic behind the name “reproductive” is that the sum of i.i.d. observations has the same distributional form, meaning the distribution is “reproduced” under summation.

Reproductive: The sum of independent observations retains the same family of distributions.

With the additive parameterization, the focus is on rewriting the exponential family form in a way that makes the natural parameter explicitly additive when combining independent observations. The key idea is that when summing independent variables, the natural parameters (or a transformation of them) add directly. The name “additive” comes from the fact that the parameter transformation aligns with addition when combining independent samples.

Additive: The natural parameters (or their transformation) combine via addition.

3.1.1 Reproductive dispersion models

A reproductive dispersion model with position parameter μ and dispersion parameter σ2 is a family with density of the form p(y;μ,σ2)=a(y;σ2)exp(12σ2d(y;μ)) for yCR and σ2>0, where d is a deviance function. Deviance functions generalize (yμ)2: d(yμ)>0 for yμ and d(y;y)=0, see Section 4.8. The model is called an exponential dispersion model ED(μ,σ2) if the deviance has the form d(y;μ)=yf(μ)+g(μ)+h(y). The parameter μ is called the mean value parameter and σ2 the dispersion parameter.

3.1.2 Natural exponential families

A natural exponential family (NEF) NE(μ) is a family of densities of the form c(y)exp(12d(y;μ)). A reproductive EDM with σ2 known is a NEF. In that case the density is c1(y)exp(yf1(μ)+g1(μ)) for suitable f1, g1, subsuming the exp(h(y)) factor and c(y).

3.1.3 Additive dispersion models

The additive form is Z=Yσ2=λY where λ=1/σ is called an additive exponential dispersion model. This is written ZED(θ,λ) where θ=f(μ) and λ=1/σ2. The parameters θ is called the canonical parameter and λ>0 the index parameter.

3.2 Volume and Loss Aggregation with EDMs

The choice of loss or pure premium aggregation determines whether losses are viewed in total or on a per-policy-per-period (what health actuaries call pppy, per person per year) averages, leading naturally to two related sets of models.

  1. Total Losses using an Additive Model. In many actuarial applications, the total loss across multiple policies or over an extended period is the primary object of interest. When losses are aggregated, the natural structure is additive, ED(θ,λ). The sum should retain the same distributional form: iED(θ,λi)ED(θ,iλi). The classic example is a compound Poisson (CP) model, where λi determine expected claim count of different units and θ determines the per claim severity distribution. CP models with the same severity add by adding expected frequencies, λi.

  2. Average Loss per Period using a Reproductive Model. In other settings, particularly in experience rating or loss ratio analysis, the focus shifts to per-unit per-period loss measures, ED(θ,λ)/λ calling for the reproductive structure, Y=ED(μ,σ2). The mean μ refers to the per-unit-per-period expected loss. The statistical properties of Y vary with context. If the unit is all losses from Mega Corp. Inc., over a year then μ could be very large and have a tight distribution (μ large, σ2 small). Conversely, for My Umbrella Policy with a $25M limit above $1M primary policies, losses per year have a very low mean but broad distribution (μ small, σ2 large). However, in both cases actuaries know that the average becomes less dispersed as more units (measured in like policy-years) are combined. Pure premiums and loss ratios combine using a weighted average approach. If the distribution for one policy-year has parameters μ and σ2 and distribution ED(μ,σ2), then the per policy-per year average of ni policy-years of identical exposures has distribution ED(μ,σ2/ni) and this is an assumption! These should combine 1niED(μ,σ2ni)ED(μ,σ2n) where n=ini is the total exposure modeled. The classic example here is adding normals derived as averages from different sample sizes, where the variance of the average from ni observations equals σ2/ni.

Exponential dispersion models support both views—total and per-unit losses—through its underlying structural properties.

3.3 Exponential Dispersion Models

An exponential dispersion model (EDM) is a collection of NEFs whose generator densities form a Lévy process. There are two kinds of EDM.

Let Zν be a Lévy process with density c(y;ν). The variable ν is called the index parameter. ν is used in place of t to be agnostic: it can represent expected loss volume or time or a combination. In the finite variance case, the distribution of Zν becomes less dispersed as ν increases. Therefore it is natural to define 1/ν as a dispersion parameter. For Brownian motion the dispersion parameter is usually denoted σ2. Throughout σ2=1/ν and the two notations are used inter-changeably. Zν and νZ1 have the same mean, but the former has varying coefficient of variation CV(Z1)/ν and the latter a constant CV. Zν is more (ν<1) or less (ν>1) dispersed than Z1. The difference between Zν and νZ1 are illustrated in the lower two panels of Figure 11.

The first kind of EDM is an additive exponential dispersion model. It comprises the NEFs associated with the generator distribution Zν. It has generator density c(z;ν) and cumulant generator κν(θ)=νκ(θ). Its members are all exponential tilts of c(z;ν) as ν varies. A representative distribution ZDM(θ,ν) has density c(z;ν)eθzνκ(θ). As ν increases, the mean of the generator distribution increases and it becomes less dispersed. The same is true for Z. The cumulant generating function of Z is K(t;θ,ν)=ν(κ(t+θ)κ(θ)). If ZiDM(θ,νi) and Z=Zi, then the cumulant generating function for Z is (iνi)(κ(t+θ)κ(θ)) showing Z has distribution DM(θ,iνi) within the same family, explaining why the family is called additive. Additive exponential dispersions are used to model total losses.

The second kind of EDM is a reproductive exponential dispersion model. It comprises the NEFs associated with the generator distribution of Yν=Zν/ν. Yν has a mean independent of ν, but this independence is illusionary since the mean can be adjusted by re-scaling ν. In the finite variance case, Yν has decreasing CV as ν increases. The densities of Y and Z are related by fY(y)=νfZ(νy). Hence a representative has density fY(y)=νc(νy,ν)eν(θyκ(θ)). The cumulant generating function of Y is K(t;θ,ν)=ν(κ(t/ν+θ)κ(θ)).

To end this section, let’s work out the impact of tiling on the Lévy measure J. In the last section, we saw that the cumulant generator of Z has the form κ(θ)=aθ+01(eθx1θx)j(x)dx+1(eθx1)j(x)dx where a is a location (drift) parameter. Rather than split the integral in two, write it as 0(eθx1θx1(0,1)(x))j(x)dx. where 1(0,1) is the indicator function on (0,1). The θ-tilt of Z has cumulant generating function Kθ(s)=κ(s+θ)κ(θ)=as+0(e(s+θ)xeθxsx1(0,1)(x))j(x)dx=as+0(esx1seθxx1(0,1)(x))eθxj(x)dx. The factor eθx in the compensation term can be combined with 1(0,1), resulting in an adjustment to the drift parameter. Thus the Lévy distribution of the tilted distribution is eθxj(x).

3.4 Connection to GLM modeling

We’re thinking the underlying units are characterized by a single size of loss distribution that is similar to a severity curve. DETAILS

4 The NEF Nonet

carrier or generating density — exponential tilting — cumulant generator — cumulant generating function — mean value mapping — variance function — log likelihood — unit deviance — density from deviance — Lévy distribution families

Material from The NEF Circle/Nonet.

This section describes the NEF Nonet which gives nine ways to define a NEF Section 3.1. The EDM index parameter is assumed known in a NEF.

Nine different ways of defining a NEF.

Nine different ways of defining a NEF.

4.1 The Generating Density

A generator or carrier is a real valued function c(y)0. (The argument is y for historical reasons, where y represented as response.) If c is a probability density, c(y)dy=1, then it is called a generating density. However, c(y)dy1 and even c(y)dy= are allowed. The generating density sits at the top of the circle, reflecting its fundamental importance.

How can we create a probability density from c? It must be normalized to have integral 1. Normalization is not possible when c(y)dy=. However, it will be possible to normalize the adjusted generator c(y)eθy when its integral is finite. To that end, define Θ={θc(y)eθydy<}. The Natural Exponential Family generated by c is the set of probability densities NEF(c)={c(y)eθyc(z)eθzdzθΘ} proportional to c(y)eθy. θ is called the natural or canonical parameter and Θ the natural parameter space. Naming the normalizing constant k(θ)=(c(y)eθydy)1 shows all densities in NEF(c) have a factorization c(y)k(θ)eθy, as required by the original definition of an exponential family.

There are two technical requirements for a NEF.

First, a distribution in a NEF cannot be degenerate, i.e., it cannot take only one value. Excluding degenerate distributions ensures that the variance of every member of a NEF is strictly positive, which will be very important.

Second, the natural parameter space Θ must contain more than one point. If c(y)=1π11+y2 is the density of a Cauchy, then Θ={0} because c has a fat left and right tails. By assumption, the Cauchy density does not generate a NEF.

The same NEF is generated by any element of NEF(c), although the set Θ varies with the generator chosen. Therefore, we can assume that c is a probability density. When c is a density c=1, k(0)=log(1)=0 and 0Θ.

Θ is an interval because κ is a convex function, Section 4.2. If c is supported on the non-negative reals then (,0)Θ. If Θ is open then the NEF if called regular. In general, Θ might contain an endpoint.

All densities in a NEF have the same support, defined by {yc(y)0} because eθy>0 and k(θ)>0 on Θ.

Many common distributions belong to a NEF, including the normal, Poisson and gamma. The Cauchy distribution does not. The set of uniform distributions on [0,x] as x varies is not a NEF because the elements do not all have the same support.

4.2 Cumulant Generator

Instead of the generator we can work from the cumulant generator or log partition function defined as κ(θ)=logeθyc(y)dy, for θΘ. The cumulant generator is the log Laplace transform of c at θ, and so there is a one-to-one mapping between generators and cumulant generators. The cumulant generator sits in the center of the circle because it is directly linked to several other components. In terms of κ, a member of NEF(c) has density c(y)eθyκ(θ).

The cumulant generator is a convex function, and strictly convex if c is not degenerate. Convexity follows from Hölder’s inequality. Let θ=sθ1+(1s)θ2. Then eθyc(y)dy=(eθ1y)s(eθ2y)1sc(y)dy(eθ1yc(y)dy)s(eθ2yc(y)dy)1s. Now take logs. As a result Θ is an interval. Hölder’s inequality is an equality iff eθ1y is proportional to eθ2y, which implies θ1=θ2. Thus provided Θ is not degenerate κ is strictly convex.

4.3 Exponential Tilting

The exponential tilt of Y with parameter θ is a random variable with density f(y;θ)=c(y)eθyκ(θ). It is denoted Yθ. The exponential tilt is defined for all θΘ. Tilting, as its name implies, alters the mean and tail thickness of c. For example, when θ<0 multiplying c(y) by eθy, decreases the probability of positive outcomes, increases that of negative ones, and therefore lowers the mean.

A NEF consists of all valid exponential tilts of a generator density c, and all distributions in a NEF family are exponential tilts of one another. They are parameterized by θ. We have seen they all have the same support, since eκ(θ)>0 for θΘ. Therefore they are equivalent measures. In finance, equivalent measures are used to model different views of probabilities and to determine no-arbitrage prices.

An exponential tilt is also known as an Esscher transform.

Exercise: show that all Poisson distributions are exponential tilts of one another as are all normal distributions with standard deviation 1. The tilt directly adjusts the mean. The cumulant generator of a standard normal is θ2/2 and for a Poisson(λ) it is λ(eθ1). ∎

4.4 Cumulant Generating Functions

The moment generating function (MGF) of a random variable Y is M(t)=E[etY]=etyf(y)dy. The MGF contains the same information about Y as the distribution function, it is just an alternative representation. Think of distributions and MGFs as the random variable analog of Cartesian and polar coordinates for points in the plane.

The moment generating function owes its name to the fact that E[Yn]=dndtnMY(t)|t=0, provided E[Yn] exists. That is, the derivatives of M evaluated at t=0 give the non-central moments of Y. The moment relationship follows by differentiating E[etY]=E[(tY)n/n!] through the expectation integral.

The MGF of a sum of independent variables is the product of their MGFs MX+Y(t)=E[et(X+Y)]=E[etXetY]=E[etX]E[etY]=MX(t)MY(t). Independence is used to equate the expectation of the product and the product of the expectations. Similarly, the MGF of a sum X1++Xn of iid variables is MX(t)n.

The cumulant generating function is the log of the MGF, K(t)=logM(t). The nth cumulant is defined as dndtnK(t)|t=0. Cumulants are additive for independent variables because KX+Y=logMX+Y=log(MXMY)=log(MX)+log(MY)=KX+KY. Higher cumulants are translation invariant because Kk+X(t)=kt+KX(t). The first three cumulants are the mean, the variance and the third central moment, but thereafter they differ from both central and non-central moments.

Exercise: show that all cumulants of a Poisson distribution equal its mean. ∎

The MGF of the exponential tilt Yθ in NEF(c) is M(t;θ)=E[etYθ]=etyc(y)eθyκ(θ)dy=eκ(θ+t)κ(θ). Therefore the cumulant generating function of Yθ is simply K(t;θ)=κ(θ+t)κ(θ).

4.5 The Mean Value Mapping

The mean of Yθ is the first cumulant, computed by differentiating K(t;θ) with respect to t and setting t=0. The second cumulant, the variance, is the second derivative. Thus {E[Yθ]=K(0;θ)=κ(θ) and Var(Yθ)=κ(θ). The mean value mapping (MVM) function is τ(θ)=κ(θ). Since a NEF distribution is non-degenerate, τ(θ)=κ(θ)=Var(Yθ)>0 showing again that κ is convex and that τ is monotonically increasing and therefore invertible. Thus θ=τ1(μ) is well defined. The function τ1 is called the canonical link in a GLM. The link function, usually denoted g, bridges from the mean domain to the linear modeling domain.

The mean domain is Ω:=τ(intΘ), the set of possible means. It is another interval. The NEF is called regular if Θ is open, and then the mean parameterization will return the whole family. But if Θ contains an endpoint the mean domain may need to be extended to include ±. The family is called steep if the mean domain equals the interior of the convex hull of the support. Regular implies steep. A NEF is steep iff E[Xθ]= for all θΘintΘ.

When we model, the mean is the focus of attention. Using τ we can parameterize NEF(c) by the mean, rather than θ, which is usually more convenient.

4.6 The Variance Function

The variance function determines the relationship between the mean and variance of distributions in a NEF. It sits at the bottom of the circle, befitting its foundational role. In many cases the modeler will have prior knowledge of the form of the variance function. Part I explains how NEFs allow knowledge about V to be incorporated without adding any other assumptions.

Using the MVM we can express the variance of Yθ in terms of its mean. Define the variance function by V(μ)=Var(Yτ1(μ))=τ(τ1(μ))=1(τ1)(μ). The last expression follows from differentiating τ(τ1(μ))=μ with respect to μ using the chain rule. Integrating, we can recover θ from V θ=θ(μ)=τ1(μ)=μ0μ(τ1)(m)dm=μ0μdmV(m). We can phrase this relationship as θμ=1V(μ) meaning θ is a primitive or anti-derivative of 1/V(μ). Similarly, κμ=κ(θ(μ))V(μ)=μV(μ), meaning κ(θ(μ)) is a primitive of μ/V(μ).

V and Θ uniquely characterize a NEF. It is necessary to specify Θ, for example, to distinguish a gamma family from its negative. (V,Θ) do not characterize a family within all distributions. For example, the family kX for X with E[X]=Var(X)=1 has variance proportional to the square of the mean, for any X. But the gamma is the only NEF family of distributions with square variance function.

4.7 Log Likelihood for the Mean

The NEF density factorization implies the sample mean is a sufficient statistic for θ. The log likelihood for θ is l(y;θ)=log(c(y))+yθκ(θ). Only the terms of the log likelihood involving θ are relevant for inference about the mean. The portion yθκ(θ) is often called the quasi-likelihood.

Differentiating l respect to θ and setting equal to zero shows the maximum likelihood estimator (MLE) of θ given y solves the score equation yκ(θ)=0. Given a sample of independent observations y1,,yn, the MLE solves y¯κ(θ)=0, where y¯ is the sample mean.

Using the mean parameterization f(y;μ)=c(y)eyτ1(μ)κ(τ1(μ)) the log likelihood of μ is l(y;μ)=log(c(y))+yτ1(μ)κ(τ1(μ)). Differentiating with respect to μ, shows the maximum likelihood value occurs when lμ=lθθμ={yκ(τ1(μ))}1V(μ)=yμV(μ)=0 since κ(τ1(μ))=τ(τ1(μ))=μ. Thus the most likely tilt given y has parameter θ determined so that E[Yθ]=y. Recall, l/μ is called the score function.

In a NEF, the maximum likelihood estimator of the canonical parameter is unbiased. Given a sample from a uniform [0,x] distribution, the maximum likelihood estimator for x is the maximum of a sample, but it is biased low. The uniform family is not a NEF.

4.8 Unit Deviance

A statistical unit is an observation and a deviance is a measure of fit that generalizes the squared difference. A unit deviance is a measure of fit for a single observation.

Given an observation y and an estimate (fitted value) μ, a unit deviance measures of how much we care about the absolute size of the residual yμ. Deviance is a function d(y;μ) with the similar properties to (yμ)2:

  1. d(y;y)=0 and
  2. d(y;μ)>0 if yμ.

If d is twice continuously differentiable in both arguments it is called a regular deviance. d(y;μ)=|yμ| is an example of a deviance that is not regular.

We can make a unit deviance from a likelihood function by defining d(y;μ)=2(supμΩl(y;μ)l(y;μ))=2(l(y;y)l(y;μ)), provided yΩ. This is where we want steepness. It is also where we run into problems with Poisson modeling since y=0 is a legitimate outcome but not a legitimate mean value. For a steep family we know the only such values occur on the boundary of the support, generally at 0. The factor 2 is included to match squared differences. l(y;y) is the likelihood of a saturated model, with one parameter for each observation; it is the best a distribution within the NEF can achieve. d is a relative measure of likelihood, compared to the best achievable. It obviously satisfies the first condition to be a deviance. It satisfies the second because τ is strictly monotone (again, using the fact NEF distributions are non-degenerate and have positive variance). Finally, the nuisance term log(c(y)) in l disappears in d because it is independent of θ.

We can construct d directly from the variance function. Since dμ=2lμ=2yμV(μ) it follows that d(y;μ)=2μyytV(t)dt. The limits ensure d(y;y)=0 and that d has the desired partial derivative wrt μ. The deviance is the average of how much we care about the difference between y and the fitted value, between y and μ. The variance function in the denominator allows the degree of care to vary with the fitted value.

Example. When d(y;μ)=(yμ)2, d/μ=2(yμ) and hence V(μ)=1. ∎

We can make a deviance function from a single-variable function d via d(y;μ)=d(yμ) provided d(0)=0 and d(x)0 for x0. d(x)=x2 shows square distance has this form. We can then distinguish scale vs. dispersion or shape via d(yμσ)  vs.  d(yμ)σ2. Scale and shape are the same in a normal-square error model. Part III shows they are different for other distributions such as the gamma or inverse Gaussian. Densities with different shape cannot be shifted and scaled to one-another.

4.9 Density From Deviance

Finally, we can write a NEF density in terms of the deviance rather than as an exponential tilt. This view further draws out connections with the normal. Starting with the tilt density, and parameterizing by the mean, we get f(y;τ1(μ))=c(y)eyτ1(μ)κ(τ1(μ))=el(y;μ)=ed(y;μ)/2+l(y;y)=c(y)ed(y;μ)/2 where c(y):=el(y;y)=c(y)eyτ1(y)κ(τ1(y)).

Although it is easy to compute the deviance from V, it is not necessarily easy to compute c. It can be hard (or impossible) to identify a closed form expression for the density of a NEF member in terms of elementary functions. This will be the case for the Tweedie distribution.

4.10 Completing the NEF Nonet

Figure 2: Relationships between actors for a NEF defined by the generating density c. If the NEF is steep (e.g., if it is regular) then the condition yΩ in the unit deviance can be dropped. If the NEF is not steep then the l(y;y) may not be defined and must be replaced with supθΘyθκ(θ) in the unit deviance. conv(S) denotes the smallest convex set (i.e., interval) containing S. For the Poisson S={0,1,2,} and C=[0,). Θ˚ denotes the interior of Θ.

Figure 2 summarizes the terms we have defined and their relationships. Now, let’s use the formulas to work out the details for two examples: the gamma and the general Bar-Lev-Enis-Tweedie Power Variance Family (PVF). Completed nonets are shown below for the normal Figure 3, Poisson Figure 4, gamma Figure 5, inverse Gaussian Figure 6, and (anticipating a little), the Bar-Lev-Enis-Tweedie PVF Figure 7.

This section presents an algorithm to compute each element of the NEF Circle from a variance function, as well as a less formulaic approach starting with the generator density. The formal approach is described in Jørgensen and Letac .

4.10.1 Starting From the Variance Function.

The variance function V for a NEF variable Y satisfies Var(Y)=V(E[Y]). It is independent of the parameterization used for Y because it only expresses how the variance behaves as a function of the mean. The mean is denoted μ. There is a one-to-one relationship between values of the canonical parameter θ and values of μ given by τ(θ):=κ(θ)=μ. Thus we can consider the family parameterized by θ or μ. To complete the NEF Circle starting from V:

  1. Integrate (τ1)(μ)=1/V(μ) to determine the canonical parameter θ=τ1(μ) as a primitive of 1/V(μ) θ(μ)=τ1(μ)=(τ1)(μ)dμ=dμV(μ).
  2. Rearrange to obtain μ=κ(θ) as a function of θ.
  3. Integrate κ(θ) to determine the cumulant generator κ(θ). Change variables μ=κ(θ), dμ=κ(θ)dθ, to see κ(θ) is a primitive of μ/V(μ): κ(θ)=κ(θ)dθ=μV(μ)dμ.
  4. The cumulant generating function is Kθ(t)=κ(θ+t)κ(θ).
  5. The deviance can be computed directly as d(y;μ)=2μyymV(m)dm=l(y;y)l(y;μ). Notice that equally d(y;μ)=2{yθ(y)κ(θ(y))(yθ(μ)κ(θ(μ)))} using the results of Steps 1 and 3. As a result, l(y;μ)=yθ(μ)κ(θ(μ)) up to irrelevant factors.

This algorithm can always be computed numerically. It can run into problems if the functions in Steps 1 and 3 are not integrable, or in Step 2 if θ cannot be inverted.

4.10.2 Starting from the Density.

  1. Starting with the density or probability mass function, find the factorization c(y)eθyκ(θ). in terms of the original parameterization.
  2. Identify θ as a function of the original parameters.

Working from the density is less algorithmic, but is easier if the form of the density is known. Note that you can then confirm V(μ)=κ(τ1(μ)).

We now present several examples of these techniques.

4.11 NEF Nonet for the Gaussian distribution

Figure 3: The normal NEF, with σ2=1. The normal distribution has base measure given by a standard normal. It is the archetype, corresponding to squared distance deviance and the constant variance function. Scale and shape coincide.

Let YN(μ,σ2), with density p(y;μ,σ2)=12πσ2exp(12σ2(yμ)2)=12πσ2exp(y22σ2)exp(1σ2{yμ12μ2}) take θ=μ and λ=1/σ2 to see this is a reproductive EDM with f(μ)=μ, unit cumulant κ(θ)=θ2/2. The mean value mapping is τ(θ)=μ and the unit variance if V(μ)=1.

The additive form Z=Yσ2=λY where λ=1/σ has ZN(θλ,λ) since the variance of Z equals λ2σ=λ.

Additive convolution is N(θλ1,λ1)+N(θλ2,λ2)=N(θ(λ1+λ2),λ1+λ2).

Reproductive convolution, YiN(μ,σ2/wi) and w=iwi, then 1wIwiYiN(μ,σ2w)

4.12 NEF Nonet for the Poisson distribution

Figure 4: The Poisson NEF. Counting base measure, κ(θ)=logneθn/n!. The Poisson, like the normal, has no shape parameter. But because it is defined on the non-negative integers, scaling changes the support and creates in the over-dispersed Poisson model.

4.13 NEF Nonet for the gamma distribution

Figure 5: The Gamma NEF with shape parameter 1. τ1(μ)=μ1 and V(μ)=1/(τ1)(μ)=μ2. Note θ>0 for y<0 is another solution.

Let YGa(μ,α), with known shape parameter α. E[Y]=μ and Var(Y)=μ2/α, so α is 1/CV2. The variance function is simply V(μ)=μ2/α. TODO: TIE TO EARLIER!!

The shape and rate parameters are α and β=α/μ. The density is βαΓ(α)yα1eβy=ααμαΓ(α)yα1eyα/μ

Gamma: Starting From the Variance Function.

  1. Integrate 1/V to determine θ θ=dμV(μ)=αμ2dμ=αμ.
  2. Invert, to obtain μ=τ(θ)=αθ.
  3. Integrate τ(θ) to determine the cumulant generator κ(θ)=τ(θ)dθ=αθdθ=αlog(θ). Beware: if F(x)=f(x)dx then f(x)dx=f(y)dy=F(y)=F(x), substituting y=x.
  4. The cumulant generating function is Kθ(t)=κ(θ+t)κ(θ)=α{log(θt)+log(θ)}=αlog(1+tθ). Exponentiating yields the MGF of the gamma; note θ<0.
  5. The deviance is d(y;μ)=α{yμμlogyμ} and l(y;μ)=y/μlogμ up to irrelevant factors.

Gamma: Starting from the Density.

  1. Factorizing the probability mass function as yα1Γ(α)(αμ)αeyα/μ=yα1Γ(α)exp(yαμ+αlogαμ)=yα1Γ(α)exp(yθ(αlog(θ))) where θ=α/μ is the canonical parameter and κ(θ)=αlog(θ)

4.13.1 Relationship with scipy.stats distribution

Start with density given by ψλΓ(λ)yλeψyy for positifve ϕ, λ. Define μ=λ/ψ (the mean) and σ2=1/λ (the squared coefficient of variation), allows writing p(y;μ,σ2)=a(y;σ2)exp(12σ22{yμlogyμ1}) where a(y;1/λ)=λλeλΓ(λ)1y. This shows that the gamma is an EDM with unit deviance d(y;μ)=2{yμlogyμ1} (f(μ)=2/μ, g(μ)=logμ, h(y)=logy)

The behavior of the [gamma] densities is fairly tyhpical for many dispersion models. For small values of σ2, the density is clearly unimodal with mode point near μ. … [T]he case where the dispersion is [so] large that the interpretation of μ as “position” becomes blurred, essentially becaue the large value of the dispersion parametrer squeezes a lot of probability mass down towards the origin. What we mean by a “large” value of the dispersion paramter here is thus a value for which the non-normality of the distribution becomes evident.

The exponential (gamma with λ=1) is clearly “non-normal”.

μ is the mean and σ2=1/λ is the squred CV. The canonical parameter is θ=1/μ and unit cumulant is κ(θ)=log(θ). The reproductive form Ga(μ,σ2) has density p(y;μ,σ2)=λλΓ(λ)yλyexp(λ{yμ+logμ}) The gamma satisfies an important scaling identity cGa(μ,σ2)=Ga(cμ,σ2). The duality identity Z=Yλ gives Ga(θ,λ)=Ga(λθ,1λ) and the additive density is p(z;θ,λ)=zλ1Γ(λ)exp(zθ+λlog(θ))=(θ)λΓ(λ)zλezθz.

4.14 NEF Nonet for the inverse Gaussian distribution

Figure 6: The inverse Gaussian NEF with shape parameter 1. τ1(μ)=1/(2μ2) and V(μ)=1/(τ1)(μ)=μ3. Compute d by integrating (yt)/V(t).

The reproductive density is p(y;θ,λ)=λ2πy3exp(λ2y+λ(θy+(2θ)1/2), a reproductive EDM with κ(θ)=2θ and τ(θ)=μ=12θ. θ=0 is valid, giving the density p(y;0,λ)=λ2πy3exp(λ2y) of a scaled Lévy stable. The deviance is d(y;μ)=(yμ)2μ2y.

5 Identifying the Bar-Lev-Enis-Tweedie Power Variance Family Distributions

aka The NEF Nonet for the Bar-Lev-Enis-Tweedie PVF Family, p0,1,2,3

identifying BLETPVF members — hiding in plain sight - aggregation operator — tilting commutes with aggregation (ASTIN FFT paper)

From Tweedie post

Figure 7: The NEF circle for the PVF families, p{1,2}, α=(p2)/(p1), (α1)(p1)=1. Taking α=2, p=0 produces the normal distribution, α=0, p=2 the gamma, and α=, p=1 the Poisson.

5.1 Solving for the Cumulant Generating Function

For any NEF we know κ(θ)=τ(θ)=μ and κ(θ)=τ(τ1(μ))=1/(τ1)(μ))=V(μ), by the results in Part II. For the power variance family, we can integrate (τ1)(μ)=1/V(μ)=μp to get θ=τ1(μ)=μ1p1p, provided p1. Rearranging determines κ(θ)=μ={(1p)θ}11p. We can integrate this expression for κ(θ) and, for p2, obtain κ(θ)=(1p)11p θ2p1p/2p1p=(1p)2p1p θ2p1p2p=12p {(1p)θ}2p1p since 11p+1=2p1p. Substituting α=2p1p, and noting p=α2α1, 1p=1α1, and 12p=α1α, yields κ(θ)=α1α(θα1)α. For reference, the most basic stable distribution with tail parameter α has cumulant generating function γ|θ|α for a scale constant γ, which has the same form.

The two excluded cases, p=1,2, work out as follows. When p=1, α=. The first integral yields θ=log(μ) and the second κ(θ)=eθ, giving a Poisson. And when p=2, α=0. The first integral yields θ=μ1 and the second κ(θ)=log(μ), giving a gamma.

One further case will occur, which in a sense corresponds to p=. It turns out the correct interpretation is V(μ)=eμ. Then, the first integral yields θ=eμ and the second κ(θ)=θθlog(θ). This will give an extreme Cauchy distribution.

To summarize: the cumulant generator for the power variance family is given by κp(θ)={ α1α(θα1)αp1,2; α=2p1p eθp=1; α= log(θ)p=2; α=0. θθlog(θ)p=;α=1 where the subscript indicates dependence on p. The normal, α=2, p=0 is a special case of the first row.

The canonical parameter domain Θp is the largest interval real numbers for which κp(θ) is finite. Therefore Θp={ (,)p=0,α=2; p=1,α= [0,)p<0,1<α<2; 0<p<1,α>2 (,0)1<p2,α0 (,0]2<p<; 0<α<1 (,0]p=; α=1. In the last row we allow θ=0 because limθ0θlog(θ)=0.

In general, a member of the NEF generated by V has cumulant generating function Kθ(t)=κ(θ+t)κ(θ). Assuming p1,2 and substituting for κ produces Kθ(t)=α1α{(θ+tα1)α(θα1)α}=α1α(θα1)α{(1+tθ)α1}. Which distributions correspond to these cumulant generating functions? We distinguish the following situations:

  1. when 0Θp we identify Kθ, and
  2. when 0Θp we identify K0(t)=κ(t), with sub-cases:
    1. the normal and Poisson, where 0 is in the interior of Θp and
    2. the second, fourth and fifth rows where 0 is a boundary point of Θp. These NEFs are not regular and θ=0 corresponds to a distribution with infinite mean that is not part of the family.

Figure 8 provides a handy reference of the relationship between p and α and the corresponding distributions. The analysis below is ordered by α (along the x-axis), starting with α=2, the normal. The line colors show the range of α values for extreme stable (1<α<2), positive extreme stable (0<α<1) and Tweedie α<0) families. No distributions correspond to 0<p<1, α>2 on the extreme right.

Figure 8: The power variance families.

5.2 Identifying Bar-Lev-Enis-Tweedie PVF Distributions

  • When α=2 and p=0, then θ=0Θp and K0(t)=κ(t)=12t2, which is the cumulant function for the normal distribution. As expected, V(μ)=μ0 is constant. This NEF is regular because 0 is an interior point of Θp.

  • When 1<α<2 and p<0, then θ=0Θp and K0(t)=κ(t)=α1α(tα1)α, which is an α-stable distribution with Lévy density |x|α1 on x<0. It falls into the compensated IACP, case 3 group, discussed in Part III, and takes any real value, positive or negative, despite only having negative jumps. It has a thick left tail and thin right tail. Its mean is zero, but the variance does not exist. A tilt with eθ, θ>0 makes the left tail thinner, the right tail thicker, and increases the mean. The effect on the right tail is manageable because it is thinner than a normal, Zolotarev , Carr and Wu . As θ0 the mean decreases to zero and the variance increases, finally becoming infinite when θ=0.

  • When α=1 and p=, then θ=0Θp. Here we interpret V to be the exponential variance function and have seen K0(t)=κ(θ)=θ(1log(θ)), corresponding to an extreme stable distribution with α=1, Jørgensen and Eaton, Morris, and Rubin . The NEF exhibits a kind of super-contagion between risks, resulting in an exponential increase in variance with the mean. The distribution is still case 3 and is supported on the whole real line.

  • When 0<α<1 and p>2, then 0Ωp and K0(t)=κ(t)=1αα(t1α)α, which is an extreme stable with Lévy distribution xα, x>0. The distribution is very thick tailed and does not have a mean (κ(θ) is not differentiable at 0 because it has a cusp). In the case α=1/2 we get the Lévy stable distribution and the inverse Gaussian NEF.

  • When α=0 and p=2, then 0Ωp and we analyze Kθ(t)=κ(t+θ)κ(θ)=log((t+θ))+log(θ)=log(1+tθ)=log(1tθ) which is a gamma distribution with rate θ and shape 1.

  • When α<0 and 1<p<2, then 0Ωp and we analyze Kθ(t)=α1α(θα1)α{(1+tθ)α1} for θ<0. The value θ/(α1) is positive and (1t/(θ))α is the MGF of a gamma with shape α and rate θ. Thus Kθ(t) is the cumulant generating function for a CP with expected frequency α1α(θα1)α>0 and gamma severity. Distributions with 1<p<2 are called Tweedie distributions. Finally, we have demonstrated why Tweedie distributions are a compound Poisson with gamma severity!

  • When α= and p=1, then 0 is an interior point of Ωp and the NEF is again regular, like the normal. Now K0(t)=κ(t)κ(0)=et1 is the cumulant function for the Poisson distribution.

  • Finally, when α>2 and 0<p<1, there is no corresponding NEF. Arguing by contradiction, suppose there is a NEF with V(μ)=μp, 0<p<1. Since θ=0Θp the NEF has a generating distribution with variance κ(0)=(θα1)α2|θ=0=0. A distribution with zero variance is degenerate and therefore has a cumulant generating function ect for some constant c. But this contradicts the known form of κ. Therefore, no such NEF can exist. This result was first proved in Jørgensen . Considering a possible NEF with variance function in this range reveals it would be very odd: its variance explodes, relative to the mean, as the mean increases from zero because V is not differentiable at zero. It is hard to conceive how such a thing would occur, and Jorgensen’s result confirms that it cannot.

We have now covered all the possible values for p and α and identified all the distributions in the power variance family. The findings are summarized in Table 2.

We can identify the Lévy measure underlying power variance NEFs as a corollary of this analysis Vinogradov . Under exponential tilting, a Lévy density j transforms into eθxj(x). Use smoothing function 1[0,1]. Then, FILL IN eθx(eitx1itx)j(x) From the classification given in the first section, we see that all PVF exponential distributions except the normal and Poisson have a jump density proportional to (1)j(x)=xα1eθx for α<2 and varying θ. When α=2, the normal case, j is interpreted as the Dirac δ0 measure: there are no jumps in a Brownian motion. When α=, the Poisson case, j is interpreted as δμ where θ grows with α so that μ=limα(2α)/(θ) is fixed. This implied tilting normalizes xα1eθx to have mean μ, see REF TWEEDIE CP.

In a sense, Equation 1 distribution can be regarded as a universal severity distribution. When α0, the Tweedie range, then θ<0 to ensure j is finite. When α>0, θ=0 defines a legitimate, stable distribution. The common jump density binds all PVF distributions together even though their individual densities are quite different. It should be noted that there was no a priori reason to expect the power variance function distributions have power Lévy jump distributions in this way IS THERE??

Figure 7 shows the completed NEF Circle for the PVF for p{0,1,2}. The deviance, log likelihood, and cumulant generator are also correct for p=0, α=2, but the density obviously becomes the normal distribution density. The Lévy measure is added at the top to unify the diagram. Series expansions are available for c and the density, see Jørgensen Section 4.2 or Sato Section 14.

5.3 Fact 1: MGF of a Poisson

If NPo(λ) is Poisson, its MGF is MN(t)=E[etN]=n0etnλneλn!=eλn0(etλ)nn!=eλeλet=eλ(et1).

5.4 Fact 2: form of gamma Lévy measure

A shape / rate gamma has density f(x)=βαΓ(α)xα1eβx, mean α/β, and variance α/β2. If α is an integer it is the distribution of a sum of α independent exponentials with mean 1/β, which gives the waiting time for α particles when β particles are expected per unit time with a Poisson distribution. Obviously, the waiting time for one particle has mean 1/β. The ch f is f(t)=(1tβ)α which shows that gammas with the same rate form an additive family.

The Lévy measure of the gamma has density j(x)=αeβxx on (0,), which has the correct expectation 0αeβxdx=αβ. The infinite activity compound Poisson has log ch f given by 0(etx1)j(x)dx=α0(etx1)eβxxdx That this equals the log ch f of the gamma follows from some trickery . First, 0tduβu=1β0tdu1u/β=log(1u/β)|0t=log(1t/β). Next, 0tduβu=0t0ex(βu)dxdu=00tex(βu)dudx=0exβ0texududx=0(ext1)exβxdx. Combining the two, multiplying by α, and comparing with the gamma ch f gves the desired result.

5.5 Fact 3: about gamma functions

The gamma function is defined for Re(a)>0 by Γ(a)=0xa1exdx=0xaexdxx using Haar measure on R+.

Factorial property: Γ(z+1)=zΓ(z) follows using integration by parts.

Let 0<a<1 be real. Then, applying the parts trick, noting that the term ex1 removes the pole at zero, and counting three 1s in the second term 0(ex1)dxxa+1=1aex1xa|01a0x1aexdxx=1aΓ(1a)=Γ(a) where the last line uses (a)Γ(a)=Γ(1a) from the factorial property. This integral is negative.

Let 1<a<2 be real. Since 0<a1<1, we can use the previous result to boostrap 0(ex1+x)dxxa+1=1a0(ex1)dxx(a1)+1=1a1a1Γ(2a)=Γ(a). The adjusted exponential term cancels the higher-order pole. Note the sign change from differentiating the exponential term. This integral is positive.

5.6 Fact 4: form of ch f in Lévy decomposition

Let J be a jump exceedance distribution (decreasing function). J(x) represents the frequency of jumps of size x. Cat model output typically gives J as the sum of the frequencies of events size x. Suppose J has density j, meaning the frequency of jumps of sizes in Ix=[x,x+dx] is J(x)J(x+dx)j(x)dx. Define A=A(J) to be the Poisson aggregate of J, meaning jumps appear with Poisson intensity. Taking each jump in Ix to have size x, the total is distributed xxPo(j(x)dx). By assumption losses of different sizes are independent. Thus A has MGF MA(t)=E[expt(xxPo(j(x)dx))]=E[xexpt(xPo(j(x)dx))]=xE[expt(xPo(j(x)dx))]=xej(x)dx(etx1). At this point, it is easier to work with cumulant functions, taking logs to covert the product into a sum logMA(t)=x(etx1)j(x)dx0(etx1)j(x)dx using Riemann sums, thus explaining the form of the cumulant function of aggregates with Poisson frequency that appeasr in the Lévy formula.

For stable variables the tail jumps are powers with density proportional to j(x)=1/xa+1 in each tail. Specifically, the Lévy representation uses logf(t)=iγtσ2t22+0(eitu1itu1+u2)dM(u)+0(eitu1itu1+u2)dN(u) where in the stable case: M(u)=c1/|u|α and N(u)=c2/uα, c1,c20, c1+c2>0, but σ=0 unless α=2, in which case c1=c2=0 and σ>0 (see Gnedenko and N. , table on p. 164; see also Lukacs , p. 132). For 1<α<2, the Lévy Khinchine representation can be adjusted in the constant term, to give the log ch f as logf(t)=c10(eitx1itx)dx|x|a+1+c20(eitx1itx)dxxa+1, where c1,c20 are are the tail weights.

Look at the second integral I2 for t>0. We need to get rid of the i in the exponent. Use Cauchy’s theorem on contour integration around a contour from the origin to R>0, around a circle to iR and back to the origin. There are no poles in this region so the integral is zero. The integral along the arc 0 as R as can be seen easily looking at the absolute value of the integrand and remembering the integral is only over a quarter arc. Then, we can substitute for t, contour, substitute for the imaginary integral, and use the previous integral, to get I2=0(eitx1itx)dxxa+1=ta0(eix1ix)dxxa+1=ta0i(eix1ix)dxxa+1=taia0(ev1+v)dxva+1=taiaΓ(a)0. Now ia=ealogi=eiaπ/2, giving I2=taeiaπ/2Γ(a)0. When t<0, the first integral in REF is the conjugate of the second giving I1=taeiaπ/2Γ(a). Assembling the parts logf(t)=taΓ(a)(c1eiaπ/2+c2eiaπ/2)=taΓ(a)((c1+c2)cos(aπ/2)eiaπ/2+i(c1c2)sin(aπ/2))=taΓ(a)(c1+c2)cos(aπ/2)(1+ic1c2c1+c2tan(aπ/2))=cta(1+iβtan(aπ/2)) where c=Γ(a)(c1+c2)cos(aπ/2) and β=c1c2c1+c2. Note that c0 and 1β1. That’s where all the weird bits come from! When t<0 use f(t)=f(t). The term ta is replaced by (t)a=|t|a. The tan term swaps signs (conjugate), which is effected by adding t/|t|, which has no effect when t>0. Thus for all t we get logf(t)=c|t|a(1+it|t|tan(aπ/2)). A similar argument works when there is no need for the linear term. In that case it can be subsumed into the constant (not tracked).

This argument fails when a=1 because then the gamma function has a pole at 1 and because we can’t subsume the rest of the trend adjustment term into the constant (because the integral does not exist). Use the XX function x/(1+x2) which looks like the identity close to x=0. Split the integral into its real and imaginary parts. The real part is a well known integral and gives tπ/2. The imaginary part becomes t times limϵ0ϵtϵsinvv2dv=limϵ0ϵtϵdvv=logt.(!) From there, it’s the same conjugate, combined the parts and slough off the awkward linear-in-t term to the constant, to arrive at logf(t)=c|t|(1+iβt|t|2πlog|t|). This treatment follows Gnedenko and N. , which is a marvel of clarity thanks to this rock star mathematician.

Andrey Nikolayevich Kolmogorov.

Andrey Nikolayevich Kolmogorov.

5.7 Spectrally negative 1<α<2

When X is Lévy stable with 1<α<2 and β=1 the variable can still take positive values because of the compensation term. The variable X can be thought of as split into independent small and large jump parts: X=Xs+Xl. The large jump part is a standard compound Poisson with jump density j(x)=1/xα+1. The small part is a sum of compensated over-dispersed Poisson variables of the form xn(Po(λn)λn) where xn0 while λx, details TBD. Since the counts are large the Poisson component is approximately normal, and so each term is approximately N(0,σn2) where σn2=xn2λn. Thus the total effect of this part is approximately normal with variance nxn2λn. However, that doesn’t quite work out! The actual result, Nolan page 100 (see also Zolotarev Section 2.5, Uchaikin and Zolotarev p. 127), works out to be, up to a constant, logPr(Xx)c2(x)α/(α1) for XSt(α,β=1,0,0;S1) in Nolan’s notation. For α close to 1 the exponent is very large and as α2 the exponent approaches 2, as expected.

Details: TBD another research project. Time to move on.

6 Parameter Interpretation

examples — roles for λ and μ — modeling/defining a unit — movies — pictures — Tweedie distribution special cases — cat and non-cat div/non-div growth — reasonable p — other (Houg) applications — pcitures of all sevs of all V — split into Tweedie - gamma - stable domains — all severity curves

From Probability Models for Insurance post. Originally called Exponential Dispersion Models, New From Old, Volume.

6.1 All the severity curves

Code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig, ax = plt.subplots(1, 1, figsize=(3.0, 3.0),
    constrained_layout=True)
xs = 0.9 ** np.linspace(-9, 150, 1001)[::-1]
ps = np.array([-3, -2, -1, -0.5, -0.25, -0.125, 0, 0.125, 0.25, 0.5,
               1, 2, 3, 5, 10, 25])
colors = cm.viridis(np.linspace(0, 1, len(ps)))
for p, color in zip(ps, colors):
    ax.plot(xs, xs**p, lw=1, color=color)
    ax.set(ylim=[-0.05, 2.5], xlim=[-0.05, 2.5],
          xticks=[0,1,2.5], yticks=[0,1,2.5],
          xticklabels=['', '1', ''],
          yticklabels=['', '1', ''],
          aspect='equal');
Figure 9: A wide range of potential severity “densities”.

Split these up.

Code
fig, axs = plt.subplots(1, 3, figsize=(6.0, 2.0),
    constrained_layout=True)
for p, color in zip(ps, colors):
    if p < 0:
        ax = axs[0]
    elif p == 0:
        ax = axs[1]
    else:
        ax = axs[2]
    ax.plot(xs, xs**p, lw=1, color=color)
for ax in axs.flat:
    ax.set(ylim=[-0.05, 2.5], xlim=[-0.05, 2.5],
          xticks=[0,1,2.5], yticks=[0,1,2.5],
          xticklabels=['', '1', ''],
          yticklabels=['', '1', ''],
          aspect='equal')
axs[0].set(title='Downward sloping densities.')
axs[1].set(title='All "equally likely".')
axs[2].set(title='Increasing densities?!');
Figure 10: Three three kinds of “densities”.

7 The Four (or Five) S Transformations

shift — scale — slant — size/shape — symmetry/swap

Figure 11: The impact of shifting, scaling, growth and shaping on a gamma density.

7.1 New from Old

In this section we describe four ways to make a new NEF from an old one:

  • shifting,
  • scaling,
  • shaping, and
  • reciprocity.

The effect of the first three of these transformations is illustrated in Figure 11. In every case the underlying distribution is a gamma.

By definition, exponential tilting operates within a NEF, c.f., the NEF Circle. It does not create a new NEF.

A distortion operator, such as the Wang and proportional hazard transform, is also familiar to actuaries. The impact of most distortions on NEFs is unclear. There are exceptions. For example, the Wang transform acts as an exponential tilt on normal variables.

This section describes the impact of each transformation on the variance function. It gives several examples of reciprocity, which turns out to be related to surplus growth or probability of default.

Throughout this section, X and X1 have means m and m1 and variance functions V and V1. They generate NEFs F and F1 with mean domains, the interior of the set of possible means, M and M1. The NEF is uniquely specified by the distribution of X or by MF and V.

k is a constant. c is the carrier distribution, V is the variance function.
Method Math Examples
Shift XX+k Location
Scale XkX
Slant f(x)kjeθxf(x) Esscher transform tilting
Shape cacλ Additive and reproductive EDMs
Symmetry V(m)=m3V(1/m) Letac-Mora reciprocity

Building complex models from simple ones is a time-honored method. It keeps the number of fundamentally different models to a minimum. For example, all normal distributions are shifted and scaled versions of a standard normal. Scaling and shaping are the same operation for a normal distribution, but in general they are not. Lognormal and gamma distributions have an additional shape parameter. Various programming languages exploit the shift/scale/shape paradigm to order the menagerie of probability distributions.

7.2 The SSSS EDM Transformations

A given base distribution X with density f can be adjusted in many ways to create a new distribution. Four relevant here are what I call the four-esses: shifting, scaling, slanting, and shaping.

  • Shifting (translation, location, position): XX+a. The simplest transformation is a shift, translating the variable by a constant. This preserves variance structure but alters location, effectively adjusting for deductibles, policy limits, or inflation-adjusted thresholds. In the context of EDMs, this corresponds to modifying the natural parameter without affecting the dispersion function. [Certain losses from a very large plan with a high deductible.] Affects the mean but not higher central moments.

  • Scaling: XbX, b>0. Scaling affects both the mean and variance, adjusting exposure by a multiplicative factor but leaving the coefficient of variation unchanged. This transformation is fundamental in loss modeling, where losses scale with insured amounts - though that is unusual (reimbursement rate per day hospitalization? Vs. property size of loss curves). Within the EDM framework, scaling interacts with the dispersion function in a predictable manner, maintaining the variance-mean power relationship. Affects the mean but not higher cumulants (scaled higher central moments).

  • Slanting (exponential tilting or Esscher transform): feθXf, Exponential tilting reweights the probability density function, by adjusting the canonical parameter θ. This transformation is key in actuarial applications and to fitting with Generalized Linear Models. Each NEF corresponds to a deviance function d(y;μ) quantifying our concent with an observation y given parameter μ. The loglikelihood of a unit is d(y;μ) plus terms varying in the EDM but not the NEF. Thus, the maximum likelihood FIX.

  • Shaping by adjusting the index parameter λ. The dispersion parameter λ governs the stochastic variability of the process. Adjusting λ modifies the scale of fluctuations, influencing tail behavior and risk loading. In the Lévy process setting, this corresponds to altering the jump intensity, directly affecting claim frequency or severity modeling. In practice, shaping allows for parametric flexibility in capturing heterogeneous risk profiles.

These transformations provide a toolkit for adapting models to different actuarial settings and reinforce the connection between EDMs, Lévy processes, and the structural principles of risk aggregation.

In practical actuarial applications, these four transformations map naturally to real-world considerations.

  • Scaling adjusts for inflation, currency changes (what finance calls a change in numeriere), or unit conversions. These are essentially trivial changes that do only affect the mean and not the higher normalized moments (cumulants) of the distribution.
  • Slanting is central to exponential dispersion modeling, particularly in maximum likelihood estimation using Generalized Linear Models when the dispersion parameter is fixed, and in risk-adjusted pricing methods such as the Esscher transform.
  • Shaping reflects the size of the portfolio or the amount of exposure, as the dispersion parameter scales with the number of policies or time periods, stabilizing loss distributions in large samples.
  • Shifting is less immediately obvious but can be linked to policy deductibles, retentions, or regulatory thresholds—effectively translating losses relative to a reference point. These four transformations offer both theoretical structure and practical flexibility in loss modeling.

Exposure severity curve, or size of loss distribution. Infinite, “unnormalizable”.

The “Four-Esses” provide a bridge between the mathematical framework of EDMs and practical insurance loss modeling. Scaling, as discussed, directly addresses inflation and different currencies. Slanting XX. Shaping tailors the model to the specific characteristics of a unit in the portfolio, reflecting its size and diversity. And shifting, while requiring more nuanced interpretation, can account for deductibles in certain circumstances.

7.3 Affinities: Shift and Scale

Transformations such as X1=aX+b are called affinities.

For a shift X1=X+b and m1=m+b. Therefore V1(m1)=V(m1b).

For a scale X1=aX, m1=am and m=m1/a. Therefore V1(m1)=a2V(m1/a) since Var(aX)=a2Var(X).

For a general affinity X1=aX+b V1(m1)=a2V(m1ba).

The mapping XaX+b gives a one-to-one mapping between F and F1.

7.4 Shape, Power, or Division

If Xν, ν0, is a Lévy process then E[Xν]=νE[X1] and Var(Xν)=νVar(X1). Therefore, if Vν and V1 are the respective variance functions of Xν and X1 then we get the shape transformation Vν(μ)=Var(Xν)=νVar(X1)=νV1(μν)

The shape transformation can be used in another way. Rather than consider Xν as a family with increasing means we can consider Xν/ν, which has a constant mean as ν varies. If Yν=Xν/ν then Vν(μ)=Var(Yν)=Var(Xν)/ν2=Var(X1)/ν=V1(μ)/μ. Xν corresponds to the additive EDM that models total losses from a growing portfolio, and Yν to the reproductive EDM that models the pure premium or loss ratio.

The power or division terminology arises by writing Xν as the ν-fold convolution of X1 using the ν power of the MGF. Division arises from the ability to “take roots” of an infinitely divisible distribution. The insurance meanings are losses over multiple periods and losses over sub-periods.

7.5 Symmetry or Reciprocity

Reciprocity is a more profound way to relate two NEFs. We define it first and then illustrate with several examples.

Given a NEF, define Θ~ to be those θΘ so that κ(θ)>0, i.e., canonical parameters corresponding to distributions with a positive mean, and define MF+ to be the image of Θ~ under the mean value mapping κ. Thus κ is a bijection between Θ~ and MF+. NEFs F and F1 define a reciprocal pair if

  1. θκ(θ) maps Θ~ to Θ~1,
  2. θκ1(θ) maps Θ~1 to Θ~, and
  3. κ1(κ(θ))=θ for all θΘ~.

For a reciprocal pair, the left-hand diagram in Figure 12 commutes. The meaning, starting from θ and θ1, is illustrated in the two right hand side diagrams.

Figure 12

Differentiating κ1(κ(θ))=θ shows κ1(κ(θ))κ(θ)=1, i.e., the diagram commutes.

The variance functions of a reciprocal pair satisfy the important reciprocity formula V(m)=m3V1(1m), which follows by differentiating κ1(κ(θ))κ(θ)=1 again and noting m=κ(θ) and κ1(κ(θ))=1/m to obtain κ1(κ(θ))(κ(θ))2=κ1(κ(θ))κ(θ)V1(1m)m2=1mV(m). Unlike the formulas for affinities and shaping, the reciprocal variance transformation involves the mean outside V().

The variance function reciprocity formula shows that the set of NEFs with polynomial variance functions of order less than or equal to three is closed under reciprocity. This fact allowed such NEFs to be completely classified by Letac and Mora . Classifying all NEFs with a polynomial variance function is a difficult problem, though it is known that any polynomial with positive coefficients that vanishes at m=0 is the variance function of some NEF with positive mean domain , Corollary 3.3. If V(0)=0 then the NEF must be supported on a subset of the positive or negative reals because of the non-degenerate requirement.

If V(0)=0 let a:=limm0V(m)/m be the right derivative of V at zero. If a=0 then the corresponding NEF is continuous on the positive reals, but can have a mass at zero (Tweedie). If a>0 then the NEF is discrete and a is the smallest positive value with positive probability. For standard counting distributions a=1. Thus polynomial variance functions of the form V(m)=mW(m), W(0)0, correspond to counting distributions (Poisson V(m)=m, negative binomial m(m1), binomial m(m+1), etc.) and those of the form m2W(m) to continuous distributions (gamma V(m)=m2, inverse Gaussian V(m)=m3 and Kendall-Ressel m2(1+m)). The Bar-Lev-Enis-Tweedie family shows that fractional powers are also possible.

In general, reciprocity is mysterious, Louati says

Even if in some cases, one of the reciprocal distributions is interpreted as the first hitting time distribution for a Lévy stochastic process corresponding to the other, the notion of reciprocity remains mysterious and needs a better understanding.

The first hitting time distribution interpretation runs as follows. In order to use standard notation we use t in place of ν to index processes, and we change signs of jumps so they are negative rather than positive—premium and income are positive, losses are negative.

Let Xt be a spectrally negative Lévy process, with no positive jumps. The Lévy measure of Xt is supported on the negative reals. Xt could be any of the following plus a positive drift: Brownian motion, an infinite activity process, or a compound Poisson process. Xt models the surplus process: cumulative premium minus cumulative losses. Losses are negative jumps and premium is a continuous positive trend.

Define the first hitting time for level x to be Tx=inf{t>0Xtx}. Since Xt has no upward jumps, XTx=x: it can’t jump over x, it can only jump downwards. Next, Tx is infinitely divisible. For any n, Tx is the sum of n independent copies of Tx/n, because the Markov property of Xt applies at a stopping time. We can reset Xt at the random time Tx/n. As a result, Tx, x0, is a strictly increasing Lévy process. Such processes are called subordinators.

We can identify the distribution of Tx. By assuming E[X1]0 (i.e., premium exceeds expected claims, so the surplus process is increasing on average) it is guaranteed that Xt will hit any positive level x in finite time, Pr(Tx<)=1. Since Xt is a Lévy process, its moment generating function E[eθXt]=etκX(θ) where κX is the cumulant generating function of X1, which is independent of t.

The exponential tilt of Xt has density eθxtκX(θ)ft(x), where ft is the density of Xt. Since a density integrates to 1, we have E[eθXttκX(θ)]=eθxtκX(θ)ft(x)dx=1 for all t0. Therefore the process eθXttκX(θ), t0, is a martingale. Martingales are a generalization of a constant process. They have no drift and a constant mean expectation. The lack of drift also holds if we stop the process at a random stopping time Stopping eθXttκX(θ) at Tx and remembering XTx=x produces an expression for the MGF of Tx E[eθxTxκX(θ)]=1MTx(κX(θ))=E[eκX(θ)Tx]=eθx. Taking logs of both sides gives κTx(κX(θ))=θx where κTx is the cumulant generating function of Tx. Since Tx is infinitely divisible we know that κTx=xκT, where κT is the cumulant generating function of T1. As a result κT(κX(θ))=θ, showing Tx and Xt define reciprocal NEFs, and explaining the name. The formula relies on the fact that Xt is a Lévy process with no positive jumps and E[X1]0.

Example 1 (Reciprocity Examples)  

  1. Poisson process hitting level n gamma.
  2. Normal with no drift hitting level x Lévy stable.
  3. Normal with positive drift hitting level x>0 inverse Gaussian.
  4. Spectrally negative extreme stable (1<α<2, p<0) with positive drift (mean) hitting level x>0 positive extreme (0<α1=α1<1, p1=3p).

Example 2 (Bar-Lev-Enis-Tweedie reciprocity p1,2 in detail) Reciprocity requires V(μ)=μ3V1(1/μ). In the Bar-Lev-Enis-Tweedie case V(μ)=μp, the reciprocal pair is therefore V1(μ)=V(μ)μ3=μ3p. All four cases in Example 1 are Bar-Lev-Enis-Tweedie pairs pp1=3p. The corresponding α values are related by α1=2p11p1=p1p2=1α. Recall also (1p)(1α)=(p1)(α1)=1.

When p1,2,, we have κ(θ)=α1α(θα1)α. Then the pair κ1, κ satisfies the reciprocity equation: κ1(κ(θ))={α11α1(α1α(θα1)αα11)α1}={α11α1((θα1)α)α1}={α11α1(θα1)}={θ}=θ.

Example 3 (Normal-Inverse Gaussian.) Here is the classic example of reciprocity worked out “by-hand”. Let Xt=ct+σBt be a Brownian motion with a positive trend, so Tx is guaranteed to be finite for x0. Xt is normal with mean ct and variance tσ2, by definition of a Brownian motion. The cumulant generator κX(θ)=logE[eθXt]=logE[eθσBt+ctθ]=t(cθ+σ2θ2/2) by the well-known formula for the mean of a lognormal distribution.

Reciprocity shows κT(κX(θ))=κT(cθσ2θ22)=θ. Inverting, using the quadratic formula, shows 0=σ2θ22+cθ+yθ=c+c22yσ2σ2, since the argument for the Laplace transform must be positive and the other root is negative. Therefore the cumulant generating function of T1 is κT(y)=cc22yσ2σ2=cσ2(112σ2yc2)=1μσ2(112μ2σ2y) where μ=1/c. Consulting a table of Laplace transforms, reveals this is the cumulant generating function for an inverse Gaussian distribution with mean μ and variance μ3σ2. λ is used for 1/σ2 in . This reciprocal relationship between cumulant generating functions is why Tweedie chose the named inverse Gaussian (op. cite p. 250). The mean makes sense: if we are traveling at speed c then it should take time x/c to travel distance x.

If σ2=1 and c=0, so there is no drift, then κT(y)=2y, which is the cumulant generating function for a Lévy stable distribution with α=1/2. In the absence of drift, the waiting time T1 has infinite expectation: you are guaranteed to get to level 1, just not quickly!

How are these facts related to exponential family distributions? By the inverse relationship of cumulant generating functions, the normal and inverse Gaussian are reciprocal distributions. Therefore the variance functions are related by V1(m)=m3V(m)=σ2m3, as expected. ∎

Example 4 (Extreme stable reciprocity) Example 3 generalizes to a reciprocity between WHAT AND WHAT?

Example 5 (Bar-Lev-Enis-Tweedie distribution reciprocity) Example 3 also generalizes to a reciprocity between Bar-Lev-Enis-Tweedie Twp(μ,σ2) and Tw3p(μ,σ2) for 1p2. DETAILS This includes reciprocity between a gamma and Poisson process, where the gamma gives the waiting time for WHAT.

8 Volume Reconsidered

modeling a dynamic portfolio — div/non-div growth — interpreting p1,2

9 Additional Source Code

reproducible research — reproducing examples — extending examples

Here is some Python code needed to run examples, it is executed using

#| echo: false
#| label: set-up-code
# also provides a separator in qmd between the yaml and first text block
%run post_code.py

before anything else is run. The code relies on an up to date version of the aggregate package, available from GitHub and PyPi. It contains standard programming nonsense.

"""
Code to support the Tweedie post.

Read in to the environment with %run post_code.py.

Pulls ideas from Going Spectral post (on graphics)...that's a work in progress.
"""

# all imports
from aggregate import build, qd, Distortion, Portfolio
from greater_tables import GT
from IPython.display import HTML, display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd


def set_mpl_options():
    # matplotlib options
    PLOT_FACE_COLOR = 'white'
    FIGURE_BG_COLOR = '#e9e9f2'  # matches beamer background blue
    plt.rcParams.update({
        "axes.edgecolor": "black",
        "axes.facecolor": PLOT_FACE_COLOR,
        "axes.labelcolor": "black",
        "figure.dpi": 300,
        "figure.facecolor": FIGURE_BG_COLOR,
        "font.family": "serif",
        "font.size": 9,
        "legend.edgecolor": "none",
        "legend.facecolor": PLOT_FACE_COLOR,
        "legend.labelcolor": "black",
        "text.color": 'black',
        "xtick.color": 'black',
        "ytick.color": 'black',
        # "font.family": "sans-serif",
        'font.family': 'serif',
        'font.monospace': ['Ubuntu Mono', 'QuickType II Mono', 'Cascadia Mono',
                'DejaVu Sans Mono'],
        'font.sans-serif': ['Myriad Pro', 'Segoe UI', 'DejaVu Sans'],
        'font.serif': ['STIX Two Text', 'Times New Roman', 'DejaVu Serif'],
        'mathtext.fontset': 'stixsans',
    })

set_mpl_options()

# pandas options
def my_ff(x):
    """My float formatter."""
    try:
        if x == int(x):
            return f'{x:,d}'
        elif abs(x) < 0.005:
            return f'{x:.5g}'
        elif abs(x) < 100:
            return f'{x:.3f}'
        else:
            return f'{x:,.0f}'
    except:
        return str(x)


pd.options.display.float_format = my_ff


# greater tables
__gt_global = GT()


def qd(x, **kwargs):
    return display(HTML(GT(x, **kwargs)))


def qdagg(b):
    """qd an Aggregate or Portfolio object, dealing with NaNs."""
    MAGIC_NO = -99.1223334444
    bit = b.describe.fillna(MAGIC_NO)
    with pd.option_context('display.float_format', lambda x: '' if x==MAGIC_NO else f'{x:.5g}'):
        display(bit)

References

Avanzi, Benjamin, Greg Taylor, and Bernard Wong, 2016, Correlations between insurance lines of business: An illusion or a real phenomenon? some methodological considerations, ASTIN Bulletin 46, 225–263.
Bar-Lev, Shaul K., 2019, Independent, Tough Identical Results: The Class of Tweedie on Power Variance Functions and the Class of Bar-Lev and Enis on Reproducible Natural Exponential Families, International Journal of Statistics and Probability 9, 30.
Bar-Lev, Shaul K., 2023, The Exponential Dispersion Model Generated by the Landau Distribution—A Comprehensive Review and Further Developments, Mathematics 11.
Bar-Lev, Shaul K., Daoud Bshouty, and Gérard Letac, 1992, Natural exponential families and self-decomposability, Statistics and Probability Letters 13, 147–152.
Bar-Lev, Shaul K., and Peter Enis, 1986, Reproducibility and natural exponential families with power variance functions, The Annals of Statistics 14, 1507–1522.
Carr, Peter, and Liuren Wu, 2003, The Finite Moment Log Stable Process and Option Pricing, Journal of Finance 58, 753–778.
Dunn, Peter K., and Gordon K. Smyth, 2018, Generalized Linear Models With Examples in R (Springer, New York).
Eaton, Morris L, Carl Morris, and Herman Rubin, 1971, On extreme stable laws and some applications, Journal of Applied Probability 8, 794–801.
Gnedenko, B. V., and Kolmogorov A. N., 1968, Limit Distributions for Sums of Independent Random Variables (Addison-Wesley).
Hougaard, Philip, 2000, Analysis of Multivariate Survival Data (Springer, New York).
Johnson, Norman L, Samuel Kotz, and N Balakrishnan, 1994, Continuous Univariate Distributions. Second. Vol. 1 (John Wiley; Sons).
Jørgensen, Bent, 1997, The theory of dispersion models (CRC Press).
Jørgensen, Bent, and José Raúl Martínez, 1996, The Levy-Khinchine Representation of the Tweedie Class, Brazilian Journal of Probability and Statistics 10, 225—–233.
Jørgensen, Brent, 1987, Exponential Dispersion Models, Journal of the Royal Statistical Society Series B 49, 127–162.
Kuchler, Uwe, and Michael Sorensen, 1997, Exponential Families of Stochastic Processes (Springer-Verlag, New York).
Letac, Gerard, 2004, The Rovinj summer school on natural exponential families, 1–15.
Letac, Gérard, 2015, Associated natural exponential families and elliptic functions, The fascination of probability, statistics and their applications: In honour of ole e. Barndorff-nielsen.
Letac, Gerard, and Marianne Mora, 1990, Natural Real Exponential Families with Cubic Variance Functions, The Annals of Statistics 18, 1–37.
Louati, Mahdi, 2013, Mixture and reciprocity of exponential models, Statistics and Probability Letters 83, 452–458.
Lukacs, Eugene, 1970, Characteristic Functions. 2nd ed. (Griffin, London).
Menn, Christian, and Svetlozar T Rachev, 2006, Calibrated FFT-based density approximations for α-stable distributions, Computational Statistics and Data Analysis 50, 1891–1904.
Mildenhall, Stephen, 2024, Aggregate: fast, accurate, and flexible approximation of compound probability distributions, Annals of Actuarial Science, 1–40.
Mildenhall, Stephen J., 2017, Actuarial Geometry, Risks 5.
Nolan, John P, 2020, Univariate stable distributions, Springer Series in Operations Research and Financial Engineering 10, 978–3.
Pitman, E. J. G., and Jim Pitman, 2016, A direct approach to the stable distributions, Advances in Applied Probability 48, 261–282.
Roberts, Siobhan, 2024, Genius at Play: The Curious Mind of John Horton Conway (Princeton University Press).
Sato, Ken-Iti, 1999, Levy Processes and Infinitely Divisible Distributions (Cambridge University Press, Cambridge).
Sato, Ken-Iti, Epdf.pub_levy-processes-and-infinitely-divisible-distributi.pdf,.
Simon, Thomas, 2015, Positive stable densities and the bell-shape, Proceedings of the American Mathematical Society 143, 885–895.
Uchaikin, Vladimir V., and Vladimir M. Zolotarev, 1999, Chance and Stability: Stable Distributions and their Applications.
Vinogradov, Vladimir, 2004, Properties of certain Lévy and geometric Lévy processes, Communications on Stochastic Analysis 2.
Wüthrich, Mario V., and Michael Merz, 2023, Statistical Foundations of Actuarial Learning and its Applications (Springer).
Zolotarev, Vladimir M, 1986, One-Dimensional Stable Distributions, Translations of Mathematical Monographs, vol. 65. Vol. 20. 2 (American Mathematical Society).