What is Information?

notes
llm
insurance
probability
PMIR
The many disciplines and many descriptors of “Information” speak to its richness and importance.
Author

Stephen J. Mildenhall

Published

2025-11-20

Modified

2025-11-20

I am working on a monograph Pricing Multi-Period Insurance Risk (PMIR). It presents ways to price insurance risk that emerges over multiple periods. As part of the chapter on Emergence (of losses) I have been thinking about Information and how is it different from data and the elusive raw data. I realized there is a deep well of words used to describe information crossing many disciplines. That led to some investigations with GPT5.1. Here are the results. First, my prompt:

What words can be used to describe information? To get you started, I’m thinking we have the volume, comma, salience, comma, value, comma, granularity, comma, salience, comma. What other things can you come up with? And then please begin to frame and categorize these qualifiers in light of whatever the literature is on information. Provide references.

I was expecting a long list, but was surprised just how long it was.


Core quantitative descriptors

These are the traditional formal properties.

  • volume (amount, size)
  • entropy (Shannon uncertainty)
  • surprisal (informativeness of an event)
  • mutual information (shared content with another variable)
  • redundancy (repetition, compressibility)
  • granularity (resolution, fineness)
  • precision (statistical dispersion of estimates)
  • completeness (coverage of relevant states)
  • timeliness (latency, recency)

References:

  • Shannon (1948) “A Mathematical Theory of Communication”.
  • Cover and Thomas (1991) Elements of Information Theory.

Relevance-to-task descriptors

These relate information to a decision or utility context.

  • salience (attention-drawing relevance)
  • value (expected decision value; EVPI/EVSI)
  • signal-to-noise ratio (useful vs irrelevant fluctuation)
  • specificity (how targeted the content is)
  • actionability (directness of implication for a decision)
  • sufficiency (statistical sufficiency for a parameter)

References:

  • Howard (1966) “Information Value Theory”.
  • Raiffa and Schlaifer (1961) Applied Statistical Decision Theory.
  • Blackwell (1953) “Equivalent Comparisons of Experiments”.

Structural / representation descriptors

These describe how information is encoded.

  • structure (degree of organization)
  • hierarchy (levels, layers, nestedness)
  • format (symbolic, numeric, linguistic, spatial)
  • modularity (decomposability into subcomponents)
  • compressibility (ability to reduce without loss)
  • dimensionality (number of variables or features)

References:

  • Kolmogorov (1965) algorithmic complexity.
  • Chaitin (1977) Algorithmic Information Theory.

Quality / credibility descriptors

These describe trust and reliability.

  • accuracy (closeness to truth)
  • reliability (consistency across measures)
  • validity (measures what it claims to measure)
  • verifiability (ability to check correctness)
  • provenance (source quality)
  • bias (systematic distortion)

References:

  • O’Neill (1986) “The Role of Accuracy, Precision, and Uncertainty in Ecological Modeling”.
  • Tufte (1997) Visual Explanations (on evidence quality).

Cognitive / perceptual descriptors

These describe how humans interact with information.

  • salience (again, psychological meaning)
  • clarity (ease of comprehension)
  • cognitive load (processing demand)
  • distinctiveness (contrast with background)
  • memorability (retention likelihood)

References:

  • Kahneman (2011) Thinking, Fast and Slow.
  • Tversky and Kahneman (1974) “Judgment under Uncertainty”.

Dynamical descriptors

Information changes over time.

  • volatility (rate of change)
  • decay (loss of relevance)
  • refresh rate (arrival frequency)
  • stability (robustness against new observations)

References:

  • Lindley (1956) “On a Measure of the Information Provided by an Experiment”.
  • Kelly (1956) “A New Interpretation of Information Rate”.

Semantic / interpretative descriptors

These go beyond Shannon into meaning.

  • meaning (semantic content)
  • context richness (how much framing is required)
  • ambiguity (degree of multiple interpretations)
  • coherence (internal consistency)
  • novelty (departure from prior understanding)

References:

  • Floridi (2010) The Philosophy of Information.
  • Bar-Hillel and Carnap (1953) semantic information.

Social / communicative descriptors

Information as a social object.

  • accessibility (ability to obtain it)
  • transparency (openness of method and data)
  • authority (recognized legitimacy)
  • consensus (agreement among interpreters)
  • diffusion (speed of spread)

References:

  • Shapiro and Varian (1998) Information Rules.
  • Benkler (2006) The Wealth of Networks.

Information in the context of sigma algebras

I want to talk about information in the context of the sigma algebra generated by knowing the information, period. So this ties information directly to a particular random variable, comma, obviously, if the sigma algebra is independent of the variable, it has no information, etc., period. Which books and papers think about information in this way? And what are the principal findings that they uncover?

1. Blackwell’s comparison of experiments

This is the central reference.

Blackwell, D. (1951, 1953). Equivalent comparisons of experiments; comparison of experiments.

Principal findings:

  • An “experiment” is a Markov kernel \(P(\cdot\mid X)\), which induces a sigma algebra of observable events.
  • Experiment \(\mathcal{E}_1\) is more informative than \(\mathcal{E}_2\) if every decision problem solvable with \(\mathcal{E}_2\) can be solved at least as well using \(\mathcal{E}_1\).
  • Equivalent characterization: \(\mathcal{E}_1\) dominates \(\mathcal{E}_2\) if the sigma algebra generated by \(\mathcal{E}_2\) is a Garbling (Blackwell transform) of that of \(\mathcal{E}_1\).
  • Intuition: finer sigma algebra ⇒ finer partitions ⇒ better conditioning on \(X\). This is precisely your “information is the sigma algebra you condition on.”

2. Le Cam’s theory of statistical experiments

Le Cam, L. (1964, 1972) and Le Cam & Yang (1990). Asymptotics in Statistics.

Principal findings:

  • Experiments are ordered by deficiency distance.
  • Two experiments are asymptotically equivalent if the sigma algebras they generate converge (in a sense) to the same limiting information.
  • Information is formalized entirely through measurable partitions. Useful when thinking about progressive arrival of claim information as \(t \mapsto \mathcal{F}_t\).

3. Strassen’s theorem (martingale couplings)

Strassen, V. (1965). The existence of martingale couplings relates two sigma algebras or filtrations ordered by convex order.

Principal findings:

  • \(\mathcal{G}\) is more informative than \(\mathcal{H}\) iff there exists a martingale coupling \(X \mapsto \mathrm{PX}(X\mid \mathcal{G}) \mapsto \mathrm{PX}(X\mid \mathcal{H})\).
  • This gives a purely probabilistic characterization of “information gains” as the ability to refine conditional expectations. Direct link to reserving: more information pushes the conditional reserve closer to the ultimate.

4. Classical sufficiency theory

Basu, Lehmann & Scheffé, Halmos & Savage.

Principal findings:

  • A statistic \(T(X)\) generates a sigma algebra \(\sigma(T)\) that is sufficient if conditioning on \(\sigma(T)\) gives full information for inference about parameters.
  • Completeness and sufficiency define when \(\sigma(T)\) is as informative as the full \(\sigma(X)\). Insurance analogue: early claim attributes that are “nearly sufficient” for tail severity.

5. Filtrations, conditional expectation, and prediction

Doob (1953) Stochastic Processes. Dellacherie & Meyer (1978–1982).

Principal findings:

  • Information evolution is a filtration \((\mathcal{F}_t)\); conditioning on \(\mathcal{F}_t\) is exactly “information at time \(t\)”.
  • Martingales capture the idea that updated predictions use all information in \(\mathcal{F}_t\).
  • Finer filtrations produce smaller conditional variances. This maps perfectly onto claim development and your InformationSimulator’s idea that more features refine the sigma algebra.

6. Bayesian decision theory

Raiffa & Schlaifer (1961); DeGroot (1970).

Principal findings:

  • The value of an information structure is precisely the improvement in expected utility from conditioning on that sigma algebra.
  • More informative sigma algebras dominate less informative ones in Blackwell’s sense. Direct actuarial relevance: EVSI, EVPI for claims handling and reserving.

7. Economic theory of information structures

Milgrom (1981), Aumann & Maschler, Kamenica & Gentzkow (2011).

Principal findings:

  • An “information structure” is a sigma algebra or partition.
  • Coarser partitions are strategically chosen (Bayesian persuasion).
  • Informativeness is ordered exactly via Blackwell dominance. This is useful when thinking of operational constraints: adjusters reveal coarse partitions first, finer partitions later.

Summary: principal findings across the tradition

  1. Information = sigma algebra. A more informative observation produces a strictly finer sigma algebra (smaller atoms, more measurable sets).

  2. Informativeness is ordered (Blackwell order). One information structure dominates another if all decisions using the coarse one can be reproduced using the fine one.

  3. Conditional expectation respects the order. If \(\mathcal{G} \supseteq \mathcal{H}\) then \(\mathrm{Var}(X\mid\mathcal{G}) \le \mathrm{Var}(X\mid\mathcal{H})\) and conditional means under \(\mathcal{G}\) are martingale refinements of those under \(\mathcal{H}\).

  4. Sufficient sigma algebras represent maximal compression. If \(\sigma(T)\) is sufficient for \(X\) regarding loss prediction, then no further information reduces uncertainty.

  5. Filtrations describe information arrival. Time-indexed information structures \((\mathcal{F}_t)\) formalize claim development; conditioning over time is a martingale.

  6. Decision-value of information is exactly EVSI/EVPI. This is the reduction in expected loss from conditioning on a finer sigma algebra.


References

Here is a clean bibliography list of all works mentioned, formatted so you can drop each directly into Google Scholar.

Bibliography

Aumann, R. J., and M. Maschler (1995). Repeated Games with Incomplete Information. MIT Press.

Bar-Hillel, Y., and R. Carnap (1953). “Semantic Information.” British Journal for the Philosophy of Science 4(14): 147–157.

Basu, D. (1955). “On Statistics Independent of a Complete Sufficient Statistic.” Sankhyā 15: 377–380.

Blackwell, D. (1951). “Comparison of Experiments.” In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 93–102.

Blackwell, D. (1953). “Equivalent Comparisons of Experiments.” Annals of Mathematical Statistics 24(2): 265–272.

Chaitin, G. J. (1977). “Algorithmic Information Theory.” IBM Journal of Research and Development 21(4): 350–359.

Cover, T. M., and J. A. Thomas (1991). Elements of Information Theory. Wiley.

DeGroot, M. H. (1970). Optimal Statistical Decisions. McGraw–Hill.

Dellacherie, C., and P.-A. Meyer (1978–1982). Probabilities and Potential (vols. I–III). North-Holland.

Doob, J. L. (1953). Stochastic Processes. Wiley.

Floridi, L. (2010). The Philosophy of Information. Oxford University Press.

Halmos, P. R., and L. J. Savage (1949). “Application of the Radon–Nikodym Theorem to the Theory of Sufficient Statistics.” Annals of Mathematical Statistics 20: 225–241.

Howard, R. A. (1966). “Information Value Theory.” IEEE Transactions on Systems Science and Cybernetics 2(1): 22–26.

Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.

Kamenica, E., and M. Gentzkow (2011). “Bayesian Persuasion.” American Economic Review 101(6): 2590–2615.

Kelly, J. L. (1956). “A New Interpretation of Information Rate.” Bell System Technical Journal 35: 917–926.

Kolmogorov, A. N. (1965). “Three Approaches to the Quantitative Definition of Information.” Problems of Information Transmission 1(1): 1–7.

Le Cam, L. (1964). Sufficiency and Approximate Sufficiency. Princeton University Press.

Le Cam, L., and G. L. Yang (1990). Asymptotics in Statistics: Some Basic Concepts. Springer.

Lehmann, E. L., and H. Scheffé (1950). “Completeness, Similar Regions, and Unbiased Estimation.” Sankhyā 10: 305–340.

Lindley, D. V. (1956). “On a Measure of the Information Provided by an Experiment.” Annals of Mathematical Statistics 27(4): 986–1005.

Milgrom, P. (1981). “Good News and Bad News: Representation Theorems and Applications.” Bell Journal of Economics 12(2): 380–391.

Raiffa, H., and R. Schlaifer (1961). Applied Statistical Decision Theory. Wiley.

Shannon, C. E. (1948). “A Mathematical Theory of Communication.” Bell System Technical Journal 27: 379–423 and 623–656.

Strassen, V. (1965). “The Existence of Probability Measures with Given Marginals.” Annals of Mathematical Statistics 36: 423–439.

Tversky, A., and D. Kahneman (1974). “Judgment under Uncertainty: Heuristics and Biases.” Science 185: 1124–1131.