19. Etymology of Entropy#
This lecture describes and compares several notions of entropy.
Among the senses of entropy, we’ll encounter these
A measure of uncertainty of a random variable advanced by Claude Shannon [Shannon and Weaver, 1949]
A key object governing thermodynamics
Kullback and Leibler’s measure of the statistical divergence between two probability distributions
A measure of the volatility of stochastic discount factors that appear in asset pricing theory
Measures of unpredictability that occur in classical Wiener-Kolmogorov linear prediction theory
A frequency domain criterion for constructing robust decision rules
The concept of entropy plays an important role in robust control formulations described in this lecture Risk and Model Uncertainty and in this lecture Robustness.
19.1. Information Theory#
In information theory [Shannon and Weaver, 1949], entropy is a measure of the unpredictability of a random variable.
To illustrate
things, let
Claude Shannon’s [Shannon and Weaver, 1949] definition of entropy is
where
Inspired by the limit
we set
Typical bases for the logarithm are
In the information theory literature, logarithms of base
Shannon typically used base
19.2. A Measure of Unpredictability#
For a discrete random variable
The quantity
Note that entropy
19.2.1. Example#
Take a possibly unfair coin, so
Then
Evidently,
at
So
Thus, among all coins, a fair coin is the most unpredictable.
See Fig. 19.1

Fig. 19.1 Entropy as a function of
19.2.2. Example#
Take an
Among all dies, a fair die maximizes entropy.
For a fair die,
entropy equals
To specify the expected number of bits needed to isolate the outcome of one roll of a fair
For example,
if
For
19.3. Mathematical Properties of Entropy#
For a discrete random variable with probability vector
is continuous. is symmetric: for any permutation of .A uniform distribution maximizes
:Maximum entropy increases with the number of states:
.Entropy is not affected by events zero probability.
19.4. Conditional Entropy#
Let
Conditional entropy
Here
19.5. Independence as Maximum Conditional Entropy#
Let
Let
Thus,
Consider the following problem:
choose a joint distribution
The conditional-entropy-maximizing
Thus, among all joint distributions with identical marginal distributions,
the conditional entropy maximizing joint distribution makes
19.6. Thermodynamics#
Josiah Willard Gibbs (see https://en.wikipedia.org/wiki/Josiah_Willard_Gibbs) defined entropy as
where
The Boltzmann constant
relates energy at the micro particle level with the temperature observed at the macro level. It equals what is called a gas constant divided by an Avogadro constant.
The second law of thermodynamics states that the entropy of a closed physical system increases until
19.7. Statistical Divergence#
Let
Assume that
Then the Kullback-Leibler statistical divergence, also called relative entropy, is defined as
Evidently,
where
It is easy to verify, as we have done above, that
19.8. Continuous distributions#
For a continuous random variable, Kullback-Leibler divergence between two densities
19.9. Relative entropy and Gaussian distributions#
We want to compute relative entropy for two continuous densities
We seek a formula for
Claim
Proof
The log likelihood ratio is
Observe that
Applying the identity
Taking mathematical expectations
Combining terms gives
which agrees with equation (19.5).
Notice the separate appearances of the mean distortion
Extension
Let
Then
19.10. Von Neumann Entropy#
Let
A measure of the divergence between two
where the log of a matrix is defined here (https://en.wikipedia.org/wiki/Logarithm_of_a_matrix).
A density matrix
The von Neumann entropy of a density matrix
19.11. Backus-Chernov-Zin Entropy#
After flipping signs, [Backus et al., 2014] use Kullback-Leibler relative entropy as a measure of volatility of stochastic discount factors that they assert is useful for characterizing features of both the data and various theoretical models of stochastic discount factors.
Where
Evidently, by virtue of the minus sign in equation (19.9),
where
Let
[Backus et al., 2014] note that a stochastic discount factor satisfies
They derive the following entropy bound
which they propose as a complement to a Hansen-Jagannathan [Hansen and Jagannathan, 1991] bound.
19.12. Wiener-Kolmogorov Prediction Error Formula as Entropy#
Let
The variance of
As described in chapter XIV of [Sargent, 1987], the Wiener-Kolmogorov formula for the one-period ahead prediction error is
Occasionally the logarithm of the one-step-ahead prediction error
Consider the following problem reminiscent of one described earlier.
Problem:
Among all covariance stationary univariate processes with unconditional variance
The maximizer is a process with spectral density
Thus, among
all univariate covariance stationary processes with variance
This no-patterns-across-time outcome for a temporally dependent process resembles the no-pattern-across-states outcome for the static entropy maximizing coin or die in the classic information theoretic analysis described above.
19.13. Multivariate Processes#
Let
Let
be a Wold representation for
Linear-least-squares predictors have one-step-ahead prediction error
Being a measure of the unpredictability of an
19.14. Frequency Domain Robust Control#
Chapter 8 of [Hansen and Sargent, 2008] adapts work in the control theory literature to define a frequency domain entropy criterion for robust control as
where
Hansen and Sargent [Hansen and Sargent, 2008] show that criterion (19.13) can be represented as
for an appropriate covariance stationary stochastic process derived from
This explains the
moniker maximum entropy robust control for decision rules
19.15. Relative Entropy for a Continuous Random Variable#
Let
The relative entropy of the distorted density
Fig. 19.2 plots the functions
That relative entropy
Fig. 19.3 and Fig. 19.4 display aspects of relative entropy visually for a continuous random variable
Where the numerator density is

Fig. 19.2 The function

Fig. 19.3 Graphs of

Fig. 19.4