**A quick intro to the
main interpretations of probability**

Since probability calculus has been
axiomatized, Kolmogorov’s axiomatization being the standard one, and the one we
briefly considered in this course, one might simply say that probability is
whatever satisfies the axioms of probability, much in the same way in which,
say, Euclidean items are whatever satisfies Hilbert’s axiomatization
of geometry. Many quantities, such as
normalized length, satisfy the axioms of probability. However, such quantities do not provide an
interpretation of probability in the sense of an *analysis* of the notion of probability, which, presumably, is what
one has in mind when one asks what probability is. Hence, assuming that the question is not ill-posed,
one may feel the need to engage in some mathematical/philosophical
considerations.

The main interpretations of probability are best divided into into two groups:

- Epistemological interpretations, according to which probability is primarily related to human knowledge or belief.
- Objective interpretations, according to which probability is about a feature of reality independent of human knowledge or belief. Sometimes reality is taken to be the physical world; at times it is taken to include a sort of Platonic realm of mathematical and logical entities.

*The Classical
interpretation *(Bernoulli, Laplace, and most
everyone up to the 1800’s)

This interpretation was developed
first in the late xvii century, especially by Jacob Bernoulli (*Ars** Conjecturandi*,
1713), but codified by *Philosophical Essays on Probabilities*,
1814). For

- Determinism obtains in the natural world. Hence, probability is epistemic.
- To
determine the probability of, say, getting a 2 when tossing a fair die, we
constructed the ratio between favorable cases and
*possible*cases. Probability*is*such ratio. - Ratios
between favorable and
*possible*cases can be easily shown to obey the axioms of probability calculus. - Passage to the limit allows then the construction of probabilities not expressible as rational numbers (fractions having integers as numerator and denominator).

NOTE: This is required by the fact that in science probabilities can be expressed by irrational numbers.

What possible cases? For example, in tossing a fair die, one
could have a sample universe of 2 and not-2 and claim that consequently
Pr(2)=Pr(not-2)=1/2, which will not do.
The answer is to require that the possible cases be *equiprobable*. To avoid circularity (defining probability by
appealing to equiprobability) one defines equiprobable cases as those for which there are no relevant
rational grounds for choosing among them (*Principle
of Indifference*). Hence, the case
not-2 subdivides into 5 equiprobable cases. So, for

NOTE: for the objective interpretation this is nonsense. If the coin is loaded, certainly Pr(H) ≠ Pr(T) ≠ 1/2.

Problems:

- The probability of a single event (e.g., the murder of Caesar) cannot be determined by constructing a ratio, as there seem to be no relevant equiprobable cases.
- Bertrand’s paradox. The Principle of Indifference gives inconsistent results because the same situation can be described in different, but equivalent, ways. A factory produces cubes with side between 0 and 1. If the production output is uniformly distributed along side-length, Pr(side is less than 1/2)=1/2. But the same cubes are then produces with unifirm distribution of face area between 0 and 1. By analogy we could then say that Pr(face area is less then 1/4)=1/4, and yet by necessity a cube with side less than 1/2 has faces with area less than 1/4, and viceversa.

* *

*The frequency
interpretation* (Venn, Reichenbach, von Mises)

Probability theory is taken to be a mathematical science
dealing with mass random events, which are unpredicatble in detail but whose
*numerical proportion in the long run* with respect to a given set
of events (the reference class) are predicatble

Example:proportion of heads when flipping a coin many times; births (deaths) of, say, males in a population; raindrop distributions, etc.

NOTE: analogy with, say, dynamics, whose subject matter is force.

Gamblers and statisticians have long known of the intimate
relation between probability and frequency: if Pr(2)=1/6, then in the long run
the frequency of 2 within the class of all the outcomes tends towards 1/6. The frequency interpretation holds that the
probability of an event or property M in a reference class B *is* (perhaps in an idealized way) the
frequency of M within B

NOTE:

- As frequencies are taken to be objective features of reality, this is an objective interpretation.
- The frequency interpretation satisfies the axioms of probability theory, with some caveats, however, with respect to countable additivity.

Problems:

- If the frequency is taken to be finite, then only probabilities expressed by rational numbers are allowed, which is problematic. The answer is to allow infinite (limiting) frequencies. The idea here is that one can introduce limits as in physics.
- M may not occur. For example an unflipped coin lacks a probability for tails in this interpretation.
- M may be a single event, e.g. the civil war or Caesar’s murder, in which case no frequency can be provided.

NOTE: von Mises did not consider this a serious objection: as the mechanical definition of work does not apply to the everyday notion of work, so this interpretation does not agree with our everyday notion of probability.

- Since
M must be given with respect to B, an individual will have M only qua B,
not absolutely. So, Pr(I live to
90) must be understood of me
*qua*male, or*qua*philosopher, or*qua*white, and so on.

NOTE: this may be a problem that affects other interpretations as well, however.

*The Logical
Interpretation* (Keynes, Jeffrey, Carnap)

The basic idea of the logical
interpretation is that probability is *the
measurement of partial entailment* (with probabilities 1 and 0 as limiting
cases), that is, the measurement of the evidential link between evidence E and
the hypothesis H* *supported by E. As such the logical interpretation tries to
provide a framework for inductive logic.
We have already seen this in our discussion of entailment in terms of
conditional probability.

There are several versions of this interpretation, but the
most famous is by Carnap (*Logical Foundations of Probability*, 1950).

Consider a language with 3 names, a, b, c, and a predicate
F. This language has 8 *state descriptions*, that is, statements
saying for each individual whether it has F or not:

- Fa&Fb&Fc
- -Fa&Fb&Fc
- Fa&-Fb&Fc
- Fa&Fb&-Fc
- -Fa&-Fb&Fc
- -Fa&Fb&-Fc
- Fa&-Fb&-Fc
- -Fa&-Fb&-Fc

When we look at the state descriptions, we note that some
differ only by permutation of names. For
example, (2), (3), (4) all have two things with F and one with –F. (2), (3), and (4) constitute a *structure description*. There are four structure descriptions:

- {1}: every individual is an F
- {2,3,4}: two individuals are F and one is not
- {5,6,7}: one individual is F two are not
- {8}: no individual is F.

Now one defines a function *m** that assigns weights to structure and state descriptions in two
steps:

- All structure descriptions get the same weight; hence, in our case each gets weight 1/4.
- Each state description within a given state structure gets the same weight. So, since {1} has only one state description, (1) gets weight 1/4; by contrast, as {2,3,4} has three state descriptions, (2), (3), and (4) get each weight 1/12, that is, 1/4 divided by 3.

Note that:

- Such assignments are a priori, much as in the classical theory.
- It
turns out that
*m**satisfies the axioms of probability. - The logical interpretation is an objective interpretation. (For Carnap, however, probability is essentially tied to a language, in this case to the very simple one we considered)
- Since
any statement in the language is expressible in terms of state
descriptions,
*m** can be extended to any statement.

At this point, given any two statement *h* and *e*, one can
introduce a *confirmation function* *c** such that

*c** (*h,e*) = [*m**(*h*&*e*)]/*m**(*e*).

Clearly, *c** (*h,e*) does the job of Pr(*h|e*). *c**
is introduced expressly to account for our
ability to learn from experience. *c**
can be generalized to a family of functions, but considering that is beyond
our goals here.

Most of the problems of the logical theory center on the attempt to provide a framework for inductive logic:

- It is
unclear why one should pick
*c** as*the*confirmation function, as it is not the only one that allows learning from experience. In other words, how does one decide a priori what the calibration of the confirmation function should be? - It
is unclear what
*e*amounts to in specific cases. For example, if I toss a die and get 5, is my evidence that I got 5, or that the die made a noise when it landed, or that it had a certain trajectory, or…?

*Propensity
interpretation *(Popper)

In this view, probability is a *physical disposition* to produce outcomes
of a certain kind.

NOTE:

- Presumably, such dispositions are causally effective.
- This
interpretation allows one to make sense of single case-probabilities such
as the probability that this atom will be observed at position
*a*is 2/3. - This is an objective interpretation, as propensities are taken to be features of the world.

For some, the outcomes are *long run* (but not infinite) frequencies: A fair coin has a
propensity to land with T half the times in the long run. Note that ˝ does not measure this tendency,
whose strength, as it were, is close to 1.

For others, the outcomes are *single* outcomes: the propensity of a fair coin to come up with T is
1/2.

Problems:

- It is unclear what such propensities are, and therefore it is hard to see how this interpretation clarifies what probability is.
- It is unclear whether Bayes’ Theorem, which ties a conditional probability and its inverse, can be couched in propensity terms because propensities seem tied to causation, which is asymmetric in such as way that at times while it makes sense to say that B causes A, it makes little sense to say that A causes B. So, if Pr(P|D) measures the propensity of of disease D to produce a positive test result, then Pr(D|P) seems to make little sense if understood as the propensity of the positive test result to produce the disease.

*The Subjectivist
interpretation *(de Finetti, Jeffrey)

Probability is *degree of belief held by a rational agent*,
that is, an agent whose degrees of belief (minimally):

- Satisfy the axioms of probability
- Are
updated by
*conditioning*: Pr(A) becomes Pr(A|E) in the face of new evidence E.

NOTE:

- Most
people violate probability calculus, especially with respect to
conditional probability, and therefore a
*normative*component (the rationality of the agent) is necessary. - Probability is then presented as the logic of partial belief with classical logic as a limiting case.

Many subjectivists (e.g., de Finetti) analyze degrees of belief (probabilities) in terms of (possible) betting behavior. Consider a bet where one wins W=1 if A is true and loses L if it is false. The probability you attribute to A is what you think the fair value of L expressed in units of W, that is, the value of L if you did not know which side of the bet you would have to take. For example, suppose you consider the arrangement whereby one wins $1 if A it true and loses $1/3 if A is false a fair one. Then, you believe that Pr(A) =1/4. In fact the arrangement is fair when

1Pr(A)=1/3 (1-Pr(A)),

that is,

4/3Pr(A)=1/3,

or

Pr(A)=1/4.

Here probability is understood in terms of utility (in the example, $$) and rational preference. (Since you don’t know which side of the bet you’ll get, you’ll settle for a fair bet).

Others (e.g., Ramsey) try to obtain both probability and utility from rational preference in a two step procedure whereby first one obtains probability from rational preferences, and then utility from probability and rational preference. Roughly, here is the procedure. First, Ramsey introduces ‘ethically neutral’ statements, namely statements which per se are indifferent to you, so that their only significance is their association with the outcomes of gambles. Suppose now that you prefer A over B, that statement P is ethically neutral and that you are indifferent between the gamble: get A if P is true and B if P is false and get A if P is false and B if P is true. Then, by definition, Pr(P)=1/2. Note that now P can be used to set up a lottery just in the same way a fair coin can. The probabilities of many other ethically neutral statements can be obtained analogously. Once the set of ethically neutral statements for which we know the probability is large enough, they can be used in place of lotteries in determining utilities. Hence, one determines utilities and then the probabilities of the remaining ethically neutral statements. Finally, by knowing utilities, one can obtain the probabilities of non-ethically neutral statements by appealing to the expected values of bets.

Other constraints beyond (1)-(2) have been proposed, but dealing with them would take us too far afield.