SOA Exam P Notes Dargscisyhp July 5, 2016
1
Mathem Mathemati atical cal iden identities tities
• Series
n−1
–
n−1
n
ari = a rr−1 = a 11−−rr
i=0
– If r < 1 then
∞
ari =
i=0 n
–
i=1 n
–
i=1 n
–
=
n(n+1)
2
i2 =
n(n+1)(2n+1)
i3 =
n2 (n+1)2
6
4
i=1
∞
–
ax x!
x=0
= e a
• Integrals –
eax + a2 d Proof: Consider da axeax −eax . Equating Equating a2
xeax dx = dx =
∗
a
1−r
xeax a
−
C
• Gamma Function – Γ(α Γ(α) =
∞ α−1 −y y e dy 0
– If n is a position integer Γ(n Γ( n) = (n
2
− 1)!.
Basi Basic c Prob Probab abil ilit ity y
• De Morgan’s laws – (A ∪ B ) = A ∩ B
n
n
∩ ∪ ∩ ∩ Ai
–
=
i=1
i=1
B ) = A
– (A
n
Ai
–
n
=
i=1
•A
i=1
Bi =
i
(A
ax
d d e eax dx = dx = xeax dx. But dx. But we also have da eax dx = dx = da = a th last express expressions ions of both equations equations proves proves the identi identity ty..
Ai B
Ai Bi )
i
1
•A
∪ Bi =
i
(A
i
• If B
n mutually
∪B ) i
exclusive and exhaustive then A then A = A = A
∩ Bi
i
= A ∩ (B ∪ B ) = (A ( A ∩ B ) ∪ (A ∩ B ) • A = A P [B ] = 1 − P [ P [B ] • P [ P [A ∪ B ] = P [ P [A] + P [ P [B ] − P [ P [A ∩ B ] • P [ P [A ∪ B ∪ C ] = P [ P [A] + P [ P [B ] + P [ P [C ] − P [ P [A ∩ B ] − P [ P [A ∩ C ] − P [ P [B ∩ C ] + P [ P [A ∩ B ∩ C ] • P [ exlusive A we have P have P A = P [ P [A ] • For mutually exlusive A n
i
n
∩ i
i=1
i
i=1
n
• If B
n forms
3
a parition then P then P [[A] =
P [A [ A
Bi ]
i=1
Cond Condit itio iona nall prob probab abil ilit ity y P [B |A] is read as the probability of B given A. • P [ P [B |A] = [ [ ∩] ] • P [ P [B ] = P [ P [B ∩ A] + P [ P [B ∩ A ] = P [ P [B |A]P [ P [A] + P [ P [B |A ]P [ P [A ] • P [ P [A |B ] = 1 − P [ P [A|B ] • P [ P [A ∪ B |C ] = P [ P [A|C ] + P [ P [B |C ] − P [ P [A ∩ B |C ] • P [ conditionality • Bayes rule: reversing conditionality P B A P A
– Simple version of Bayes rule: | ] [ ] ∗ P [ P [A|B ] = [ | ] [[ ]+ [ | ] [ ] just P [[B ∩ A] + P [ P [B ∩ A ] = P [ P [B ]. Top is P is P [[B ∩ A]. ∗ Bottom is just P P B A P A P B A P A P B A P A
– Bayes Rule can be extended using the same justification as above:
P [ P [Aj B ] =
|
P [ P [B Aj ] = P [ P [B ]
∩
P [ P [B Aj ] P [ P [Aj ] n
i=1
• If P
n−1
| · P [ P [B |A ] · P [ P [A ] i
i
Ai > 0 > 0 then
i=1
n
P
Ai = P = P [[A1 ]P [ P [A2 A1 ]P [ P [A3 A2
i=1
|
| ∩ A1] . . . P [ A |A1 ∩ A2 ∩ . . . ∩ A −1] n
n
• A and B are considered independent events if any of the following equivalent hold – P [ P [A ∩ B ] = P [ P [B ]P [ P [A] – P [ P [A|B ] = P [ P [A] – P [ P [B |A] = P [ P [B ]
– Keep in mind that independent events are not the same thing as disjoint events.
2
4
Com Combina binato tori rics cs
• Permutations – The number of ways n distinct objects can be arranged is n!. n !. – The number of ways we can choose k objects from n in an ordered manner is
n!
P ( P (n, k ) =
(n
− k)! n
– If you have n have n 1 objects of type 1, n 1, n 2 objects of type 2 etc such that
number of ways of arranging these objects in an ordered manner is
ni = n then n then the
i=1
n! n
ni !
i=1
• Combinations
– Combinations give you the number of ways you can choose k objects from a set of n
where the order is irrelevant. This number is
n n! = k (n k)!k )!k!
−
– If you have n have n 1 objects of type 1, n2 objects of type 2 etc then the number of ways we
can combine k combine k 1 objects of type 1, k 1, k 2 objects of type 2 etc is k
ni ki
i=1
5
Rand Random om Varia ariabl bles es
• Cumulative distribution function:
– F ( F (X ) =
p( p(ω) for discrete variable
ω
– F ( F (X ) =
f ( f (t)dt for dt for continuous variable.
∞
F (X ) • Survival function: S (X ) = 1 − F ( [ F ((x)] = F (X ) = −S (X ) = f ( f (x) • [F P [A ≤ x ≤ B] B ] = F ( F (b) − F ( F (a) • P [ • Expectation Value (X is random variable, p(x) and f(x) are distribution functions and h(x) d dx
is a function of a random variable) – Discrete: E [X ] =
– Continuous: E [X ] =
xp( xp(x) ∞
xf ( xf (x)dx
∞
– E [h(X )] )] =
h(x) p( p(x) =
discrete
– If X is defined on [a, [ a,
h(x)f ( f (x)dx
continuous
∞
for x < a) then E then E [[X ] = a + ∞] (f(x)=0 for x 3
− [1
a
)]dx F ( F (x)]dx
b
−
– If X is defined on [a, [ a, b] then E then E [[X ] = a + [1
F ( F (x)]dx )]dx
a
– Jensen Jensen’s ’s inequa inequalit lity y states states that that if h”(X ”(X )
E [h(X )] )] h(E [X ]). If h”(X ”(X ) > 0 = for h” < 0 < 0..
≥
⇒
≥ 0
wher wheree X is a rand random om varia ariabl blee then then E [h(X )] )] > h(E [X ]). ]). Inequality Inequality rever reverses ses
n
is E [[X ]. • nth moment of X is E E [X ] = µ is µ is E [( E [(X X − • nth central moment of X where E [ − µ) • Variance
n
].
– Variance measures dispersion about the mean. – V ar[ ar[X ] = σ 2 = E [(X [(X
− − µ)2] = E [X 2] − (E [X ])])2.
– σ is standard deviation.
then V ar[ ar[aX + + b] = a 2 V ar[ ar[X ] ∈ R then V
– If a, b
• Coefficient of variation: σ/µ Skew[X ] = 1 E [(X [(X − • Skew[ − µ)3] • Moment generating function σ3
– M X (t) = E [etx ] ∞
∗ M ∗ M
X (t)
=
X (t) =
f ( f (x)etx dx (continuous) dx (continuous)
∞
etx p( p(x) (discrete) ∞
– M X (0) = 1 because M because M X (0) =
e0x f ( f (x)dx = dx = 1
∞
– Moments of X can be found by successive derivatives of moment generating function,
hence the name.
∗ M (0) = E = E [X ] = E [X 2 ] ∗ M ”(0) = E = E [X ] ∗ M (0) = E ∗ X X n X
n
d2 [ln(M [ln(M X (t))]t=0 = V = V ar( ar(X ) dt2
–
∞
M X (t) =
k=0
tk E [X k ] k!
– If X has a discrete distribution x 1 , x2 ,...,xn with probability p1 , p2 ,...,pn the moment
generating function is M is M X (t) =
pi etxi .
– If Z=X+Y then M then M Z Z (t) = M X (t)M Y Y (t (t) Z Z
] = E [et(X +Y ) ] = E [etX etY ] = E [etX ]E [etY ] = M X (t)M Y Y (t (t)
tZ
∗ Proof: M (t) = E [e
r >0, X random variable, P | • Chebyshev’s inequality: For r> |X − − µ| > rσ] rσ ] ≤ 1 . f (X ) is probability function of x then the conditional probability distribution of X given • If f the occurrence of event A is (( )) if X X ∈ ∈ A and 0 otherwise. • Simple definition of independent variables – P [( P [(a a < X < b) b) ∩ (c < Y < d)] d)] = P [ P [a < X < b] b] · P [ P [c < Y < d] d] r2
X
f X P A
4
– If X and Y are independent variables then if A is an event involving only X and B is
an event involving only Y then A and B are independent events.
• Hazard rate of failure for continu continuous ous random variable variable (normal notation: f(x)= probability probability distribution, F(X) = CDF)
h(x) =
f ( f (x) = 1 F ( F (X )
−dxd ln [1 − F ( F (x)]
−
given given random variables ariables X i ik=1 with density functions f i (X ) and weights αi < 1 such that ik=1 αi = 1 we can construct a random variable X with the following density function
distributions: s: • Mixture of distribution
{
}
– f ( f (x) =
– E [X n ] =
– M X (t) =
6
k i=1 αi f i (x) k n i=1 αi E [X i ] k i=1 αi M Xi (t)
{ }
Disc Discre rete te Dist Distri ribu buti tion onss
• Uniform Distribution – Uniform distribution of N points – p(X ) = – E [X ] =
1 N N +1
2
– V ar[ ar[X ] =
N 2 −1
12
–
N
M X (t) =
j =1
ejt et (eN t−1) = N N ( N (et 1)
−
• Binomial Distribution – If a single single trial of an experiment experiment has success probabilit probability y p and unsuccessful unsuccessful probability probability
1-p then if n is a number of trials and X the number of successes X is binomially distributed – P ( P (X ) =
n X
pX (1
– E [X ] = np
– V ar[ ar[x] = np(1 np(1 – M X (t) = (1 –
P k P k 1 −
=
p) − − p)
− p) p)
n x
t n
− p + pe )
p p
− 1−
+
(n+1) p k(1− p)
=
(n+1) p−kp k(1− p)
• Poisson distribution – Often used as a model for counting umber of events in a certain period of time. – Example: Example: Number Number of customers customers arriving arriving for service service at a bank over a 1 hour p eriod is
X. – The Poisson parameter λ parameter λ > 0. 0 . –
p( p(x) =
5
e−λ λx x!
– E [X ] = V ar[ ar[x] = λ –
t
M X (t) = e λ(e –
P k P k 1 −
−1)
λ k
=
• Geometric distribution – In a series of independent experiments with success probability p and failure probability
q = 1 p, if p, if X represents the number of failures until the first success then X has a geometric distribution with parameter p.
−
– p(X ) = (1 – E [X ] =
p) − p)
1− p p
– V ar[ ar[X ] = – M X (t) =
=
1− p
X
for X=0,1,2,3,...
q p
= pq2
p2 p
1−(1− p)et
• Negative binomial distribution with parameters r and p – If X is the number number of failures failures until the rth success success occurs where each trial has a success success
probability p then X has a negative binomial distribution r +x−1 r p (1 p) p)X for X=0,1,2,3 x r+x−1 = (r+x−1)(r+xx−! 2)...(r+1)(r) r −1
– p(x) = –
– E [X ] =
−
r (1− p) p
– V ar[ ar[x] =
r(1− p) p2
–
M X (t) = –
pk pk 1 −
=1
1
p (1 p) p)et
− −
r
− p + ( −1)(1− ) r
p
k
• Multinomial Distribution – Parameters: n, p n, p 1 , p 2 ,... p pk (n> (n>0, 0
≤ p ≤ 1, i
i pi =
1)
– Suppose an experiment has k possible outcomes with probabilities pi . Let Let X i denote
the number of experiments with outcome i so that
– P [ P [X 1 = x 1 , X 2 = x 2 ,...,X k = x k ] = p( p (x1 , x2 ,...,xk ) =
i
X i = n. n! P x1 P 2x2 ...P kxk x1 !x2 !...xk ! 1
– E [X i ] = np i – V ar[ ar[X i ] = npi (1 – C ov[ ov[X i , X j ] =
− p ) i
−np p
i j
• Hypergeometric distribution – Rarely Rarely used used – If there are M total objects with k objects of type I and M-k objects of type II ad if n
objects are chosen at random then X denotes the number of type I chosen then X has a hypergeometric distribution. – X
≤ ≤ n, X ≤ ≤ k, − k) ≤ X k , 0 ≤ X, n − (M − k (kx)(M n x) M (n) −
– p(x) =
−
nk M M −k)(M −n) = nk(M 2 (M −1)
– E [X ] = – V ar[ ar[x]
6
7
Contin Continuou uouss Distri Distribut bution ionss
• The probabilities for the following boundary conditions are equivalent for continuous distributions: P [ P [a < x < b] b ] = P [ P [a < x ≤ b] b] = P [ P [a ≤ x < b] = P [ P [a ≤ x ≤ b] b ]. • Uniform distribution – f ( f (x) = – E [x] =
1 b−a
for a for a < x < b. b.
a+b
2 =median
– Var[x Var[x] =
(b−a)2 12
–
ebt eat M x (t) = (b a) t
− − ·
– Symmetric about mean – P [ P [c < x < d] d] = – Since
x a
d−c b−a
f ( f (x)dx = dx =
x−a the b−a
characteristic function is
F ( F (x) =
• Normal distribution N(0,1)
0 x−a b−a
1
x
b
≤ ≤ b
– Mean of 0, Variance of 1. – Density function
φ(z ) = – M z (t) = exp
• z-tables
√ 12π e
z2
−
2
t2
2
– A z-table gives P gives P [[Z < z ] = Φ(z Φ(z ) for normal distribution N(0,1). – When using table use symmetry of distribution for negative values i.e. Φ( 1) = P [ P [z
−
−1] = P [ P [z ≥ 1] = 1 − Φ(1).
≤
distribution N ((µ, σ2 ) • General normal distribution N – Mean and mode: µ, Variance: σ 2 –
1 f ( f (x) = exp σ 2π
√
– M x (t) = exp µt +
σ2 t2
2
−
(x
− µ)2
2σ 2
– To find P [ P [r < x < s] s] first standardize the distribution by putting things in terms of µ Z = x− and then use use the z-table z-table as follows: follows: P [ P [r < x < s] s] = P σ s−µ r −µ Φ σ Φ σ
−
r −µ σ
<
x−µ σ
<
s−µ σ
=
– If X = X 1 + X 2 where E [X 1 ] = µ1 , var[X var[X 1 ] = σ12 , E [X 2 ] = µ2 , var[X var[X 2 ] = σ22 and
X 1 and X 2 are normal random variables then X X is normal with E [x] = µ1 + µ + µ2 and 2 2 var[X var[X ] = σ 1 + σ2 .
7
µ and varianc variancee σ2 the distribu distributio tion n can be ap2 proximated by the normal distribution N ( N (µ, σ ). If X takes takes discrete discrete intege integerr values alues then you can improve on your normal approximation in Y by using the ”integer correction:” P [ P [n x m] m] P [ P [n 12 Y m + 12 ].
Given n a random random variabl ariablee X with with mea mean n • Give
≤ ≤
− ≤ ≤
• Exponential distribution – Used as a model for the time until some specific event occurs. – f ( f (x) = λe −λx for x > 0, f ( f (x) = 0 otherwise.
− e− – S (x) = 1 − F ( F (x) = P [ P [X > x] = e − λx
– F ( F (x) = 1
λx
– E [x] =
1
λ
– var[x var[x] = – M x (t) = – E [X k ] =
1
λ2 λ λ−t
fot t fot t < λ. λ.
∞ k x λe−λx dx = dx = λkk! for 0
k=1,2,3,... λ(x+y)
[X>x +y ] e −λy = P P . In other words, [X>x ] = e λx = e based on the definition of the survival function S function S (x), P ), P [[X > x + y X > x] = P [ P [X > y]. This result is interpret interpreted ed as showing showing that an exponential exponential process is memoryless memoryless.. (See page 205 in manual for details)
– P [ P [X > x + y X > x] =
|
P [X>x +y∩X>x ] P [X>x ]
−
−
|
– If X is the time between successive events and is exponential with a mean λ−1 for X
and N is the number of events per unit time then N is a Poisson random variable with mean λ mean λ.. – If we have a set of independent exponential random variables
{Y } with mean i
−1
and Y and Y = = min {Y } then Y is exponential with mean i
• Gamma distribution
λi
.
1 λ− i
i
– Parameters α Parameters α > 0, β > 0 – For x > 0
f ( f (x) =
β α xα−1 e−βx Γ(α Γ(α)
α β = βα2
– E [X ] = – var[x var[x]
– M x (t) =
β
α
β −t
– If this shows up on the exam then most likely α likely α = = n where n is an integer so the density n where
function becomes f ( f (x) =
8
β α xn−1 e−βx (n 1)!
−
Joint Joint,, Marginal Marginal,, Conditio Conditional nal distr distribu ibutio tions ns and Indepen Indepen-dence
• Joint distribution – Probability of a joint distribution is given by f ( f (x, y) which must be less than 1 at any
value and sum or integrate to 1 over the probability space.
8
– If f ( f (x, y ) is appropriately defined as specified above, P above, P [( [(x, x, y )
A] is the summation or ∈ A]
double integral of the density function over A. – The cumulative distribution function is defined as x
≤ x) F ( F (x, y) = P [( P [(X X ≤ x ) ∩ (Y ≤ y)] y )] = –
E [h(x, y )] =
x
R2
y
f (s, t)dtds −∞ −∞ f (
continuous
y
x
f ( f (s, t)
discrete
s=−∞ t=−∞
h(x, y )f ( f (x, y )
discrete
y
h( h(x, y )f ( f (x, y) dy dx continuous
– If X and Y are jointly distributed with a uniform density in R and 0 outside then the
pdf is
1
M (R)
where M ( M (R) represents the area of R. The probability of event A is the
M (A) . M (R)
– E [h1 (x, y) + h2 (x, y )] = E [h1 (x, y )] + E [h2 (x, y )]. )]. – E
xi =
i
–
E [xi ].
i
lim F ( F (x, y) = lim F ( F (x, y ) = 0
x→−∞
y →−∞
– P [( P [(x x1 < X
≤ ≤ x2) ∩ (y1 < Y ≤ y2)] = F ( F (x2 , y2 ) − F ( F (x2 , y1 ) − F ( F (x1 , y2 ) + F ( F (x1 , y1 )
• Marginal distribution – Derived for a single variable from a joint distribution. – Marginal distribution of X is denoted f X (x) and is f X (x) =
f ( f (x, y) in the discrete
y
case and f X (x) =
R
f ( f (x, y) dy in dy in the continuous case.
– The cumulative distribution function is F X (x) = lim lim F ( F (x, y ). y →∞
– Since X is a dummy variable, variable, all of this discussion discussion carries over over to other variabl variables es in the
joint distribution. – Marginal probability and density functions must satisfy all requirements of probability
and density functions.
• Conditional distribution – Gives the distribution of one random variable with a condition imposed on another
random variable. – Must satisfy conditions of a distribution. – Conditional mean, variance, etc can all be found using the usual methods. – Conditional distribution of X given Y=y is f is f X |Y (x (x Y = y) y ) =
|
f (x,y) . f Y Y ( y)
– If X and Y are independen independentt then f then f Y x ) = F Y (y ) and f and f X |Y (x (x Y = y) y ) = F X (x). Y (y Y |X (y X = x)
|
|
– If marginal and conditional distributions are known then joint distribution can be found:
f ( f (x, y ) = f Y x ) f X (x). Y |X (y X = x)
|
– E [E [X Y ]] Y ]] = E [X ]
|
·
– Var[X Var[X ] = E [Var[X [Var[X Y ]] Y ]] + Var[E Var[E [X Y ]] Y ]]
|
|
• Independence of random variables – X and Y are independent if the probability space is rectangular and f ( f (x, y ) = f X (x)f Y Y (y (y ).
9
– Independence is also equivalent to the factorization of the cumulative distribution func-
tion: F ( F (x, y) = F X (x)F Y Y (y (y ) for all (x,y). – If X and Y are independent then
)]E [h(y )]. ∗ E [g(x)h(x)] = E [g(x)]E Var[X + + Y ] Y ] = Var[X Var[X ] + Var[Y Var[Y ]] ∗ Var[X
• Covariance – High covariance indicates larger values of Y occur when larger values of X occur. Low
covariance indicates low values of Y occur when larger values of X occur. 0 covariance indicates that X is not related to the Y values it is paired with. – CovE CovE [(X [(X
− − E [x])(Y − E [Y ])] ])(Y − Y ])] = E = E [[XY XY ]] − E [x]E [Y ] Y ]
– Cov[X, Cov[X, X ] = Var[X Var[X ] – If a, a, b, c
ar[aX = bY + c] = a 2 V ar[ ar[X ] + b2 V ar[ ar [Y ] Y ] + 2abCob 2 abCob[[X, Y ] Y ] ∈ R ⇒ V ar[
– C ov[ ov[X, Y ] Y ] = C ov[ ov[Y, X ]. – If a,b,c,d,e,f
∈ ∈ R and X,Y,Z,W are random variables then C ov[ ov[aX + bY + c, dZ + +
eW + f ] f ] = adCov[ adCov[X, Z ] + aeCov[ aeCov [X, W ] W ] + bdCov[ bdCov [Y, Z ] + beCov[ beCov [Y, W ]. W ].
• Coefficient of correlation – ρ(X, Y ) Y ) = ρ X,Y = –
−1 ≤ ρ
X , Y
Cov[ X,Y ] σX σY
1 . ≤ 1.
• Moment generating function for jointly distributed random variables. – M XY (t1 , t2 ) = E [et1 X +t2 Y ]. XY (t –
∂ n+m (t1 , t2 ) E [ E [X Y ] = n m M XY XY (t ∂ t1 ∂ t2 n
m
t1 =t2 =0
– M XY (t1 , 0) = E [et1 x ] = M X (t1 ) and you can do this with Y too. XY (t – If M ( M (t1 , t2 ) = M ( M (t1 , 0) M (0 M (0,, t2 ) in a region about t1 = t2 = 0 then X and Y are
·
independent.
– IF Y IF Y = aX + + b then M then M Y (t) = e bt M X (at). at). Y (t
• Bivariate Normal Distribution – Occurs Occurs infrequently infrequently on exams 2 – If X and Y are normal random variables with E [X ] = µ x , V ar[ ar[x] = σX , E [Y ] Y ] = µ Y , 2 and V ar[ ar[Y ] Y ] = σY with correlation coefficient ρXY then X and Y are said to have a bivariate normal distribution.
– Conditional mean of Y given X=x: E [Y X = x] x ] = µ Y +ρ +ρXY (x (x µx ) = µ y +
|
µx ).
−
2 – Conditional variance of Y given X=x: V ar[ ar[Y X = x] x ] = σ Y (1
|
– If X and Y are independent ρ independent ρ XY = 0.
Cov (X,Y ) (x 2 σX
−
− ρ2
XY )
–
f ( f (x, y ) =
1 2πσ x σY
− 1
1
·exp − 2(1 − ρ2) ρ2 10
x
−µ
σX
x
2
2
− − +
y
µy
σY
2ρ
x
−µ
x
σX
− y
µy
σY