CEE 202: Engineering Risk and Uncertainty

(1)

1

457.212 Statistics for Civil & Environmental Engineers In-Class Material: Class 26 (end-of-semester)

Regression Analysis (2) (A&T: 8.4-8.7) & Extreme Value Distribution (A&T 4.2.3)

1. Multiple Linear Regression

“Linear regression of Y on

X

₁

,

…,

X

_m”

(a) Define

∆

² by assuming

σ = σ

_{Y x}²_| ² (constant) or

σ = σ

²_{Y x}_| ²

g

²

( ,..., x

₁

x

_m

)

(non-constant) (b) Find

1 0 1 1

E[ | Y x ,..., x

_m

] = β + β + + β x ...

_m

x

_m

(c) Estimate

β β ˆ

₀

, ˆ

₁

,..., β ˆ

_mby solving

2 2 2

0 1

0

m

∂∆ ∂∆ ∂∆

= = = =

∂β ∂β



∂β

(d) 1

2 2

| ,...,

1

Y x xm

s n m

= ∆

− −

(Note: m=1 for single linear regression)

2. Nonlinear Regression & Applications of Regression Analysis (Read A&T 8.6-8.7) 3. Correlation Analysis

(a) (True or theoretical) correlation coefficient

E[( )( )] E[ ]

[ , ]

_X _Y _X _Y

XY

X Y X Y X Y

X Y XY

Cov X Y − µ − µ − µ µ

ρ = = =

σ σ σ σ σ σ

(a) Unbiased estimator of

ρ

_XY

, ρ ˆ

1

ˆ 1

n i i i

x y

x y nXY

n s s

=

−

ρ = −

∑

(b)

ρ ˆ

and βˆ

2 2

( )( ) ( )( ) ˆ

ˆ ˆ

( 1) ( )

i i i i

X X X X

X X Y i Y Y

x X y Y x X y Y

s s s s

s n s s x X s s

Σ − − Σ − −

ρ = ρ = = = β

− Σ −

(2)

(d)

ρ ˆ

² and

r

²

= − 1 s

_{Y x}²_|

/ s

_Y²

2

| 2

2

ˆ 1 2

1

Y x Y

n s

ρ = − −

− ^{. As}

n → ∞

,

2

|

2 2

ˆ 1 ^{Y x}2

Y

s r

ρ → − s = 4. “Model-based” vs “Data-based” prediction

(a) Model-based prediction: assumes a smooth model and fit

e.g. Linear regression ₀

1 N

i i i

Y x

=

= β +

∑

β

- maybe inaccurate, but stable

(b) Data-based prediction (interpolation): does not assume a model, just interpolate from adjacent data points

e.g. k-nearest neighbor model

( )

1

i k

i x N x

Y y

k

∈

= ∑

- Accurate, but may be unstable

5. Statistical/Machine Learning

Build a prediction model by (1) Clustering, (2) Classification, and (3) Regression

(3)

3 6. Extreme Value Distributions

Given: The distribution model of a random quantity X, i.e. PDF or CDF

Question: From a sample of size

n

, the distribution of the minimum or maximum?

 Deriving an ( ) distribution

e.g. maximum flood (or drought) in the next 100 years, maximum traffic load on bridge in the next 50 years

(More generally, the distribution of the k-th largest or smallest from a sample  “_______

Statistics”)

(a) Deriving “Exact Distributions”

 Maximum:

Y

_n

= max ( X X

1

,

2

,..., X

_n

)

Under the assumption that

X X

₁

,

₂

,..., X

_n are statistically ( ) and ( ) distributed,

[ ]

1 2

( ) ( ... )

Yn n

n

F y = P X ≤ y X ≤ y X ≤ y

=

The corresponding PDF is therefore,

[ ]

¹

( )

ⁿ

( ) ( )

n

Y n

Y X

dF y

f y n f y

dy

= =

−

 Minimum:

Y

1

= min ( X X

1

,

2

,..., X

_n

)

[ ]

1 1 2

1 ( ) ( ... )

Y n

n

F y P X y X y X y

− = > > >

=

Therefore, the CDF of Y₁ is

[ ]

1( ) 1 ⁿ FY y = −

The corresponding pdf is

[ ]

1 1

( )

1

( )

^Y

ⁿ

( )

Y X

dF y

f y n f y

dy

= =

−

(4)

Example 1: Suppose the PDF of a random variable X is given as below.

2

( ) 1 , 1

fX x x

= x ≥

When someone constructs a sample of size n, derive the CDF and PDF of the largest in the sample, i.e.

Y

_n

= max ( X X

1

,

2

,..., X

_n

) .

(5)

5 (b) Asymptotic Distributions

An asymptotic distribution can be derived for large samples, i.e.

n → ∞

, using Cramer’s method (1946). For the example above, the exact (i.e. derived) and asymptotic

distributions are compared as follows.

The asymptotic distributions of the extremes tend to converge on certain limiting forms (Gumbel 1958):

 Type I: The ( ) exponential form, exp−e⁻^{A n y}^{( )}  - Gumbel distribution (largest)

 Type II: The exponential form, exp−A n( ) / y^k - Fisher-Tippett distribution (largest)

 Type III: The exponential form with upper/lower bound, exp−A n( ) / (ω −y)^k

- Weibull distribution (smallest)

The type is determined by the ( ) behavior of the original probability density function.

 Exponentially decaying tail (e.g. Normal)  Type I

 Polynomial tail (e.g. Example 3)  Type II

 Polynomial tail with the limited extreme value  Type III