ON THE SMOOTHING PARAMETER IN CASE OF DATA
FROM MULTIPLE SOURCES
by
Breaz Nicoleta
Abstract. In this paper we focuse on data smoothing by spline function. The smoothing parameter λ that is involved in this kind of modeling is obtained here from the generalized cross validation (GCV) procedure. Also for data from two sources with different weights is already known a GCV formula for parameter λ given in a particular case when the smoothing function is a function on the circle. We extend this formula to a more general case and in the same time for more than two sources.
1. INTRODUCTION
We consider a regressional model written with n observational data
( )
x i nf
yi = i +εi, =1, where xi∈
[ ]
0,1 ,(
n)
N( )
I2 2
1,ε ,...,ε ~ 0,σ ε
ε= ′ , a gaussian n dimensional vector with zero mean and σ2I matrix of covariances. About the regression function we just know the information that f is in some space Wm defined as
[ ]
{
( ) ( )}
2 1 absolutelycontinuous ,
,..., , : 1 ,
0 f f f f f L
W
Wm= m = ′ m− m ∈ .
Then, we can speak about a spline smoothing problem. So we obtain an estimate of f by finding fλ∈Wm to minimize
( )
[
]
(
( )( )
)
, 01 1
1
0
2
2+ >
−
∑
=n λ∫
λi
m i
i f x f u du
y
n . (1)
It is known that the solution of this problem is the natural polynomial spline of degree 1
2m− with knots xi,i=1,n.
For the beginning we consider the smoothing parameter λ fixed. Then we can search a solution for (1) in a certain subspace of Wm, spaned by n appropriate chosen basis functions. According to [2] such basis functions are related to B-spline but here we are not interested in this. We just consider f of the form
∑
= = nk k kB
c f
Now we rewrite the problem in matriceal form using the following notations:
(
)
′= y y yn
y 1, 2,...,
(
)
′= c c cn
c 1, 2,...,
( )
ij j( )
i nj n i
ij B B x
B
B= =
≤ ≤≤ ≤ , 1
1 .
Also, from [2] we know that we can write the seminorm
( )
f(
f( )( )
u)
duJ =
∫
m 10
2
in matriceal form c′Σc for some matrix Σ. Then the problem is to find c to minimize y−Bc 2+λc′Σc. The solution for this problem is given as
(
BB n)
Byc= ′ + λΣ −1 ′ . When the smoothing parameter λ is too small we obtain some function f which is close to data despite of its smoothness and when λ is too big we obtain some function f which is very smooth but is not sufficient close to data.
Among the methods which provide an optimal λ from the data are the(cross validation)CV and (general cross validation) GCV procedures described in [2].
According to CV method, λ is the minimizer of the expression
( )
∑
(
[ ]( )
)
= −
= n
k
k k
k f x
y n CV
1
2 1
λ λ
with fλ[ ]k the spline estimate using all data but the k-th data point of y. As a generalization, GCV procedure use a more general function
( )
(
( )
)
( )
(
)
22
1 1
  
 − − =
λ λ λ
A I Tr n
y A I n GCV
where A
( )
λ is the influence or hat matrix given by the relation( )
( )
xfλ
  
From [2] we know that A
( )
λ has the form A( ) (
λ =B B′B+nλΣ)
−1B′. Also we remind that an estimate for σ2 is given by(
( )
)
( )
(
λ)
λ σ
A I Tr
y A I
− − =
2 2
ˆ .
In the problem formulated here the data came from a single source. In [1], Feng Gao formulate this problem for two sources and give a similar GCV method for
λ. Gao consider a particular case when f is a function on the circle.
In this paper we extend the results of Gao for more than two sources and in the same time we consider a more general case for domain of function f.
2.MAIN RESULTS
We consider the case when the data came from l sources with unknown weights as in
( )
( )
( )
, 1, ,.. ... ... ... ...
, 1 ,
, 1 ,
2 2
2 2
1 1
1 1
l li
li li
i i i
i i i
N i x
f y
N i x
f y
N i x
f y
= + =
= +
=
= +
=
ε ε ε
n N N
N1+ 2 +...+ l = ,
with residual gaussian vectors defined as
(
)
( )
(
)
( )
(
)
N( )
II N
I N
l lN
l l l
N N
l
2 2
1
2 2 2
22 21 2
2 1 1
12 11 1
, 0 ~ ,...,
,
... ... ... ... ... ...
, 0 ~ ,...,
,
, 0 ~ ,...,
,
2 1
σ ε
ε ε ε
σ ε
ε ε ε
σ ε
ε ε ε
′ =
′ =
′ =
We assume that σ12,σ22,...,σl2 are unknown and ε1,ε2,...,εl are independent.
( )
(
)
(
( )
)
( )
(
y
f
x
)
J
( )
f
.
...
x
f
y
x
f
y
N
...
N
N
min
l m
N
i
li li
l
N
i
i i
N
i
i i
l W
f
λ
+
−
σ
+
−
+
σ
+
−
σ
+
+
+
∑
∑
∑
=
= =
∈
1
2 2
1
2 2 2
2 2 1
2 1 1
2 1 2
1
1
1
1
1
1 2We search for f∈Wm of the form
∑
=
= n
k k kB
c f
1
and we use the notations
(
)
(
)
(
)
(
)
′=
′ =
′ =
′ =
n lN l l l
N N
c c c c
y y y y
y y y y
y y y y
l
,..., ,
,..., ,
... ... ... ...
,..., ,
,..., ,
2 1
2 1
2 22 21 2
1 12 11 1
2 1
( )
( )
( )
( )
( )
ij j( )
li nj N i ij l
i j ij n j
N i ij
i j ij n j
N i ij
x B B B
B
x B B B
B
x B B B
B
l =
=
= =
= =
≤ ≤≤ ≤
≤ ≤≤ ≤
≤ ≤≤ ≤
,
... ... ... ... ...
, ,
1 1
2 1
1 2
1 1
1 1
2 1
Now we have l matrices B. The model becomes
( )
I i l Nc B y
c B y
i i
l l l
, 1 , , 0 ~
. ... ...
2 1 1 1
= +
= + =
σ ε
ε ε
and the variational problem is now equivalent to find c the minimizer of
(
r y Bc r y B c r y Bc c c)
n − + − + + l l − l + ′Σ
⋅ α
θ
2 2
2 2 2 2 1 1
1 ...
1
l
σ σ σ θ = 1 2...
(
... 1)
, 1 , ... ...
2 1 1
1 2
1 = =
= − +
l i
l i i
i i l rr r
r
σ σ σ σ
σ σ
n
l⋅ ⋅
=σσ σ λ α 1 2... . We can prove the following proposition:
Proposition 1. For fixed r =
(
r1,r2,...,rl)
and α the solution of the variationalproblem (2) is
 ′ +
 ′ + ′ +
 ′ + ′ + + ′ + Σ
= l l l − l l l
r rB B rB B rB B rB y rB y rB y
cˆ ... 1 1 1 2 2 2 ...
1 2
2 2 1 1 1
,α α (3)
Proof. We denote by E the expression that has to minimize so we have the condition 0
= ∂ ∂
c E
. This condition is equivalent with
0 ...
1 1 1 1 1
1 ′ + ′ − − ′ + ′ + Σ =
−rB y rB Bc rlBl yl rlBl Blc α c
and the solution of this matriceal equation is
 ′ +  ′ + ′ + 
 ′ + ′ + + ′ + Σ
= rB B r B B rlBl Bl − rB y r B y rlBl yl
c ... 1 1 1 2 2 2 ...
1 2
2 2 1 1
1 α .
According to formula (3) it is important to get some estimates for
  
 =
− −
1 2 1 1
2 1
... 1
and ,..., ,
l l
l
r r r r r
r r
r and α . Using the similar methods with [1] we give
a GCV function that we can use for estimating
r
and α in the same time. We introduce the notations′   ′ ′ ′ =
′   ′ ′ ′ =
l l
B B B B
y y y y
,..., ,
,..., ,
2 1
2 1
(4)
( )
         
 
         
 
=
− − −
l l
N l N
l N
N
I r r r I
r I
r I
r
r I
1 2 1 1
2 1
... 0
... 0 0 0 0
0 1
... 0 0 0 0
... ...
... ... ... ... ...
0 0
... 0 0 1
0
0 0
... 0 0 0 1
1 2
1
(5)
Also we use the notations
( )
( )
r B IB
y r I y
B B r B
B r B B r M
r r
l l l
⋅ =
⋅ =
  ′ + ′ + + ′ + Σ =
− −
1 1
2 2 2 1 1
1 ... α
(6)
We can prove the following proposition:
Proposition 2. The influence matrix Ar
( )
r,α defined by( )
rr r
y r A
yˆ = ,α (7)
has the form
( )
    
 
    
 
′ ′
′
′ ′
′
′ ′
′
=
− −
−
− −
−
− −
−
l l l l
l l
l
l l
l l
r
B M B r B
M B r r B M B r r
B M B r r B
M B r B M B r r
B M B r r B
M B r r B
M B r
r A
1 2
1 2 1
1 1
1 2 2 2
1 2 2 1 1 2 2 1
1 1 1 2
1 1 2 1 1
1 1 1
...
... ...
... ...
... ... ,α
or
( )
= r − r′r
B M B r
A ,α 1 .
( )
r rr r r
y r A y B M
B = ⋅
  
 ′
⋅ −1 ,α and further
( )
= r − r′r
B M B r
A ,α 1 .
In order to get a GCV formula for r and α, first we construct for our case a CV-like formula.
Now we denote by c[ ]rk,α, the spline estimate of c, using all but the k-th data point of y; also we denote by yk the k-th data point and by
k
B and Bk,r the k-th row of B and Br respectively.
Then the (cross validation) CV function for our model would be
( )
∑
(
[ ])
= −
= n
k
k r k k B c
y n r CV
1
2 , 1
,α α
Next we prove the corresponding leaving-out-one lemma for our case that is: Lemma 3. Let be hr,α
[ ]
k,z the solution to the variational problem(
r y Bc r y Bc c c)
n l l l
c θ⋅ − + + − +α ′Σ
2 2
1 1
1 ...
1 min
with the k-th data point ykr replaced by z. Then we have [ ]
[
]
[ ]kr k r r k
r k B c c
h,α , , ,α = ,α
Proof. We have
[ ] [ ]
(
)
(
[ ])
( )
[ ] [ ][ ]
(
)
( )
[ ] [ ]∑
∑
≠
= α α α
≠
= α α α
α α
≤ ′
α + −
=
= ′
α + −
+ −
n
k i i
k , r k , r k
, r r , i r i
n
k i i
k , r k
, r k
, r r , i r i k
, r r , k k
, r r , k
c c c
B y
c c c
B y c
B c B
1
2 1
2 2
Σ
(
)
[ ]
(
)
(
)
( )
nn k i i r i r i r , k k , r r , k n k i i r , i r i R c , c c c B y c B c B c c c B y ∈ ∀ ′ α + − + − ≤ ≤ ′ α + − ≤
∑
∑
≠ = α ≠ =Σ
Σ
1 2 2 1 2Based on the leaving-out-one lemma now we can prove the following theorem: Theorem 4. We have the identity:
( )
(
( )
[
( )
]
( )
)
( )
[
( )
]
( )
( )
∑
= −− −−       ⋅ − ′ − ⋅ = n k k r k r k r I r A I r I y r I r A I r I n r CV 1 2 1 1 2 1 1 , , 1 , α α αwith Ik
( )
r 1− , the k-th row of the matrix I−1
( )
r .Proof. We consider the following identity
[ ] [ ] α α α α , , , , , , r r k r k r r k r k k r k k k r k k c B y c B y c B y c B y − − ⋅ − =
− . (8)
Further we have
[ ]
[ ]k r k k r k s k s r r k r k k r k k c B y c B r y r c B y c B y k k α α α α , , , , , − − ⋅ − = − where
{
}
{
}
{
}
      + ∈ + ∈∈ = − l l k N N k l N N k N k s ,..., 1 if ... ... ... ... ,..., 1 if 2 ,..., 2 , 1 if 1 1 2 1 1 .Moreover, we can write [ ]
[ ] [ ]k r k k k r r k r r k s r r k r k k r k k c B y c B c B r c B y c B y k α α α α α , , , , , , , , − − − − =
[ ]
[ ]
[
]
[ ]
[
]
[ ] =
− − =
− −
k r k k
k r r k r r k r
r k r r k
k r k k
k r r k r r k
c B y
c B k h B c B k h B
c B y
c B c B
α
α α
α α
α α α
,
, , , , ,
, , ,
, , , , ,
, ,
[ ]
[
[ ]]
[ ]
k
s k r r k r k
k r r k r r k r k r r k
r c B y
c B k h B y k h B
α
α α
α
, ,
, , , , ,
,
, ,
− − =
Because Bk,rcr,α is linear in each data point, we can replace the divided difference below by a derivative. So we can write that
[ ]
[ ] r sk
k r r k
k r k k
k r r k r r k
r y
c B
c B y
c B c B
⋅ ∂ ∂ = −
− α
α α
α ,
,
, , , , ,
. (10)
Moreover, we have
kk r
k r r k
a y
c B
= ∂ ∂ ,α
,
, (11)
the kk-th entry from Ar
( )
r,α matrix.So, according to (8), (9), (10) and (11) we can write [ ]
(
a) ( )
I rc B y c
B y
kk kk
r r k r k k r k
k 1
, ,
,
1− ⋅ −
− =
− α α .
Moreover,
( )
r y[
I A( )
r]
I( )
r yA y c B
y k kr
r r
k r k r r k r k
1 ,
, ,
, = − , ⋅ = − , ⋅ −
− α α α
where Ak,r
( )
r,α ,Ik is a k-th row of matrix Ar( )
r,α and I respectively. So we have[ ]
[
( )
]
( )
( )
a I( )
r y r I r A I c B ykk r kk r k k k r k
k 1
1 ,
,
1 ,
− −
⋅ −
⋅ −
=
− α α .
After multiplying the numerator and the denominator by Ikk
( )
r 1( )
(
)
( )
( )
(
( )
)
( )
  − ⋅ ′
= ′
− − −
=
− −
∑
Tr I r I A r I rn r I r A I r I n
r n
k
k r
k
1 1
1
1
1 ( , ) 1 ,
1 α α
and we have a GCV-like function
( )
( )
[
( )
]
( )
( )
(
( )
)
1( )
21
2 1 1
, 1
, 1
,
  
  − ⋅ ′
− =
− −
− −
r I r A I r I Tr n
y r I r A I r I n r
GCV
r r
α α α
This function is a generalization for the GCV
( )
λ function from [2] (one source) and the GCV( )
r,α function from [1] (two sources).REFERENCES
[1] Gao Feng, On combining data from multiple sources with unknown relative weights, Technical Report, no. 894/1993, University of Wisconsin, Madison
[2] Wahba Grace, Spline models for observational data, SIAM, Philadelphia, CBMS-NSF Regional Conference, Series in Applied Mathematics, vol. 59.