Karhunen–Loève theorem
In the theory of stochastic processes, the Karhunen–Loève theorem (named after Kari Karhunen and Michel Loève), also known as the Kosambi–Karhunen–Loève theorem[1][2] is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.[3]
Stochastic processes given by infinite series of this form were first considered by Damodar Dharmananda Kosambi.[4][5] There exist many such expansions of a stochastic process: if the process is indexed over [a, b], any orthonormal basis of L2([a, b]) yields an expansion thereof in that form. The importance of the Karhunen–Loève theorem is that it yields the best such basis in the sense that it minimizes the total mean squared error.
In contrast to a Fourier series where the coefficients are fixed numbers and the expansion basis consists of sinusoidal functions (that is, sine and cosine functions), the coefficients in the Karhunen–Loève theorem are random variables and the expansion basis depends on the process. In fact, the orthogonal basis functions used in this representation are determined by the covariance function of the process. One can think that the Karhunen–Loève transform adapts to the process in order to produce the best possible basis for its expansion.
In the case of a centered stochastic process {Xt}t ∈ [a, b] (centered means E[Xt] = 0 for all t ∈ [a, b]) satisfying a technical continuity condition, Xt admits a decomposition
- Xt=∑k=1∞Zkek(t){displaystyle X_{t}=sum _{k=1}^{infty }Z_{k}e_{k}(t)}
where Zk are pairwise uncorrelated random variables and the functions ek are continuous real-valued functions on [a, b] that are pairwise orthogonal in L2([a, b]). It is therefore sometimes said that the expansion is bi-orthogonal since the random coefficients Zk are orthogonal in the probability space while the deterministic functions ek are orthogonal in the time domain. The general case of a process Xt that is not centered can be brought back to the case of a centered process by considering Xt − E[Xt] which is a centered process.
Moreover, if the process is Gaussian, then the random variables Zk are Gaussian and stochastically independent. This result generalizes the Karhunen–Loève transform. An important example of a centered real stochastic process on [0, 1] is the Wiener process; the Karhunen–Loève theorem can be used to provide a canonical orthogonal representation for it. In this case the expansion consists of sinusoidal functions.
The above expansion into uncorrelated random variables is also known as the Karhunen–Loève expansion or Karhunen–Loève decomposition. The empirical version (i.e., with the coefficients computed from a sample) is known as the Karhunen–Loève transform (KLT), principal component analysis, proper orthogonal decomposition (POD), empirical orthogonal functions (a term used in meteorology and geophysics), or the Hotelling transform.
Contents
1 Formulation
2 Statement of the theorem
3 Proof
4 Properties of the Karhunen–Loève transform
4.1 Special case: Gaussian distribution
4.2 The Karhunen–Loève transform decorrelates the process
4.3 The Karhunen–Loève expansion minimizes the total mean square error
4.4 Explained variance
4.5 The Karhunen–Loève expansion has the minimum representation entropy property
5 Linear Karhunen–Loève approximations
6 Non-Linear approximation in bases
6.1 Non-optimality of Karhunen–Loève bases
7 Principal component analysis
7.1 Covariance matrix
7.2 Principal component transform
8 Examples
8.1 The Wiener process
8.2 The Brownian bridge
9 Applications
9.1 Applications in signal estimation and detection
9.1.1 Detection of a known continuous signal S(t)
9.1.2 Signal detection in white noise
9.1.3 Signal detection in colored noise
9.1.3.1 How to find k(t)
9.1.3.2 Test threshold for Neyman–Pearson detector
9.1.3.3 Prewhitening
9.1.4 Detection of a Gaussian random signal in Additive white Gaussian noise (AWGN)
10 See also
11 Notes
12 References
13 External links
Formulation
- Throughout this article, we will consider a square-integrable zero-mean random process Xt defined over a probability space (Ω, F, P) and indexed over a closed interval [a, b], with covariance function KX(s, t). We thus have:
- ∀t∈[a,b]Xt∈L2(Ω,F,P),{displaystyle forall tin [a,b]qquad X_{t}in L^{2}(Omega ,F,mathbf {P} ),}
- ∀t∈[a,b]E[Xt]=0,{displaystyle forall tin [a,b]qquad mathbf {E} [X_{t}]=0,}
- ∀t,s∈[a,b]KX(s,t)=E[XsXt].{displaystyle forall t,sin [a,b]qquad K_{X}(s,t)=mathbf {E} [X_{s}X_{t}].}
- We associate to KX a linear operator TKX defined in the following way:
- TKX:L2([a,b])→L2([a,b]):f↦TKXf=∫abKX(s,⋅)f(s)ds{displaystyle T_{K_{X}}:L^{2}([a,b])to L^{2}([a,b]):fmapsto T_{K_{X}}f=int _{a}^{b}K_{X}(s,cdot )f(s),ds}
- Since TKX is a linear operator, it makes sense to talk about its eigenvalues λk and eigenfunctions ek, which are found solving the homogeneous Fredholm integral equation of the second kind
- ∫abKX(s,t)ek(s)ds=λkek(t){displaystyle int _{a}^{b}K_{X}(s,t)e_{k}(s),ds=lambda _{k}e_{k}(t)}
Statement of the theorem
Theorem. Let Xt be a zero-mean square-integrable stochastic process defined over a probability space (Ω, F, P) and indexed over a closed and bounded interval [a, b], with continuous covariance function KX(s, t).
Then KX(s,t) is a Mercer kernel and letting ek be an orthonormal basis on L2([a, b]) formed by the eigenfunctions of TKX with respective eigenvalues λk, Xt admits the following representation
- Xt=∑k=1∞Zkek(t){displaystyle X_{t}=sum _{k=1}^{infty }Z_{k}e_{k}(t)}
where the convergence is in L2, uniform in t and
- Zk=∫abXtek(t)dt{displaystyle Z_{k}=int _{a}^{b}X_{t}e_{k}(t),dt}
Furthermore, the random variables Zk have zero-mean, are uncorrelated and have variance λk
- E[Zk]=0, ∀k∈NandE[ZiZj]=δijλj, ∀i,j∈N{displaystyle mathbf {E} [Z_{k}]=0,~forall kin mathbb {N} qquad {mbox{and}}qquad mathbf {E} [Z_{i}Z_{j}]=delta _{ij}lambda _{j},~forall i,jin mathbb {N} }
Note that by generalizations of Mercer's theorem we can replace the interval [a, b] with other compact spaces C and the Lebesgue measure on [a, b] with a Borel measure whose support is C.
Proof
- The covariance function KX satisfies the definition of a Mercer kernel. By Mercer's theorem, there consequently exists a set {λk, ek(t)} of eigenvalues and eigenfunctions of TKX forming an orthonormal basis of L2([a,b]), and KX can be expressed as
- KX(s,t)=∑k=1∞λkek(s)ek(t){displaystyle K_{X}(s,t)=sum _{k=1}^{infty }lambda _{k}e_{k}(s)e_{k}(t)}
- The process Xt can be expanded in terms of the eigenfunctions ek as:
- Xt=∑k=1∞Zkek(t){displaystyle X_{t}=sum _{k=1}^{infty }Z_{k}e_{k}(t)}
- where the coefficients (random variables) Zk are given by the projection of Xt on the respective eigenfunctions
- Zk=∫abXtek(t)dt{displaystyle Z_{k}=int _{a}^{b}X_{t}e_{k}(t),dt}
- We may then derive
- E[Zk]=E[∫abXtek(t)dt]=∫abE[Xt]ek(t)dt=0E[ZiZj]=E[∫ab∫abXtXsej(t)ei(s)dtds]=∫ab∫abE[XtXs]ej(t)ei(s)dtds=∫ab∫abKX(s,t)ej(t)ei(s)dtds=∫abei(s)(∫abKX(s,t)ej(t)dt)ds=λj∫abei(s)ej(s)ds=δijλj{displaystyle {begin{aligned}mathbf {E} [Z_{k}]&=mathbf {E} left[int _{a}^{b}X_{t}e_{k}(t),dtright]=int _{a}^{b}mathbf {E} [X_{t}]e_{k}(t)dt=0\[8pt]mathbf {E} [Z_{i}Z_{j}]&=mathbf {E} left[int _{a}^{b}int _{a}^{b}X_{t}X_{s}e_{j}(t)e_{i}(s),dt,dsright]\&=int _{a}^{b}int _{a}^{b}mathbf {E} left[X_{t}X_{s}right]e_{j}(t)e_{i}(s),dt,ds\&=int _{a}^{b}int _{a}^{b}K_{X}(s,t)e_{j}(t)e_{i}(s),dt,ds\&=int _{a}^{b}e_{i}(s)left(int _{a}^{b}K_{X}(s,t)e_{j}(t),dtright),ds\&=lambda _{j}int _{a}^{b}e_{i}(s)e_{j}(s),ds\&=delta _{ij}lambda _{j}end{aligned}}}
- where we have used the fact that the ek are eigenfunctions of TKX and are orthonormal.
- Let us now show that the convergence is in L2. Let
- SN=∑k=1NZkek(t).{displaystyle S_{N}=sum _{k=1}^{N}Z_{k}e_{k}(t).}
- Then:
- E[|Xt−SN|2]=E[Xt2]+E[SN2]−2E[XtSN]=KX(t,t)+E[∑k=1N∑l=1NZkZℓek(t)eℓ(t)]−2E[Xt∑k=1NZkek(t)]=KX(t,t)+∑k=1Nλkek(t)2−2E[∑k=1N∫abXtXsek(s)ek(t)ds]=KX(t,t)−∑k=1Nλkek(t)2{displaystyle {begin{aligned}mathbf {E} left[left|X_{t}-S_{N}right|^{2}right]&=mathbf {E} left[X_{t}^{2}right]+mathbf {E} left[S_{N}^{2}right]-2mathbf {E} left[X_{t}S_{N}right]\&=K_{X}(t,t)+mathbf {E} left[sum _{k=1}^{N}sum _{l=1}^{N}Z_{k}Z_{ell }e_{k}(t)e_{ell }(t)right]-2mathbf {E} left[X_{t}sum _{k=1}^{N}Z_{k}e_{k}(t)right]\&=K_{X}(t,t)+sum _{k=1}^{N}lambda _{k}e_{k}(t)^{2}-2mathbf {E} left[sum _{k=1}^{N}int _{a}^{b}X_{t}X_{s}e_{k}(s)e_{k}(t),dsright]\&=K_{X}(t,t)-sum _{k=1}^{N}lambda _{k}e_{k}(t)^{2}end{aligned}}}
- which goes to 0 by Mercer's theorem.
Properties of the Karhunen–Loève transform
Special case: Gaussian distribution
Since the limit in the mean of jointly Gaussian random variables is jointly Gaussian, and jointly Gaussian random (centered) variables are independent if and only if they are orthogonal, we can also conclude:
Theorem. The variables Zi have a joint Gaussian distribution and are stochastically independent if the original process {Xt}t is Gaussian.
In the Gaussian case, since the variables Zi are independent, we can say more:
- limN→∞∑i=1Nei(t)Zi(ω)=Xt(ω){displaystyle lim _{Nto infty }sum _{i=1}^{N}e_{i}(t)Z_{i}(omega )=X_{t}(omega )}
almost surely.
The Karhunen–Loève transform decorrelates the process
This is a consequence of the independence of the Zk.
The Karhunen–Loève expansion minimizes the total mean square error
In the introduction, we mentioned that the truncated Karhunen–Loeve expansion was the best approximation of the original process in the sense that it reduces the total mean-square error resulting of its truncation. Because of this property, it is often said that the KL transform optimally compacts the energy.
More specifically, given any orthonormal basis {fk} of L2([a, b]), we may decompose the process Xt as:
- Xt(ω)=∑k=1∞Ak(ω)fk(t){displaystyle X_{t}(omega )=sum _{k=1}^{infty }A_{k}(omega )f_{k}(t)}
where
- Ak(ω)=∫abXt(ω)fk(t)dt{displaystyle A_{k}(omega )=int _{a}^{b}X_{t}(omega )f_{k}(t),dt}
and we may approximate Xt by the finite sum
- X^t(ω)=∑k=1NAk(ω)fk(t){displaystyle {hat {X}}_{t}(omega )=sum _{k=1}^{N}A_{k}(omega )f_{k}(t)}
for some integer N.
Claim. Of all such approximations, the KL approximation is the one that minimizes the total mean square error (provided we have arranged the eigenvalues in decreasing order).
Consider the error resulting from the truncation at the N-th term in the following orthonormal expansion:
- εN(t)=∑k=N+1∞Ak(ω)fk(t){displaystyle varepsilon _{N}(t)=sum _{k=N+1}^{infty }A_{k}(omega )f_{k}(t)}
The mean-square error εN2(t) can be written as:
- εN2(t)=E[∑i=N+1∞∑j=N+1∞Ai(ω)Aj(ω)fi(t)fj(t)]=∑i=N+1∞∑j=N+1∞E[∫ab∫abXtXsfi(t)fj(s)dsdt]fi(t)fj(t)=∑i=N+1∞∑j=N+1∞fi(t)fj(t)∫ab∫abKX(s,t)fi(t)fj(s)dsdt{displaystyle {begin{aligned}varepsilon _{N}^{2}(t)&=mathbf {E} left[sum _{i=N+1}^{infty }sum _{j=N+1}^{infty }A_{i}(omega )A_{j}(omega )f_{i}(t)f_{j}(t)right]\&=sum _{i=N+1}^{infty }sum _{j=N+1}^{infty }mathbf {E} left[int _{a}^{b}int _{a}^{b}X_{t}X_{s}f_{i}(t)f_{j}(s),ds,dtright]f_{i}(t)f_{j}(t)\&=sum _{i=N+1}^{infty }sum _{j=N+1}^{infty }f_{i}(t)f_{j}(t)int _{a}^{b}int _{a}^{b}K_{X}(s,t)f_{i}(t)f_{j}(s),ds,dtend{aligned}}}
We then integrate this last equality over [a, b]. The orthonormality of the fk yields:
- ∫abεN2(t)dt=∑k=N+1∞∫ab∫abKX(s,t)fk(t)fk(s)dsdt{displaystyle int _{a}^{b}varepsilon _{N}^{2}(t),dt=sum _{k=N+1}^{infty }int _{a}^{b}int _{a}^{b}K_{X}(s,t)f_{k}(t)f_{k}(s),ds,dt}
The problem of minimizing the total mean-square error thus comes down to minimizing the right hand side of this equality subject to the constraint that the fk be normalized. We hence introduce βk, the Lagrangian multipliers associated with these constraints, and aim at minimizing the following function:
- Er[fk(t),k∈{N+1,…}]=∑k=N+1∞∫ab∫abKX(s,t)fk(t)fk(s)dsdt−βk(∫abfk(t)fk(t)dt−1){displaystyle Er[f_{k}(t),kin {N+1,ldots }]=sum _{k=N+1}^{infty }int _{a}^{b}int _{a}^{b}K_{X}(s,t)f_{k}(t)f_{k}(s),ds,dt-beta _{k}left(int _{a}^{b}f_{k}(t)f_{k}(t),dt-1right)}
Differentiating with respect to fi(t) (this is a functional derivative) and setting the derivative to 0 yields:
- ∂Er∂fi(t)=∫ab(∫abKX(s,t)fi(s)ds−βifi(t))dt=0{displaystyle {frac {partial Er}{partial f_{i}(t)}}=int _{a}^{b}left(int _{a}^{b}K_{X}(s,t)f_{i}(s),ds-beta _{i}f_{i}(t)right),dt=0}
which is satisfied in particular when
- ∫abKX(s,t)fi(s)ds=βifi(t).{displaystyle int _{a}^{b}K_{X}(s,t)f_{i}(s),ds=beta _{i}f_{i}(t).}
In other words, when the fk are chosen to be the eigenfunctions of TKX, hence resulting in the KL expansion.
Explained variance
An important observation is that since the random coefficients Zk of the KL expansion are uncorrelated, the Bienaymé formula asserts that the variance of Xt is simply the sum of the variances of the individual components of the sum:
- var[Xt]=∑k=0∞ek(t)2var[Zk]=∑k=1∞λkek(t)2{displaystyle operatorname {var} [X_{t}]=sum _{k=0}^{infty }e_{k}(t)^{2}operatorname {var} [Z_{k}]=sum _{k=1}^{infty }lambda _{k}e_{k}(t)^{2}}
Integrating over [a, b] and using the orthonormality of the ek, we obtain that the total variance of the process is:
- ∫abvar[Xt]dt=∑k=1∞λk{displaystyle int _{a}^{b}operatorname {var} [X_{t}],dt=sum _{k=1}^{infty }lambda _{k}}
In particular, the total variance of the N-truncated approximation is
- ∑k=1Nλk.{displaystyle sum _{k=1}^{N}lambda _{k}.}
As a result, the N-truncated expansion explains
- ∑k=1Nλk∑k=1∞λk{displaystyle {frac {sum _{k=1}^{N}lambda _{k}}{sum _{k=1}^{infty }lambda _{k}}}}
of the variance; and if we are content with an approximation that explains, say, 95% of the variance, then we just have to determine an N∈N{displaystyle Nin mathbb {N} } such that
- ∑k=1Nλk∑k=1∞λk≥0.95.{displaystyle {frac {sum _{k=1}^{N}lambda _{k}}{sum _{k=1}^{infty }lambda _{k}}}geq 0.95.}
The Karhunen–Loève expansion has the minimum representation entropy property
Given a representation of Xt=∑k=1∞Wkφk(t){displaystyle X_{t}=sum _{k=1}^{infty }W_{k}varphi _{k}(t)}, for some orthonormal basis φk(t){displaystyle varphi _{k}(t)} and random Wk{displaystyle W_{k}}, we let pk=E[|Wk|2]/E[|Xt|L22]{displaystyle p_{k}=mathbb {E} [|W_{k}|^{2}]/mathbb {E} [|X_{t}|_{L^{2}}^{2}]}, so that ∑k=1∞pk=1{displaystyle sum _{k=1}^{infty }p_{k}=1}. We may then define the representation entropy to be H({φk})=∑ipklog(pk){displaystyle H({varphi _{k}})=sum _{i}p_{k}log(p_{k})}. Then we have H({φk})≥H({ek}){displaystyle H({varphi _{k}})geq H({e_{k}})}, for all choices of φk{displaystyle varphi _{k}}. That is, the KL-expansion has minimal representation entropy.
Proof:
Denote the coefficients obtained for the basis ek(t){displaystyle e_{k}(t)} as pk{displaystyle p_{k}}, and for φk(t){displaystyle varphi _{k}(t)} as qk{displaystyle q_{k}}.
Choose N≥1{displaystyle Ngeq 1}. Note that since ek{displaystyle e_{k}} minimizes the mean squared error, we have that
- E|∑k=1NZkek(t)−Xt|L22≤E|∑k=1NWkφk(t)−Xt|L22{displaystyle mathbb {E} left|sum _{k=1}^{N}Z_{k}e_{k}(t)-X_{t}right|_{L^{2}}^{2}leq mathbb {E} left|sum _{k=1}^{N}W_{k}varphi _{k}(t)-X_{t}right|_{L^{2}}^{2}}
Expanding the right hand size, we get:
- E|∑k=1NWkφk(t)−Xt|L22=E|Xt2|L2+∑k=1N∑ℓ=1NE[Wℓφℓ(t)Wk∗φk∗(t)]L2−∑k=1NE[WkφkXt∗]L2−∑k=1NE[XtWk∗φk∗(t)]L2{displaystyle mathbb {E} left|sum _{k=1}^{N}W_{k}varphi _{k}(t)-X_{t}right|_{L^{2}}^{2}=mathbb {E} |X_{t}^{2}|_{L^{2}}+sum _{k=1}^{N}sum _{ell =1}^{N}mathbb {E} [W_{ell }varphi _{ell }(t)W_{k}^{*}varphi _{k}^{*}(t)]_{L^{2}}-sum _{k=1}^{N}mathbb {E} [W_{k}varphi _{k}X_{t}^{*}]_{L^{2}}-sum _{k=1}^{N}mathbb {E} [X_{t}W_{k}^{*}varphi _{k}^{*}(t)]_{L^{2}}}
Using the orthonormality of φk(t){displaystyle varphi _{k}(t)}, and expanding Xt{displaystyle X_{t}} in the φk(t){displaystyle varphi _{k}(t)} basis, we get that the right hand size is equal to:
- E[Xt]L22−∑k=1NE[|Wk|2]{displaystyle mathbb {E} [X_{t}]_{L^{2}}^{2}-sum _{k=1}^{N}mathbb {E} [|W_{k}|^{2}]}
We may perform indentitcal analysis for the ek(t){displaystyle e_{k}(t)}, and so rewrite the above inequality as:
- E[Xt]L22−∑k=1NE[|Zk|2]≤E[Xt]L22−∑k=1NE[|Wk|2]{displaystyle {displaystyle mathbb {E} [X_{t}]_{L^{2}}^{2}-sum _{k=1}^{N}mathbb {E} [|Z_{k}|^{2}]}leq {displaystyle mathbb {E} [X_{t}]_{L^{2}}^{2}-sum _{k=1}^{N}mathbb {E} [|W_{k}|^{2}]}}
Subtracting the common first term, and dividing by E[|Xt|L22]{displaystyle mathbb {E} [|X_{t}|_{L^{2}}^{2}]}, we obtain that:
- ∑k=1Npk≤∑k=1Nqk{displaystyle sum _{k=1}^{N}p_{k}leq sum _{k=1}^{N}q_{k}}
This implies that:
- −∑k=1∞pklog(pk)≤−∑k=1∞qklog(qk){displaystyle -sum _{k=1}^{infty }p_{k}log(p_{k})leq -sum _{k=1}^{infty }q_{k}log(q_{k})}
Linear Karhunen–Loève approximations
Let us consider a whole class of signals we want to approximate over the first M vectors of a basis. These signals are modeled as realizations of a random vector Y[n] of size N. To optimize the approximation we design a basis that minimizes the average approximation error. This section proves that optimal bases are Karhunen–Loeve bases that diagonalize the covariance matrix of Y. The random vector Y can be decomposed in an orthogonal basis
- {gm}0≤m≤N{displaystyle left{g_{m}right}_{0leq mleq N}}
as follows:
- Y=∑m=0N−1⟨Y,gm⟩gm,{displaystyle Y=sum _{m=0}^{N-1}leftlangle Y,g_{m}rightrangle g_{m},}
where each
- ⟨Y,gm⟩=∑n=0N−1Y[n]gm∗[n]{displaystyle leftlangle Y,g_{m}rightrangle =sum _{n=0}^{N-1}{Y[n]}g_{m}^{*}[n]}
is a random variable. The approximation from the first M ≤ N vectors of the basis is
- YM=∑m=0M−1⟨Y,gm⟩gm{displaystyle Y_{M}=sum _{m=0}^{M-1}leftlangle Y,g_{m}rightrangle g_{m}}
The energy conservation in an orthogonal basis implies
- ε[M]=E{‖Y−YM‖2}=∑m=MN−1E{|⟨Y,gm⟩|2}{displaystyle varepsilon [M]=mathbf {E} left{left|Y-Y_{M}right|^{2}right}=sum _{m=M}^{N-1}mathbf {E} left{left|leftlangle Y,g_{m}rightrangle right|^{2}right}}
This error is related to the covariance of Y defined by
- R[n,m]=E{Y[n]Y∗[m]}{displaystyle R[n,m]=mathbf {E} left{Y[n]Y^{*}[m]right}}
For any vector x[n] we denote by K the covariance operator represented by this matrix,
- E{|⟨Y,x⟩|2}=⟨Kx,x⟩=∑n=0N−1∑m=0N−1R[n,m]x[n]x∗[m]{displaystyle mathbf {E} left{left|langle Y,xrangle right|^{2}right}=langle Kx,xrangle =sum _{n=0}^{N-1}sum _{m=0}^{N-1}R[n,m]x[n]x^{*}[m]}
The error ε[M] is therefore a sum of the last N − M coefficients of the covariance operator
- ε[M]=∑m=MN−1⟨Kgm,gm⟩{displaystyle varepsilon [M]=sum _{m=M}^{N-1}{leftlangle Kg_{m},g_{m}rightrangle }}
The covariance operator K is Hermitian and Positive and is thus diagonalized in an orthogonal basis called a Karhunen–Loève basis. The following theorem states that a Karhunen–Loève basis is optimal for linear approximations.
Theorem (Optimality of Karhunen–Loève basis). Let K be a covariance operator. For all M ≥ 1, the approximation error
- ε[M]=∑m=MN−1⟨Kgm,gm⟩{displaystyle varepsilon [M]=sum _{m=M}^{N-1}leftlangle Kg_{m},g_{m}rightrangle }
is minimum if and only if
- {gm}0≤m<N{displaystyle left{g_{m}right}_{0leq m<N}}
is a Karhunen–Loeve basis ordered by decreasing eigenvalues.
- ⟨Kgm,gm⟩≥⟨Kgm+1,gm+1⟩,0≤m<N−1.{displaystyle leftlangle Kg_{m},g_{m}rightrangle geq leftlangle Kg_{m+1},g_{m+1}rightrangle ,qquad 0leq m<N-1.}
Non-Linear approximation in bases
Linear approximations project the signal on M vectors a priori. The approximation can be made more precise by choosing the M orthogonal vectors depending on the signal properties. This section analyzes the general performance of these non-linear approximations. A signal f∈H{displaystyle fin mathrm {H} } is approximated with M vectors selected adaptively in an orthonormal basis for H{displaystyle mathrm {H} }
- B={gm}m∈N{displaystyle mathrm {B} =left{g_{m}right}_{min mathbb {N} }}
Let fM{displaystyle f_{M}} be the projection of f over M vectors whose indices are in IM:
- fM=∑m∈IM⟨f,gm⟩gm{displaystyle f_{M}=sum _{min I_{M}}leftlangle f,g_{m}rightrangle g_{m}}
The approximation error is the sum of the remaining coefficients
- ε[M]={‖f−fM‖2}=∑m∉IMN−1{|⟨f,gm⟩|2}{displaystyle varepsilon [M]=left{left|f-f_{M}right|^{2}right}=sum _{mnotin I_{M}}^{N-1}left{left|leftlangle f,g_{m}rightrangle right|^{2}right}}
To minimize this error, the indices in IM must correspond to the M vectors having the largest inner product amplitude
- |⟨f,gm⟩|.{displaystyle left|leftlangle f,g_{m}rightrangle right|.}
These are the vectors that best correlate f. They can thus be interpreted as the main features of f. The resulting error is necessarily smaller than the error of a linear approximation which selects the M approximation vectors independently of f. Let us sort
- {|⟨f,gm⟩|}m∈N{displaystyle left{left|leftlangle f,g_{m}rightrangle right|right}_{min mathbb {N} }}
in decreasing order
- |⟨f,gmk⟩|≥|⟨f,gmk+1⟩|.{displaystyle left|leftlangle f,g_{m_{k}}rightrangle right|geq left|leftlangle f,g_{m_{k+1}}rightrangle right|.}
The best non-linear approximation is
- fM=∑k=1M⟨f,gmk⟩gmk{displaystyle f_{M}=sum _{k=1}^{M}leftlangle f,g_{m_{k}}rightrangle g_{m_{k}}}
It can also be written as inner product thresholding:
- fM=∑m=0∞θT(⟨f,gm⟩)gm{displaystyle f_{M}=sum _{m=0}^{infty }theta _{T}left(leftlangle f,g_{m}rightrangle right)g_{m}}
with
- T=|⟨f,gmM⟩|,θT(x)={x|x|≥T0|x|<T{displaystyle T=left|leftlangle f,g_{m_{M}}rightrangle right|,qquad theta _{T}(x)={begin{cases}x&|x|geq T\0&|x|<Tend{cases}}}
The non-linear error is
- ε[M]={‖f−fM‖2}=∑k=M+1∞{|⟨f,gmk⟩|2}{displaystyle varepsilon [M]=left{left|f-f_{M}right|^{2}right}=sum _{k=M+1}^{infty }left{left|leftlangle f,g_{m_{k}}rightrangle right|^{2}right}}
this error goes quickly to zero as M increases, if the sorted values of |⟨f,gmk⟩|{displaystyle left|leftlangle f,g_{m_{k}}rightrangle right|} have a fast decay as k increases. This decay is quantified by computing the IP{displaystyle mathrm {I} ^{mathrm {P} }} norm of the signal inner products in B:
- ‖f‖B,p=(∑m=0∞|⟨f,gm⟩|p)1p{displaystyle |f|_{mathrm {B} ,p}=left(sum _{m=0}^{infty }left|leftlangle f,g_{m}rightrangle right|^{p}right)^{frac {1}{p}}}
The following theorem relates the decay of ε[M] to ‖f‖B,p{displaystyle |f|_{mathrm {B} ,p}}
Theorem (decay of error). If ‖f‖B,p<∞{displaystyle |f|_{mathrm {B} ,p}<infty } with p < 2 then
- ε[M]≤‖f‖B,p22p−1M1−2p{displaystyle varepsilon [M]leq {frac {|f|_{mathrm {B} ,p}^{2}}{{frac {2}{p}}-1}}M^{1-{frac {2}{p}}}}
and
- ε[M]=o(M1−2p).{displaystyle varepsilon [M]=oleft(M^{1-{frac {2}{p}}}right).}
Conversely, if ε[M]=o(M1−2p){displaystyle varepsilon [M]=oleft(M^{1-{frac {2}{p}}}right)} then
‖f‖B,q<∞{displaystyle |f|_{mathrm {B} ,q}<infty } for any q > p.
Non-optimality of Karhunen–Loève bases
To further illustrate the differences between linear and non-linear approximations, we study the decomposition of a simple non-Gaussian random vector in a Karhunen–Loève basis. Processes whose realizations have a random translation are stationary. The Karhunen–Loève basis is then a Fourier basis and we study its performance. To simplify the analysis, consider a random vector Y[n] of size N that is random shift modulo N of a deterministic signal f[n] of zero mean
- ∑n=0N−1f[n]=0{displaystyle sum _{n=0}^{N-1}f[n]=0}
- Y[n]=f[(n−p)modN]{displaystyle Y[n]=f[(n-p){bmod {N}}]}
The random shift P is uniformly distributed on [0, N − 1]:
- Pr(P=p)=1N,0≤p<N{displaystyle Pr(P=p)={frac {1}{N}},qquad 0leq p<N}
Clearly
- E{Y[n]}=1N∑p=0N−1f[(n−p)modN]=0{displaystyle mathbf {E} {Y[n]}={frac {1}{N}}sum _{p=0}^{N-1}f[(n-p){bmod {N}}]=0}
and
- R[n,k]=E{Y[n]Y[k]}=1N∑p=0N−1f[(n−p)modN]f[(k−p)modN]=1NfΘf¯[n−k],f¯[n]=f[−n]{displaystyle R[n,k]=mathbf {E} {Y[n]Y[k]}={frac {1}{N}}sum _{p=0}^{N-1}f[(n-p){bmod {N}}]f[(k-p){bmod {N}}]={frac {1}{N}}fTheta {bar {f}}[n-k],quad {bar {f}}[n]=f[-n]}
Hence
- R[n,k]=RY[n−k],RY[k]=1NfΘf¯[k]{displaystyle R[n,k]=R_{Y}[n-k],qquad R_{Y}[k]={frac {1}{N}}fTheta {bar {f}}[k]}
Since RY is N periodic, Y is a circular stationary random vector. The covariance operator is a circular convolution with RY and is therefore diagonalized in the discrete Fourier Karhunen–Loève basis
- {1Nei2πmn/N}0≤m<N.{displaystyle left{{frac {1}{sqrt {N}}}e^{i2pi mn/N}right}_{0leq m<N}.}
The power spectrum is Fourier transform of RY:
- PY[m]=R^Y[m]=1N|f^[m]|2{displaystyle P_{Y}[m]={hat {R}}_{Y}[m]={frac {1}{N}}left|{hat {f}}[m]right|^{2}}
Example: Consider an extreme case where f[n]=δ[n]−δ[n−1]{displaystyle f[n]=delta [n]-delta [n-1]}. A theorem stated above guarantees that the Fourier Karhunen–Loève basis produces a smaller expected approximation error than a canonical basis of Diracs {gm[n]=δ[n−m]}0≤m<N{displaystyle left{g_{m}[n]=delta [n-m]right}_{0leq m<N}}. Indeed, we do not know a priori the abscissa of the non-zero coefficients of Y, so there is no particular Dirac that is better adapted to perform the approximation. But the Fourier vectors cover the whole support of Y and thus absorb a part of the signal energy.
- E{|⟨Y[n],1Nei2πmn/N⟩|2}=PY[m]=4Nsin2(πkN){displaystyle mathbf {E} left{left|leftlangle Y[n],{frac {1}{sqrt {N}}}e^{i2pi mn/N}rightrangle right|^{2}right}=P_{Y}[m]={frac {4}{N}}sin ^{2}left({frac {pi k}{N}}right)}
Selecting higher frequency Fourier coefficients yields a better mean-square approximation than choosing a priori a few Dirac vectors to perform the approximation. The situation is totally different for non-linear approximations. If f[n]=δ[n]−δ[n−1]{displaystyle f[n]=delta [n]-delta [n-1]} then the discrete Fourier basis is extremely inefficient because f and hence Y have an energy that is almost uniformly spread among all Fourier vectors. In contrast, since f has only two non-zero coefficients in the Dirac basis, a non-linear approximation of Y with M ≥ 2 gives zero error.[6]
Principal component analysis
We have established the Karhunen–Loève theorem and derived a few properties thereof. We also noted that one hurdle in its application was the numerical cost of determining the eigenvalues and eigenfunctions of its covariance operator through the Fredholm integral equation of the second kind
- ∫abKX(s,t)ek(s)ds=λkek(t).{displaystyle int _{a}^{b}K_{X}(s,t)e_{k}(s),ds=lambda _{k}e_{k}(t).}
However, when applied to a discrete and finite process (Xn)n∈{1,…,N}{displaystyle left(X_{n}right)_{nin {1,ldots ,N}}}, the problem takes a much simpler form and standard algebra can be used to carry out the calculations.
Note that a continuous process can also be sampled at N points in time in order to reduce the problem to a finite version.
We henceforth consider a random N-dimensional vector X=(X1 X2 … XN)T{displaystyle X=left(X_{1}~X_{2}~ldots ~X_{N}right)^{T}}. As mentioned above, X could contain N samples of a signal but it can hold many more representations depending on the field of application. For instance it could be the answers to a survey or economic data in an econometrics analysis.
As in the continuous version, we assume that X is centered, otherwise we can let X:=X−μX{displaystyle X:=X-mu _{X}} (where μX{displaystyle mu _{X}} is the mean vector of X) which is centered.
Let us adapt the procedure to the discrete case.
Covariance matrix
Recall that the main implication and difficulty of the KL transformation is computing the eigenvectors of the linear operator associated to the covariance function, which are given by the solutions to the integral equation written above.
Define Σ, the covariance matrix of X, as an N × N matrix whose elements are given by:
- Σij=E[XiXj],∀i,j∈{1,…,N}{displaystyle Sigma _{ij}=mathbf {E} [X_{i}X_{j}],qquad forall i,jin {1,ldots ,N}}
Rewriting the above integral equation to suit the discrete case, we observe that it turns into:
- ∑i=1NΣijej=λej⇔Σe=λe{displaystyle sum _{i=1}^{N}Sigma _{ij}e_{j}=lambda e_{j}quad Leftrightarrow quad Sigma e=lambda e}
where e=(e1 e2 … eN)T{displaystyle e=(e_{1}~e_{2}~ldots ~e_{N})^{T}} is an N-dimensional vector.
The integral equation thus reduces to a simple matrix eigenvalue problem, which explains why the PCA has such a broad domain of applications.
Since Σ is a positive definite symmetric matrix, it possesses a set of orthonormal eigenvectors forming a basis of RN{displaystyle mathbb {R} ^{N}}, and we write {λi,φi}i∈{1,…,N}{displaystyle {lambda _{i},varphi _{i}}_{iin {1,ldots ,N}}} this set of eigenvalues and corresponding eigenvectors, listed in decreasing values of λi. Let also Φ be the orthonormal matrix consisting of these eigenvectors:
- Φ:=(φ1 φ2 … φN)TΦTΦ=I{displaystyle {begin{aligned}Phi &:=left(varphi _{1}~varphi _{2}~ldots ~varphi _{N}right)^{T}\Phi ^{T}Phi &=Iend{aligned}}}
Principal component transform
It remains to perform the actual KL transformation, called the principal component transform in this case. Recall that the transform was found by expanding the process with respect to the basis spanned by the eigenvectors of the covariance function. In this case, we hence have:
- X=∑i=1N⟨φi,X⟩φi=∑i=1NφiTXφi{displaystyle X=sum _{i=1}^{N}langle varphi _{i},Xrangle varphi _{i}=sum _{i=1}^{N}varphi _{i}^{T}Xvarphi _{i}}
In a more compact form, the principal component transform of X is defined by:
- {Y=ΦTXX=ΦY{displaystyle {begin{cases}Y=Phi ^{T}X\X=Phi Yend{cases}}}
The i-th component of Y is Yi=φiTX{displaystyle Y_{i}=varphi _{i}^{T}X}, the projection of X on φi{displaystyle varphi _{i}} and the inverse transform X = ΦY yields the expansion of X on the space spanned by the φi{displaystyle varphi _{i}}:
- X=∑i=1NYiφi=∑i=1N⟨φi,X⟩φi{displaystyle X=sum _{i=1}^{N}Y_{i}varphi _{i}=sum _{i=1}^{N}langle varphi _{i},Xrangle varphi _{i}}
As in the continuous case, we may reduce the dimensionality of the problem by truncating the sum at some K∈{1,…,N}{displaystyle Kin {1,ldots ,N}} such that
- ∑i=1Kλi∑i=1Nλi≥α{displaystyle {frac {sum _{i=1}^{K}lambda _{i}}{sum _{i=1}^{N}lambda _{i}}}geq alpha }
where α is the explained variance threshold we wish to set.
We can also reduce the dimensionality through the use of multilevel dominant eigenvector estimation (MDEE).[7]
Examples
The Wiener process
There are numerous equivalent characterizations of the Wiener process which is a mathematical formalization of Brownian motion. Here we regard it as the centered standard Gaussian process Wt with covariance function
- KW(t,s)=cov(Wt,Ws)=min(s,t).{displaystyle K_{W}(t,s)=operatorname {cov} (W_{t},W_{s})=min(s,t).}
We restrict the time domain to [a, b]=[0,1] without loss of generality.
The eigenvectors of the covariance kernel are easily determined. These are
- ek(t)=2sin((k−12)πt){displaystyle e_{k}(t)={sqrt {2}}sin left(left(k-{tfrac {1}{2}}right)pi tright)}
and the corresponding eigenvalues are
- λk=1(k−12)2π2.{displaystyle lambda _{k}={frac {1}{(k-{frac {1}{2}})^{2}pi ^{2}}}.}
In order to find the eigenvalues and eigenvectors, we need to solve the integral equation:
- ∫abKW(s,t)e(s)ds=λe(t)∀t,0≤t≤1∫01min(s,t)e(s)ds=λe(t)∀t,0≤t≤1∫0tse(s)ds+t∫t1e(s)ds=λe(t)∀t,0≤t≤1{displaystyle {begin{aligned}int _{a}^{b}K_{W}(s,t)e(s),ds&=lambda e(t)qquad forall t,0leq tleq 1\int _{0}^{1}min(s,t)e(s),ds&=lambda e(t)qquad forall t,0leq tleq 1\int _{0}^{t}se(s),ds+tint _{t}^{1}e(s),ds&=lambda e(t)qquad forall t,0leq tleq 1end{aligned}}}
differentiating once with respect to t yields:
- ∫t1e(s)ds=λe′(t){displaystyle int _{t}^{1}e(s),ds=lambda e'(t)}
a second differentiation produces the following differential equation:
- −e(t)=λe″(t){displaystyle -e(t)=lambda e''(t)}
The general solution of which has the form:
- e(t)=Asin(tλ)+Bcos(tλ){displaystyle e(t)=Asin left({frac {t}{sqrt {lambda }}}right)+Bcos left({frac {t}{sqrt {lambda }}}right)}
where A and B are two constants to be determined with the boundary conditions. Setting t = 0 in the initial integral equation gives e(0) = 0 which implies that B = 0 and similarly, setting t = 1 in the first differentiation yields e' (1) = 0, whence:
- cos(1λ)=0{displaystyle cos left({frac {1}{sqrt {lambda }}}right)=0}
which in turn implies that eigenvalues of TKX are:
- λk=(1(k−12)π)2,k≥1{displaystyle lambda _{k}=left({frac {1}{(k-{frac {1}{2}})pi }}right)^{2},qquad kgeq 1}
The corresponding eigenfunctions are thus of the form:
- ek(t)=Asin((k−12)πt),k≥1{displaystyle e_{k}(t)=Asin left((k-{frac {1}{2}})pi tright),qquad kgeq 1}
A is then chosen so as to normalize ek:
- ∫01ek2(t)dt=1⟹A=2{displaystyle int _{0}^{1}e_{k}^{2}(t),dt=1quad implies quad A={sqrt {2}}}
This gives the following representation of the Wiener process:
Theorem. There is a sequence {Zi}i of independent Gaussian random variables with mean zero and variance 1 such that
- Wt=2∑k=1∞Zksin((k−12)πt)(k−12)π.{displaystyle W_{t}={sqrt {2}}sum _{k=1}^{infty }Z_{k}{frac {sin left(left(k-{frac {1}{2}}right)pi tright)}{left(k-{frac {1}{2}}right)pi }}.}
Note that this representation is only valid for t∈[0,1].{displaystyle tin [0,1].} On larger intervals, the increments are not independent. As stated in the theorem, convergence is in the L2 norm and uniform in t.
The Brownian bridge
Similarly the Brownian bridge Bt=Wt−tW1{displaystyle B_{t}=W_{t}-tW_{1}} which is a stochastic process with covariance function
- KB(t,s)=min(t,s)−ts{displaystyle K_{B}(t,s)=min(t,s)-ts}
can be represented as the series
- Bt=∑k=1∞Zk2sin(kπt)kπ{displaystyle B_{t}=sum _{k=1}^{infty }Z_{k}{frac {{sqrt {2}}sin(kpi t)}{kpi }}}
Applications
Adaptive optics systems sometimes use K–L functions to reconstruct wave-front phase information (Dai 1996, JOSA A).
Karhunen–Loève expansion is closely related to the Singular Value Decomposition. The latter has myriad applications in image processing, radar, seismology, and the like. If one has independent vector observations from a vector valued stochastic process then the left singular vectors are maximum likelihood estimates of the ensemble KL expansion.
Applications in signal estimation and detection
Detection of a known continuous signal S(t)
In communication, we usually have to decide whether a signal from a noisy channel contains valuable information. The following hypothesis testing is used for detecting continuous signal s(t) from channel output X(t), N(t) is the channel noise, which is usually assumed zero mean Gaussian process with correlation function RN(t,s)=E[N(t)N(s)]{displaystyle R_{N}(t,s)=E[N(t)N(s)]}
- H:X(t)=N(t),{displaystyle H:X(t)=N(t),}
- K:X(t)=N(t)+s(t),t∈(0,T){displaystyle K:X(t)=N(t)+s(t),quad tin (0,T)}
Signal detection in white noise
When the channel noise is white, its correlation function is
- RN(t)=12N0δ(t),{displaystyle R_{N}(t)={tfrac {1}{2}}N_{0}delta (t),}
and it has constant power spectrum density. In physically practical channel, the noise power is finite, so:
- SN(f)={N02|f|<w0|f|>w{displaystyle S_{N}(f)={begin{cases}{frac {N_{0}}{2}}&|f|<w\0&|f|>wend{cases}}}
Then the noise correlation function is sinc function with zeros at n2ω,n∈Z.{displaystyle {frac {n}{2omega }},nin mathbf {Z} .} Since are uncorrelated and gaussian, they are independent. Thus we can take samples from X(t) with time spacing
- Δt=n2ω within (0,″T″).{displaystyle Delta t={frac {n}{2omega }}{text{ within }}(0,''T'').}
Let Xi=X(iΔt){displaystyle X_{i}=X(i,Delta t)}. We have a total of n=TΔt=T(2ω)=2ωT{displaystyle n={frac {T}{Delta t}}=T(2omega )=2omega T} i.i.d observations {X1,X2,…,Xn}{displaystyle {X_{1},X_{2},ldots ,X_{n}}} to develop the likelihood-ratio test. Define signal Si=S(iΔt){displaystyle S_{i}=S(i,Delta t)}, the problem becomes,
- H:Xi=Ni,{displaystyle H:X_{i}=N_{i},}
- K:Xi=Ni+Si,i=1,2,…,n.{displaystyle K:X_{i}=N_{i}+S_{i},i=1,2,ldots ,n.}
The log-likelihood ratio
- L(x_)=log∑i=1n(2Sixi−Si2)2σ2⇔Δt∑i=1nSixi=∑i=1nS(iΔt)x(iΔt)Δt≷λ⋅2{displaystyle {mathcal {L}}({underline {x}})=log {frac {sum _{i=1}^{n}(2S_{i}x_{i}-S_{i}^{2})}{2sigma ^{2}}}Leftrightarrow Delta tsum _{i=1}^{n}S_{i}x_{i}=sum _{i=1}^{n}S(i,Delta t)x(i,Delta t),Delta tgtrless lambda _{cdot }2}
As t → 0, let:
- G=∫0TS(t)x(t)dt.{displaystyle G=int _{0}^{T}S(t)x(t),dt.}
Then G is the test statistics and the Neyman–Pearson optimum detector is
- G(x_)>G0⇒K<G0⇒H.{displaystyle G({underline {x}})>G_{0}Rightarrow K<G_{0}Rightarrow H.}
As G is Gaussian, we can characterize it by finding its mean and variances. Then we get
- H:G∼N(0,12N0E){displaystyle H:Gsim Nleft(0,{tfrac {1}{2}}N_{0}Eright)}
- K:G∼N(E,12N0E){displaystyle K:Gsim Nleft(E,{tfrac {1}{2}}N_{0}Eright)}
where
- E=∫0TS2(t)dt{displaystyle mathbf {E} =int _{0}^{T}S^{2}(t),dt}
is the signal energy.
The false alarm error
- α=∫G0∞N(0,12N0E)dG⇒G0=12N0EΦ−1(1−α){displaystyle alpha =int _{G_{0}}^{infty }Nleft(0,{tfrac {1}{2}}N_{0}Eright),dGRightarrow G_{0}={sqrt {{tfrac {1}{2}}N_{0}E}}Phi ^{-1}(1-alpha )}
And the probability of detection:
- β=∫G0∞N(E,12N0E)dG=1−Φ(G0−E12N0E)=Φ(2EN0−Φ−1(1−α)),{displaystyle beta =int _{G_{0}}^{infty }Nleft(E,{tfrac {1}{2}}N_{0}Eright),dG=1-Phi left({frac {G_{0}-E}{sqrt {{tfrac {1}{2}}N_{0}E}}}right)=Phi left({sqrt {frac {2E}{N_{0}}}}-Phi ^{-1}(1-alpha )right),}
where Φ is the cdf of standard normal, or Gaussian, variable.
Signal detection in colored noise
When N(t) is colored (correlated in time) Gaussian noise with zero mean and covariance function RN(t,s)=E[N(t)N(s)],{displaystyle R_{N}(t,s)=E[N(t)N(s)],} we cannot sample independent discrete observations by evenly spacing the time. Instead, we can use K–L expansion to uncorrelate the noise process and get independent Gaussian observation 'samples'. The K–L expansion of N(t):
- N(t)=∑i=1∞NiΦi(t),0<t<T,{displaystyle N(t)=sum _{i=1}^{infty }N_{i}Phi _{i}(t),quad 0<t<T,}
where Ni=∫N(t)Φi(t)dt{displaystyle N_{i}=int N(t)Phi _{i}(t),dt} and the orthonormal bases {Φit}{displaystyle {Phi _{i}{t}}} are generated by kernel RN(t,s){displaystyle R_{N}(t,s)}, i.e., solution to
- ∫0TRN(t,s)Φi(s)ds=λiΦi(t),var[Ni]=λi.{displaystyle int _{0}^{T}R_{N}(t,s)Phi _{i}(s),ds=lambda _{i}Phi _{i}(t),quad operatorname {var} [N_{i}]=lambda _{i}.}
Do the expansion:
- S(t)=∑i=1∞SiΦi(t),{displaystyle S(t)=sum _{i=1}^{infty }S_{i}Phi _{i}(t),}
where Si=∫0TS(t)Φi(t)dt{displaystyle S_{i}=int _{0}^{T}S(t)Phi _{i}(t),dt}, then
- Xi=∫0TX(t)Φi(t)dt=Ni{displaystyle X_{i}=int _{0}^{T}X(t)Phi _{i}(t),dt=N_{i}}
under H and Ni+Si{displaystyle N_{i}+S_{i}} under K. Let X¯={X1,X2,…}{displaystyle {overline {X}}={X_{1},X_{2},dots }}, we have
Ni{displaystyle N_{i}} are independent Gaussian r.v's with variance λi{displaystyle lambda _{i}}
- under H: {Xi}{displaystyle {X_{i}}} are independent Gaussian r.v's.
- fH[x(t)|0<t<T]=fH(x_)=∏i=1∞12πλiexp(−xi22λi){displaystyle f_{H}[x(t)|0<t<T]=f_{H}({underline {x}})=prod _{i=1}^{infty }{frac {1}{sqrt {2pi lambda _{i}}}}exp left(-{frac {x_{i}^{2}}{2lambda _{i}}}right)}
- under K: {Xi−Si}{displaystyle {X_{i}-S_{i}}} are independent Gaussian r.v's.
- fK[x(t)∣0<t<T]=fK(x_)=∏i=1∞12πλiexp(−(xi−Si)22λi){displaystyle f_{K}[x(t)mid 0<t<T]=f_{K}({underline {x}})=prod _{i=1}^{infty }{frac {1}{sqrt {2pi lambda _{i}}}}exp left(-{frac {(x_{i}-S_{i})^{2}}{2lambda _{i}}}right)}
Hence, the log-LR is given by
- L(x_)=∑i=1∞2Sixi−Si22λi{displaystyle {mathcal {L}}({underline {x}})=sum _{i=1}^{infty }{frac {2S_{i}x_{i}-S_{i}^{2}}{2lambda _{i}}}}
and the optimum detector is
- G=∑i=1∞Sixiλi>G0⇒K,<G0⇒H.{displaystyle G=sum _{i=1}^{infty }S_{i}x_{i}lambda _{i}>G_{0}Rightarrow K,<G_{0}Rightarrow H.}
Define
- k(t)=∑i=1∞λiSiΦi(t),0<t<T,{displaystyle k(t)=sum _{i=1}^{infty }lambda _{i}S_{i}Phi _{i}(t),0<t<T,}
then G=∫0Tk(t)x(t)dt.{displaystyle G=int _{0}^{T}k(t)x(t),dt.}
How to find k(t)
Since
- ∫0TRN(t,s)k(s)ds=∑i=1∞λiSi∫0TRN(t,s)Φi(s)ds=∑i=1∞SiΦi(t)=S(t),{displaystyle int _{0}^{T}R_{N}(t,s)k(s),ds=sum _{i=1}^{infty }lambda _{i}S_{i}int _{0}^{T}R_{N}(t,s)Phi _{i}(s),ds=sum _{i=1}^{infty }S_{i}Phi _{i}(t)=S(t),}
k(t) is the solution to
- ∫0TRN(t,s)k(s)ds=S(t).{displaystyle int _{0}^{T}R_{N}(t,s)k(s),ds=S(t).}
If N(t)is wide-sense stationary,
- ∫0TRN(t−s)k(s)ds=S(t),{displaystyle int _{0}^{T}R_{N}(t-s)k(s),ds=S(t),}
which is known as the Wiener–Hopf equation. The equation can be solved by taking fourier transform, but not practically realizable since infinite spectrum needs spatial factorization. A special case which is easy to calculate k(t) is white Gaussian noise.
- ∫0TN02δ(t−s)k(s)ds=S(t)⇒k(t)=CS(t),0<t<T.{displaystyle int _{0}^{T}{frac {N_{0}}{2}}delta (t-s)k(s),ds=S(t)Rightarrow k(t)=CS(t),quad 0<t<T.}
The corresponding impulse response is h(t) = k(T − t) = CS(T − t). Let C = 1, this is just the result we arrived at in previous section for detecting of signal in white noise.
Test threshold for Neyman–Pearson detector
Since X(t) is a Gaussian process,
- G=∫0Tk(t)x(t)dt,{displaystyle G=int _{0}^{T}k(t)x(t),dt,}
is a Gaussian random variable that can be characterized by its mean and variance.
- E[G∣H]=∫0Tk(t)E[x(t)∣H]dt=0E[G∣K]=∫0Tk(t)E[x(t)∣K]dt=∫0Tk(t)S(t)dt≡ρE[G2∣H]=∫0T∫0Tk(t)k(s)RN(t,s)dtds=∫0Tk(t)(∫0Tk(s)RN(t,s)ds)=∫0Tk(t)S(t)dt=ρvar[G∣H]=E[G2∣H]−(E[G∣H])2=ρE[G2∣K]=∫0T∫0Tk(t)k(s)E[x(t)x(s)]dtds=∫0T∫0Tk(t)k(s)(RN(t,s)+S(t)S(s))dtds=ρ+ρ2var[G∣K]=E[G2|K]−(E[G|K])2=ρ+ρ2−ρ2=ρ{displaystyle {begin{aligned}mathbf {E} [Gmid H]&=int _{0}^{T}k(t)mathbf {E} [x(t)mid H],dt=0\mathbf {E} [Gmid K]&=int _{0}^{T}k(t)mathbf {E} [x(t)mid K],dt=int _{0}^{T}k(t)S(t),dtequiv rho \mathbf {E} [G^{2}mid H]&=int _{0}^{T}int _{0}^{T}k(t)k(s)R_{N}(t,s),dt,ds=int _{0}^{T}k(t)left(int _{0}^{T}k(s)R_{N}(t,s),dsright)=int _{0}^{T}k(t)S(t),dt=rho \operatorname {var} [Gmid H]&=mathbf {E} [G^{2}mid H]-(mathbf {E} [Gmid H])^{2}=rho \mathbf {E} [G^{2}mid K]&=int _{0}^{T}int _{0}^{T}k(t)k(s)mathbf {E} [x(t)x(s)],dt,ds=int _{0}^{T}int _{0}^{T}k(t)k(s)(R_{N}(t,s)+S(t)S(s)),dt,ds=rho +rho ^{2}\operatorname {var} [Gmid K]&=mathbf {E} [G^{2}|K]-(mathbf {E} [G|K])^{2}=rho +rho ^{2}-rho ^{2}=rho end{aligned}}}
Hence, we obtain the distributions of H and K:
- H:G∼N(0,ρ){displaystyle H:Gsim N(0,rho )}
- K:G∼N(ρ,ρ){displaystyle K:Gsim N(rho ,rho )}
The false alarm error is
- α=∫G0∞N(0,ρ)dG=1−Φ(G0ρ).{displaystyle alpha =int _{G_{0}}^{infty }N(0,rho ),dG=1-Phi left({frac {G_{0}}{sqrt {rho }}}right).}
So the test threshold for the Neyman–Pearson optimum detector is
- G0=ρΦ−1(1−α).{displaystyle G_{0}={sqrt {rho }}Phi ^{-1}(1-alpha ).}
Its power of detection is
- β=∫G0∞N(ρ,ρ)dG=Φ(ρ−Φ−1(1−α)){displaystyle beta =int _{G_{0}}^{infty }N(rho ,rho ),dG=Phi left({sqrt {rho }}-Phi ^{-1}(1-alpha )right)}
When the noise is white Gaussian process, the signal power is
- ρ=∫0Tk(t)S(t)dt=∫0TS(t)2dt=E.{displaystyle rho =int _{0}^{T}k(t)S(t),dt=int _{0}^{T}S(t)^{2},dt=E.}
Prewhitening
For some type of colored noise, a typical practise is to add a prewhitening filter before the matched filter to transform the colored noise into white noise. For example, N(t) is a wide-sense stationary colored noise with correlation function
- RN(τ)=BN04e−B|τ|{displaystyle R_{N}(tau )={frac {BN_{0}}{4}}e^{-B|tau |}}
- SN(f)=N02(1+(wB)2){displaystyle S_{N}(f)={frac {N_{0}}{2(1+({frac {w}{B}})^{2})}}}
The transfer function of prewhitening filter is
- H(f)=1+jwB.{displaystyle H(f)=1+j{frac {w}{B}}.}
Detection of a Gaussian random signal in Additive white Gaussian noise (AWGN)
When the signal we want to detect from the noisy channel is also random, for example, a white Gaussian process X(t), we can still implement K–L expansion to get independent sequence of observation. In this case, the detection problem is described as follows:
- H0:Y(t)=N(t){displaystyle H_{0}:Y(t)=N(t)}
- H1:Y(t)=N(t)+X(t),0<t<T.{displaystyle H_{1}:Y(t)=N(t)+X(t),quad 0<t<T.}
X(t) is a random process with correlation function RX(t,s)=E{X(t)X(s)}{displaystyle R_{X}(t,s)=E{X(t)X(s)}}
The K–L expansion of X(t) is
- X(t)=∑i=1∞XiΦi(t),{displaystyle X(t)=sum _{i=1}^{infty }X_{i}Phi _{i}(t),}
where
- Xi=∫0TX(t)Φi(t)dt{displaystyle X_{i}=int _{0}^{T}X(t)Phi _{i}(t),dt}
and Φi(t){displaystyle Phi _{i}(t)} are solutions to
- ∫0TRX(t,s)Φi(s)ds=λiΦi(t).{displaystyle int _{0}^{T}R_{X}(t,s)Phi _{i}(s)ds=lambda _{i}Phi _{i}(t).}
So Xi{displaystyle X_{i}}'s are independent sequence of r.v's with zero mean and variance λi{displaystyle lambda _{i}}. Expanding Y(t) and N(t) by Φi(t){displaystyle Phi _{i}(t)}, we get
- Yi=∫0TY(t)Φi(t)dt=∫0T[N(t)+X(t)]Φi(t)=Ni+Xi,{displaystyle Y_{i}=int _{0}^{T}Y(t)Phi _{i}(t),dt=int _{0}^{T}[N(t)+X(t)]Phi _{i}(t)=N_{i}+X_{i},}
where
- Ni=∫0TN(t)Φi(t)dt.{displaystyle N_{i}=int _{0}^{T}N(t)Phi _{i}(t),dt.}
As N(t) is Gaussian white noise, Ni{displaystyle N_{i}}'s are i.i.d sequence of r.v with zero mean and variance 12N0{displaystyle {tfrac {1}{2}}N_{0}}, then the problem is simplified as follows,
- H0:Yi=Ni{displaystyle H_{0}:Y_{i}=N_{i}}
- H1:Yi=Ni+Xi{displaystyle H_{1}:Y_{i}=N_{i}+X_{i}}
The Neyman–Pearson optimal test:
- Λ=fY∣H1fY∣H0=Ce−∑i=1∞yi22λi12N0(12N0+λi),{displaystyle Lambda ={frac {f_{Y}mid H_{1}}{f_{Y}mid H_{0}}}=Ce^{-sum _{i=1}^{infty }{frac {y_{i}^{2}}{2}}{frac {lambda _{i}}{{tfrac {1}{2}}N_{0}({tfrac {1}{2}}N_{0}+lambda _{i})}}},}
so the log-likelihood ratio is
- L=ln(Λ)=K−∑i=1∞12yi2λiN02(N02+λi).{displaystyle {mathcal {L}}=ln(Lambda )=K-sum _{i=1}^{infty }{tfrac {1}{2}}y_{i}^{2}{frac {lambda _{i}}{{frac {N_{0}}{2}}left({frac {N_{0}}{2}}+lambda _{i}right)}}.}
Since
- X^i=λiN02(N02+λi){displaystyle {widehat {X}}_{i}={frac {lambda _{i}}{{frac {N_{0}}{2}}left({frac {N_{0}}{2}}+lambda _{i}right)}}}
is just the minimum-mean-square estimate of Xi{displaystyle X_{i}} given Yi{displaystyle Y_{i}}'s,
- L=K+1N0∑i=1∞YiX^i.{displaystyle {mathcal {L}}=K+{frac {1}{N_{0}}}sum _{i=1}^{infty }Y_{i}{widehat {X}}_{i}.}
K–L expansion has the following property: If
- f(t)=∑fiΦi(t),g(t)=∑giΦi(t),{displaystyle f(t)=sum f_{i}Phi _{i}(t),g(t)=sum g_{i}Phi _{i}(t),}
where
- fi=∫0Tf(t)Φi(t)dt,gi=∫0Tg(t)Φi(t)dt.{displaystyle f_{i}=int _{0}^{T}f(t)Phi _{i}(t),dt,quad g_{i}=int _{0}^{T}g(t)Phi _{i}(t),dt.}
then
- ∑i=1∞figi=∫0Tg(t)f(t)dt.{displaystyle sum _{i=1}^{infty }f_{i}g_{i}=int _{0}^{T}g(t)f(t),dt.}
So let
- X^(t∣T)=∑i=1∞X^iΦi(t),L=K+1N0∫0TY(t)X^(t∣T)dt.{displaystyle {widehat {X}}(tmid T)=sum _{i=1}^{infty }{widehat {X}}_{i}Phi _{i}(t),quad {mathcal {L}}=K+{frac {1}{N_{0}}}int _{0}^{T}Y(t){widehat {X}}(tmid T),dt.}
Noncausal filter Q(t,s) can be used to get the estimate through
- X^(t∣T)=∫0TQ(t,s)Y(s)ds.{displaystyle {widehat {X}}(tmid T)=int _{0}^{T}Q(t,s)Y(s),ds.}
By orthogonality principle, Q(t,s) satisfies
- ∫0TQ(t,s)RX(s,t)ds+N02Q(t,λ)=RX(t,λ),0<λ<T,0<t<T.{displaystyle int _{0}^{T}Q(t,s)R_{X}(s,t),ds+{tfrac {N_{0}}{2}}Q(t,lambda )=R_{X}(t,lambda ),0<lambda <T,0<t<T.}
However, for practical reasons, it's necessary to further derive the causal filter h(t,s), where h(t,s) = 0 for s > t, to get estimate X^(t∣t){displaystyle {widehat {X}}(tmid t)}. Specifically,
- Q(t,s)=h(t,s)+h(s,t)−∫0Th(λ,t)h(s,λ)dλ{displaystyle Q(t,s)=h(t,s)+h(s,t)-int _{0}^{T}h(lambda ,t)h(s,lambda ),dlambda }
See also
- Principal component analysis
- Proper orthogonal decomposition
- Polynomial chaos
Notes
^ Sapatnekar, Sachin (2011), "Overcoming variations in nanometer-scale technologies", IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 1 (1): 5–18, Bibcode:2011IJEST...1....5S, CiteSeerX 10.1.1.300.5659, doi:10.1109/jetcas.2011.2138250.mw-parser-output cite.citation{font-style:inherit}.mw-parser-output .citation q{quotes:"""""""'""'"}.mw-parser-output .citation .cs1-lock-free a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/6/65/Lock-green.svg/9px-Lock-green.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-limited a,.mw-parser-output .citation .cs1-lock-registration a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Lock-gray-alt-2.svg/9px-Lock-gray-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-subscription a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Lock-red-alt-2.svg/9px-Lock-red-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration{color:#555}.mw-parser-output .cs1-subscription span,.mw-parser-output .cs1-registration span{border-bottom:1px dotted;cursor:help}.mw-parser-output .cs1-ws-icon a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/4/4c/Wikisource-logo.svg/12px-Wikisource-logo.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output code.cs1-code{color:inherit;background:inherit;border:inherit;padding:inherit}.mw-parser-output .cs1-hidden-error{display:none;font-size:100%}.mw-parser-output .cs1-visible-error{font-size:100%}.mw-parser-output .cs1-maint{display:none;color:#33aa33;margin-left:0.3em}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration,.mw-parser-output .cs1-format{font-size:95%}.mw-parser-output .cs1-kern-left,.mw-parser-output .cs1-kern-wl-left{padding-left:0.2em}.mw-parser-output .cs1-kern-right,.mw-parser-output .cs1-kern-wl-right{padding-right:0.2em}
^ Ghoman, Satyajit; Wang, Zhicun; Chen, PC; Kapania, Rakesh (2012), A POD-based Reduced Order Design Scheme for Shape Optimization of Air Vehicles
^ Karhunen–Loeve transform (KLT), Computer Image Processing and Analysis (E161) lectures, Harvey Mudd College
^ Raju, C.K. (2009), "Kosambi the Mathematician", Economic and Political Weekly, 44 (20): 33–45
^ Kosambi, D. D. (1943), "Statistics in Function Space", Journal of the Indian Mathematical Society, 7: 76–88, MR 0009816.
^ A wavelet tour of signal processing-Stéphane Mallat
^ X. Tang, “Texture information in run-length matrices,” IEEE Transactions on Image Processing, vol. 7, No. 11, pp. 1602–1609, Nov. 1998
References
Stark, Henry; Woods, John W. (1986). Probability, Random Processes, and Estimation Theory for Engineers. Prentice-Hall, Inc. ISBN 978-0-13-711706-2.
Ghanem, Roger; Spanos, Pol (1991). Stochastic finite elements: a spectral approach. Springer-Verlag. ISBN 978-0-387-97456-9.
Guikhman, I.; Skorokhod, A. (1977). Introduction a la Théorie des Processus Aléatoires. Éditions MIR.
Simon, B. (1979). Functional Integration and Quantum Physics. Academic Press.
Karhunen, Kari (1947). "Über lineare Methoden in der Wahrscheinlichkeitsrechnung". Ann. Acad. Sci. Fennicae. Ser. A. I. Math.-Phys. 37: 1–79.
Loève, M. (1978). Probability theory. Vol. II, 4th ed. Graduate Texts in Mathematics. 46. Springer-Verlag. ISBN 978-0-387-90262-3.
Dai, G. (1996). "Modal wave-front reconstruction with Zernike polynomials and Karhunen–Loeve functions". JOSA A. 13 (6): 1218. Bibcode:1996JOSAA..13.1218D. doi:10.1364/JOSAA.13.001218.
- Wu B., Zhu J., Najm F.(2005) "A Non-parametric Approach for Dynamic Range Estimation of Nonlinear Systems". In Proceedings of Design Automation Conference(841-844) 2005
- Wu B., Zhu J., Najm F.(2006) "Dynamic Range Estimation". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 25 Issue:9 (1618–1636) 2006
Jorgensen, Palle E. T.; Song, Myung-Sin (2007). "Entropy Encoding, Hilbert Space and Karhunen–Loeve Transforms". Journal of Mathematical Physics. 48 (10): 103503. arXiv:math-ph/0701056. Bibcode:2007JMP....48j3503J. doi:10.1063/1.2793569.
External links
Mathematica KarhunenLoeveDecomposition function.
E161: Computer Image Processing and Analysis notes by Pr. Ruye Wang at Harvey Mudd College [1]