||
(For new reader and those who request 好友请求,please read my 公告栏 first)
Because of software problems, there are difficulties displaying equations and figures in this blog article. Until they are fixed, you can always open the original WORD document attached here.
Filtering Tutorial (Final).docx
Explaining Filtering (Estimation) In OneHour, Ten Minutes, One Minute, And One Sentence.
By
Yu-Chi Ho
Harvard and Tsinghua University
Kalman filtering is arguably thecrowning achievement of modern system science and control (Kalman for hiscontribution received what might considered as the Nobel prize of Engineering,the Draper Prize, in 2008) More generally, estimation in the presence of noiseand disturbance is a key ingredient in any scheme for the control andoptimization of complex systems. In the past, I have touched on and explainedvarious aspects of this general topics in my blog articles (http://blog.sciencenet.cn/blog-1565-14253.html , http://blog.sciencenet.cn/blog-1565-426323.html ) aswell as in published references years ago. However, these were donescatter-shot and some time ago. It seems worthwhile to collect all thesematerial in one place and present them in a holistic manner with the latestreferences. At the same time it is an attempt to explain this material to theaverage scientific public without specialization.
For most of this article only highschool algebra background and no probability knowledge are needed forunderstanding. This include nonlinear filtering and estimation. As forprobabilistic concepts and interpretations, the material contained in my five parttutorial on probability and stochastic processes (http://blog.sciencenet.cn/blog-1565-664051.html , http://blog.sciencenet.cn/blog-1565-665359.html ,http://blog.sciencenet.cn/blog-1565-666599.html ,http://blog.sciencenet.cn/blog-1565-669532.html ,http://blog.sciencenet.cn/blog-1565-671318.html ) inearlier blog articles are MORE THAN enough. While an appreciation of theelementary Bayes rule in probability theory will be useful, it will also be explainedas part of this article
We start with a non-probabilisticand intuitive explanation Kalman filter that requires no previous knowledge orbackground. Consider the following simplest problem of estimation. We are givena series of measurement z1, z2, . . . , zk (often denoted by the shorthand, of anunknown constant x. We assume the additive model
zi= x + vi i=1,2, . . .k (1)
where vi are measurementnoises. If nothing else is known, then everyone will agree that a reasonableestimate of x given the k measurements can be given by
(2)
in the sense we try to average outthe unknown measurement noises which can be either positive or negative invalue (note if the noises have an unknown mean or average value, then this meanis no different than the unknown x and cannot be distinguished from x. We havewhat is technically called an unobservable situation. Exercise 1:can you explain the above remark to another reader? What will Eq.(2) determineif k->infinity?)
Now we can re-write Eq.(2) bysimple algebraic manipulation to get
(3)
Eq.(3) which is simply Eq.(2)expressed in recursive form has an interesting interpretation. It says that thebest estimate of x after k measurement is the best estimate of x after k-1measurements plus a correction term. The correction term is the differencebetween what you expect to measure based on k-1 measurement, i.e., andwhat you actually measure zk. The difference can be due to twosources:
(i) isdifferent from x, in which case you should use the difference to modify yournew estimate
(ii) =xactually in which case you should ignore the difference which is due to noise vkonly.
The weighting factor, 1/k, for correctionterm expressed the proper tradeoff betweenthese two considerations. At the beginning when k is small, one does not havegood estimate of x and thus should pay attention to the correction term. As kbecome large, one has more and more confidence in our estimate and lessattention should be paid to the correction (remember the Law of Large Numbersif you are already familiar with probabilities). In the limit whenk->infinity and case (ii) is operative, we completely ignore the correctionterm.
If we label the correction 1/k as Pk,then again simply algebraic manipulation can write the recursive form of Pkas
=
(4)
Believe it or not, Eqs.(3-4) can berecognized as the “Kalman filtering” equations for this simple case. For thosefamiliar with the literature on “stochastic approximation”, Eq.(3) is simplythe formula for successive approximation of the root of the function f(x) whichin this case is f(x)= x using the idea of gradient descent. Eq.(1) can also berecognized as the solution of the simple problem of least square fit
(5)
We can generalize this idea ofleast square (or stochastic approximation, or law of large numbers, or Kalmanfilter) in many, sophisticated and useful ways. Let us see how this can bedone.
First of all, suppose x is avector, then Eqs.(3-4) need not be changed at all. In fact, we can generalizeEq.(1) to
=Hx+
i-1, . . . , k (1’)
Where H is a matrix of scalefactors. Note here x and z need not beof the same dimension. For example, x can a six-vector of position and velocityof a vehicle in space and z is a one dimensional angular measurement of thevehicle with respect to a point on the surface of the earth (in this case H isa 1x6 row vector). Eqs.(3-4) now becomes
(3’)
(4’)
Where Pk is now a nxnmatrix and superscript “T” means “transpose” (note the validity of (3’-4’) canbe easily seen if we treat everything as scalars).
Eqs.(3’-4’) can be further generalized by introducing dynamics. Suppose the unknownx evolves according to
xi=Фxi-1, i=1,2, . . ., k with x1=x (5)
We want to know how Eqs. (3'-4')should be modified. Here the algebraic manipulation becomes a little messy butthe principle (i.e. to do a least square fit) is still the same. We wish tofind a that is the least square fit to the singleunknown x based on all the measurements. To simplify the manipulation we shalljust solve the case when k=2, I.e., we wish to solve
(
(6)
Directly minimizing for the optimalx as we get
+
;
(3”)
and
(4”)
Thus Eqs. (3'-4') becomes Eqs.(3"-4")
Eq. (5) can also be furthergeneralized by
xi=Фxi-1+wi-1 (5’)
where wi are disturbancenoises in the dynamics equation. Finally, let x be a n-vector, and the noisesand disturbances carry different weights “R” and “Q”, then we merely need toadd transposes to some of the matrices, change the number 1 to the identitymatrix, and insert Q and R weight matrices to get
+
;
(3*)
(4*)
Eqs.(3*-4*) are of course the fullblown Kalman filter equations. I cannot emphasize too strongly that the truthof the above statement requires no matrix theory but only messy matrixalgebraic manipulations. Eqs. (3*-4*) can also be derived directly again withmessy matrix manipulation by minimizing the general least square criterion
(6*)
Where the matrices (or scalefactor) Q and R represent relative weighting of the noises w and v.
Solving this minimization problem(except for the involved matrix algebraic manipulation but no new concepts) againdirectly yield the well known general Kalman Filtering formula of Eqs. (3*-4*)
A block diagram of Kalman filteringimplementation is shown below:
BLOCKDIAGRAM OF KALMAN FILTER
In this sense the essence of theKalman theory can be captured by the sentence: ”subtractout every known effect and average the left over.” Thisis the only thing and the best thing that can be done. Kalman filter issimply a mathematical implementation of the above sentence, a glorifiedleast square fit, and an ultra-sophisticated application of the law oflarge numbers.
On probabilistic basis, avery satisfying Bayesian decision-theoretic interpretation [Ho & Lee1965, Bryson and Ho 1969] is also possible. We shall do this below after extendingthe concept of least square fit to nonlinear cases.
( see Deterministic Nonlinearfiltering http://blog.sciencenet.cn/blog-1565-426323.html)
Now the Kalman filter strictlyspeaking is applicable only to linear systems with Gaussian disturbances andmeasurement noises. Since many real world problems are nonlinear and non-Gaussian, the question about nonlinear filtering and estimation naturallyarise and the related question as to “what shall we do in practice?”
The most popular approach to nonlinear estimation is what is known as
EXTENDED KALMAN FILTERING - Here you simply start with a guessestimate/trajectory of the nonlinear system. Linearize around this trajectoryand apply the usual Kalman filtering formula to this linearized system. You canalso recursively update the estimated trajectory with new data andre-linearize. All veryefficient. The only problem is that Kalman filtering is based on a unimodal(Gaussian) distribution. If your problem actually involves/ results inmultimodal distribution, and/or you have very bad nonlinearities that cannot beeasily approximated by successive linearization, then this approach often willbreakdown or lead to very bad estimates
The second approach is
NONLINEAR LEAST SQUARE FIT THE DATA– This approach goes back to our original thesis that optimal filtering isbasically least square fit. Thus this approach is quite stable and works. Theonly problem is the fact that it is not real time. For each new measurement youneed to solve a new two point boundary value problem. For problems where youonly have few measurements and long intervals between observations, thisapproach is quite workable (e.g. on long space voyages). Otherwise, unless youupdate very infrequently, the computational load is quite high and notpractical in real time.
The third approach is PARTICLEFILERING which we shall postpone its discussion to the probabilistic part ofthis tutorial
OK, Now in my opinion you know whatare worth knowing about practical filtering using no probabilistic knowledge.
Now we shall re-derive the formulaabove using probabilistic considerations:
Whenwe say a quantity x is a random variable, we mean that the value it actuallytakes on is unpredictable before the quantity is sampled (or looked at). Wecharacterize this variability using the probability density function or pdf,p(x) which is the probability that the sample value may take on between thevalue of x and x+dx. They say probabilityis a precise way to deal with uncertainly. The pdf p(x) is this precise waysince it is a deterministic function. Now a multivariable function iscomputationally cumbersome if not impossible to work with
http://blog.sciencenet.cn/blog-1565-26889.html (Thisis a dirty secret of applied mathematics that is not often mentioned in papersand textbooks). To simplify description of r.v., we often use the roughcharacterization of a pdf by its mean and variance ( the average value and thespread of values around the mean). These basic concepts have all beenpreviously explained in my probability tutorial (see the five tutorial articlesmentioned earlier in part one).
Insystem studies, we are interested in what might be called input-outputrelationship. If we have a random input to a system, then the output will berandom also. Given the input pdf we want to know the output pdf. In particular,we are interested in p(y) given p(x) when y=f(x). There are two probabilisticanalogs of the Eqs.(3’-4’) and Eqs. (5 or 5’).
(i) The propagation equation.Suppose xk=f(xk-1, wk-1) which is a nonlinearinput/output generalization of (5). Then given p(xk-1) and p(wk-1),we get
(5”)
where are in principle known given p(
and therelationship xk=f(xk-1, wk-1).
(ii) The updating equation.Suppose zk = h(xk, vk) which is a nonlineargeneralization of Eq.(1’), then the Bayes rule says that the probabilisticanalog of Eq.(4, or 4’, or 4*) is
p(x) (4”)
Where p(x) isknown as the prior pdf of x and p(x|z) is the posterior pdf of x , i.e. x afteror conditioned on the measurement z. Once again knowing p(vk) andthe relationship zk = h(xk, vk), we can inprinciple determine p(z|x) and p(z)=.
Note we say “inprinciple” in connection with Eq. (4” and 5”). It is worthwhile to emphasizeonce more the dirty secret of applied mathematics http://blog.sciencenet.cn/blog-1565-26889.html , namely it is basically computationallyimpossible to deal with multi-variable functions in general. Eqs.(4”-5”) aredeceptively simple looking, but infeasible computationally if x, z, w and v aren-vectors. Thus, we look for a parametric description of a pdf to enable thecomputation of Eqs. (4”-5”). We introduce the concept of conjugate or reproducing pdfs.For example, if p(xk-1) and p(xk|xk-1) inEq.(5”)are Gaussian distributed then p(xk) in (5”) will also beGaussian. We can then express Eq.(5”) in terms of a finite dimensional equationrelating the posterior pdf parameters to that of the prior pdf. In fact, if p(xk-1)is Gaussian with mean , and xkand xk-1 are related linearly via Eq.(5’) then p(xk), canbe verified via messy algebraic manipulation, that it is again Gaussian withmean and covariance given by Eqs.(3*-4*). Similarly, we can show that the p(x)and p(x|z) of Eq.(5) is also a reproducing or conjugate pair. If p(x) isGaussian, the p(x|z) will also be Gaussian with the prior and posterior meanand covariance related by Eqs.(3*-4*). In other words, Kalman filter is simplythe propagation and updating of equations (4”-5”) specialized to the Gaussianreproducing case. In our mind’s eye we can visualize the propagation andupdating of the pdf of the state of a dynamic system as illustrated in theFigure below:
A more pictorial diagram of theabove figure can be found in my 45 year old textbook on pages 322-328 and ingeneral in Chapter 12 of Applied OptimalControl.
In general, our dynamic system andobservation equation will not be linear as in the strict Kalman filtering case.Then how do we deal with Eqs.(4”-5”)? One obvious approach is to approximatethe pdfs by discrete histograms (i.e., discrete number of samples) and then tryto implement (4”-5”) by brute force. This is known as
PARTICLE FILTERING–One of theleading practitioners of this approach is Dr. Fred Daum of Raytheon companywith over two decades of real experience and success stories to his credit. Sowhat is Particle Filtering (PF)?In one sentence – you try to propagateand update the multidimensional density function of the system state directlyvia samples (i.e., particles). Of course, you approximate themultidimensional density function by histograms made up by actual samples(i.e., particles) of the state vector. Propagatea histogram (i.e., the particles making up the histogram) is not that hardcomputationally (think about it. It is a simple Monte Carlo run involving theparticles). But the problem of updating a histogram using a new measurement isnot simple. First of all,to adequately represent the histogram you may need an exponentially largenumber of particles if the state dimension is high. This is a universal dirtysecret applied mathematicians do not like to talk about (http://bbs.sciencenet.cn/home.php?mod=space&uid=1565&do=blog&id=26889 ). Tomake matters worse, the Bayes rule used in the updating formula requires you tomultiply the prior density by the likelihood function which falls off rapidlyaround the actual measurement value. Thus, only a small percentage of theparticles will get significant updating. Computationally this is ratherinefficient. Daum’s expertise and contribution are the tricks he has developedto ameliorate this problem (in some recent work with Huang published in SPIEproceedings they have detailed their most successful techniques. See referencebelow) These still havetheir limits, namely, the curse of dimensionality. Fortunately, many realproblems have only a six dimensional state vector (3 position and 3 velocity inaerospace guidance and control applications) which helps in computationalload. In fact, byhindsight, the nonlinear filtering example in my half century old text book onpage 381 http://www.amazon.com/Applied-Optimal-Control-Optimization-Estimation/dp/0891162283 is an extremely simple problem of particle filtering(it has only two possible states) except the problem/solution was not labeledas such since the name PF had not been invented yet some 50 years ago.
Thus, here it is the completeconceptual story of filtering or estimation as I know it:
1. It isnothing more than simple least square fit carry out in a sophisticated manner
2. It isbased on two simple probabilistic formula for the propagation and updating ofpdfs
3. Kalmanfiltering is but cases 1 or 2 specialized to linear systems and/or reproducingGaussian pdfs.
Depending on the audience and timeavailable you can use select part of this article to explain the crowningachievement and Draper prize winning results of system control for the past 50+years.
Theoretically, to derive rigorouslythe nonlinear filtering equations in continuous time
require carefully applying measuretheoretic tools not easily accessible to the engineering public. But forpractical use, discrete time computation makes such knowledge unnecessary sincewe do digital computation anyway. The fact we do not reference them here do notdiminish their theoretical contribution.
References
Y.C. Ho, “OnStochastic Approximation Methods and Optimal
Filtering” J. of Math Anal. and Application ,Vol. 6, No. 1, Feb.
1963, pp 152-154.
R.E. Kalman, Y.C. Ho, and K.S. Narendra,“Controllability of
Linear Dynamical Systems” Contributions to Diff.Equation. ,
Vol. I, Interscience , 1963, pp 189-213.
Y.C. Ho and R.C.K. Lee, “A Bayesian Approach toProblems in
Stochastic Estimation and Control” , IEEE Trans.on Automatic
Control , Vol. 9, Oct. 1964, pp 333-339
R.E. Kalman, “A New Approach to Linear Filteringand
Prediction Problems”,Transactions of theASME–Journal of
Basic Engineering, 82 (Series D): 1960, 35-45.
R.E. Kalman, “Discovery and Invention: TheNewtonian Revolution in
System Technology“, Journal of Guidance andDynamics, Vol 26, #6,
Nov-Dec. 2003, pp1-7
Fred Daum & Jim Huang,”How to avoid normalization ofparticle flow for nonlinear filters,Bayesian decisions and transport” Proceedings of SPIE 2014
Archiver|科学网 ( 京ICP备14006957 )
GMT+8, 2015-1-14 00:12
Powered by ScienceNet.cn
Copyright © 2007-2014 中国科学报社