non-param filters - particle filter and histogram filter summary

2021-10-04

tech

Particle filter intro

The state estimation filtering will use the KF, PF, memberfship fitlers to fix like below.

Gradient Descent

PF could be used in both linear system and non-linear system. For the linear system, it required that:

Gradient Descent

Why PF? Most of systems are non-linear, and Gaussian noise will be violated sometimes. PF uses the Monte Carlo methods to simulate the samples, according to the large number theory, as particles get large, the empirical distribution gets close to the true distribution.

Gradient Descent

Steps

x: state variable; u: inputs; z: observations; d: data.

Gradient Descent

Prediction:

Gradient Descent

Update:

Gradient Descent

Resample: sample importance sampling. (particle deletion; high probability in MAP not represent well; density could not represent the real pdf) Importance resampling. poster / prior with observation model.

Gradient Descent

Output estimate state

Gradient Descent

This process could be wrote as:

Gradient Descent

Improve: Rao-Blackwellised Particle Filter

The aim of Rao-Blackwellised Particle Filtering is to find an estimator of the conditional distribution ![{\displaystyle p(y_{t}

z_{t})}](https://wiki.ubc.ca/api/rest_v1/media/math/render/svg/e98e6a7686d803cd1f7b676db7a2b212139a3e56) such that less particles will be required to reach the same accuracy as a typical particle filter.

split the posterior probability into two sets, one could be calculated by closed form(margin probability accumlated) and other could be calculated by the PF.

Gradient Descent

Application

Monte Carlo Localization in ref. 4.
Realtime Monte Carlo Localization

Summary

Why PF? Advantage & disadvantages.
How to do with PF?
Improve of PF?

Histogram filter

Another non-parameter method, and using the grid to represent the state. The formula very similar to PF.

Gradient Descent

More state estimation with parametric filters are:

KF & EKF and summary

KL divergence: Consider two probability distributions $P$ and . Usually, $P$ represents the data, the observations, or a probability distribution precisely measured. Distribution $Q$ represents instead a theory, a model, a description or an approximation of $P$ . The Kullback–Leibler divergence is then interpreted as the average difference of the number of bits required for encoding samples of $P$ using a code optimized for $Q$ rather than one optimized for $P$ . To measure the observe distribute P is similar to guess Q. Is the same as cross entropy in wiki.

Gradient Descent

KLD sample:

Gradient Descent

With relationship with cross-entropy:

Gradient Descent

Ref: