SpamFilter

A practical modern application of some areas of mathematics is Spam Filtering.

How do you identify a given email as spam, without accidentally, wrongly rejecting good email?

One technique is to use Bayes' Theorem. Take a large number of known spam messages and a large number of known good messages. For all the words that occur, measure whether they are more common in spam, or more common in "ham". That gives a probability for each word to predict whether the email containing it is spam or ham.

So given a single email, look at the words in it and find those that are the best predictors. Bayes' Theorem then lets us use their probabilities to estimate the probability that the email is spam.

Another technique is to use Markov Chains. Not sure how that works, but I'm sure a Google search would find something.

http://www.google.co.uk/search?q=spam+filter+markov+chain
http://www.google.co.uk/search?q=spam+filter+bayes+theorem

Last change to this page
Full Page history
Links to this page

Edit this page
(with sufficient authority)
Change password

Recent changes
All pages
Search

Spam Filter

Navigation