Saturday, April 26, 2008

Bayesian Filter: Technology And Advantages

Not long ago, most anti-spam products simply a list of keywords to identify spam. A good set of key words could catch a lot of spam. However, a keyword-based anti-spam filter requires manual updating and can be easily fooled by fine-tuning the message a little. Spammers simply the latest anti-spam techniques and find ways to circumvent them. On the result left with a high number of false positives.

The must be a new technique to effectively combat spam stood up. Experience has shown that this new method could itself on the spammers & 39; tactics would change with time.

The Bayes & 39; specific filtering technology is based on the principle that most events are dependent and that the probability of an event occurring in the future can be detached from the events of this event in the past. This approach is used to identify spam. If some piece of text occurred mainly in spam e-mails, but not in legitimate mail, then it would be reasonable to suppose that this e-mail is probably spam.

To mail filters using the Bayes & 39; ing technology, you must create a database of words, the spam and legitimate mail. Then a probability value is assigned to each word, the probability is based on calculations that take into account how often the word occurs in spam as opposed to legitimate mail.

After the legitimate and spam databases during a first internship, the word probabilities can be calculated and the Bayesian filter is ready for use. When a new mail arrives, it is broken and in words the most important words are highlighted. From these words, the Bayesian filters calculate the probability that a new message as spam or not. If the likelihood is greater than a spam threshold, say 0.9, the message is classified as spam.

Tip! G-Lock Combat spam, you can have the hotkeys in the joint operations. For example, you can use F8 to Mark message as spam function and F9 message to Mark so clean. The next time you train the Bayesian filters can & 39; sche simply with two keys on your keyboard Q8 and Q9.

It is important to note that the analysis of spam and legitimate e-mails to the e-mail to certain users (organization, companies, etc.), and thus the Bayesian filter is adjusted to that particular person, firm or organisation. For example, a financial institution can be a lot of e-mails with the word " " mortgage and would be a lot of false positives if you are using an outdated anti-spam filter. The Bayesian filters analyzes the entire message with the word " mortgage ", and comes to the conclusion whether this e-mail is spam or legitimate basis not only to a single keyword " mortgage & quot ;. The Bayes & 39; specific approach to spam filter is very effective - spam detection rates of over 99.7% can be achieved with a very small number of false positives!

Let & 39; s summarize what advantages we are dealing with the Bayesian filters to catch spam:

1) Much more intelligent approach, because they examine all aspects of a message, in contrast to examine whether keyword classified an e-mail as spam on the basis of a single word.

2) Self-adjustment - constantly learning from new spam and new valid incoming mails, the Bayesian filters develops and adapts to new spam techniques.

3) sensitive to the user - he learns the e-mail habits of the company and assumes, for example, that the e-mails with the word " mortgage " not always spam.

4) Multi-lingual and international - it can be adaptive for each language. The Bayesian filters also take into account certain languages or variations of the multiple use of certain words in different areas, even if the same language is spoken.

5) Difficult to fool, as opposed to a keyword filter - an advanced spammers, will stunt the Bayes & 39; cal filter can either fewer words, which usually indicate spam, or more words that are generally valid Mail specify (such as a valid contact name, etc). Doing the latter is impossible, because the spammers have to know the e-mail profile of the recipient - and a spammer can never hope to collect this type of information by each recipient.



Bookmark it: del.icio.usdigg.comreddit.comnetvouz.comgoogle.comyahoo.comtechnorati.comfurl.netbloglines.comsocialdust.comma.gnolia.comnewsvine.comslashdot.orgsimpy.com

No comments: