E-mail communication is indispensable nowadays, but the e-mail spam problem continues growing drastically. In recent years, the notion of collaborative spam detection system with a novel e-mail abstraction scheme with nearduplicate matching scheme has been widely discussed. The primary idea of the similarity matching for spam detection is to maintain a known spam database, On purpose of achieving efficient similarity matching and reducing storage utilization, prior works mainly represent each e-mail by a succinct abstraction derived from e-mail content text. However, these abstractions of e-mails cannot fully catch the evolving nature of spams, and are thus not effective enough in near-duplicate detection. In this paper, we propose a novel e-mail abstraction scheme, which considers email layout structure to represent e-mails. We present a procedure to generate the e-mail abstraction using HTML content in e-mail,imap,pop3 and this newly devised abstraction can more effectively capture the A Fuzzy Similarity Aproch For Automated Spam Filtering And Naïve Bayes Classifier is a near-duplicate phenomenon of spams....
Authors: CH. N. V. V. Sivakumar.