DAILY TECH E-LETTER | ARCHIVES
SEARCH: Search Options
Technology Home
Washtech
Tech Policy
Government IT
Markets
Columnists
Personal Tech
Special Reports
Jobs
Navigate washingtonpost.com
Advertisement
Company Postings
Get Quotes
Tech Almanac

Spam Archive Project Seeks to Halt Flood of Unwanted E-mail

Advertisement

_____On The Web_____
spamarchive.org
CipherTrust.com
_____TechNews.com Archive_____
FTC Continues Anti-Spam Push (TechNews.com, Nov 13, 2002)
Spam and Ughs (The Washington Post, Sep 2, 2002)
Making Spam Go Splat (The Washington Post, Jun 9, 2002)
FTC Joins The Fight Against Spam (The Washington Post, Feb 13, 2002)
Weeding In the Garden Of Good E-Mail (The Washington Post, Jan 31, 2002)
_____OnPolitics_____
Today's Political News
Elections 2002 Coverage
Daily E-mail Updates
_____TechNews.com_____
Sign-up for the weekly tech policy e-letter (Delivered every Monday).
E-Mail This Article
Printer-Friendly Version
Subscribe to print edition
By Brian Krebs
washingtonpost.com Staff Writer
Tuesday, November 26, 2002; 1:39 PM

CipherTrust wants garbage -- e-mail garbage to be exact. It wants every e-mail from Nigeria promising millions and every one of those e-mail solicitations for "free" pornography, Viagra and adult services.

The folks at CipherTrust aren't e-mail masochists. They're out to build the Dewey Decimal System of "spam."

The Alpharetta, Ga.-based security company will use the detritus of offensive e-mail marketing to open an online library -- www.spamarchive.org -- that programmers and researchers can use in the never-ending fight against spam.

The company hopes to amass at least 10 million spam samples within a year, said Paul Judge, the company's director of research and development. The project is already well on its way there, he said, thanks to dozens of anti-spam activists who have donated junk e-mails from their in-boxes -- from a few hundred to a few hundred thousand.

Unsolicited bulk e-mail is at an all-time high, according to firms that track it. Spam now accounts for roughly 40 percent of all e-mail, up from less than 10 percent early last year, according to anti-spam service provider Brightmail.

A public spam library could be a huge boon to the anti-spam community, not just to commercial software vendors; most spam-fighting tools are developed by independent programmers who give away their wares.

"This should help eliminate one of the big bottlenecks for people who want to make anti-spam tools," said Paul Graham, a computer programmer who has developed open-source mail filtering programs. "You can write all the code you want, but it won't do a whole lot of good unless you have a large amount of spam to test your algorithms on."

Graham is leading the "statistical filtering" approach to trapping spam, a method whose accuracy depends on the amount of spam used in the testing process.

Traditional anti-spam programs target junk e-mails by searching for specific words or catchphrases commonly found in junk e-mail, such as "teen," "click here" and "Dear Friend." The text-based approach usually stops at least 80 percent of spam.

Yet attempts to increase spam software's level of accuracy by adding more words to the watch list often result in "false positives," where innocent e-mail is treated like spam and sent to the virtual trash can, Graham said.

"For most users, missing legitimate email is far worse than receiving spam," he said.

With statistical filtering, a mathematical formula determines the prevalence of common spam terms within a collection of junk e-mail, and examines how frequently those same terms appear in a body of legitimate messages.

"So, if you notice that a particular word shows up in 20 percent of spam and .0001 percent of good e-mail, it means when you find that term in a newly-arrived e-mail there's a good bet it's spam," Graham said.

Eric S. Raymond, a programmer and unofficial spokesman for the open-source software movement, called the spam archive a good idea, but questioned the need for such a huge database, given the limited vocabulary of the average junk e-mail.

"You can build an effective (statistical) spam filter with a few thousand spam samples because the language spammers use is very stereotyped," Raymond said.

Still, a huge spam archive would yield interesting and useful patterns, such as the Web sites, P.O. Boxes and 1-800 numbers spammers use to ply their trade -- all of which can be used to locate spammers in the real world, Graham said.

Many companies have solicited spam from the public, but most don't share their collections.

The Federal Trade Commission has compiled about 23 million junk e-mail messages, which it uses for fraud investigations and consumer education campaigns. The commission has refused requests to open the record to the public, saying that it is difficult to remove private data from the messages, such as the e-mail addresses of people who sent their spam e-mails to the FTC.

The commission cannot take action against most people who send unsolicited junk e-mail, because it is illegal only if they defraud consumers or solicit illegal activity. Twenty-six states have laws that curb junk e-mail by outlawing bogus return addresses and requiring marketers to identify advertisements with labels such as "ADV:" in a message's subject line.

But even the strongest laws or the best junk e-mail filters won't stop the most ardent spammers, said Raymond.

"It's like a whack-a-mole game: you shut down (spam e-mail) servers in one place, and the same spammers pop up again in another place running a shoestring operation out of their basement," Raymond said. "But in a weird way, that sort of highlights one of the Internet's strengths, that it's very hard to lock someone out of communication or suppress speech."

Lest spammers try to harvest new victims from the database, CipherTrust will "scrub" all messages forwarded to the archive to remove any e-mail addresses. Judge said. Each spam specimen will be tagged with a reference number and assigned to a category based on the message's content.

CipherTrust said spamarchive.org will be a free service, but the company said that having a huge archive of spam will help its technicians improve "IronMail," CipherTrust's proprietary anti-spam product.


TechNews.com Home

© 2002 TechNews.com

Techway Events: Techfast Live | Fast 50
Company Postings: Quick Quotes | Tech Almanac
About TechNews.com | Advertising | Contact TechNews.com | Privacy
My Profile | Reprints | Subscribe to print edition | Syndication