Re: [gutvol-d] Spam on PG lists?

newer
re: [gutvol-d] Humanities...

older
Slashdot on Google Print

JBuck814366460＠aol.com

22 Mar 2005 22 Mar '05

10:27 p.m.

...

Isn't the list open to subscribers-only? If not, I suggest moving it to that model.

I agree, if it isn't subscriber-only, it should be as soon as possible. The spam is very annoying and doesn't belong on the list. Jared

Attachments:

attachment.html (text/html — 759 bytes)

Show replies by date

David A. Desrosiers

22 Mar 22 Mar

10:55 p.m.

New subject: Spam on PG lists?

...

...
Isn't the list open to subscribers-only? If not, I suggest moving it to that model.

...

I agree, if it isn't subscriber-only, it should be as soon as possible. The spam is very annoying and doesn't belong on the list.

Honestly, I haven't seen a single spam on either list since I've been a subscriber (a year?). Then again, I run dspam on my MTA, and its probably catching and quarantining them so I never even see them. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com

Pauline

11:03 p.m.

New subject: Spam on PG lists?

David A. Desrosiers wrote:

...

Honestly, I haven't seen a single spam on either list since I've been a subscriber (a year?). Then again, I run dspam on my MTA, and its probably catching and quarantining them so I never even see them.

From my quick peek - it's only the posted list (posted@pglaf.org) archive which is visible. So anyone submitting projects to PG will have a visible email address to email harvesters. The gutvol* lists are OK. I hope this helps, P -- Help digitise public domain books: Distributed Proofreaders: http://www.pgdp.net "Preserving history one page at a time." Set free dead-tree books: http://bookcrossing.com/referral/servalan

Tony Baechler

23 Mar 23 Mar

7:42 a.m.

New subject: Spam on PG lists?

Hello. What is dspam? How hard is it to set up? Is it similar to Spam Assassin? I'm running qmail under Linux and had an extremely hard time setting up spam filtering, so I eventually gave up. I have not heard of that antispam package before. More information would be appreciated. Thanks. To stay on topic, I have received no spam from the pglaf.org lists and I do not run a spam filter locally.

Greg Newby

7:22 p.m.

New subject: Spam on PG lists?

On Tue, Mar 22, 2005 at 11:42:11PM -0800, Tony Baechler wrote:

...

Hello. What is dspam? How hard is it to set up? Is it similar to Spam Assassin? I'm running qmail under Linux and had an extremely hard time setting up spam filtering, so I eventually gave up. I have not heard of that antispam package before. More information would be appreciated.

I did a very informal comparison of dspam to Spam Assassin, and found them to be about the same. They have some different features, but basically both "learn" based on your mail patterns. dspam takes a little longer to get trained, and is tuned to have a very low portion of false positives (that is, it very seldom flags non-spam as spam). With any spam filter, though, it's important to periodically check the logs or spam folders, to see what messages were misidentified as spam.

...

To stay on topic, I have received no spam from the pglaf.org lists and I do not run a spam filter locally.

If people could forward spam items to me that were distributed via the lists.pglaf.org server, I can look into how they got to the list. I'll also look into obfuscating email addresses in the logs (via transforming the @ or similar techniques). This is sometimes done automatically with Pipermail (which manages our Mailman archives, I believe), but doesn't seem to be happening. Sorry about that.... I'm still looking for a volunteer to manage the mailing lists, by the way. It takes just a few minutes per day (every day). -- Greg

Carlo Traverso

7:43 p.m.

New subject: Spam on PG lists?

I don't filter the lists, (I apply the filters after accepting pglaf lists) and I don't receive any spam on the lists (a lot outside). Consider the possibility of forged sender address. Carlo

Chuck MATTSEN

8:35 p.m.

New subject: Spam on PG lists?

On Wed, 23 Mar 2005 11:22:52 -0800 Greg Newby <gbnewby@pglaf.org> typed:

...

On Tue, Mar 22, 2005 at 11:42:11PM -0800, Tony Baechler wrote:

...
Hello. What is dspam? How hard is it to set up? Is it similar to Spam Assassin? I'm running qmail under Linux and had an extremely hard time setting up spam filtering, so I eventually gave up. I have not heard of that antispam package before. More information would be appreciated.

I did a very informal comparison of dspam to Spam Assassin, and found them to be about the same. They have some different features, but basically both "learn" based on your mail patterns. dspam takes a little longer to get trained, and is tuned to have a very low portion of false positives (that is, it very seldom flags non-spam as spam). With any spam filter, though, it's important to periodically check the logs or spam folders, to see what messages were misidentified as spam.

Another alternative tool is POPFile (or any of the other Bayesian filters) ... http://popfile.sourceforge.net/ ... also free, open source, cross-platform. It has the advantage of being very fast in its processing of incoming mail (POP3 included), and it "learns" very quickly what the user considers spam and "not spam" ... actually, one could set up any number of different categories and, with time, it would learn to sort things however one wished. I get about 10,000 e- mails per months and POPFile has been running at about 99.81% accuracy for me with respect to false-positives, etc.

...

...
To stay on topic, I have received no spam from the pglaf.org lists and I do not run a spam filter locally.

Nor have I received any.... -- Chuck MATTSEN / mattsen at arvig dot net / Mahnomen, MN, USA Mandrakelinux release 10.2 (Cooker) for i586 kernel 2.6.10-3.mm.5mdk RLU #346519 / MT Lookup: http://eot.com/~mattsen/mtsearch.htm Random Thought/Quote for this Message: From listening comes wisdom, from speaking, repentance.

Jared Buck

11:35 p.m.

New subject: Spam on PG lists?

Hi Greg, Sure, I wouldn't mind managing the lists for a couple minutes a day. I can't promise it will be as soon as I get up (I tend to sleep more than the average person) but it will be once a day. I'll forward you copies of the spam I'm getting on the list as I receive them, then you can figure out how to ban the senders' IPs to keep that mail from getting on the list and interfering with perfectly good discussions. Jared

David A. Desrosiers

24 Mar 24 Mar

12:08 a.m.

New subject: Spam on PG lists?

...

I did a very informal comparison of dspam to Spam Assassin, and found them to be about the same.

They are so dramatically different, I can't believe you even would suggest they're "about the same". SpamAssassin is written in Perl, and is significantly slower than dspam. SpamAssassin also relies on static rulesets, not the "quality" of the mail received. You can't do per-user filtering with SA. With dspam, if one user prefers seeing lots of HTML advertisements, they can. Another user on the same system can reject those as spam. In my case, I was using SpamAssassin for about 2 years, trained down to a threshhold of 2, with 13 RBLs in place, and my users were still getting 20-30 spams per-week. SpamAssassin's accuracy under that configuration after 2 years was about 90%. In 1 month of using dspam, we were over 98% accuracy, AND I no longer had to manage mail. The users get their own quarantine and they can manage their own mail "quality" themselves, I don't _ever_ have to get involved.

...

They have some different features, but basically both "learn" based on your mail patterns. dspam takes a little longer to get trained, and is tuned to have a very low portion of false positives (that is, it very seldom flags non-spam as spam).

You probably didn't read the docs. Did you load it with the SA corpus first? Did you train it with that corpus? It took about an hour for me to train it to a level where it was accurately catching and quarantining mail. Getting dspam configured properly is no small task, and you have to be _very_ careful about using conflicting algorithms when you configure and build it. Also, were you using TOE? TEFT? TUM? Each of these has VERY different usages and specific conditions where they work well, or horrible.

...

With any spam filter, though, it's important to periodically check the logs or spam folders, to see what messages were misidentified as spam.

And with dspam, this is all handled completely seamlessly, no need to "check logs" or "spam folders" at all. Users simply forward their false positives to spam-$USER@domain.com, and it gets marked as spam. When more emails come in that match similar tokens, those are marked as spam also.

...

I'm still looking for a volunteer to manage the mailing lists, by the way. It takes just a few minutes per day (every day).

I host quite a few mailing lists here for SourceFubar.Net, and I'd be happy to take over management of the lists for you, if you wish. We don't have any spam on the lists we host, and everything works as it should. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com

Bruce Albrecht

3:54 p.m.

New subject: Spam on PG lists?

David A. Desrosiers writes:

...

...
I did a very informal comparison of dspam to Spam Assassin, and found them to be about the same.

They are so dramatically different, I can't believe you even would suggest they're "about the same".

SpamAssassin is written in Perl, and is significantly slower than dspam. SpamAssassin also relies on static rulesets, not the "quality" of the mail received. You can't do per-user filtering with SA. With dspam, if one user prefers seeing lots of HTML advertisements, they can. Another user on the same system can reject those as spam.

I don't want this to turn this mailing list into a dspam vs Spam Assassin war, but I think your information about SA is out of date. SA v3 supports multi-tiered (e.g., global, domain, user) configurations, and has bayesian filtering as one of several rules for determining spam. I'd also like to point out that being written in Perl does not imply that something is always much slower than C, especially when large amounts of regular expression pattern matching is involved. Perl developers have spent a lot of time optimizing its pattern matching. The SA Wiki suggests that if you find that SA is slow, you should examine the rule set you're using, and disable inappropriate rules (for example, ones requiring DNS lookups). Bruce

David A. Desrosiers

4:15 p.m.

New subject: Spam on PG lists?

...

I don't want this to turn this mailing list into a dspam vs Spam Assassin war, but I think your information about SA is out of date.

You're right, my information is a bit out of date, dspam is quite a bit ahead of SA now, further than I originally surmised (see further down). But I agree, let's not turn this into a religious war.

...

SA v3 supports multi-tiered (e.g., global, domain, user) configurations, and has bayesian filtering as one of several rules for determining spam.

Does SA support allowing the user to configure their own mail preferences via a simple web interface? Does it support adding and revoking tokens by simply sending the false-positives back through email, without involving a mail administrator? Sure, those things can be written, but do they come as part of the core package? Does that capability exist in the base engine? Incidentally, dspam supports the following, out of the box: - Bayesian filtering - Graham Bayes - Burton Bayes - Noise Reduction - Robinson Geometric Mean calculation - Fisher-Robinson Inverse Chi-Square calculation - Robinson Combined P-Values - Chained Tokens - Neural Networking - Message Innoculation ..and quite a bit more for filtering mail. Does SpamAssassin v3? I'm glad that SA is now beginning to incorporate some of these things now, and they've got a good base project to learn from. I've been very disappointed with SA, and dspam has already trounced it in our case, so we have no need to de-evolve to something that doesn't suit our needs. Less than 10 spam messages total in any user's mailbox in over a year now (that we've been told about), and only a small handful of innocent messages were caught as spam, but were really ham. With the web interface, the user just sends them on to their normal account, and dspam scores them lower, so future versions aren't caught. Works great, and I don't have to be involved in the mail management process _at all_ anymore.

...

I'd also like to point out that being written in Perl does not imply that something is always much slower than C, especially when large amounts of regular expression pattern matching is involved.

True, poorly-written C can definately be worse than Perl, but well-written C is ALWAYS going to be faster than equivalently written Perl. I don't think I've ever seen SA process 100 messages/sec., but dspam has no problem doing the same thing, every day.

...

Perl developers have spent a lot of time optimizing its pattern matching. The SA Wiki suggests that if you find that SA is slow, you should examine the rule set you're using, and disable inappropriate rules (for example, ones requiring DNS lookups).

You're preaching to the choir here, I'm a very heavy user and supporter of Perl, and I use it for 99% of my tasks... but there are some cases where an interpreted language just can't compete with a natively-compiled object code. Anyway, good discussions all around. Use whatever tool fits your needs. In my case (heavy mail use from very disparate sources), dspam easily beat what SA could do, hands-down in terms of quality and speed and flexibility. The added benefit is that now I don't have to micro-manage mail, whitelists, or rulesets anymore. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com

7401

Age (days ago)

7403

Last active (days ago)

List overview

Download

10 comments

9 participants

participants (9)

Bruce Albrecht
Carlo Traverso
Chuck MATTSEN
David A. Desrosiers
Greg Newby
Jared Buck
JBuck814366460＠aol.com
Pauline
Tony Baechler