Allen Brunson
2005-10-05 01:51:37 UTC
so, who wants to have a go at the filtering stuff i've designed so far?
here's some sample rules, as they will be found in the FilterRules.txt file,
or whatever i decide to call it:
if inreplyto "***@serious.net"
then color {6,50,30} score +10 download
account "Grumpy,Hog Tiller"
group "alt.internet.talk.haven,welcher.talk.net"
expires Jan 6, 2007
if header from matches "Alistair Cooke <***@stupid.com>"
group "group.alistair.likes.to.troll"
then kill
if header path regex "<pretend i wrote a regex here>spamsite.com"
if NOT header from contains "***@okay.com"
then kill
filters can be applied to any header in a message. before downloading a
group, i'll make the filter list report all headers that it wants to examine
for that group. if all examined headers can be found in the overview data,
i.e., the stuff you can get via XOVER, then the filter process begins by
downloading a bunch of XOVER data, and the program filters against that. if
the user wants to filter against something exotic that isn't in the overview
database, then we'll have to do a HEAD for every message we're thinking about
downloading. that's an order of magnitude slower, but if people really want
to filter against that stuff, i can't see a way around it. i asked the news
admins in news.software.nntp, and they seem to think that XHDR and XPAT are
pretty much worthless.
a rule can contain any number of "if" clauses. the usual form is:
if (NOT) header <name> <matches|contains|startswith|endswith|regex> "string"
and there's one more form, for a special case:
if (NOT) inreplyto <email-address>
which means i'll have to look at the references: header, then go get the
parent message of this one, and look for that e-mail address in the from:
header. boy, is that going to be a pain in the butt.
after that, you've got the "then" line. a filter rule will allow you to:
apply a color; apply a positive or negative score, which will make this
message sort to the top or the bottom of the headers view, relative to other
scores; go ahead and download the message, if this group is set for
headers-only; "kill" the message, i.e., don't download it.
then you've got some optional parameters. you can specify a list of one or
more accounts that this rule will apply to; without it, the rule applies to
all accounts. and there's an optional list of newsgroups to apply the rule
to, otherwise the rule applies to all groups. finally, there's an "expire"
parameter, so that this rule can time out after awhile. this will be useful
when you decide to kill a thread; the associated rule will probably expire in
two weeks, by default.
special cases. when a rule is applied to the subject header, it will match as
if the "Re: " part wasn't there, if this is a reply. i'll have at least two
"pseudo-headers:" author and e-mail address, since it would be damned
inconvenient to have to write a regex to rip up the from header. and if
there's a need, i'll probably interpret the date header and let you do
comparisons against it as if it were an integer representing the number of
microseconds since 1970.
can anybody think of any cases this doesn't cover? speak now, before i get
too much stuff written to change my mind very much.
here's some sample rules, as they will be found in the FilterRules.txt file,
or whatever i decide to call it:
if inreplyto "***@serious.net"
then color {6,50,30} score +10 download
account "Grumpy,Hog Tiller"
group "alt.internet.talk.haven,welcher.talk.net"
expires Jan 6, 2007
if header from matches "Alistair Cooke <***@stupid.com>"
group "group.alistair.likes.to.troll"
then kill
if header path regex "<pretend i wrote a regex here>spamsite.com"
if NOT header from contains "***@okay.com"
then kill
filters can be applied to any header in a message. before downloading a
group, i'll make the filter list report all headers that it wants to examine
for that group. if all examined headers can be found in the overview data,
i.e., the stuff you can get via XOVER, then the filter process begins by
downloading a bunch of XOVER data, and the program filters against that. if
the user wants to filter against something exotic that isn't in the overview
database, then we'll have to do a HEAD for every message we're thinking about
downloading. that's an order of magnitude slower, but if people really want
to filter against that stuff, i can't see a way around it. i asked the news
admins in news.software.nntp, and they seem to think that XHDR and XPAT are
pretty much worthless.
a rule can contain any number of "if" clauses. the usual form is:
if (NOT) header <name> <matches|contains|startswith|endswith|regex> "string"
and there's one more form, for a special case:
if (NOT) inreplyto <email-address>
which means i'll have to look at the references: header, then go get the
parent message of this one, and look for that e-mail address in the from:
header. boy, is that going to be a pain in the butt.
after that, you've got the "then" line. a filter rule will allow you to:
apply a color; apply a positive or negative score, which will make this
message sort to the top or the bottom of the headers view, relative to other
scores; go ahead and download the message, if this group is set for
headers-only; "kill" the message, i.e., don't download it.
then you've got some optional parameters. you can specify a list of one or
more accounts that this rule will apply to; without it, the rule applies to
all accounts. and there's an optional list of newsgroups to apply the rule
to, otherwise the rule applies to all groups. finally, there's an "expire"
parameter, so that this rule can time out after awhile. this will be useful
when you decide to kill a thread; the associated rule will probably expire in
two weeks, by default.
special cases. when a rule is applied to the subject header, it will match as
if the "Re: " part wasn't there, if this is a reply. i'll have at least two
"pseudo-headers:" author and e-mail address, since it would be damned
inconvenient to have to write a regex to rip up the from header. and if
there's a need, i'll probably interpret the date header and let you do
comparisons against it as if it were an integer representing the number of
microseconds since 1970.
can anybody think of any cases this doesn't cover? speak now, before i get
too much stuff written to change my mind very much.
--
Maybe in a perfect world, Springsteen's idea of a good time was to drink a
few pints and listen to all six sides of Sandinista!, while avoiding
Clarence Clemmons via caller I.D. -- Joshyboy of "Johnny Socko"
Maybe in a perfect world, Springsteen's idea of a good time was to drink a
few pints and listen to all six sides of Sandinista!, while avoiding
Clarence Clemmons via caller I.D. -- Joshyboy of "Johnny Socko"