Discussion:
filtering
(too old to reply)
Allen Brunson
2005-10-05 01:51:37 UTC
Permalink
so, who wants to have a go at the filtering stuff i've designed so far?
here's some sample rules, as they will be found in the FilterRules.txt file,
or whatever i decide to call it:

if inreplyto "***@serious.net"
then color {6,50,30} score +10 download
account "Grumpy,Hog Tiller"
group "alt.internet.talk.haven,welcher.talk.net"
expires Jan 6, 2007

if header from matches "Alistair Cooke <***@stupid.com>"
group "group.alistair.likes.to.troll"
then kill

if header path regex "<pretend i wrote a regex here>spamsite.com"
if NOT header from contains "***@okay.com"
then kill

filters can be applied to any header in a message. before downloading a
group, i'll make the filter list report all headers that it wants to examine
for that group. if all examined headers can be found in the overview data,
i.e., the stuff you can get via XOVER, then the filter process begins by
downloading a bunch of XOVER data, and the program filters against that. if
the user wants to filter against something exotic that isn't in the overview
database, then we'll have to do a HEAD for every message we're thinking about
downloading. that's an order of magnitude slower, but if people really want
to filter against that stuff, i can't see a way around it. i asked the news
admins in news.software.nntp, and they seem to think that XHDR and XPAT are
pretty much worthless.

a rule can contain any number of "if" clauses. the usual form is:

if (NOT) header <name> <matches|contains|startswith|endswith|regex> "string"

and there's one more form, for a special case:

if (NOT) inreplyto <email-address>

which means i'll have to look at the references: header, then go get the
parent message of this one, and look for that e-mail address in the from:
header. boy, is that going to be a pain in the butt.

after that, you've got the "then" line. a filter rule will allow you to:
apply a color; apply a positive or negative score, which will make this
message sort to the top or the bottom of the headers view, relative to other
scores; go ahead and download the message, if this group is set for
headers-only; "kill" the message, i.e., don't download it.

then you've got some optional parameters. you can specify a list of one or
more accounts that this rule will apply to; without it, the rule applies to
all accounts. and there's an optional list of newsgroups to apply the rule
to, otherwise the rule applies to all groups. finally, there's an "expire"
parameter, so that this rule can time out after awhile. this will be useful
when you decide to kill a thread; the associated rule will probably expire in
two weeks, by default.

special cases. when a rule is applied to the subject header, it will match as
if the "Re: " part wasn't there, if this is a reply. i'll have at least two
"pseudo-headers:" author and e-mail address, since it would be damned
inconvenient to have to write a regex to rip up the from header. and if
there's a need, i'll probably interpret the date header and let you do
comparisons against it as if it were an integer representing the number of
microseconds since 1970.

can anybody think of any cases this doesn't cover? speak now, before i get
too much stuff written to change my mind very much.
--
Maybe in a perfect world, Springsteen's idea of a good time was to drink a
few pints and listen to all six sides of Sandinista!, while avoiding
Clarence Clemmons via caller I.D. -- Joshyboy of "Johnny Socko"
Alexander G. M. Smith
2005-10-08 17:29:12 UTC
Permalink
if the user wants to filter against something exotic that isn't in the
overview database, then we'll have to do a HEAD for every message we're
thinking about downloading.
How about filtering by body content? Sometimes you want to do that for
generic spam detection. Maybe also have a plug-in for filters, kind of
like MDR in BeOS, so that the filter can get the whole message and
evaluate it. Or just the headers - MDR did that too for POP mailboxes by
only reading the header unless the filter tried to read past it into the
body. It used a nifty trick by making a custom BPositionIO
http://www.beunited.org/bebook/The%20Support%20Kit/PositionIO.html to turn
the message into a file stream, sort of like a Unix pipe, to feed the
filter.
special cases. when a rule is applied to the subject header, it will
match as if the "Re: " part wasn't there, if this is a reply. i'll have
at least two "pseudo-headers:" author and e-mail address, since it would
be damned inconvenient to have to write a regex to rip up the from
header.
MDR had a pseudo-header just like you, called Thread, which had the
subject without Re:, Fw:, [MailingListID] and a dozen other things. You
could have one for real threads - would the thread root MessageID work?

- Alex
Allen Brunson
2005-10-08 21:04:48 UTC
Permalink
hi alex! haven't seen you around in awhile. i was reading through the
comments on bebits the other day, and some guy who has made a firefox variant
puts a program of yours into the system boot script. i think it's making a
compressed ram disk or something?
Post by Alexander G. M. Smith
How about filtering by body content? Sometimes you want to do that for
generic spam detection.
if i decided to go that far, i'd probably bolt on some external anti-spam
thing, and let it have a go at incoming messages. that's probably outside the
scope of the project, though.
Post by Alexander G. M. Smith
Maybe also have a plug-in for filters, kind of like MDR in BeOS, so that
the filter can get the whole message and evaluate it.
WAY outside the scope of the project. heh!
Post by Alexander G. M. Smith
MDR had a pseudo-header just like you, called Thread, which had the
subject without Re:, Fw:, [MailingListID] and a dozen other things. You
could have one for real threads - would the thread root MessageID work?
what would it be good for? is anybody really going to write a rule that way?

perhaps i should do it that way when i implement my "kill this thread"
feature? then i could say, "any message with this root message-id is killed."
but i'm using that method to group subthreads together in the headers view,
and it doesn't always work, because a lot of newsreaders don't generate proper
references headers, i.e., the first reference is not always the root
reference. so i had planned to implement "kill this thread" by dumping
messages based on the subject line.
Alexander G. M. Smith
2005-10-08 23:27:51 UTC
Permalink
Post by Allen Brunson
hi alex! haven't seen you around in awhile.
Animation festival two weeks ago - 5 days of watching cartoons. Then the
next weekend was spent writing up the memories. And reading Iron Sunrise.
By the way, speaking of animation, I enjoyed Wallace and Gromit - The Curse
of the Were Rabbit last night. Yet more time away from the computer. Then
an unexpected tea this afternoon took away a few more hours. And there's
this interesting full time day job...
Post by Allen Brunson
i think it's making a compressed ram disk or something?
Mozilla/SeaMonkey and Pineapple run better from a RAM disk. I do something
similar too - create a RAM file system and then unzip the saved disk
contents into it (Pineapple News database included). There's also the older
AGMSRAMDiskDevice which implements a block device with automatic load and
save on boot / shutdown, including compression. I suspect that's what
they're using. But it allocates whole blocks for files so it isn't as
memory efficient as the RAM file system.
Post by Allen Brunson
WAY outside the scope of the project. heh!
Yup, better get that thing polished and out the door first.
Post by Allen Brunson
perhaps i should do it that way when i implement my "kill this thread"
feature? then i could say, "any message with this root message-id is killed."
That's what I was thinking - if the rules system was also your killfile
system.
Post by Allen Brunson
but i'm using that method to group subthreads together in the headers
view, and it doesn't always work, because a lot of newsreaders don't
generate proper references headers, i.e., the first reference is not
always the root reference.
Oh. Well, so much for that idea, unless you put in a lot of work making a
separate thread tracker system.
Post by Allen Brunson
so i had planned to implement "kill this thread" by dumping
messages based on the subject line.
It usually works well enough in MDR using hacked apart e-mail subjects as
the thread ID.

- Alex
Michael Koenig
2005-10-09 14:07:45 UTC
Permalink
Post by Allen Brunson
so, who wants to have a go at the filtering stuff i've designed so far?
Sorry for the delay, I was really shot in the last few days...
Do I interpret the translation further down correctly that this doesn't mean
that the email address was found in the Reply-To field but rather that the
filter matches a reply to a posting from that address?
Ah, I guess this is the feature you planned to mark replies to ones own
postings, right?
Post by Allen Brunson
then color {6,50,30} score +10 download
Is the indentation obligatory? In that case you could drop the 'then' as
Python does.
Colour might be a useful tool to quickly spot interesting postings.
As I understand it, scoring affects the header view sorting. Does that apply
only to the scored message or also to the thread it is part of?
Do color and score have to be on the same line or is it just for easier
formatting of the example?
(Sorry, that's just me as a programmer trying to understand your rule
parser...)
Post by Allen Brunson
account "Grumpy,Hog Tiller"
group "alt.internet.talk.haven,welcher.talk.net"
expires Jan 6, 2007
I guess that's clear.
Post by Allen Brunson
group "group.alistair.likes.to.troll"
then kill
Do you plan on filtering more than just header lines and replies in the
furture? I'm just asking, because if that's not the case the 'header' keyword
isn't really necessary, is it?
Is there a specific order for the "arguments" of the 'if' statement? I'm just
asking, because in the example above you had 'then' first and 'group' later,
here it's the other way round.
Post by Allen Brunson
if header path regex "<pretend i wrote a regex here>spamsite.com"
then kill
The nesting is basically an AND operation. I guess OR would demand a new rule
with the current syntax, right?
Post by Allen Brunson
if the user wants to filter against something exotic that isn't in the
overview database, then we'll have to do a HEAD for every message we're
thinking about downloading.
I have no idea how news servers are organized, so what actually *is* in the
overview database? Is that consistent for all news servers?
Post by Allen Brunson
a rule can contain any number of "if" clauses.
I guess a blank line marks the end of a filter rule, right?
Post by Allen Brunson
if (NOT) header <name> <matches|contains|startswith|endswith|regex> "string"
I think you cover the typical comparison criteria. The only thing I'd probably
drop would be the 'header' keyword if it isn't really needed.
Post by Allen Brunson
apply a color; apply a positive or negative score, which will make this
message sort to the top or the bottom of the headers view, relative to other
scores; go ahead and download the message, if this group is set for
headers-only; "kill" the message, i.e., don't download it.
Sounds good to me, but I have to admit that I didn't give filters too much
thought in that respect.
I do use some filters on my mail server though. The biggest limitations there
were that they did have starts-with but no ends-with (but they do now), and
that you can only filter on a few selected header lines, not all.
Since you already have all that, I guess there wouldn't be a limitation from
my point of view.
Post by Allen Brunson
special cases. when a rule is applied to the subject header, it will match
as if the "Re: " part wasn't there, if this is a reply.
Good idea.
Post by Allen Brunson
i'll have at least two
"pseudo-headers:" author and e-mail address, since it would be damned
inconvenient to have to write a regex to rip up the from header.
Right, that's a good idea as well.
Post by Allen Brunson
and if there's a need, i'll probably interpret the date header and let you
do comparisons against it as if it were an integer representing the number
of microseconds since 1970.
Don't you mean seconds since 1970? At least that's what Unix is using, and
that wraps around in 2038 when using a 32 bit integer.
Post by Allen Brunson
can anybody think of any cases this doesn't cover? speak now, before i get
too much stuff written to change my mind very much.
My main use for filters would be to eliminate threads of stupid cross-posters
(most likely a regex comparison of the Newsgroups line to check for multiple
commas), posters without email addresses (Would that work with 'email contains
"<None>"?), and answers to regular spam I killfiled ages ago (Probably just
comparing the subject line...).
For all of the other annoying posters the killfile should be sufficient. I'd
use the filters only where the killfile doesn't work, and in my case that's at
least one jerk who cross-posts with changing names, but most of the time he
doesn't have an email-address (thus I want to filter on that), and a filter
for extreme cross-postings (he normally posts to 5 groups at least) would take
care of answers to those postings as well.

I haven't really thought of using filters for sorting (scoring) yet, maybe I
get more ideas if I try to think of more uses in that field.

I guess having a filter to automatically copy a message to a folder doesn't
make too much sense, a) because unlike the other filters it would have to
happen after the download, b) you cannot reply to a message that was copied to
a folder.
--
M.I.K.e
Allen Brunson
2005-10-09 21:01:11 UTC
Permalink
Post by Michael Koenig
Do I interpret the translation further down correctly that this doesn't mean
that the email address was found in the Reply-To field but rather that the
filter matches a reply to a posting from that address?
that is correct, yes. which means this rule will be harder to implement,
because i have to go looking through current messages to see which one this is
in reply to.
Post by Michael Koenig
Ah, I guess this is the feature you planned to mark replies to ones own
postings, right?
yep. and also to get rid of replies to people you don't like.
Post by Michael Koenig
Is the indentation obligatory? In that case you could drop the 'then' as
Python does.
the indentation is obligatory, and yes, i could drop the 'then', but that
would make parsing harder. i parse stuff line-by-line, deciding what type of
thing it is by the first word on the line.

i'm writing discrete code to parse rules, which i suppose is kind of LAYM. i
really should learn yacc or lexx or something, but i don't think i will.
Post by Michael Koenig
Colour might be a useful tool to quickly spot interesting postings.
you can set a color as the result of a rule. is that enough?
Post by Michael Koenig
As I understand it, scoring affects the header view sorting. Does that apply
only to the scored message or also to the thread it is part of?
which do you think? i think the whole thread ought to get hoisted to the
score of its highest member.

threading should be more or less inviolable. if the user doesn't want that,
then there's the option in the view menu to turn it off.
Post by Michael Koenig
Do color and score have to be on the same line or is it just for easier
formatting of the example? (Sorry, that's just me as a programmer
trying to understand your rule parser...)
yes, they have to be on the same line. and in my case, "the rule of the
parser" is "it's LAYM."
Post by Michael Koenig
Post by Allen Brunson
group "group.alistair.likes.to.troll"
then kill
Do you plan on filtering more than just header lines and replies in the
furture? I'm just asking, because if that's not the case the 'header'
keyword isn't really necessary, is it?
maybe not, but it makes parsing easier. heh!
Post by Michael Koenig
Is there a specific order for the "arguments" of the 'if' statement? I'm
just asking, because in the example above you had 'then' first and 'group'
later, here it's the other way round.
'group' isn't part of the 'if' clause. 'group' and 'account' means that the
rule only gets applied to the listed accounts and groups.

okay, technically, i guess that just means it's another form of 'if'. but for
me, i'm going to use it at a different phase of the process. like, just
before selecting a particular group, i'll scan through the rules list to see
which ones apply to this particular download, and make a sub-list of those
only.
Post by Michael Koenig
The nesting is basically an AND operation. I guess OR would demand a new
rule with the current syntax, right?
yep. OR would be hard to express, syntactically. the workaround would be to
make two rules.
Post by Michael Koenig
I have no idea how news servers are organized, so what actually *is* in the
overview database?
the main commands you use to get article data are HEAD, which gets the headers
for a particular message, and ARTICLE, which gets both the headers and the
body. and there's also BODY, which will get just the body, assuming you've
already got the headers. that's not really enough information to do
filtering, though. if you have to do a HEAD for every message as part of the
filtering stage, it would take a long time.

therefore, news servers have this separate-but-parallel database of only a few
headers from every message, called "the overview database." there are three
commands to access its data, but according to one of the more knowledgeable
news admins i asked, two of them are LAYM. that leaves just one command,
XOVER. i suppose it will eventually move out of X-land, and be renamed just
OVER. but at the rate the NNTP protocol evolves, i don't think it's
reasonable to expect that to happen before the heat death of the universe.

you can request overview data for one article or a range of articles. the
server sends you the overview data in the format of one line per message, with
each header separated by a tab character. it doesn't send the header names,
i.e., the Date: and From: parts, just the meat. therefore, you have to know
the order that the headers are in, which is fixed.
Post by Michael Koenig
Is that consistent for all news servers?
and therein lies the rub. no, it is NOT consistent for all news servers.
they are all required to have, i think, the first three headers, which must
always be the same and in the same order. after that, the overview data can
contain zero or more additional headers. there's a command you can issue to
determine what those headers are, and you *have* to do it, because as i said,
not every news server will have the same headers in its overview database.
Post by Michael Koenig
I guess a blank line marks the end of a filter rule, right?
i'm forcing the file format to be even stricter than that. a new rule must
begin at column zero. secondary lines must be indented by at least one
whitespace char. then the rule ends with a blank line.
Post by Michael Koenig
Post by Allen Brunson
and if there's a need, i'll probably interpret the date header and let you
do comparisons against it as if it were an integer representing the number
of microseconds since 1970.
Don't you mean seconds since 1970? At least that's what Unix is using, and
that wraps around in 2038 when using a 32 bit integer.
internally, pnews keeps time as microseconds-since-1970, stored in 64-bit
integers. that was pretty much the natural time format in beos, and i grew to
like it. so now i've imposed it on macosx as well (heh).
Post by Michael Koenig
posters without email addresses (Would that work with 'email contains
"<None>"?),
more like "email equals <None>", but yeah.
Post by Michael Koenig
and answers to regular spam I killfiled ages ago (Probably just
comparing the subject line...).
i think "kill thread" will work for that. i'll make a few such rules
auto-generated from menu items.
Post by Michael Koenig
For all of the other annoying posters the killfile should be sufficient. I'd
use the filters only where the killfile doesn't work, and in my case that's
at least one jerk who cross-posts with changing names, but most of the time
he doesn't have an email-address (thus I want to filter on that),
there is probably some other header you could use. some isps insert a header
that contains the poster's IP address. some others have a unique identity
header. i notice yours does, for example:

X-Trace: individual.net BljPJqh6x2ss1v2xgOrdKwDBq63aXBDX3HWvszjBFhFnql/Gc=

i'm pretty sure that is supposed to uniquely identify a poster.
Post by Michael Koenig
I guess having a filter to automatically copy a message to a folder doesn't
make too much sense, a) because unlike the other filters it would have to
happen after the download, b) you cannot reply to a message that was copied
to a folder.
that wouldn't be too hard, actually. would you really want to do that? under
what circumstances?

i assumed i'd have to add a rule like that when i get pmail working, for
mailing lists and so on. i can't see how it would be of much use for news,
though.
Alexander G. M. Smith
2005-10-10 15:42:26 UTC
Permalink
Post by Allen Brunson
Post by Michael Koenig
I guess having a filter to automatically copy a message to a folder
doesn't make too much sense, a) because unlike the other filters it
would have to happen after the download, b) you cannot reply to a
message that was copied to a folder.
that wouldn't be too hard, actually. would you really want to do that?
under what circumstances?
Sounds a bit like the MDR (BeOS Mail Daemon Replacement) filtering system,
which has filters that let you redirect messages to different folders.
Unfortunately the code to do that is combined with the code to do the test.
So every time you think up a new kind of test (regular expression, Bayesian
spam filter, etc), you have to duplicate the code that moves a message to a
folder, and the GUI to configure the folder name etc. Would have been
better to have a fancier pipeline system where messages could be marked with
attributes and then have a mover command at the end of the filter chain to
look for those attributes and do the move. Hint, hint...

- Alex
Allen Brunson
2005-10-10 21:02:35 UTC
Permalink
Would have been better to have a fancier pipeline system where messages
could be marked with attributes and then have a mover command at the end
of the filter chain to look for those attributes and do the move. Hint,
hint...
d00d. i would no more connect two things together like that than i would take
an elephant to a bowling alley. because everybody KNOWS elephants don't like
bowling. DUH. also, those places never have bowling shoes big enough for
them.

one of my new beta testers was pretty happy to sign up for testing duty. he
said he had done the same thing for the halime guy, back when that program was
still being developed. i'm pretty sure he thought "beta testing" would mean
"find some scenario where the program crashes, then report it." it's been
about two weeks now, and despite his moving to pnews full time, he just
e-mailed me that he has STILL not seen it crash even once. his attitude was
that this must be some weird statistical anomaly, like he'd hit a hole in one
and won the lottery on the same day.

it's not that there's *never* a crash. michael has seen a couple. my other
new guy, ciprol, got a big one recently, but that's because he was the first
to use a new feature that he'd just requested. but the deal is, crash
scenarios get reported, then fixed, then never heard from again.

but that means my coding style suffers in other areas. i over-engineer
everything, so i add new features more slowly than most. i've got two guys
who are on powerbooks, and their hard disk subsystems are slower and
relatively louder than for powermacs, so both those guys think that my bulk
operations (marking all messages read, for instance) are too slow. they can
hear their hard disks grinding away and it makes them gnash their teeth.
those of us on powermacs can't hear our hard disks, because they are drowned
out by the fan. heh! in my world, slow-and-fully-encapsulated always wins
over fast-and-potentially-dangerous, which makes the situation a little worse.
the powerbook guys would probably be willing to trade the occasional crash for
faster disk operations.

i really wish i had time to back-port all this cool stuff i've written to
beos. in the last month or so, i've gotten three registrations. two of them
were for the beos version, i'm sure, because they came in before macosx/pnews
went public. that was after a long drought, so i have to assume zeta is
actually promoting beos usage. the other was shortly after the public
release, so i have to assume that one was for macosx. so even with no real
updates for years, the beos version is out-selling the mac version. oy vey.

small wonder. beos/pnews may well be the best gui newsreader on the platform.
on macosx, there's lots of credible alternatives, most with way more features
than i've got now. the one area where i can claim superiority is language
support, but that's only an issue for non-english-speakers, which make up an
abysmally small percentage of the world's (paying) software consumers.
Steve Hodgson
2005-10-11 19:14:57 UTC
Permalink
The one area where i can claim superiority is language
support, but that's only an issue for non-english-speakers, which make up an
abysmally small percentage of the world's (paying) software consumers.
And people who care about getting these things right - an even smaller
percentage!

Apologies for keeping my head down of late. Things are all just too frantic
and I haven't even got round to 0.8.2 yet.

Cheers,

Steve
Allen Brunson
2005-10-11 22:09:12 UTC
Permalink
Post by Steve Hodgson
The one area where i can claim superiority is language support, but
that's only an issue for non-english-speakers, which make up an
abysmally small percentage of the world's (paying) software consumers.
And people who care about getting these things right - an even smaller
percentage!
i'd guess you're underestimating the potential for havoc. you're probably
thinking about what happens when a program improperly decodes a message in,
say, french. in that case, maybe one character out of 30 will be wrong, like
all the ones with diacritical marks. but if a program improperly decodes a
message in chinese, the result is completely unusable. every single character
will be wrong. the message is as useless as if it were encrypted.

other program's decoding mistakes are my opportunity to shine.
Post by Steve Hodgson
Apologies for keeping my head down of late. Things are all just too frantic
and I haven't even got round to 0.8.2 yet.
eh, it's just as well. i don't expect the program will get interesting for
you until i've got filtering implemented, and i have to admit, i'm putting
that farther and farther back on the schedule, because i prefer to work on
language-oriented features, which i find more interesting. the more i look at
all the stuff i'll have to change to make the program suitable for
localizations, the more horrified i get.
Steve Hodgson
2005-10-11 19:24:23 UTC
Permalink
Post by Allen Brunson
both those guys think that my bulk
operations (marking all messages read, for instance) are too slow.
4.5 minutes to set 132 visible headers as read. It does set unread state for
7000+ messages before the headers change state.

Cheers,

Steve
Allen Brunson
2005-10-11 22:02:01 UTC
Permalink
Post by Steve Hodgson
4.5 minutes to set 132 visible headers as read. It does set unread state for
7000+ messages before the headers change state.
you know what, i fixed that, last night. the program now uses the cache
database to determine which messages need to be changed, and only operates on
those. so in your case, where only 132 need to be marked read out of 7,000
messages, the speed-up may well be around two orders of magnitude.

this will be in 0.8.3, which i will probably have to release prematurely,
because i just discovered a near-fatal bug in my encoding conversion routines.
ciprol is trying to pimp the program to his chinese friends, so it's no good
if it doesn't work for a lot of their groups.
Alexander G. M. Smith
2005-10-12 02:58:38 UTC
Permalink
Post by Allen Brunson
d00d. i would no more connect two things together like that than i would
take an elephant to a bowling alley. because everybody KNOWS elephants
don't like bowling.
Nice to hear. You have become wise in your trials. Plus a lack of crashing
should cut down on customer support.
Post by Allen Brunson
i really wish i had time to back-port all this cool stuff i've written to
beos.
I wish I had time too. Still working on updating the spam checker
tokenization to something more modern - handling those HTML tricks spammers
use. Plus some more international stuff, including truncating long asian
"words" down to a single character since they're likely a whole sentence,
not a word.
Post by Allen Brunson
the one area where i can claim superiority is language support, but
that's only an issue for non-english-speakers, which make up an
abysmally small percentage of the world's (paying) software consumers.
A spot on the small dot, you are. The dot being Apple in the PC world.
Still, you have a few thousand potential eager customers in China for a
good news reader. More than BeOS users, you'd think. Maybe they also have
money to pay for your software!

- Alex
Michael Koenig
2005-10-19 17:57:36 UTC
Permalink
Allen Brunson wrote:
[replyto]
Post by Allen Brunson
Post by Michael Koenig
Ah, I guess this is the feature you planned to mark replies to ones own
postings, right?
yep. and also to get rid of replies to people you don't like.
Isn't that just direct replies or will it take care of the whole thread some
troll starts? (The latter case would probably be mind-boggingly difficult...)
Post by Allen Brunson
the indentation is obligatory, and yes, i could drop the 'then', but that
would make parsing harder.
It was a silly question anyway, because the other example had the 'then' line
in a different position which would make it nearly impossible to identify
without the leading keyword.
Post by Allen Brunson
i parse stuff line-by-line
Out of interest, when is that done? Just during the application start?
Post by Allen Brunson
really should learn yacc or lexx or something, but i don't think i will.
I'm pretty sure lexx and yacc are pretty cool (I don't know how to use them
either), but the scanners and parsers are hardly the most efficient. This is
one of the reasons why the LCC compiler with hand-coded scanner and parser is
so fast (the fact that it lacks a lot of code optimization is another).

[score effects on threads]
Post by Allen Brunson
which do you think? i think the whole thread ought to get hoisted to the
score of its highest member.
I'd say that threads containing messages with a higher score should be sorted
to the beginning of the headers view, and it probably should work the same way
for sub-threads.
But how to deal with threads that contain positive and negative scores?
Post by Allen Brunson
'group' isn't part of the 'if' clause. 'group' and 'account' means that the
rule only gets applied to the listed accounts and groups.
I was just wondering because in one example the 'then' line didn't come
directly after the 'if' clause.

[overview database]
Post by Allen Brunson
Post by Michael Koenig
Is that consistent for all news servers?
and therein lies the rub. no, it is NOT consistent for all news servers.
they are all required to have, i think, the first three headers, which must
always be the same and in the same order.
So filtering on From and Newsgroups (to eliminate cross-posters) seems like
the most efficient option supported by all servers, right?
Post by Allen Brunson
internally, pnews keeps time as microseconds-since-1970, stored in 64-bit
integers.
OK, with 64-bit it's a bit different.
Post by Allen Brunson
that was pretty much the natural time format in beos, and i grew
to like it. so now i've imposed it on macosx as well (heh).
Sneaky bastard ;-)
Post by Allen Brunson
Post by Michael Koenig
posters without email addresses (Would that work with 'email contains
"<None>"?),
more like "email equals <None>", but yeah.
I guess we got it both wrong, and the keyword should be 'matches' ;-)
Post by Allen Brunson
i'll make a few such rules auto-generated from menu items.
Good idea.
I guess with your love for GUI you probably don't want to make a graphical
rule builder ;-)
Post by Allen Brunson
Post by Michael Koenig
I guess having a filter to automatically copy a message to a folder doesn't
make too much sense, a) because unlike the other filters it would have to
happen after the download, b) you cannot reply to a message that was copied
to a folder.
that wouldn't be too hard, actually. would you really want to do that?
under what circumstances?
i assumed i'd have to add a rule like that when i get pmail working, for
mailing lists and so on. i can't see how it would be of much use for news,
though.
You got me there, I use such a rule for filtering mailing lists, and while
trying to think of possible rules I fell back on that one.
--
M.I.K.e
Allen Brunson
2005-10-20 04:56:56 UTC
Permalink
Post by Michael Koenig
Post by Allen Brunson
Post by Michael Koenig
Ah, I guess this is the feature you planned to mark replies to ones own
postings, right?
yep. and also to get rid of replies to people you don't like.
Isn't that just direct replies or will it take care of the whole thread some
troll starts? (The latter case would probably be mind-boggingly difficult...)
just direct replies. if you want to kill a whole thread, i'll do that by
subject.
Post by Michael Koenig
Post by Allen Brunson
i parse stuff line-by-line
Out of interest, when is that done? Just during the application start?
yes, just when it reads the rules file from disk at startup. after that,
rules are stored in memory in some binary state. in stl containers, most
likely.
Post by Michael Koenig
[score effects on threads]
Post by Allen Brunson
which do you think? i think the whole thread ought to get hoisted to the
score of its highest member.
I'd say that threads containing messages with a higher score should be
sorted to the beginning of the headers view, and it probably should work
the same way for sub-threads.
But how to deal with threads that contain positive and negative scores?
i think i have to go optimistic rather than pessimistic, and always score a
thread according to its highest-scoring member.
Post by Michael Koenig
[overview database]
Post by Allen Brunson
Post by Michael Koenig
Is that consistent for all news servers?
and therein lies the rub. no, it is NOT consistent for all news servers.
they are all required to have, i think, the first three headers, which must
always be the same and in the same order.
So filtering on From and Newsgroups (to eliminate cross-posters) seems like
the most efficient option supported by all servers, right?
that's one way to look at it, yeah. filtering on stuff found in the overview
database is definitely faster than having to do a HEAD for every article.
Post by Michael Koenig
I guess with your love for GUI you probably don't want to make a graphical
rule builder ;-)
i'll probably have to, but dang, it's sure not going to be fun. the first
version of the program that has filtering definitely won't have it. i'll make
sure i've got everything else more or less worked out, then write the
graphical filter editor window last. i guess i'll look at how mail.app does
it, for inspiration.

Loading...