(Use DNews 5.2b2 or later for these rules)
New builtin filtering system
This filter system should give sufficient flexibility so that most common requirements in
terms of blocking individual spammers can be achieved without needing to resort to perl
scripts (like cleanfeed) although this is NOT intended as a replacement for such scripts
if you really want that level of flexibility this version of DNews does support using such
scripts.
Design Goals:
Fast processing of incoming messages.
A simple clear syntax so rules can easily be understood and modified
Enough power/flexibility to implement 99% of a filter like cleanfeed.
Incorporate regular expression matching.
Not to create a new elaborate language if possible.
There are 5 valid statements in a rule file:
Assignment
Action
if (Conditional_Expression) [and (Conditional_Expression)...] Action
else
end if
Assignment
$variable_name = "quoted string" [+ "quoted string"
[+ $variable ...]]
Action
accept "reason" | reject "reason" | then
| setflag("flagname") | clearflag("flagname")
Conditional_Expression
Any pre defined function, e.g. isbinary(),
isin("subject","free pictures")
Numeric comparisons, e.g. lines()>100
Simple NOT operator, e.g. if (!isbinary()) reject "Only binaries
alowed here mate!"
Calculations are NOT permitted, e.g. lines()+10 would fail
Line continuation
Lines can be continue by ending the line in a '\' character
Quoting strings
All strings and header names should be within double quotes, sometimes
you may get away without doing this, but we don't gaurantee this will work in future.
e.g. use: exists("Supersedes") not exists(Supersedes), quotes can
be escaped in the usual way, e.g. "This \"Word\" has quotes around it"
Assignments are processed at compile time, variables DO NOT exist at run time.
Don't think of this as a programming language, but rather as a list of rules that are
processed with each incoming message. Real run-time variables only exist in the form of
the ifflag("xxx") function and the setflag("xxx") action.
For example the following is NOT VALID, as the assignment is processed before the rules
are run. The rejection would always read "big message"
$fred = "small message"
if (lines()>100) then
$fred = "big message" (this will not
work as expected)
end if
reject $fred
Builtin Functions - details below
isin("header","string-not-case-sensitive")
isinc("header","string-not-case-sensitive")
rexp("header","regular-expression")
rexp_case("header","regular-expression")
match("header","wildcard")
size()
matchall("header","wildcardlist")
matchone("header","wildcardlist")
ifflag("flag-name")
exists("header")
isbinary()
ishtml()
isencodedhtml()
isencodedtext()
isencodedurl()
isbase64()
isimage()
islocalpost()
lines()
allmod()
strcmp()
New tellnews commands:
tellnews rules_reload
tellnews rules_trace rulefile.rul testmessages.txt
In newsfeeds.conf you specify a rule file like this:
site me
groups *
rules d:\dnews\me.rul
...
Sample Rules File
A sample rules file is available here. We make no claims with regard to its correctness/completeness but it should demonstrate the sort of thing that is possible with the new rules system. If you choose to base your rules file on it then we recommend you customise it to make sure it meets your specific needs.
You may also be interested in NoCem support or External Filters
This function is 'true' if the message contains any attachments which match the wild card list, e.g.
if (attach("*.exe,*.com,*.vbs,*.bat,*.jav*")) reject "Message contains executable programs"
This was added in DNews 5.4j7
This is a simple 'content' searching function, if the named header contains the string (a non case sensitive match is used) e.g.
if (isin("Subject","Free")) reject "Probably a spammer selling something"
This would reject a message containing a subject of "Get your Free pictures here", it would also reject a message containing a subject of "Is there any real freedom in the world?" so it's probably not a good rule :-)
isinc() found in DNews 5.5d1 and later has similar functionality to isin() with the exception that it first 'cleans' the string to remove any non-alphanumeric characters or non-spaces before the comparison. In the above example, this would match both "Free" and "F~R~E~E".
These functions search the specified header for a regular expression. Use rexp_case() for case-sensitive regular expressions.
This function applies a simple wild card matching algorithym as is typically used to match file names, e.g. match("From","*@netwin.co.nz*") would match against a message from that domain.
Returns the size in bytes of the current message, can be used with > and < operators.
Used for matching a single wild card, against a header which contains a list of values, like Newsgroups:, Path:, etc..., The match is TRUE only if all entries in the list match, e.g. if (matchall("Newsgroups","news.filters.*")) accept "It is only in the filters list so we will accept it"
Identical to the above function, but returns 'TRUE' if any match occurs.
Used to check if a flag variable has been defined as true, this can be done with the setflag("flag-name") action, e.g.
if (size()>100000) setflag("bigitem") if (isimage()) setflag("bigitem") if (isflag("bigitem")>100000) reject "It was a big item or had a picture in it"
This is true if the header exists in the message and is non zero in length, e.g. if (exists("supersedes")) then reject "We don't like supersedes headers"
This is true if the message has binary data, either base64 encoding or uuencoded data.
This is true if the message appears to contain HTML instead of plain text data.
This is true if the message appears to contain mime or uuencoded HTML instead of plain text data.
This is true if the message appears to contain mime or uuencoded text data. This will always be true if isencodedhtml() returns true.
This is true if the message appears to contain a uuencoded URL reference.
This is true if the message appears to contain base64 binary encoded data.
This is true if the message appears to contain a picture (either mime or uuencoded).
This is true if the message is a local post.
This returns the number of lines in the message.
This returns true if all the newsgroups in the specified header are moderated.
Compares the value of the specified header with a given string.
Actions
accept "reason"
Accepts the current article reporting the "reason" specified in the log files.
reject "reason"
Rejects the current article reporting the "reason" specified in the log files.
setflag("flag-name")
Used to set the specified flag variable to the true state.
clearflag("flag-name")
Used to set the specified flag variable to the false state.
Special Flags
There are three special internal flags that can be used from within a rules file to bypass other spam filter routines internal to dnews.
Special Flag | Default Value | Description |
skip_filter | false | Don't check body for words in filter.dat file |
skip_from | false | Don't apply duplicate FROM filter spam rules |
skip_dup | false | Don't apply dupliate body filter |
In the rules file you can use actions like setflag("skip_from") to stop DNews from checking the from header, for example if you don't want it to apply the from filter to messages from anyone at 'netwin.co.nz' then:
if (isin("From","@netwin.co.nz")) setflag("skip_from")
Specials:
\s = white space
\S = not white space
\d = digit
\D = not digit
\b = word boundary
\B = not word boundary
\x00 = Hex character
. (period) represents any one character.
[] (brackets) contain a set of characters from which a match can be made. It corresponds
to one character in the search string.
\ (backslash) is an escape character which means that the next character will not have a
special meaning.
* (asterisk) is a multiplier. It will match zero or more ofthe previous character. (Note
that it's not a wildcard character as in file names.)
? (question mark) is a multiplier. It will match zero or one of the previous character.
(Note that it's not a wildcard character as in file names.)
+ (plus) is a multiplier. It will match one or more of the previous character.
{} (squiggly brackets) contain a number which specifies an exact number of the previous
character. Or range {2,3}
[^] (brackets containing caret and other characters) means any characters except the
character(s) after the caret symbol
in the brackets.
^ (caret) is the start of the line.
$ (dollar) is the end of the line.
(note: these two are not implemented, use \b instead "\<"
"\>" (beginning and end of word)
[:alpha:] represents any alphabetic letter.
[:digit:] represents any single-digit number.
[:blank:] represents a space or tab.
| (pipe) is OR. It requires that the joined expressions have parentheses around them.
Examples:
e.a matches eta, eda, e1a, but not Eta
[eE].a matches eta and Eta
E.*a matches Eudora, Etcetera, Ea
ho+p matches hop, hoop, hoooop, but not hp
etc\. matches etc. but not etc
Lookahead operator
Free(?!dom|bsd) matches freesex but not freedom or freebsd