Mfilter Rule Syntax

Design Goals:

  • Fast processing of incoming messages.
  • A simple, clear, syntax so that rules can easily be understood and modified
  • Enough power/flexibility
  • Incorporate regular expression matching to give real power.
  • Not to create a new elaborate language if possible.

How to configure rules:

  • Simply create a file called mfilter.rul in the SurgeMail home area (as defined in SurgeMails config)
  • Use the test command via the SurgeMail admin interface to check that your filter works as expected
  • Please note that "local.rul" should be used for adding scoring for ASPAM not mfilter.rul.

Tracing problems

If you have problems getting mfilter to run you can use these two settings in surgemail.ini, they will provide logging to show exactly what is going on.

g_mfilter_trace "true"
g_mfilter_noisey "true"

then examine 'mail.log' after sending in a test message.

Syntax Of mfilter.rul File

There are 6 valid statements in a rule file:

Assignment
Action
if (Conditional_Expression) [and (Conditional_Expression)...] Action
else
end if
call built_in_function()

Assignment

$variable_name = "quoted string" [+ "quoted string" [+ $variable ...]]
$variable_name = function()

Action

accept "reason" | bounce "reason" | drop "reason" | forward "user@domain" | then | setflag("flagname") | clearflag("flagname")

Conditional Expression (if, else, end if)

Any pre-defined function, e.g. isbinary()
isin("subject","free pictures")
Numeric comparisons, e.g. lines()>100
Simple NOT operator, e.g. if (!isbinary()) reject "Only binaries allowed here mate!"
Calculations are NOT permitted, e.g. lines()+10 would fail

Recipients block for processing individual recipients

A single mail message may have many recipients, and in many cases the actions of your spam filter should vary depending on the recipients (you might, for example, want all messages to your account to get through even if the same message would be blocked if sent to any other user).

The recipient block (recipients...end recipients) is processed once for each recipient of the message.

Inside the 'recipients' block there is a dummy variable defined 'recipient' which is the specific recipient in question.

All the action's (except, bounce, drop) refer to the recipient only, not to the entire message, so when one of those actions that normally terminates message processing is encountered (accept, bounce, drop, etc) instead the action is applied only to that recipient and the recipient block is restarted with the next recipient defined.

(Example of mfilter rule to do processing 'per recipient')

recipients
       if (isin("recipient","manager@this.domain")) accept "Always accept for me        so spammers can talk to me"
       if (isin("recipient","sales@your.domain")) then
       	if (isin("subject","order")) then
       		# Make a Duplicate of sale order
       		call forward_cc("sales_copy@your.domain")
       	end if
       end if
end recipients 

Miscellaneous

Line Continuation

Lines can be continued by ending the line in a '\' character

Quoting Strings

All strings and header names should be within double quotes, sometimes you may get away without doing this, but we don't guarantee this will work in future. e.g. use: exists("Supersedes") not exists(Supersedes); quotes can be escaped in the usual way, e.g. "This \"Word\" has quotes around it"

Assignments

Assignments are processed at compile time, variables DO NOT exist at run time. Do not think of this as a programming language, but rather as a list of rules that are processed with each incoming message. Real run-time variables only exist in the form of the ifflag("xxx") function and the setflag("xxx") action.

For example, the following is NOT VALID, as the assignment is processed before the rules are run. The rejection would always read "big message"

$fred = "small message"
if (lines()>100) then
   $fred = "big message" (this will not work as expected)
end if
reject $fred

Odd stuff

The statement 'do_bounce_fast' should appear at the end of your mfilter.rul file, and it is used by the rexp_fast() rules. rexp_fast acts just like rexp() but it is much faster because it searches the message once for all of the rules in question, each rule must start with two simple non 'regular expression' characters. This enables mfilter to generate a hash table of all the regular expressions it's going to search for and then it can efficiently apply only the ones that appear to match as it runs through the message. Also rexp_fast includes the score to apply if the message matches the rule.

Actions & Commands

Actions

accept"reason" (Terminates processing)
bounce "reason" (Terminates processing)
reject "reason" (same as bounce)
forward "reason" (Terminates processing) (redirect is a synonym for this action)
print "reason" (Prints debugging line to log file mail.log)
setflag("flagname") "reason"
clearflag("flagname") "reason"

Functions that have actions but must be proceeded by the 'call' action as they are really functions and must be on a line of their own (not on the end of an if statement)

call forward_cc("new@email.address)
call replace("header_name","wildcard_match_pattern","replacement_pattern")
call report("manger@email.address","subject of message")

Builtin Functions

call add_header("Header: header information")
allmod()
exists("header")
head_len("header")
isbase64()
isbinary() )
isencodedhtml()
(isencodedtext()
isencodedurl()
isflag("flag-name")
ishtml()
isimage()
isin("header","string-not-case-sensitive")
lines()>3)
match("header","wildcard")
matchall("header","wildcardlist")
matchone("header","wildcardlist")
rexp("header","regular-expression")
size()
call spamdetect(n,"reason")
call spawn("d:/surge/filter.exe $FILE$")

 

New Functions

time_hour() - returns the 'hours' 0-23, useful for rules that apply at different times of day
time_min() - returns the minutes
isimage() - True if message contains an image
isjpg() - True if message contains a jpeg image
ispdf() - True if message contains a pdf file
image_size() - Approx size of image in bytes
nimage() - Approx number of images found in message
islocal() - Message is to a local user not an outgoing message
isloggedin() - Message is from a logged in local user
is_dayofweek("monday,tuesday") - True on those days of the week.

Notes

The "header" parameter can be any normal header, such as "Subject", "From" or "To". However, the are some additional pseudo-headers than can also be used as parameters in any function which takes a "header" parameter:

"head": refers to the entire message header.
"body": refers only to the message body (after any necessary decoding)
"urls": refers to any urls found in the body

Function Descriptions

call add_header("Header: header information")

Used to add a header to a message. eg

if (isin("x-spamdetect","****") then
call add_header("X-MailScanner-SpamCheck: LEVEL=****")
end if 

NOTE: This will cause bounces if used in local.rul or simple.rul, it can only be used in mftiler.rul

Requires Version 3.8 or later.

allmod("header")

This returns true if all the newsgroups in the specified header are moderated.

exists("header")

This is true if the header exists in the message and is non zero in length, eg: if (exists("supersedes")) then reject "We don't like supersedes headers"

head_len("header")

Returns the length of the named header, e.g.
if (head_len("date")>60) bounce "Naughty message"

isbase64()

This is true if the message appears to contain base64 binary encoded data.

isbinary()

This is true if the message has binary data either base64 encoding or uuencoded data.

isencodedhtml()

This is true if the message appears to contain MIME or uuencoded HTML instead of plain text data.

isencodedtext()

This is true if the message appears to contain MIME or uuencoded text data.  This will always be true if isencodedhtml() returns true.

isencodedurl()

This is true if the message appears to contain an uuencoded URL reference.

isflag("flag-name")

Used to check whether a flag variable has been defined as true. This can be done with the setflag("flag-name") action, e.g.
if (size()>100000) setflag("bigitem")
if (isimage()) setflag("bigitem")
if (isflag("bigitem")) reject "It was a big item or had a picture in it"

ishtml()

This is true if the message appears to contain HTML instead of plain text data.

isimage()

This is true if the message appears to contain a picture (either MIME or uuencoded)

isin("header","string-not-case-sensitive")

This is a simple 'content' searching function if the named header contains the string (a non case sensitive match is used) eg:
if (isin("Subject","Free"))
reject "Probably a spammer selling something"
This would reject a message containing a subject of "Get your Free pictures here" it would also reject a message containing a subject of "Is there any real freedom in the world?" so it's probably not a good rule :-)

lines()

This returns the number of lines in the message.

match("header","wildcard")

This function applies a simple wild card matching algorithm as is typically used to match file names, eg:
match("From","*@netwin.co.nz*")
would match against a message from that domain.

matchall("header","wildcardlist")

Used for matching a single wild card against a header which contains a list of values, like Newsgroups:, Path: etc..., The match is TRUE only if all entries in the list match, eg:

if (matchall("Newsgroups","news.filters.*")) accept "It is only in the filters list so we will accept it"

matchone("header","wildcardlist")

Identical to the above function but returns 'TRUE' if any match occurs.

rexp("header","regular-expression") This function searches the named header for a regular expression, the matching is not case sensitive, use rexp_case() for a case sensitive version.

rexp_fast(spamdetect_score,"regular expression ","comment for spam header")

This is just like rexp, but it does the search more efficiently, the first 2 characters of regular expression must be plain ascii (not a regular expression) if it's found in the body of the message then the score is added to the spam_detect header

size()

Returns the size in bytes of the current message can be used with > and < operators.

call spamdetect(n,"reason")

This function can be used to mark a message as possible spam, the 'n' is a (floating-point) number and each time this function is called for a message the total is increased, then finally a header is added to the message;

X-SpamDetect: <stars>: <score> <reason1> [reason2 [reason3 ... ]]

<stars> is a string of n stars, where n is the total score (capped at 20)
<score> is the total spam score

The idea is that users can then set their mail clients to filter messages based on this pseudo header. For instance, filtering any message with "******" in its X-SpamDetect header will throw out any message with a score of 6 or more.

Please note that "local.rul" should be used for adding scoring for ASPAM not mfilter.rul.

call spawn("program.exe $FILE$")

This function runs a program on each message the $FILE$ macro is replaced by a temporary file name containing the actual mail message. The return value of the program (return n; in main() function) is returned by this 'spawn' function, so it can be used to filter the message or allow it to continue. eg:
if (spawn("d:/path/xfilter.exe $FILE$")) reject "That was spam according to xfilter" 

NOTE: The mfilter is only passed the first 14k of each message, and so the spawned program also only gets the first 14k not the entire message.

Actions

accept "reason"

Accepts the current article reporting the "reason" specified in the log files.

clearflag("flag-name")

Used to set the specified flag variable to the false state.

forward "remote@address.com"

Forwards the message to the specified address and terminates processing.

call forward_cc("new@email.address")

Sends the current message to this new Email address in addition to any existing destination users.

reject "reason" (or bounce "reason")

Rejects the current article reporting the "reason" specified in the log files and to the user

call replace("header_name","wildcard_match_pattern","replacement_pattern")

If the named header matches the 'wildcard_match_pattern' then the replacement pattern is applied, e.g.

replace("from","*@*.domain.name","BOB_%1@%2.other.name")

Subject: "joe@this.domain.name"
Would be translated to:
Subject: "BOB_joe@this.other.name"

call report("manger@email.address","subject of message")

Sends an Email, including the top part of the offending message, to the specified person, with the specified subject. This is intended when you want to be alerted to something but don't want to simply forward the message itself which may be 'confusing' as it would look like the message had been sent to the manager directly.

setflag("flag-name")

Used to set the specified flag variable to the true state.

Regular Expression Syntax - In Brief

Please note you need to escape spaces in this implementation.
eg:

sweepstake lottery / international program
sweepstake lottery/ international program
sweepstake lottery /international program

So what you want is this. Just put slashes in front of the spaces.

sweepstake lottery( / | /|/ )international program

if (rexp("subject","sweepstake lottery(\ /\ |\ /|/\ )international program")) bounce "a"


\s = white space
\S = not white space
\d = digit
\D = not digit
\b = word boundary
\B = not word boundary
\x00 = Hex character

. (period) represents any one character.
[] (brackets) contain a set of characters from which a match can be made. It corresponds to one character in the search string.
\ (backslash) is an escape character which means that the next character will not have a special meaning.
* (asterisk) is a multiplier. It will match zero or more of the previous character. (Note: it is not a wildcard character as in file names.)
? (question mark) is a multiplier. It will match zero or one of the previous character. (Note: it is not a wildcard character as in file names.)
+ (plus) is a multiplier. It will match one or more of the previous character.
{} (squiggly brackets) contain a number which specifies an exact number of the previous character, or range {2,3}
[^] (brackets containing caret and other characters) means any characters except the character(s) after the caret symbol
in the brackets.
^ (caret) is the start of the line.
$ (dollar) is the end of the line.
(Note the following \< \> (begin and end word) are not implemented, use \b instead)

[:alpha:] represents any alphabetic letter.
[:digit:] represents any single-digit number.
[:blank:] represents a space or tab.

Lookahead operator
Free(?!dom|bsd) matches freesex but not freedom or FreeBSD

OR operator
| (pipe) is OR. It requires that the joined expressions have parentheses around them.

Examples:

e.a matches eta, eda, e1a, but not Eta
[eE].a matches eta and Eta
E.*a matches Eudora, Etcetera, Ea
ho+p matches hop, hoop, hoooop, but not hp
etc\. matches etc. but not etc

Example rule file:

$sex = "fuck|xxx|sex"
$free = "free(?!dom|bsd|nix|serve)"
$pics = "pi[cx]"
$free_pictures = $free + $pics
$bad_guys = + "|freepictures|jus.?.?\.doi.?.?\.to|great\.site|webbinaries" \
          + "|yad.?.?.?\.ion.?.?\.org|freehidden|joy.?.?\.to.?.?\.al|from.?behind" \
          + "|love(youhon|ergirl|chatting|stofuck)|forever\.yours|\@ju.?.?\.sex|town.\girl|beachbums" \i
# Do some processing which is specific to individual recipients
recipients
        if (isin("recipient","manager@this.domain")) accept "Always accept for me so spammers can talk to me"
        if (isin("recipient","sales@your.domain")) then
                if (isin("subject","order")) then
                        # Make a Duplicate of sale order
                        call forward_cc("sales_copy@your.domain")
                end if
        end if
end recipients
# Check for some known spammers and naughty subjects
if (rexp(subject,$free_pictures)) bounce "No emails about free pictures"
if (rexp(from,$bad_guys)) bounce "No emails from black listed people thanks"

# Strip local node names from from addresses:
call replace("From","*@*.parts.co.nz","%1@parts.co.nz")
accept "Great, we liked the message"

Example 2:

We want to block any message that has been found in SURBL database. We will use the exists function to check if that header exists.

if (exists("X-Surbl")) reject "Your SPAM is not wanted here."

You can easily change that to drop the message silently if you prefer

if (exists("X-Surbl")) drop "SURBL SPAM is not wanted here." 

(The reason will be logged so still important to put there)

Example 3:

We want to block any message with no subject header. SurgeMail adds a subject header if it is missing so we have to match on the text that SurgeMail adds.

if (isin("Subject","(No subject header)")) bounce "No Subject header"

Example 4:

We want to block any message with an empty subject header.

if (head_len("Subject")<1) bounce "Emtpy Subject header"

Example 5:

I have a user fred in one of my local domains localdomain.com I only want him to be able to send to other users at localdomain.com and not to any other domains.

recipients
if (isin("from","fred@localdomain.com")) then
     if (!isin("recipient", "localdomain.com")) bounce "Sorry you can only send to localdomain.com"
end if