==============================================================================
                                                                        README
==============================================================================

Contents:

	1.	What is this?
	2.	Why should I use it?
	3.	How it works
	   3.1	RBL history
	   3.2	Why RBL's are good
	   3.3	The setup
	4.	config.h
	5.	rblsd.conf
	   5.1	A short overview
	   5.2	Numeric values
	   5.3	Numeric constants
	   5.4	Strings
	   5.5	IP network masks
	   5.6	A word on 'Received:' headers
	   5.7	The 'on' statement
	6.	spamc and spamc2
	7.	Running rblsd
	8.	Concluding remarks
	9.	Legal



1. What is this?

rblsd is a small, fast, SpamAssassin-compatible spam filter.  It filters
messages by running a series of RBL lookups on the path of each message.
These lookups are fast, accurate, and take up minimal CPU time.  For
more information on the merits of RBL lookups, see section 3.2.



2. Why should I use it?

- rblsd is very efficient.  rblsd was written from scratch in pure, tight C.
  As a result, there is no interpreter overhead and messages are processed
  as quickly as possible.
- rblsd can process many messages in parallel.  All lookups are performed
  asynchronously and consume negligible processing power.  Because of this
  and other factors, rblsd filters mail at a much higher rate than traditional
  heuristic-based spam filters.
- rblsd can be easily integrated into any system that already uses
  SpamAssassin.  Using the typical setup with SA can dramatically improve the
  mail throughput and performance of that system.
- RBL's are maintained in real time, whereas Rules-based spam filters become
  obsolete over time.  The accuracy provided by rblsd is not dependant on how
  recent your rblsd installation is.  RBL's are provided by third parties whose
  goal is to reduce the circulation of spam on the Internet.



3. How it works

This section is designed to give you insight on how everything fits
together.  Understanding this section is important to the installation and
troubleshooting of any mail system with rblsd.


3.1 RBL History

We'll begin by answering the question "What, exactly, is an RBL?"

Originally, the term 'RBL' referred to the "Realtime Blackhole List", which
was a free-to-use database of IP addresses known to be sources of
unsolicited commercial e-mail, or 'spam'.  One could check to see if an IP
address was listed in this database by performing a DNS lookup on the
reverse IP, followed by 'rbl.mail-abuse.org' (eg, to check 127.0.0.1, one
would look up 1.0.0.127.rbl.mail-abuse.org).

This database was designed to be used by mailservers.  Upon receiving an
incoming SMTP connection, the mailserver performs this lookup, and if
the lookup succeeds, the mailserver terminates the connection.

Over time, similar services sprung up, and 'RBL' was used to refer to any
such service.  Today, DNSBL is the proper term to describe such services,
but I still prefer the old nomenclature.

There are a variety of DNSBL services available (you can find many of
them at http://rbls.org/).

The traditional RBL usage is not feasible in every mail setup, for every
user who wants such spam protection.  More thorough checks are possible by
examining the 'Received:' headers (at the top of every email message) for
RBL blacklisted IP addresses, and this is exactly how rblsd filters mail.  
When using these headers as the basis for filtering spam, precautions must
be taken by the spam filter to minimize false positives and to prevent
header forgery from having a negative impact on the filter.  This is the
primary function of the configuration file; issues and possible concerns
are addressed there.


3.2 Why RBL's are good

The primary advantage of RBL's is that they are updated in real time.
The people who maintain RBL's spend considerable time collecting data on
known spammers.  When the IP blocks that spammers use are discovered, they
are immediately blacklisted.  Traditional rule-based filters will become
obsolete over time.

The processing time required to check an RBL is negligible.  Lookups can be
performed in parallel.  Unfortunately, in most implementations, they are
checked synchronously and have funky timeout rules.


3.3 The setup

rblsd runs as a daemon and accepts incoming connections (don't worry -- you
can assign strict access rules to it via the configuration file).  Recent
versions of SpamAssassin are set up to do the exact same thing, which avoids
the overhead of loading the perl interpreter and interpreting the whole mess
of scripts involved.  Mail is piped into a small C program called 'spamc'.
This establishes a connection to the server and sends the mail to the server
to be processed.  The server processes the message and makes a response to
the client, which pipes out the processed message.

Generally, both programs (spamc and spamd) run on the same machine, so there
are not many security concerns.  Messages can be processed remotely over SSL,
but rblsd does not yet support secure connections.  See the section on spamc2
for more information on using spamc with rblsd, and possibly with SpamAssassin.

When rblsd detects that a message is spam it can do a number of things to
identify it as such.  The behavior is decided in the configuration file.
rblsd will always add the header "X-Spam-Flag: YES" to mail that it identifies
as spam.  It can also be configured to do any combination of the following:

- Prepend a string (such as "**SPAM**") to the subject of the message.
- Prepend several lines of text, explaining why the message was marked as spam
  to the beginning of the message body.
- Add a set of headers to the message that identify (generally for user-side
  email filters) the message as spam, and provide more information, such as the
  score.
- Neutralize message text (change the content-type from text/html to text/plain)

After messages are tagged by rblsd, the server or the client's software must be
configured to filter out messages that bear the markings left by rblsd.



4. config.h

The entire configuration was previously localized to this file, but this is no
longer the case.  Only two things are set here now.

The first is an option to determine the location of the configuration file.
If you want to put rblsd.conf somewhere other than the /etc/ directory, set
the path here.  Note that absolute paths are strongly recommended, and you
cannot use tildes ('~').  The default configuration file can be overridden
with the -f flag on the command line.

The second is an option to use syslog.  By default, rblsd logs its messages
to standard error.  To change this so that it logs to syslog, change the line
from
	//#define USE_SYSLOG
to
	#define USE_SYSLOG

and set the syslog facility that you want to use (just change it from LOG_MAIL).

After modifying these options, you can continue to the compile step of
the installation (see INSTALL for full instructions).



5. rblsd.conf

This section provides specific details about the syntax of the configuration
file.  It should also clear up a few of the hairy points on how rblsd
works (and in turn, what some of those configuration parameters actually
mean).  Most of the configuration parameters are documented in the version
of rblsd.conf distributed with this file.

The first five sections are dedicated to the configuration file syntax.
Section 5.6 explains how IP addresses are collected, LevelOfTrust, OmitLast,
and CheckAtLeast.  Section 5.7 extends this discussion to the 'on' statement.

Note that although this section is meant to be technical, the configuration
file is designed to be quite easy to use.  If you haven't already looked at
it, you should check it out -- chances are you'll pick it up immediately.  
If you find it confusing, come back here, read the documentation once, and
try it again.  If you still need help, go to Section 8 and ask for help.


5.1 A short overview

First things first:  You can only enter one command per line.  Comments begin
with either a hash ('#') or a semicolon (';') and extend to the end of the
line.  Whitespace (outside of quotes) and comments are ignored.

There are two types of commands: assignments and statements.  An assignment
sets the value of a single parameter.  The format of an assignment is:

	ParameterName = value

or, equivalently:

	parametername=value	# This is a comment.
	  PARAMETERNAME =value	# Note that spacing and case do not matter
... and so on

Statements begin the line with a keyword ("server", "accept", "deny", "rbl",
"on", or "listfile"), followed by a space, followed by a comma-separated list
of arguments.  The details of each one can be found in the 'rblsd.conf' file
distributed.

The rest of this section explains the use of the types of values available
to you: (numbers, constants, and strings).


Here are some examples of commands:

	MaxClients = 15			# Numeric value
	ResolveTimeout = 20 seconds	# Numeric value
	RestartPeriod = 1 fortnight	# Numeric value
	RunAsDaemon = true		# Constant value
	RunAsDaemon = no		# Constant value
	SpamSubjectPrefix = null	# Constant value
	SpamSubjectPrefix = "**SPAM**"	# String value
	SpamSubjectPrefix = **SPAM**	# String value
	SomeParameter = 13.2		# Numeric value
	SomeParameter = 13.2.0.0	# String value

Note that most of the parameters that can be set by assignment expect a certain
type.  If you provide the wrong one, it will be treated as if you had entered
nothing.


5.2 Numeric values

The most basic form of a numeric value is a number by itself.  All numbers
used in rblsd are integers and are input in decimal (no hex or octal
available).  One luxury is the use of units.  A decimal number, followed by a
space and then by one of several constants, will be multiplied by the
predetermined value given to that constant.  For example:

	1 week

is the same as:

	7 days
or,	168 hours
or,	604800 seconds
or,	0.5 fortnights	# Know any other software where you can specify time
			# Shakespeareanly?

Keep in mind that there must be a space between the number and the unit, or
the entire thing will be interpreted as a string.  Decimal points are
allowed, but after the number is multiplied by its unit, digits to the right
of the decimal will be truncated.

Available units are as follows:
	- byte, bytes, b
	- kilobyte, kilobytes, kb
	- megabyte, megabytes, mb, meg, megs
	- gigabyte, gigabytes, gb, gig, gigs
	- second, seconds, s, sec, secs
	- minute, minutes, m, min, mins
	- hour, hours, h, hr, hrs
	- day, days, d
	- week, weeks, w, wk, wks
	- month, months, m, mon, mons
	- year, years, y, yr, yrs

Note that the names of units are case-insensitive.

If you want to specify a quantity with mixed units, you can add them together
like so:

	1 hour + 30 minutes
	1 hour + 30		# = 1 hour + 30 seconds
	1 day + 2 hours + 3 minutes + 2 seconds
	0.5 years + 2 months + 36 days

and so on..


5.3 Constants

The next type that will be covered is called a constant.  Constants consist
of a word, and they stand for a value.  There are only a few constants, and
only a few places in which they can be used.  The constants are:

	- yes, true, y, t
	- no, false, n, f
	- null, nil, none

yes/true confirms something, no/false denies something, and null/nil/none
indicates the absence of any value (it is if there had not been a value
there).  For example:

	SpamSubjectPrefix = ""

is different from:

	SpamSubjectPrefix = null

Many parameters are null by default, so SpamSubjectPrefix=null is equivalent
to omitting it from the configuration file entirely.


5.4 Strings

Strings are generally enclosed in quotes, such as:

	"Hello"
or:	'Hello'

which have the same value.  Additionally, if a word is found, and it is not
a number or a constant, it is treated as a string (although, it will have no
spaces in it).  If a string is enclosed in quotes, it can contain any
character (including newlines, spaces, and the other type of quote).  Here
are three examples of legal strings:

	Hello
	"Hi, how are ya'?"
	"I'm fine.
	How are you?"

Strings can also be concatanated together using +'s, like so:

	"I'm great, are you really " + '"fine"?'
	Would + " you like some medications?"
	'Perhaps you would like to refinance your home?'


5.5 IP network masks

While IP masks are not a type in their own right, they become very important
in a few locations in the configuration file.  They can be considered a
subtype of strings, which is how they are read.  The basic form is as
follows:

	address/mask

IP masks are always one word long (no spaces), and are separated by a slash.

The address side is an IPv4 address in dotted notation (IPv6 support is
planned for the future, but not currently implemented).  You can leave off
as many of the latter places as you want, they will be set to zero.  For
example:

	address		=>	equivalent to:
	127.0.0.1		127.0.0.1
	127.0			127.0.0.0
	127			127.0.0.0
	127.			127.0.0.0

The mask side is either a bitmask (which must contain at least one dot), or
the number of significant bits of the address side.  Examples help:

	IP Mask		=>	equivalent to:
	127/8			127.0.0.0/255.0.0.0	or	127/255.
	172.16/12		172.16.0.0/255.240.0.0	or	172.16/255.240
	192.168/16		192.168.0.0/255.255.0.0	or	192.168/255.255
	0/0			0.0.0.0/0.0.0.0		or	0./0.


5.6 A word on 'Received:' headers

Some of the most critical sections of the configuration file involve how
rblsd collects IP addresses from message headers.  This section gives a
brief overview of received headers and how rblsd chooses which ones to look
up.  If you're wondering what LevelOfTrust, OmitLast, CheckAtLeast, and
friends do, then you've come to the right place.

On its journey from sender to destination, an e-mail message may pass
through several e-mail servers.  Each time one of them processes a message,
it adds a 'Received:' header to the top of the message, identifying the IP
address from which the message was sent.  By the time a message reaches its
destination (which is when rblsd sees it), the received headers at the top
of the message are more recent, and more trustworthy.

So why are forged 'Received:' headers bad?  Spammers will not be able to get
better scores, but other people can easily forge them, force rblsd to
perform a lot of lookups, and slow down your mail.  Preventing this from
happening while effectively checking mail are dealt with by the following
parameters.  We'll start with LevelOfTrust.  This basically says "read IP
addresses from this many Received: headers".  Note that up to two IP
addresses will bre read from each "Received:" header, and duplicates will be
ignored.  Four is the default and three to four is recommended.  If you
enter 0 or null, then there will be no limit.


Consumer IP addresses are often listed in RBL's, as receiving direct SMTP
traffic from them usually means spam.  This works well in the traditional
RBL setup, but not as well when dealing with "Received:" headers which
usually contain such IP addresses.  In legitimate e-mail, these IP addresses
will appear at the bottom of the Received headers, since they are usually
the origin of the message.  OmitLast specifies the number of "Received:"
headers at the bottom of the message to be omitted from RBL lookups.  This
prevents legitimate e-mail from getting tagged by RBL's that mark consumer
blocks.

CheckAtLeast is meant to counteract OmitLast, LevelOfTrust, and IP rules
from "on" statements (see the next section) in situations that would reduce
the number of IP addresses checked.  CheckAtLeast will prevent these factors
from dropping the number of IP addresses checked below its value.  Note that
even though OmitLast and LevelOfTrust specify numbers of "Received:"
headers, CheckAtLeast specifies numbers of IP addresses.

Examples:

# of Received: headers	LevelOfTrust	OmitLast   CheckAtleast   # checked
5			3		1	   1		  3 Rcvd headers
1			3		1	   1		  1 IP address
2			3		1	   1		  1 Rcvd header
2			3		1	   2		  2 IP addresses

In the first example, the first three received headers are trusted by
LevelOfTrust, so all three are examined.  In the second example, OmitLast
wants to stop the one and only "Received:" header from being read, but
CheckAtLeast forces one to be checked (if your smtp server places its IP
address in the received header for some reason, you may use the 'on'
statement to ignore it).  The third example has OmitLast taking away one
from LevelOfTrust, and the fourth example reiterates the second.

Simple, right?


5.7 The 'on' statement

The 'on' statement allows you to further modify the way rblsd reads in
addresses from the headers.  The format is as follows:

	on ipmask, action [number] [, action [number] ...]

The first argument is the IP mask (see section 5.5).  The second argument
is the name of the action to perform when an IP address matching the mask
in the first argument is found.  Each action takes in a number as a
parameter (wait for the description of actions to see what this means).
By default, the number is 1, but it can be specified by following the
action name with a space and the number to use.  You can optionally place
additional actions on the same line.  Here are some examples of the on
statement:

	on 127/8, omit
	on 128.32/16, skip, hit -1
	on 128.32.61.103/32, skip 3

If the above three lines were present, in sequence, in an actual configuration
file, the third line would never have any effect.  Why?  Because 128.32/16 also
matches 128.32.61.103, and it would be skipped and assigned a hit of -1.  Only
one 'on' statement will be activated by any IP address, and priority will be
given to the first 'on' statement found that matches the IP address.

Here are the actions that can be taken:

skip: This will prevent the next n IP addresses from being read (including
the IP address that triggered the rule).  It will be as if the IP addresses
had never been there in the first place, so they will not count against the
LevelOfTrust.  For example, if a message has four Received headers with one
IP in each, with LevelOfTrust = 2, and the 2nd IP matches a rule that takes
the action skip 1 then the first and third IP's will be read.

omit: This will prevent lookup of the next n IP addresses.  This
has the same basic functionality as skip, except the addresses do count against
the LevelOfTrust.

hit: Simulates an rbl hit with a score of n.

check: Extends the level of trust so that it collects at least the next n
IP addresses (does not include the IP that matched the rule).

If you have a backup e-mail server, it's a good idea to use skip on that
host's most specific netmask (if it has a dynamic IP address -- if it's
static then just use address/32).

The default on statements omit the standard private networks and reserved
netmasks.



6. spamc/spamc2

SpamAssassin listens on port 783 for incoming connections by default. 
Accordingly, spamc was written to connect to port 783 on localhost by
default.  The port and remote host can be changed by using the command line
options -p and -d, respectively.  For a verbose list of command line
options, compile and run 'spamc2' with the -h option.  rblsd listens on port
784 by default.  In order to filter a message through rblsd, the command
line would be: spamc -p 784.  To filter a file 'message' through it, you
would use: spamc -p 784 < message.in > message.out.

rblsd comes with a program called 'spamc2', which is exactly like
SpamAssassin's spamc, but is capable of connecting to a series of filters. 
This is useful when you wish to run rblsd alongside SpamAssassin.  The
typical setup uses rblsd as the first filter, and if rblsd doesn't mark it
as spam, then the message will be filtered through SpamAssassin.  If rblsd
does mark the message as spam, the result will be immediately returned, and
will not continue to SpamAssassin.  This setup will minimize the number of
messages that reach SpamAssassin (and thus will reduce the load on the
machine).

Say that SpamAssassin is running on its default port 783, and rblsd is
running on port 784.  The commandline would look like this:

	spamc2 -p 784 -and- -p 783

or, since spamc uses port 783 by default:

	spamc2 -p 784 -and-

You can chain together as many sets of command line arguments as you like,
using '-and-' to separate them from each other.  If you like, you can also
use spamc2 to connect to a single server.  It will act exactly the same as
spamc if there are no '-and-' options present in the command line.

There is a version of qmail-spamc2 (that passes all of its arguments to
spamc2) available in the spamc/ directory.



7. Running rblsd

Most of the documentation has discussed setting up rblsd before it is
actually used.  Now that you've done this, find the rblsd binary, and type
the following at the command prompt:

	./rblsd

And that's it.  If you set RunAsDaemon=true, then rblsd will serve in the
background.  If you have rblsd running on a port number less than 1024, then
you will need to be root to run rblsd (rblsd will alert you if you attempt
to run it with no root access).

If you want to use a configuration file other than the one specified in
config.h (see section 4), then run rblsd with the -f flag, like so:

	./rblsd -f ~/.rblsd.conf

Which will load .rblsd.conf from the current user's home directory.  If that
file cannot be loaded for some reason, rblsd will revert to the default
configuration file specified in config.h.

In order to check your configuration, or to inspect the options that rblsd
is using, use the -c flag like so:

	./rblsd -c

This will attempt to load the default configuration file, and if successful,
will print out all options that would have been used to run the daemon. 
Note that the -c option will not start the server.  Also note that the -c
flag can be used in conjunction with the -f flag.

The last flag to mention is the -s flag.  -s stands for silent, and,
naturally this means that rblsd will make no output during the course of its
execution.

Equally important to running the daemon is stopping the daemon, and the
remainder of this section is dedicated to the details of doing so.

Sending the rblsd daemon process a SIGTERM (eg: kill -TERM pid-of-rblsd)
will cause it to finish processing any clients that are currently connected,
shut down, and exit peacefully.  Using SIGKILL (-KILL or -9) will rudely
disconnect clients, and spam may slip through!

Sending the daemon a SIGHUP will cause it to perform a complete restart. 
This involves re-launching the executable.  If rblsd is setuid root, then
this is a big, gaping security hole.  Do not allow users to run rblsd as
root.  Note that this is the only potential security hole currently known,
but it can be completely avoided, by not setuid'ing rblsd to root.  For
those of you skimming the documentation:

                        DO NOT SETUID RBLSD TO ROOT.

Besides being a potential security hole, the restart mechanism will provide a
smooth transition for the new daemon: no clients will be lost, and all pending
requests will be fulfilled before the original daemon exits.  Since the binary
is re-run when rblsd is restarted, future upgrades will go smoothly (0.0.3 will
not be able to restart to 0.0.4, but 0.0.4 will restart to 0.0.5, etc.).

If you make changes to the configuration file, you will need to restart
rblsd in this manner for them to take effect.  Note that if there is an
error in the configuration file, the restarted daemon will probably fail and
you will not know about it (unless you watch your logs constantly).  It is
highly recommended that you do a configuration check, using the -c flag (as
described above) before restarting rblsd.  Once you have confirmed that the
new configuration is clean, you can safely killall -HUP rblsd.

Restarting rblsd has one side-effect that may be considered negative: the
local cache of spammers' IP addresses is reset.  Since it's likely that you
will get more than one connection from a blacklisted IP address, rblsd's
local cache is highly effective.  rblsd performs at its peak when the cache
is in full form; resetting the cache isn't much of a setback, but it can be
considered a small loss.  In short, keep restarts to a minimum.



8. Concluding remarks

0.0.4 is overdue by over two months and I apologize for that.  School and
summer research have dominated my time for the last six months, but I think
that this release will be worth the wait.

rblsd was meant to be a small side-project, but recently I've been devoting
a large portion of my time to it.  I don't expect anything in return for its
use, and I feel that anyone who wants to use it should be able to do so without
anything required of them.

I enjoy working on rblsd, but I'm not inclined to continue unless I know
there is an audience.  I can't emphasize enough how much I enjoy receiving
feedback on rblsd, so if you have any comments or suggestions, please write
to me.

I've also set up a few e-mail lists to support the software after the release
and, hopefully, to establish a community of sorts.  Your participation in any
of these lists is also considered support, and is a good way to let me know
you're out there.  Here are the rblsd lists:

- rblsd-announce: Major announcements for rblsd only.  Posts made only by the
  author.  Low traffic.
  http://lists.sourceforge.net/mailman/listinfo/rblsd-announce
- rblsd-troubleshoot: Interactive forum intended to help new users get started
  with rblsd.  Daily traffic expected.
  http://lists.sourceforge.net/mailman/listinfo/rblsd-troubleshoot
- rblsd-discuss: General rblsd discussion forum intended for feature requests
  and miscellaneous other topics.
  http://lists.sourceforge.net/mailman/listinfo/rblsd-discuss

Finally, if you find this software useful and would like to see the project
continued, please consider making a donation.  Donations will help insure
continued development and will go directly towards my UC Berkeley out-of-
state tuition (books+fees are around $26,400).  Donations can be made from
the rblsd sourceforge page.

Links:

- rblsd home page
  http://rblsd.sourceforge.net/
- rblsd sourceforge page
  http://sourceforge.net/projects/rblsd/
- author's contact e-mail:
  jjjordan@XXX.sourceforge.net (XXX = users)

Thanks for your interest in rblsd.



9. Legal

The Realtime Blocking List Spam Daemon is
Copyright (C) 2003-2004 John J. Jordan

rblsd is released under the GNU General Public License (GPL).  A copy of this
license is included with this distribution in the 'COPYING' file.

SpamAssassin is
Copyright (C) 2000-2004 Justin Mason

spamc is
Copyright (C) 2001 Craig Hughes
Portions Copyright (C) 2002 Brad Jorsch

The spamc source from the SpamAssassin package is included in this
distribution under the terms of the GPL.  A copy of the SpamAssassin license
is included along with the software in the 'spamc/License' file.
  
==============================================================================
RBL Spam Daemon                                  http://rblsd.sourceforge.net/
Copyright 2003-2004, John Jordan                 jjjordan@XXX.sourceforge.net
==============================================================================
                                                 (XXX = users)