#!perl -w # Pop3proxy - a SpamAssassin enabled POP3 proxy designed for Win32 # users. use strict; # Set this to zero to turn off all debugging statements. Set to 1 for # basic debugging, which is pretty verbose, set it to 2 to add a dump # of key data structs on connect, set it to 3 to add a dump of every # read/write we do. (Oy) use constant DEBUGGING => 1; # Seems that SpamAssassin wants to remove the dependency on # Time::HiRes. I only need it for measuring performance, so I'll only # include it if it's available. Have to eval the "use constant" # statements to avoid redefinition warnings. # # I use constants for debugging switches because I believe they get # optimized out by the compiler if they're false. I could be wrong. BEGIN { eval "use Time::HiRes"; if ($@) { eval "use constant TIMERS => 0"; } else { eval "use constant TIMERS => 1"; } } # A set of enumerated reasons why we're snarfing a multiline response # for a socket. use constant RETR => 1; use constant TOP => 2; use constant CAPA => 3; use IO::Socket; use IO::Select; use FindBin; use Mail::SpamAssassin; use Getopt::Long; ######################### # A BUNCH OF EVIL GLOBALS ######################### # Set this to be a file that will contain the debug log. Set to an # empty string to debug to STDOUT. --logfile command line arg sets. my $logfile = 'pop3proxy.log'; # Hostmap - keys are ports to listen to on localhost, values are # hostname:port to proxy connections on the key port to. Set up by # the command line --host arg or by the hostmap.txt config file. # # A simplest case - you get your mail from a server server named # pophost.isp.com, on the standard POP3 port (110): # # my %hostmap = ( 110 => 'pophost.isp.com:110' ); # # ...And you change your mail client to get mail from localhost. # # Fancier case - you pop mail off of two hosts, pophost.isp.com and # mail.yetanother.org: # # my %hostmap = ( # 817 => 'pophost.isp.com:110', # 818 => 'mail.yetanother.org:110', # ) # # In that case, the proxy listens to TWO sockets on localhost - 817, # and 818, proxying off to two separate remote hosts as indicated. # # Note that for this to work, you need to be able to tell your mail # client to connect to two different ports on localhost to find the # proxy - namely, 817 for pophost.isp.com and 818 for # mail.yetanother.org. Some mail clients, like Netscape 4.5's, won't # let you specify the port to use for a pop3 connection. Oops. # Others, like Mozilla 1.0, will let you set the port, but won't allow # two servers to be on the same host (localhost in this case). You # can work around THAT by creating another alias for localhost in your # C:\Windows\Hosts file: # # 127.0.0.1 localhost MyHostName # # ...and then configuring one account for localhost:817 and the other # for MyHostName:818 my %hostmap = (); # Respect_byte_count - If TRUE, then we do not alter the byte count of # the message when marking it as spam - instead, we overwrite portions # of the headers, such as changing the first five characters of the # Subject: line to "*SPAM*" (a shortened form of SpamAssassin's famous # subject prefix). Set by the command line --nopad arguement. # # This, because under certain conditions the POP3 protocol indicates # message and mailbox sizes, and the safe thing is not to enlarge # those sizes while marking a message as spam. # # If there is no Subject: line in the mail headers (there doesn't have # to be, after all) or if it's less than 5 bytes, then we use the # first Received: line we find instead. # # Setting this value to FALSE (0) seems to work with most mail # clients, and it causes us to proxy back the mail as it's been # modified by SpamAssassin, which gives you a wonderful great lot of # info about WHY it's labeled as spam, and also labels it clearly and # beyond doubt, and defangs the MIME contents, etc, etc - but it # *could* break the mail client. Harumph. my $respect_byte_count = 0; # If true, we let the POP3 "TOP" command go thru to the server, # otherwise, we don't proxy the TOP command and return an error back # to the client. Set by the command line --allowtop arguement. # # TOP is specified as an optional command, it shows you the headers of # a mail message and a configurable number of lines of the body. The # idea is that you can sort of "screen" what you choose to download or # not before you do. All well and good, but our spam filtering can # cause this to break when we scan the actual message during retrieval # and potentially modify or add to the headers, such as changing the # subject line to start with *****SPAM***** or something. # # This breaks the protocol a little and could have unusual or possibly # even destructive consequences. Since it's an optional part of the # protocol, most mail clients should be coded to work without it, # hence, by default, we avoid the problem by turning it off. my $allow_top = 0; # Here's the problem with using SpamAssassin in this way - given a # large enough message, he will take a LONG time to scan it, where # long is like sixteen minutes on a P-II 350 running Linux for a 3MB # text message. Maybe that was a degenerate case of some sort, but # there it is. If SpamAssassin takes long enough to scan a message, # the mail client (who's not getting any data in response to his RETR # command during all this) will eventually time out. Sockets close, # data is lost, etc, etc. Very bad, very difficult to fix and get on # with your life if you have a large mail message on the server that # keeps causing this. # # Hence, this config parameter. If a message exceeds this size while # we're snarfing it, we'll abandon the snarf, start passing the data # back to the client, and no scan of the message by SpamAssassin will # be performed. # # Setting this to zero turns this behavior off - all messages will be # scanned, regardless of size. # # I chose a 250K default for this value after analyzing a few months # worth of spam - 1500 messages. The average size was about 9K, the # largest was 110K. I figured double the largest would allow most of # the spam we see today to get scanned, without trouble. # # This has the added side effect of keeping our memory usage down - # that scan of a 3MB message took 86MB worth of memory. That's not # such a hot idea for a daemon. my $max_scan_size = 250000; # If we're invoked with a logfile for output using ActiveState's # wperl.exe, we can effectively hum along in the background. Nice. I # don't want to send the user to Task Manager to shut us down, and # under Win98 at least you get the nasty "application not responding" # dialog box because I'm busy waiting for to select a socket, so # instead we have this - a port that we listen on for the purposes of # exiting. Any connection to it from localhost, and I'll get out of # town. # # The default is 9625 (which is otherwise unused). Set this to zero # to disable this behavior. my $exit_port = 9625; # Note CRLF == \015\012 my $no_top = "-ERR Not supported by proxy\015\012"; # %peer - mapping of client socket => server socket, and vice versa. # # Keys are stringified references to IO::Socket objects, values are # actual references to the same. It's a little ugly to contemplate, # but it works just dandy. # # The Peer mapping is removed when the peer is closed. Thus, if # you're reading data on $socket: # # The destination of this data is $reading_buf{$peer{$socket}}, and, # If there is no destination any more, there's no point in reading the # data, so shut down, and, # If you read some data, add the $peer{$socket} to the Writeable set, # because now you want to write something to him. # # And, if you're writing to $socket, # # The data is in $writing_buf{$socket}, and, # Once all the data is written, you should close $socket if # $peer{$socket} is missing. my %peer; # %is_client - stringified IO::Socket references for keys, true or # false values based on whether that socket is connected to a client # or the server. my %is_client; ################# # Buffers galore. ################# # The general flow of data is: # # data from $socket -> $peer = $peer{$socket} -> read data into # $reading_buf{$peer} -> hook protocol, snarfing to $message{$peer} if # needed -> move data into $writing_buf{$peer} -> write data to $peer # %reading_buf - keys are sockets, value is buffer of data read from that # socket's peer, waiting to be proxy'd to the socket. my %reading_buf; # %writing_buf - keys are still sockets, value is data from the # %reading_buf buffer which is now ready for writing to the socket. my %writing_buf; # Hash of socket => buffer, buffer is filled up with the message being # snarfed. Then the buffer is scanned and modified, then copied into # $writing_buf{$socket} and flushed back to the client. my %message; # Hash of socket => enums, set to the reason we're snarfing a # multiline response into %message_for array for this socket. Set to # zero (false) if we're NOT snarfing this data. my %snarfing; # Hash of Client socket => queue of commands the client has requested. my %client_commands; # Hash of listening sockets - keys are stringified socket object refs, # values are the host:port we should proxy connections on that socket # to. my %proxyto; # Flags - toggled on and off to indicate if we're reading a multiline # response or not. Keys are sockets. my %reading_multiline_response; # Hash - keys are sockets, values are HiRes timer floats. Used to # time downloads. my %snarf_start; ######## # "Main" ######## # Get in your directory chdir "$FindBin::RealBin"; read_config() if -s "./hostmap.txt"; my $cl_proxyto; my $helpflag = 0; usage() unless GetOptions("logfile:s" => \$logfile, "nopad" => \$respect_byte_count, "allowtop" => \$allow_top, "maxscan=i" => \$max_scan_size, "exitport=i" => \$exit_port, "host=s" => \$cl_proxyto, "help" => \$helpflag, ); usage() if $helpflag; if ($cl_proxyto) { warn "WARNING: $cl_proxyto overrides hostmap.txt entry: $hostmap{110}\n" if exists $hostmap{110}; # We're nice to command line users. If you tag a :port onto your # hostname, that's cool, otherwise, you get :110 for free. $cl_proxyto .= ':110' unless $cl_proxyto =~ /:\d+$/; $hostmap{110} = $cl_proxyto; } die "No proxy host! Use --host or hostmap.txt\n" unless keys %hostmap; # Prevent concurrent proxies - kill any previous instance if (IO::Socket::INET->new(PeerAddr => 'localhost', PeerPort => $exit_port, Proto => "tcp", Type => SOCK_STREAM)) { warn "WARNING: Existing proxy killed\n"; } if ($logfile) { # Redirect stdout and stderr to logfile if specified. # Windows strangeness - you can't reopen STDOUT/STDERR successfully # under wperl.exe unless you've already closed it. Go figure. close STDOUT; close STDERR; open(STDOUT, "> $logfile") or die "Can't redirect stdout: $!"; open(STDERR, ">&STDOUT") or die "Can't dup stdout: $!"; } $| = 1; # The SpamAssassin scanner. # # We tell it to use ./user_prefs, and not to try to copy in a default # if it's not there (because he goes looking for a template file in # all the usual places to copy over - all the usual *UNIX* places). # We tell it to run only local tests, because otherwise, you'll get # complaints and timeouts when it can't find a dcc app to run or it # can't find a DNS server or the network is the wrong color or # whatever. In theory, you could get away with DNS RBL checks, but # I've had a hard time making Net::DNS work on my Win32 setup. # Hackers welcome, best of luck. See notes at end of file. my $spamtest = Mail::SpamAssassin->new({ userprefs_filename => './user_prefs', dont_copy_prefs => 1, local_tests_only => 1, }); my $readable = IO::Select->new; my $writeable = IO::Select->new; # Create sockets to listen on. foreach my $port (keys %hostmap) { my $listener = IO::Socket::INET->new(LocalPort => $port, Listen => 5, Reuse => 1); die "Can't create socket for listening: $!" unless $listener; print "Listening for connections on port $port (proxy $hostmap{$port})\n" if DEBUGGING; $readable->add($listener); $proxyto{$listener} = $hostmap{$port}; } # Create the "exit socket" - any connection on this socket from # localhost will cause us to exit. my $exit_socket; if ($exit_port) { $exit_socket = IO::Socket::INET->new(LocalPort => $exit_port, Listen => 1, Reuse => 1); $readable->add($exit_socket); } while(1) { my ($toread, $towrite) = IO::Select->select($readable, $writeable); foreach my $socket (@$toread) { if ($socket == $exit_socket) { all_done($socket); next; # Just in case it wasn't from localhost } # Is it a new connection? if (exists $proxyto{$socket}) { dump_data_structs() if (DEBUGGING > 1); # Open connection to remote, add to readable set, map it # to this new client connection. my $remote = IO::Socket::INET->new(PeerAddr=>$proxyto{$socket}); $readable->add($remote) if $remote; if (not $remote) { # Break the incoming new client off, create a new # listener to try again. print "Connect to remote: $proxyto{$socket} FAILED: $@\n" if DEBUGGING; my $port = $socket->sockport; $socket->close; $readable->remove($socket); my $listener = IO::Socket::INET->new(LocalPort => $port, Listen => 5, Reuse => 1); die "Can't create socket for listening: $!" unless $listener; $readable->add($listener); $proxyto{$listener} = $hostmap{$port}; next; } # Accept the connection and add it to our readable list. my $new_sock = $socket->accept; $readable->add($new_sock) if $new_sock; die "Can't create new socket for incoming connection: $!" unless $new_sock; # Create proxy/peer mapping, set client/server indicators, # create buffers, etc. $peer{$new_sock} = $remote; $peer{$remote} = $new_sock; $is_client{$new_sock} = 1; $is_client{$remote} = 0; $message{$new_sock} = ''; $snarfing{$new_sock} = 0; # The first thing we'll see is a response to no command at # all - "+OK Welcome to foobar.com" - so we seed the # command queue with a dummy command to eleminate warnings # later on. $client_commands{$new_sock} = [('none')]; foreach ($new_sock, $remote) { $reading_buf{$_} = ''; $writing_buf{$_} = ''; } if (DEBUGGING) { print "\nNew connection:\n"; print "From: ", $new_sock->peerhost, ':', $new_sock->peerport,"\n"; print "To: ", $remote->peerhost, ':', $remote->peerport, "\n"; } } else { # It's an established connection my $key; if (DEBUGGING) { if ($socket->connected) { $key = $socket->peerhost . ':' . $socket->peerport; } else { $key = "$socket"; } } my $proxy; # Which socket we're going to proxy this data to if (exists $peer{$socket}) { $proxy = $peer{$socket}; } else { # No peer. print "\n$key - peer gone on read" if DEBUGGING; # No need to keep hearing about how it's ready to be # read - we've got no use for subsequent data. $readable->remove($socket); # Tear down connection, unless there's data waiting to # be written to it - in that case, we'll catch it in # writeables and close it when we're done. if (! data_waiting($socket)) { print ", nothing to write, closing socket" if DEBUGGING; clean_up($socket); } print "\n" if DEBUGGING; next; } # Why 4096 bytes? I dunno. You got a better buffer size? unless (my $n = sysread($socket, $reading_buf{$proxy}, 4096, length($reading_buf{$proxy}))) { warn "sysread: $!\n" if not defined $n; # Shut down the socket print "\n$key - socket close on read" if DEBUGGING; clean_up($socket); # Remove the proxy map delete $peer{$socket}; delete $peer{$proxy}; if (! data_waiting($proxy)) { # No pending data - tear down the peer as well. print ", closing peer too" if DEBUGGING; clean_up($proxy); } print "\n" if DEBUGGING; next; } if (DEBUGGING > 2) { $is_client{$socket} ? print "C< " : print "S< "; print "\n"; } # Got data from a socket. Go do something clever with it. run_hooks($proxy); } } # End of readables # Next, do something with each socket ready to write. Like, write # to it. foreach my $socket (@$towrite) { my $key; if (DEBUGGING) { if ($socket->connected) { $key = $socket->peerhost . ':' . $socket->peerport; } else { $key = "$socket"; } } my $wrote = syswrite($socket, $writing_buf{$socket}) or do { warn "syswrite: $!\n"; print "\n$key - socket close on write" if DEBUGGING; clean_up($socket); # Remove the proxy map if (exists $peer{$socket}) { my $proxy = $peer{$socket}; delete $peer{$proxy}; delete $peer{$socket}; if (! data_waiting($proxy)) { print ", closing peer too" if DEBUGGING; clean_up($proxy); } } print "\n" if DEBUGGING; next; }; if (DEBUGGING > 2) { $is_client{$socket} ? print "C> " : print "S> "; print "\n"; } # Scrub the just-written data from the buffer substr($writing_buf{$socket}, 0, $wrote, ""); # All done writing? if (! length($writing_buf{$socket})) { $writeable->remove($socket); if (! exists $peer{$socket}) { # No peer? Tear down connection. print "\n$key - peer gone after write, closing\n" if DEBUGGING; clean_up($socket); next; } } } # end of writeables } # data_waiting($socket) # # Returns true if there's any data waiting to be proxy'd to this socket. # # Reason this works - we only check data_waiting() on a socket *after* # we've closed it's peer. Closing the peer in clean_up(), below, will # have the effect of flushing any pending %message buffers (and # %reading_buf, for that matter) to %writing_buf, and hence, all the # data which is "waiting" is, in fact, guaranteed to now be waiting. sub data_waiting { my $socket = shift; return (length($reading_buf{$socket}) or length($writing_buf{$socket})); } # clean_up($socket) # # Given a socket, close it, stop selecting it for anything, clean up # all our structs that refer to it, set the peer if any to flush # buffers. sub clean_up { my $socket = shift; # This socket is history. If there's a peer, then that peer # currently has all the data it's ever gonna get. Flush that data # into the writing_buf and add it to the writeable set. # # Ok, technically, this *could* burn you if what you were caching # away in %message was a multiline TOP response that you were # going to discard anyway, and now I'm going to flush it to the # client, instead. Look, the client is going to get an error # condition *anyway* because the darn socket is GONE, man, just # like that, in the middle of a multiline response! I will # venture to say that no harm will come of this - but if it does, # we can always make this behave a lot more like a "last ditch" # run_hooks() session. if (exists $peer{$socket}) { my $proxy = $peer{$socket}; $writing_buf{$proxy} .= $message{$proxy} if exists ($message{$proxy}); $writing_buf{$proxy} .= $reading_buf{$proxy}; $reading_buf{$proxy} = ''; $message{$proxy} = ''; $snarfing{$proxy} = 0; if (length ($writing_buf{$proxy})) { $writeable->add($proxy); print "\nFlushing peer on close\n" if DEBUGGING; } } # Note that you can apparently remove a socket more than once from # an IO::Select set. Also you can delete a key/value pair from a # hash that doesn't exist. Love Perl. DWIM. $readable->remove($socket); $writeable->remove($socket); $socket->close; delete $reading_buf{$socket}; delete $writing_buf{$socket}; delete $is_client{$socket}; delete $snarfing{$socket}; delete $message{$socket}; delete $client_commands{$socket}; delete $reading_multiline_response{$socket}; delete $snarf_start{$socket}; } # run_hooks($socket) # # This is where we hook the POP3 protocol. Called whenever a socket # gets new data in it's buffer, we can do whatever you want here. The # default is to wait until there's a \n in the %reading_buf buffer, then (in # a loop) move all those bytes into the %writing_buf buffer (giving us the # window to look at a full line of I/O), then add the socket to the # writeable set, thereby causing the contents of %writing_buf to get # flushed to the socket. # # Under certain conditions, though, we'll want to intercept the # protocol, at which point we snarf the data off into %message until # it's done, then we look at it or replace it or something, and THEN # we ship it off to %writing_buf for flushing to the client. # # Client commands are pushed onto a queue of commands, server # responses shift commands off that queue. This way we can support # pipelining client/servers, per rfc 2449 # # Note - logically, the %peer mapping must be intact when you get # here. The main loop enforces this. You may assume that # $peer{$socket} will exist and be valid in this routine. my $pos; sub run_hooks { my $socket = shift; # This loop looks for the first occurance of a \n in a string, # then MOVES all of the string up to and including the \n into the # output buffer and adds the socket to the set of sockets we'd # like to write to. Then it loops looking for another \n. # # Just before the move, you can examine the beginning of # $reading_buf{$socket} to see what kinds of interesting thingies might # be in there, in the confidence that it's a real full line of # data from the protocol. You can say things like: # # $reading_buf{$socket} =~ /^(.*)$/m # /m lets $ match next to embedded \n $pos = -1; while (($pos = index($reading_buf{$socket}, "\012", 0)) > -1) { # Right here you can examine $reading_buf{$socket} if ($is_client{$socket}) { # Hooks here for data from the server to the client # Responses from the server are interesting. They can be # single line, in which case they MUST start with "+OK" or # "-ERR", or else they're part of a multiline response, # such as a LIST or RETR command, in which case they MUST # end with a CRLF.CRLF. if ($reading_buf{$socket} =~ /^(\+OK|-ERR)/i and not $reading_multiline_response{$socket}) { # Response to a command my $command = shift @{$client_commands{$socket}}; print $peer{$socket}->peerhost . ':' . $peer{$socket}->peerport . " (Server) said $1 to $command\n" if DEBUGGING; # Always include the greeting line in the log. if (DEBUGGING and $command eq 'none') { print $reading_buf{$socket}; } die "Assertion failed: snarfing outside multiline response" if ($snarfing{$socket}); # Only interested in snarfing successful response - # none of the error responses are multiline. if (substr ($1, 0, 1) eq '+') { if ($command =~ /^TOP$/i and not $allow_top) { print "Snarfing TOP response\n" if DEBUGGING; $snarfing{$socket} = TOP; } if ($command =~ /RETR/i) { print "Snarfing RETR response\n" if DEBUGGING; $snarf_start{$socket} = Time::HiRes::gettimeofday if TIMERS; $snarfing{$socket} = RETR; } if ($command =~ /CAPA/i) { print "Snarfing CAPA response\n" if DEBUGGING; $snarfing{$socket} = CAPA; } } } elsif ($reading_buf{$socket} =~ m|^\.\015?\012|) { # End of a multiline response $reading_multiline_response{$socket} = 0; if ($snarfing{$socket}) { print "Detected end of snarfed multiline\n" if DEBUGGING; printf "Download took %.8f seconds\n", Time::HiRes::gettimeofday - $snarf_start{$socket} if (DEBUGGING and TIMERS); # At this point, $message{$socket} contains the # full multiline response, +OK up to but not # including this trailing ".CRLF". if ($snarfing{$socket} == RETR) { # Right here, $message{$socket} is ripe for # scanning. scan_mail(\$message{$socket}); $writing_buf{$socket} .= $message{$socket}; } elsif ($snarfing{$socket} == TOP) { # Eat the .CRLF, add the error message to the # output buffer, flush said output buffer, # clean up your structs and move on. substr($reading_buf{$socket}, 0, $pos+1, ""); $writing_buf{$socket} .= $no_top; $message{$socket} = ''; $snarfing{$socket} = 0; $writeable->add($socket); next; } elsif ($snarfing{$socket} == CAPA) { # Strips out the TOP response, if any. $message{$socket} =~ s/\012TOP[^\012]*\012/\012/ig if not $allow_top; # Strips out the SASL response, if any. $message{$socket} =~ s/\012SASL[^\012]*\012/\012/ig; $writing_buf{$socket} .= $message{$socket}; } $message{$socket} = ''; $snarfing{$socket} = 0; } } else { # Part of a multiline response. Flip the ready flag, # you won't be ready to see another response until you # see your CRLF.CRLF $reading_multiline_response{$socket} = 1; } # At this point, snarf data into %message if snarfing and # move along. if ($snarfing{$socket}) { $message{$socket} .= substr($reading_buf{$socket}, 0, $pos+1, ""); # Check size of snarfed message and stop snarfing if it's # getting too big - see notes at $max_scan_size. if ($max_scan_size != 0 and length($message{$socket}) > $max_scan_size) { print "Message exceeding max scan size, abandoning snarf\n" if DEBUGGING; $writing_buf{$socket} .= $message{$socket}; $message{$socket} = ''; $snarfing{$socket} = 0; $writeable->add($socket); } next; } } else { # Hooks here for data from the client to the server # Spot the client's command, add to the queue. my ($command) = $reading_buf{$socket} =~ /^(\S+)\s/; print $peer{$socket}->peerhost . ':' . $peer{$socket}->peerport . " (Client) said $command\n" if DEBUGGING and $command; # AUTH is a special case, see discussion elsewhere. Must # not have any commands in the queue, and we reply back to # the socket immediately with an error. if ($command and $command =~ /^AUTH$/i) { if (scalar(@{$client_commands{$peer{$socket}}})) { die "I so can't cope with AUTH commands while pipelining"; } print "AUTH Rejected\n" if DEBUGGING; substr($reading_buf{$socket}, 0, $pos+1, ""); # Note - $no_top is a generic -ERR response, works fine. $writing_buf{$peer{$socket}} .= $no_top; $writeable->add($peer{$socket}); next; } push (@{$client_commands{$peer{$socket}}}, $command) if $command; } # Default action after all your shots at hooking and magic, # etc.: Move the data to the writing buffer, and set it up to # get written. $writing_buf{$socket} .= substr($reading_buf{$socket}, 0, $pos+1, ""); $writeable->add($socket); } } sub dump_data_structs { # Dump your current key per-connection data structs print "\nExisting proxy/peer mappings:\n"; print map "$_ => $peer{$_}\n", keys %peer; print "\nExisting is_client flags:\n"; print map "$_ => $is_client{$_}\n", keys %is_client; print "Existing socket reading_buf buffers:\n"; print map "$_ => $reading_buf{$_}\n", keys %reading_buf; print "Existing socket writing_buf buffers:\n"; print map "$_ => $writing_buf{$_}\n", keys %writing_buf; print "Existing message buffers:\n"; print map "$_ => $message{$_}\n", keys %message; print "Existing snarfing flags:\n"; print map "$_ => $snarfing{$_}\n", keys %snarfing; print "Existing command queues:\n"; print map "$_ => @{$client_commands{$_}}\n", keys %client_commands; print "Existing reading_multiline_response flags:\n"; print map "$_ => $reading_multiline_response{$_}\n", keys %reading_multiline_response; print "Existing snarf_start values:\n"; print map "$_ => $snarf_start{$_}\n", keys %snarf_start; } # @mail - array of lines of a mail message. Some notes on memory # usage here: # # Big mail messages getting copied about will chew up memory right # quick. I start with one copy of the message built up in a scalar # buffer, then I need a second copy, broken out into an array of # lines, for Mail::SpamAssassin::NoMailAudit to chew on. That's two # copies. # # I can save a copy's worth of memory by MOVING the lines from the # scalar buffer into the array - but then, once SpamAssassin is done # chewing on them, I have to put them BACK into the scalar buffer. If # I'm not removing them from the SpamAssassin::NoMailAudit object as I # do that, I'm going to wind up with a second copy of the mail # *anyway*. And that kind of removal is nasty and creeps inside of # the objects encapsulation, where I really ought not go. # # NoMailAudit::as_string() returns a copy of the mail as a string, but # to do so, it creates a big ol' scalar on the stack to return. # Simple, but it costs a THIRD chunk of memory the size of the # message. my @mail; sub scan_mail { my $mailref = shift; my $bytecount = length $$mailref; $$mailref =~ s/\012\.\./\012\./g; # un-byte-stuff @mail = split /^/, $$mailref; my $response = shift @mail; # SpamAssassin::NoMailAudit adds a Unix mbox From_ line, unless # you construct your NoMailAudit message with the (ahem, # undocumented) add_From_line param set to false. That From_ # kinda breaks the protocol - the client isn't expecting mbox, # he's expecting raw 822 mail - so we leave it out. my $message = $spamtest->parse(\@mail); my $start; $start = Time::HiRes::gettimeofday if TIMERS; my $status = $spamtest->check($message); printf "Spam check took %.8f seconds\n", Time::HiRes::gettimeofday - $start if (DEBUGGING and TIMERS); my $id = $message->get_pristine_header('Message-id') || '*none*'; print "$bytecount bytes, ", $status->is_spam() ? 'SPAM' : 'NOT spam', ", Message-id: $id\n" if DEBUGGING; print "mailref is $$mailref"; print $status->get_report() if DEBUGGING and $respect_byte_count; my $headers_body=$status->rewrite_mail() unless $respect_byte_count; print "headersbody is $headers_body"; if ($status->is_spam ()) { if ($respect_byte_count) { # DAN - danger, you don't know if you're in the headers or not. $$mailref =~ s/\012Subject: [^\012]{6}/\012Subject: *SPAM*/i or $$mailref =~ s/\012Received: [^\012]{6}/\012Received: *SPAM*/i; } else { # What as_string() does as of SpamAssassin v2.31: # return join ('', $self->get_all_headers()) . "\n" . # join ('', @{$self->get_body()}); $$mailref = $response; #$$mailref .= $message->get_all_headers(); #$$mailref .= "\015\012"; #foreach my $line (@{$message->get_body()}) { #$$mailref .= $line; # } $$mailref .= $headers_body; # SA's markups end with \n instead of CRLF's. Gotta # change those here. $$mailref =~ s|(?get_all_headers(); #$$mailref .= "\015\012"; #foreach my $line (@{$message->get_body()}) { # $$mailref .= $line; # } $$mailref .= $headers_body; # SA's markups end with \n instead of CRLF's. Gotta # change those here. $$mailref =~ s|(?finish(); print "mailref is $$mailref"; $$mailref =~ s/\012\./\012\.\./g; # byte-stuff } sub all_done { my $socket = shift; my $new_sock = $socket->accept; if ($new_sock->peerhost eq '127.0.0.1') { print "Connection on exit socket, exiting\n" if DEBUGGING; exit; } else { print "Connection on exit socket from non-local host!\n" if DEBUGGING; $new_sock->close; } } sub read_config { open (CONFIG, "./hostmap.txt") or die "Can't read hostmap.txt: $!\n"; # Straight from the cookbook 8.16 while () { chomp; # no newline s/#.*//; # no comments s/^\s+//; # no leading white s/\s+$//; # no trailing white next unless length; # anything left? my ($port, $proxyto) = split(/\s*=\s*/, $_, 2); $hostmap{$port} = $proxyto; } } sub usage { print < C =head1 DESCRIPTION =head2 OVERVIEW If you read email on a Win32 platform with a POP3 mail client, it can be a little challenging to filter your mail. One of the best spam detection solutions available today is SpamAssassin, which is (unfortunately for Windows users) a little biased towards Perl and Unix. This proxy is designed to allow a Win32 POP3 mail client to use SpamAssassin to filter their mail. It does so by standing between the user's mail client and the POP3 server, seamlessly proxying data back and forth between them. When a mail message is retrieved, the proxy waits to read the full message before handing it to the client. After it has the full message, it uses SpamAssassin to check to see if it's spam, marks up the message accordingly, and then returns it to the client. The client can then easily filter the mail based on SpamAssassin's markups, such as the inclusion of "*****SPAM*****" at the beginning of the subject line. Installing this proxy is not for the faint of heart - SpamAssassin is not supported on Win32 platforms, and you'll need to be comfortable with downloading, uncompressing, installing, and copying files, as well as editing text configuration files. You might need to know a little about networking, and if your client is not one we've tested with, you might need to learn a little about the POP3 protocol. But generally speaking, anyone comfortable with installing Perl on their Win32 platform will be able to make use of this proxy. See L<"INSTALLING">, below. If all this is a little daunting, Pop3proxy isn't for you. Consider one of the fine alternatives available from Deersoft inc., instead, for your Windows based SpamAssassin solution. You can find them at http://www.deersoft.com. Pop3proxy supports most of the POP3 protocol as per RFCs 1939 (POP3 protocol) and 2449 (Extensions to POP3). Since marking up a message in transit between the POP3 server and the mail client can "break" the protocol, there are a variety of options to help your client deal with this. Pop3proxy was tested with a number of popular Win32 mail clients, which were found to be pretty good at dealing with the protocol being "broken" in this way, but your mileage may vary. Notably, Pop3proxy does not support RFC 2222, Simple Authentication and Security Layer (SASL), nor RFC 1734, POP3 AUTHentication command. By default, the optional POP3 C command is not supported, but you can override this (see B<--allowtop> in L<"OPTIONS">, below). C and C are also removed from the results of an RFC 2449 C command as well. RFC 2449 Pipelining is supported. SpamAssassin is a very full featured mail analysis package, but due to the constraints of the Win32 platform and the nature of being a network proxy, Pop3proxy provides ONLY basic SpamAssassin text scanning and markup. Razor, DCC, and the auto_whitelist feature are not supported. DNS based RBL checking is not supported. Pop3proxy also does not report any detected spam - if you want to complain about spam, that's up to you, we just try to spot it. =head2 INSTALLING =over 4 =item * Download Pop3proxy Download the C script and save it in its own directory, for example, C. (The rest of this document will refer to C, but naturally, you can install it anywhere you'd like.) =item * Install Perl You need Perl for your Win32 box. Here you should acquaint yourself with the good people at www.ActiveState.com if you're not already familiar with them. Pop3proxy was written against ActiveState Perl v5.6.1 build 631, and probably requires a pretty modern Perl. Download and install their Perl if you don't already have it. =item * Install Time::HiRes SpamAssassin, as of this writing, requires the Time::HiRes module. (Pop3proxy itself will also use it if it's available.) Using ActiveState's C package manager to install Time::HiRes worked fine for me. See the documentation that comes with ActiveState for more details on using C. =item * Download SpamAssassin Download SpamAssassin from http://spamassassin.org/. As of this writing, it was available as both a .tar.gz or a .zip, you can use whichever you're more comfortable working with on Win32. Pop3proxy was written using SpamAssassin v2.31. =item * Install SpamAssassin Manually Unpack the Mail::SpamAssassin distribution. Since trying to get the module to build via the traditional "make" method on a Win32 platform can be difficult, we try to get away with a "manual" installation. From the unpacked distribution, copy C to C, assuming you've installed Perl into C. Since the bits of the SpamAssassin package that Pop3proxy uses are pure Perl (as of this writing), that should do it. =item * Install SpamAssassin Rules Next, also from the SpamAssassin distribution, copy the C<\Rules> directory and all it's contents to C. =item * Install SpamAssassin user_prefs Finally, copy C to C. This is your SpamAssassin prefs file, you can use it to configure the SpamAssassin scanner in Pop3proxy. See the SpamAssassin docs for more details. =item * Configure Shortcuts At this point, you might create a C file if you need to proxy to more than one server. See L<"CONFIGURING CLIENTS AND SERVERS">, below, for details. Otherwise, create a shortcut to launch the proxy. ActiveState Perl installs the undocumented C in the C<\perl\bin> directory. This is a version of the Perl interpreter for Windows that does not require a "console" or "DOS Box" window to run, which makes it nice for running processes like Pop3proxy without leaving an extra window lying around. You can create a shortcut to the proxy that invokes it like this: wperl.exe c:\pop3proxy\pop3proxy.pl --host my.pop3.host ...replacing C with your POP3 server's name, of course. This shortcut can be launched any way you like - from a desktop icon, or a start menu entry, or even from your StartUp folder, if you want to run it all the time. The important thing, of course, is to make sure it's running before you attempt to fetch your mail. You can stop the proxy process a number of ways. The cleanest is to use the C script, which comes with the proxy. If run on the same host as the proxy, it will cause the proxy to exit. You should also be able to use the Win32 task manager to stop the proxy, which will show up as WPERL or PERL, depending on how you ran it. If you ran it from a DOS prompt using C, you should be able to use C to kill the proxy, although some users have reported that this doesn't work all the time. =item * Configure Mail Client All done! You must configure your mail client to use the proxy instead of your POP3 server. See L<"CONFIGURING CLIENTS AND SERVERS">, below, for details. =back =head2 CONFIGURING CLIENTS AND SERVERS The default port for POP3 is 110. If you specify a host on the Pop3proxy command line using the C<--host> flag, then the proxy will proxy connections on port 110 on your local machine to port 110 on the host specified. This is the simplest case. For example, if you normally get your mail from a host called C, you can invoke Pop3proxy like this: wperl.exe c:\pop3proxy\pop3proxy.pl --host mailbox.isp.com ...and then change where your mail client thinks the server is. Where you had configured C, change it instead to use the host named C. The mail client will connect to the proxy, and the proxy will pass the data along to C for you. If you have more than one account on different POP3 servers, you'll need to create a C file in the C directory which will describe which local ports the proxy should map to which remote hosts and ports. The format of this file is one mapping per line of the form: localport = remote.host.name:remoteport Leading and trailing whitespace is ignored, and you can use the Perl commenting convention of "C<#>" to end-of-line for comments. So, for example: # My POP3 servers, how I love them. 110 = mailbox.isp.com:110 818 = mail.another.place.org:110 This will cause connections on the local port 110 to proxy to C on port 110, and connections on port 818 to proxy to host C, also on port 110. Note that the command line C<--host> flag overrides the C entries for port 110, if any. Pop3proxy will gladly listen on any local port you configure, but I suggest choosing one not already in use by someone else. That's increasingly difficult, but these guys might help: http://www.portsdb.org/ Of course, you also need to be able to configure your mail client to talk POP3 on a port other than 110. Most modern mail clients can be configured this way, see L<"TESTED MAIL CLIENTS">. A bigger issue, though, is trying to convince some clients that "C" and "C" are I servers with different accounts on them, and some mail clients are deeply confused by this - again, see the table. If your mailer is confused, the following workaround should do the trick: Create an alias for localhost in your C file. For example, you might configure: 127.0.0.1 localhost jester which would allow you to configure your mail client to get mail from C, and C, both of which are actually Pop3proxy running locally. This simple subterfuge seems to make even persnickety mail clients happy. =head1 OPTIONS =over 4 =item B<--host>=I Specify the host to proxy a single connection to. Pop3proxy will proxy local port 110 to this host. You can optionally specify a remote port, or the default of 110 will be used on the remote as well. This option is in addition to any hosts configured in C, and will override any mapping configured in that file to listen on port 110. =item B<--logfile>=I Specify the name of the file that Pop3proxy should use for logging. The default is C. This file will be created in the same directory as the script, and is overwritten each time the proxy starts. If you use this option but omit the filename, Pop3proxy will log to STDOUT, which is useful for watching things happen in a console window. By default, Pop3proxy doesn't log much. If you'd like more debugging information in your log, you can edit the C script and change the value of the constant C near the top from the default of zero to 1, 2, or 3. See the comments in the code for debugging levels. =item B<--nopad> Specifying this option tells Pop3proxy not to change the size of a mail message after scanning it. The default is to increase the size of the message by adding headers describing the results of the SpamAssassin scan. Some mail clients may break without this. The POP3 protocol has some elements where the server indicates the size of the mailbox and the size of individual messages. The RFC warns against depending on these counts when writing a POP3 client, but some clients might try to anyway. In that case, adding bytes to the message to mark the message as spam, indicate what tests hit, etc., would cause the message size to increase past what the client was expecting. Using this flag disables this behavior. So far, we haven't seen any mail clients that require this flag, but YMMV. If this flag is set, Pop3proxy will flag mail as spam by I the first six bytes of the C header with the string C<*SPAM*>. So: Subject: Fabulous Nekkid Chix For You! becomes Subject: *SPAM*us Nekkid Chix For You! Normal behavior is to I the default SpamAssassin subject flag to the beginning of the C header (along with other headers and indicators). If there is no C line in the mail headers (there doesn't have to be, after all) or if it's less than 6 bytes, then we overwrite the first six bytes of the first C line we find instead. C headers aren't required, either, but you can almost always count on one, especially if the mail is from the outside world. (And if you're getting spammed by someone on localhost, your problems are beyond the scope of this little program to address. :-) Not all mail clients can filter mail based on the contents of the C headers. If this flag is set and the C constant is true (see details under B<--logfile> in L<"OPTIONS"> above) then the results of the SpamAssassin scan will be dumped to the logfile for review. =item B<--allowtop> This flag will cause Pop3proxy to proxy the optional POP3 C command to the server. The default is to decline to proxy it by returning an error response back to the client. The POP3 protocol implements an optional command called C, which returns the top of a mail message including the headers. Servers that don't support it are to return an error, but some mail clients may require it. The danger of proxying it is that the contents of the headers as returned by the C command may be different from the headers after Pop3proxy has scanned the full mail and marked it as spam. This may or may not cause problems for the client, but just to be safe, by default we pretend it's not supported by the server. If your mail client isn't working with Pop3proxy, enable debugging as outlined under B<--logfile>, above, and check your logs for indications that your client is demanding the C command. This will probably look something like the client exiting after the message C<(Client) said TOP>. In that case, this option might allow you to use Pop3proxy - or you may experience weird or destructive behavior. =item B<--maxscan>=I This option lets you set the maximum size of a message which will be scanned for spam. The default is 250000 bytes. SpamAssassin has some well-known performance problems triggered by very large mail messages. Most users work around this by selecting a message size over which they won't feed mail to SpamAssassin. In the context of a network proxy like this, the problem is particularly acute. If we spend too much time scanning a message, the client will timeout waiting for their network read to complete and will probably abort the operation in all kinds of unpleasant ways. As a bonus, the message will likely remain on the server and cause the problem I if you retry. It's just not pretty, hence this limit. As an added bonus, this keeps the proxy's memory usage down, since memory use is tied to the size of the message you're buffering and scanning. This is polite behavior for a long-lived process like a network proxy. The default was selected after examining 1500 spam messages. They averaged 9K bytes, with a max of 110K bytes. Setting this value to zero disables this feature, and is not advised. =item B<--exitport>=I This option lets you specify what port Pop3proxy will listen on for the purposes of exiting. The default is 9625. Any connection from C on this port will cause Pop3proxy to exit cleanly. (This is what C does.) Pop3proxy is a pure Perl program, with few if any hooks into the Win32 environment - in fact, he'll probably run just fine on any Perl supported platform. His main loop is blocked on C, waiting for a socket to be ready to have data read or written, which unfortunately isn't how Win32 apps are supposed to behave - their main loops should be processing Windows events. The "exit port" feature allows us to get an exit "event" of sorts cleanly without a true Win32 event loop. If you change this setting, you'll probably want to modify C as well. Setting this value to zero disables this feature. =back =head1 DIAGNOSTICS All Pop3proxy messages are written to the logfile, or to STDOUT (see B<--logfile> in L<"OPTIONS">, above). The logfile is overwritten each time Pop3proxy starts. Setting the debugging flag to 1 will result in messages about clients connecting and leaving, as well as some information about the protocol exchange. Each client command detected is logged as: 127.0.0.1:1246 (Client) said RETR and each server response is logged as: 204.127.5.31:110 (Server) said +OK to RETR The act of intercepting a multiline server response (such as a mail message) before passing it back to the client is called "snarfing" in Pop3proxy. Many log messages will indicate what is being done with a snarfed response - beginning, detecting the end (a C sequence), aborting if it exceeds the max configured size, etc. If the Time::HiRes module is available, Pop3proxy will use it to time message downloads and SpamAssassin scans, so that you can see where the delay is if you're experiencing network timeouts on the client. The results of a spam scan are logged, along with the Message-Id header and size of the message as downloaded from the server. If the B<--nopad> option is enabled, the results of all SpamAssassin scans are printed to the log. If you bump DEBUGGING up, Pop3proxy will log dumps of its internal data structures each time a client connects. If you bump it up further, you'll get logs of every read and write from every client and server. For hardcore debugging only, see the comments in the code. =head1 FILES All of these files are found in the Pop3proxy install directory, for example, C, following installation and configuration. F, F - The Perl scripts that make up the proxy and a program to kill it. F - An optional configuration file used to map local ports to the remote host and remote port that they should be proxy'd to. See L<"CONFIGURING CLIENTS AND SERVERS">, above, for details and examples. F, a sample hostmap file, is provided with the distribution. F - The SpamAssassin C file, which must be installed by hand. See L<"INSTALLING">, above. F - Default log file, created by Pop3proxy. See B<--logfile> under L<"OPTIONS">, above to change this. See L<"DIAGNOSTICS">, above, for notes on contents. =head1 CAVEATS You may need to tinker with the default settings to get Pop3proxy to work with your mail client, see L<"OPTIONS"> B<--nopad> and B<--allowtop> in particular. Mail may be lost while tinkering, so be careful if you're venturing into unknown territory. Consult L<"TESTED MAIL CLIENTS">, below, to see what kind of territory you're in. SpamAssassin is not perfect, but it's pretty darn good. Adjust your expectations accordingly. =head1 BUGS If using B<--nopad> and the mail does not contain a C header but it DOES contain the string C somewhere in the body, the spam marking is incorrectly placed in the body rather than in a C header. The C<$respect_byte_counts> regex will need to be fixed someday to detect end of headers to fix this. C won't always kill the proxy if it's running in a DOS box. =head1 RESTRICTIONS A client that tries to use the AUTH command while pipelining will cause Pop3proxy to die with a message. This is unlikely (and irrational) behavior that we just can't deal with. =head1 TESTED MAIL CLIENTS Client: |Netscape|Mozilla|Outlook|Express|Eudora|Pegasus Version: | 4.5 | 1.0 | 2002 | 6 |4.2.3 | 4.01 Platform: | Win98 | Win98 | WinXP | WinXP | NT4 | Win98 _____________________|________|_______|_______|_______|______|_______ Filter on: | | | | | | Subject header? | YES | YES | YES | YES |YES | YES Received header? | YES | YES | NO | NO |YES[2]| NO Require TOP command? | NO | NO | NO | NO | NO | NO Use Pipelining? | NO | NO | NO | NO | NO | NO Configure POP port? | NO | YES | YES | YES | NO | YES Multiple POP servers?| NO | YES[1]| YES[6]| YES[6]| NO | YES[5] Handles padded mail? | YES | YES | YES | YES |YES[3]| YES[4] [1] Cannot be same host, must use alias in HOSTS file [2] Technically, on [3] Eudora seems to use byte count to limit what messages get downloaded, if you want, but doesn't seem affected by having the count increased in flight. [4] Pegasus reports size based on actual size downloaded, suggesting that it's not relying on server supplied sizes. [5] Pegasus must have separate hosts names for each identity to use multiple accounts, must use alias in HOSTS as in [1] [6] Outlook/Express can be configured to use the same host on two ports without difficulty. =head1 SEE ALSO http://spamassassin.org/ SpamAssassin homepage http://www.activestate.com Providers of Perl on Win32 platforms http://www.openhandhome.com/howtosa.html How To Use SpamAssassin on Win32 http://nickdafish.com/SAPP.htm An earlier POP3 proxy for SpamAssassin. =head2 RFCs http://www.faqs.org/rfcs/rfc1939.html RFC 1939, POP3 Protocol http://www.faqs.org/rfcs/rfc2449.html RFC 2449, POP3 Extension Mechanism http://www.faqs.org/rfcs/rfc1734.html RFC 1734, POP3 AUTHentication command http://www.faqs.org/rfcs/rfc2222.html RFC 2222, Simple Authentication and Security Layer http://www.faqs.org/rfcs/rfc3206.html RFC 3206, The SYS and AUTH POP Response Codes =head1 AVAILABILITY This document can be found at http://mcd.perlmonk.org/pop3proxy/ The package distribution can be found at: http://mcd.perlmonk.org/pop3proxy/pop3proxy.zip =head1 CHANGES 10 August 2002 - 1.0 Initial release =head1 AUTHOR Dan McDonald CMcD@att.netE> Dedicated to Lisa, who put up with me hacking on this for many lonely nights. Thanks to: The SpamAssassin creators and contributors, All the helpful coders on PerlMonks and Usenet, Larry Wall for being an all around decent human. Special thanks to Marc and Alan for a bundle of help testing this beast, and to Klar for numerous (great|evil) ideas. =head1 COPYRIGHT Copyright (c) 2002, Dan McDonald. All Rights Reserved. This program is free software. It may be used, redistributed and/or modified under the terms of the Perl Artistic License (see http://www.perl.com/perl/misc/Artistic.html) =cut