Atozed Forums

Full Version: TIdSmtpServer not freeing threads
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Hi,

I have an extremely simple SmtpServer set up to simply receive emails and write them to disk.  My problem is that over several hours the number of active threads just keeps growing, leading me to believe they are not being freed correctly.  I further have to conclude it is quite likely as a result of something I am doing or not doing that is causing this, however for the life of me I cant seem to identify the problem.

I have attached my sources, they do contain an external reference to a text file logging routine that is thread safe (file protected by critical section) that has been used without issue for many years.

In my current setup I am binding to 20 IP addresses and they all seem to receive and handle the incomming email as expected.  The load on the system is between 1 and 5 messages per minute.  after 24 hours there are well over 1,000 threads active, this would not be all the threads created (fewer than messages received) in that period.

By my speculation my error should be in my configuration of the component
    with WwolSmtpServ do
    begin
      ReuseSocket := rsTrue;
      AllowPipelining := True;
      ServerName := 'WWOL SMTP server';
      DefaultPort := 25;
      ListenQueue := 50;
      MaxConnections := 0;
      MaxMsgSize := 0;
      TerminateWaitTime := 500;
      UseNagle := True;
      UseTLS := utNoTLSSupport;
    end;

OR in my component exception handlers
procedure TdSmtpServ.WwolSmtpServException(AContext: TIdContext; AException: Exception);
begin
  try
    if ((AException is EIdConnClosedGracefully) or (AException is EIdNotConnected)) then
    begin
      Exit;
    end
    else
    begin

      if AContext.Connection.Connected then
      begin
        Log(Now, 0, 1, AContext.Binding.IP, AContext.Binding.PeerIP, AException.ClassName, AException.message + ' - ' + ' SmtpServException still connected');
        AContext.Connection.Disconnect;
      end
      else
        Log(Now, 0, 1, AContext.Binding.IP, AContext.Binding.PeerIP, AException.ClassName, AException.message + ' - ' + ' SmtpServException disconnected');
      // AContext.Connection.Disconnect(False);
    end;
  except
    on E: Exception do
    begin
      dmMain.TextLog('WwolSmtpServException : ' + E.Message);
      dmMain.TextLog('WwolSmtpServException : ' + AContext.Binding.IP + ' : ' + AContext.Binding.PeerIP + ' : ' + AException.ClassName + ' : ' + AException.message);
    end;
  end;
end;

procedure TdSmtpServ.WwolSmtpServListenException(AThread: TIdListenerThread; AException: Exception);
begin
  // Log(logModerate, 0, 'TdSmtpServ', 'WwolSmtpServListenException', '', '', AException.Message);
  Log(Now, 0, -3, AThread.Binding.IP, AThread.Binding.PeerIP, AException.ClassName, AException.message + ' - WwolSmtpServListenException');
  // AThread.Connection.Disconnect;
end;

OR in the component disconnect event
procedure TdSmtpServ.WwolSmtpServConnect(AContext: TIdContext);
begin
  Log(Now, 0, -4, AContext.Binding.IP, AContext.Binding.PeerIP, 'WwolSmtpServConnect',
    AContext.Binding.DisplayName + ' - ' + ' WwolSmtpServConnect');

end;

I am hoping I have missed something stupid and some fresh eyes might see the obvious,

Thanks for any assistance
-Allen
(10-24-2018, 05:04 PM)bluewwol Wrote: [ -> ]after 24 hours there are well over 1,000 threads active, this would not be all the threads created (fewer than messages received) in that period.

That typically happens when either:
  • your server command event handlers are catching and discarding Indy exceptions without letting the server handle them internally, so it doesn't know when clients are no longer valid so it can stop their threads.  At the very least, if you catch exceptions in your code, you should re-raise any EIdException-derived exception you catch (except in the OnException and OnListenException events, of course).
  • you encounter a deadlock within an event handler, causing the calling thread to not fully terminate.

(10-24-2018, 05:04 PM)bluewwol Wrote: [ -> ]By my speculation my error should be in my configuration of the component

No.  The config is fine.

(10-24-2018, 05:04 PM)bluewwol Wrote: [ -> ]OR in my component exception handlers

The handlers you have shown are fine in general.  Though, you don't need to call Disconnect() in the OnException event, the server will do that for you after the event handler exits.  And the PeerIP is not valid in the OnListenException event, as that event is for a listening socket, not a connected client socket.

(10-24-2018, 05:04 PM)bluewwol Wrote: [ -> ]OR in the component disconnect event

That handler is fine.

(10-24-2018, 05:04 PM)bluewwol Wrote: [ -> ]I am hoping I have missed something stupid and some fresh eyes might see the obvious,

Nothing obvious that I see. You are just going to have to debug your code to find out what is actually happening. For instance, are you getting an OnDisconnect event triggered for every client that connects? Under normal conditions, you should be.
Remy;

These seem most likely, I will review my code carefully and make certain I am not leaving anything hanging.

As always thanks for your kind assistance,

-Allen

(10-24-2018, 06:48 PM)rlebea u Wrote: [ -> ]That typically happens when either:
  • your server command event handlers are catching and discarding Indy exceptions without letting the server handle them internally, so it doesn't know when clients are no longer valid so it can stop their threads.  At the very least, if you catch exceptions in your code, you should re-raise any EIdException-derived exception you catch (except in the OnException and OnListenException events, of course).
  • you encounter a deadlock within an event handler, causing the calling thread to not fully terminate.
Remy;

The problem appears practical, not technical. The issue in reality is my server not keeping up with probing attempts from the outside. My initial steps have greatly reduced the impact of the problem.

The two changes I have made are to
procedure TdSmtpServ.WwolSmtpServMailFrom(ASender: TIdSMTPServerContext; const AAddress: string; AParams: TStrings; var VAction: TIdMailFromReply);
begin
if Trim(AAddress) = '' then
begin
VAction := mReject;
end
else
begin
VAction := mAccept;
end;
end;

and
TerminateWaitTime := 30;

These are simplistic and incomplete, however have largely eliminated the current problem. The issue really is not properly understanding how my app is interacting with the world.

Thanks again for the assistance.

-Allen
Remy;

I do still have some threads not being freed, but will focus some attention on those in the next few days.  

Threads associated with  VAction := mReject from OnMailFrom event and VAction := rInvalid from OnRcptTo event and threads where the client disconnected are currently freed after five minutes.  There are a few threads the remain active well beyond this that I have not yet determined a cause.  I had also assumed that the TerminateWaitTime would impact the time interval for these to be cleaned up, however it seems TerminateWaitTime has no impact on this.

In the mean time I have adjusted the OnMailFrom event to include a ASender.Connection.Disconnect which swams to be making a big improvement, however is this a foolish solution?  
Code:
procedure TdSmtpServ.WwolSmtpServMailFrom(ASender: TIdSMTPServerContext; const AAddress: string; AParams: TStrings; var VAction: TIdMailFromReply);
var
 s1                    : string;
begin
 if Trim(AAddress) = '' then
 begin
   VAction := mReject;
   s1 := 'mReject';
   TmySmtpItem(ASender.Data).LastEvent := 'MailFrom mReject';
   ASender.Connection.Disconnect;
 end
 else
 begin
   VAction := mAccept;
   s1 := 'mAccept';
   TmySmtpItem(ASender.Data).LastEvent := 'MailFrom mAccept';
 end;
end;

I have also added a procedure on a 30 minute timer that cleans up other threads (a heavy handed remedy of the symptom)

Code:
procedure TdSmtpServ.TerminateThreads;
var
 i                     : Integer;
 LContext              : TIdContext;
 LList                 : TIdContextList;
 TermTime              : TDateTime;
begin
 // try terminate extra threads...
 TermTime := Now - (1 / 48);                     // half an hour ago
 if WwolSmtpServ.Contexts <> nil then
 begin
   LList := WwolSmtpServ.Contexts.LockList;
   try
     // for i := 0 to LList.Count - 10 do
     for i := LList.Count - 1 downto 0 do
     begin
       LContext := TIdContext(LList.Items[i]);
       Assert(LContext <> nil);
       Assert(LContext.Connection <> nil);

       if (TmySmtpItem(LContext.Data).Start < TermTime) then
         LContext.Connection.Disconnect;
     end;
   finally
     WwolSmtpServ.Contexts.UnLockList;
   end;
 end;
end;


While maybe technically inept my app now frees threads immediately after a message is received, after the sender is determent bad, and after a recipient is determined to be bad.
After five minutes the threads Indy believes to be complete are cleaned up.
After thirty minutes I brute force clean up the remaining threads.


Thanks again for the help,
(10-26-2018, 12:36 AM)bluewwol Wrote: [ -> ]I do still have some threads not being freed

And what are those threads actually doing?  And do you get an OnDisconnect event fired for them?

(10-26-2018, 12:36 AM)bluewwol Wrote: [ -> ]Threads associated with  VAction := mReject from OnMailFrom event and VAction := rInvalid from OnRcptTo event and threads where the client disconnected are currently freed after five minutes.

If a client disconnects, its thread should be stopping itself immediately.  If it is not, then there is a serious problem with your server.

(10-26-2018, 12:36 AM)bluewwol Wrote: [ -> ]I had also assumed that the TerminateWaitTime would impact the time interval for these to be cleaned up, however it seems TerminateWaitTime has no impact on this.

No, it does not.  In fact, it has NO EFFECT WHATSOEVER in Indy 10 at all.  It is a legacy property that is NO LONGER IMPLEMENTED.

However, even in Indy 9 and earlier, where it USED to be implemented, it had NO EFFECT on running client threads during normal server usage.  The property applied ONLY when the server was being deactivated.  The property specifies the total amount of time the server waits for all active threads to terminate.  In Indy 8, if a timeout occurred, the deactivation would simply exit immediately.  In Indy 9, the deactivation would raise an EIdTerminateThreadTimeout exception instead.  In Indy 10, the property is ignored altogether, deactivation waits as long as it takes for all threads to terminate.

(10-26-2018, 12:36 AM)bluewwol Wrote: [ -> ]In the mean time I have adjusted the OnMailFrom event to include a ASender.Connection.Disconnect which swams to be making a big improvement, however is this a foolish solution?

Yes, it is foolish, not to mention a violation of the SMTP protocol.  Just because a client sends a MAIL FROM command with a blank address should not doom that client to an immediate disconnection.  In fact, MAIL FROM:<> is a valid command (see RFC 5321 Section 3.6.3 regarding "undeliverable mail" notifications).

Now, if a client sends multiple commands that repeatedly fail, that would be a different case to consider.

You might also consider using the TIdSMTPServer.OnSPFCheck event to make sure clients are coming from IPs that are even authorized to access your server in the first place.

(10-26-2018, 12:36 AM)bluewwol Wrote: [ -> ]I have also added a procedure on a 30 minute timer that cleans up other threads (a heavy handed remedy of the symptom)

Very heavy handed.  I would opt for a different solution that does not rely on a looping timer, or even accessing the server's Contexts list.

First, start with assigning a non-infinite ReadTimeout for each connected client, such as in the OnConnect event.  If a client connects and then doesn't send a command for awhile, an exception will be raised that will close the connection.  You might also go as far as using AContext.Connection.Socket.SetSockOpt() to set a non-infinite SO_SNDTIMEO writing timeout, too (Indy currently does not have any kind of WriteTimeout property), to handle cases where a client connects and does not read anything the server sends, causing a thread blockage if the client's inbound buffer fills up.

Beyond that, TIdSMTPServer is derived from TIdCmdTCPServer, which has virtual DoExecute() and ReadCommandLine() methods.  I would override one of those to handle the TmySmtpItem.StartStart check.  And I would not use a TDateTime for that check, since Now() is sensitive to local clock changes.  I would use Indy's Ticks64() and ElapsedTicks() functions instead, which do not care about the clock at all.
Remy;

I dropped a log intercept on the server and trapped the following:
Code:
0.0.0.0:0 Stat Disconnected.
aaa.85.217.53:46820 Stat Connected.
aaa.85.217.53:46820 Sent 10/26/2018 1:56:43 AM: 220 Welcome to WWOL Server<EOL>
aaa.85.217.53:46820 Recv 10/26/2018 1:56:44 AM: EHLO mail-vs1-f53.google.com<EOL>
aaa.85.217.53:46820 Sent 10/26/2018 1:56:44 AM: 250-Hello mail-vs1-f53.google.com<EOL>250-AUTH LOGIN<EOL>250-ENHANCEDSTATUSCODES<EOL>250-PIPELINING<EOL>250-SIZE 0<EOL>250-CHUNKING<EOL>250-8BITMIME<EOL>250 BINARYMIME<EOL>
aaa.85.217.53:46820 Recv 10/26/2018 1:56:44 AM: MAIL FROM:<addr@gmail.com> SIZE=3215<EOL>
aaa.85.217.53:46820 Sent 10/26/2018 1:56:44 AM: 250 2.1.0 addr@gmail.com Address Okay<EOL>
aaa.85.217.53:46820 Recv 10/26/2018 1:56:44 AM: RCPT TO:<info@testdomain.com><EOL>
aaa.85.217.53:46820 Sent 10/26/2018 1:56:44 AM: 250 2.1.5 info@testdomain.com Address Okay<EOL>


0.0.0.0:0 Stat Disconnected.
bbb.ccc.104.133:56531 Stat Connected.
bbb.ccc.104.133:56531 Sent 10/26/2018 2:01:19 AM: 220 Welcome to WWOL Server<EOL>
bbb.ccc.104.133:56531 Recv 10/26/2018 2:01:19 AM: EHLO smtp.domain.com<EOL>
bbb.ccc.104.133:56531 Sent 10/26/2018 2:01:19 AM: 250-Hello smtp.domain.com<EOL>250-AUTH LOGIN<EOL>250-ENHANCEDSTATUSCODES<EOL>250-PIPELINING<EOL>250-SIZE 0<EOL>250-CHUNKING<EOL>250-8BITMIME<EOL>250 BINARYMIME<EOL>
bbb.ccc.104.133:56531 Recv 10/26/2018 2:01:19 AM: MAIL From:<allen@domain.com> SIZE=1120<EOL>
bbb.ccc.104.133:56531 Sent 10/26/2018 2:01:19 AM: 250 2.1.0 allen@domain.com Address Okay<EOL>
bbb.ccc.104.133:56531 Recv 10/26/2018 2:01:19 AM: RCPT To:<info@testdomain.com><EOL>
bbb.ccc.104.133:56531 Sent 10/26/2018 2:01:19 AM: 250 2.1.5 info@testdomain.com Address Okay<EOL>
bbb.ccc.104.133:56531 Recv 10/26/2018 2:01:20 AM: DATA<EOL>
bbb.ccc.104.133:56531 Sent 10/26/2018 2:01:20 AM: 354 Start mail input; end with <CRLF>.<CRLF><EOL>
bbb.ccc.104.133:56531 Recv 10/26/2018 2:01:20 AM: DKIM-Signature: a=rsa-sha256; t=1540519286; x=1541124086; s=201610251434; d=domain.com; c=relaxed/relaxed; v=1; bh=rEllRzBne5UlxrjY79VFG07bCTHyrRDex8JGOe7OWfY=; h=From:Subject:Date:Message-ID:To:MIME-Version:Content-Type:Content-Transfer-Encoding;<EOL>   b=BLQX/jZgUw2FQq7eAIGTln132rtibPDHc29iALlggBd/CAH3w8Dxa8gN5BjULnyB0g0Qm5Qq+au8WxSc1pOUAg==<EOL>Return-Path: <allen@testdomain.com><EOL>Received: from [192.168.1.7] (c-76-110-182-244.hsd1.fl.comcast.net [76.110.182.244])<EOL>        by smtp.testdomain.com (testdomain.com) with ASMTP (SSL) id 201810260201250069<EOL>        for <info@testdomain.com>; Fri, 26 Oct 2018 02:01:25 -0000<EOL>To: info@testdomain.com<EOL>From: Allen <allen@testdomain.com><EOL>Subject: info@testdomain.com<EOL>Message-ID: <6ac22db3-8f00-27da-97a2-61ec8d8a444a@testdomain.com><EOL>Date: Thu, 25 Oct 2018 22:01:18 -0400<EOL>User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101<EOL> Thunderbird/60.0<EOL>MIME-Version: 1.0<EOL>Content-Type: text/plain; charset=utf-8; format=flowed<EOL>Content-Transfer-Encoding: 7bit<EOL>Content-Language: en-US<EOL><EOL>info@testdomain.com<EOL>
bbb.ccc.104.133:56531 Recv 10/26/2018 2:01:20 AM: <EOL>.<EOL>
bbb.ccc.104.133:56531 Sent 10/26/2018 2:01:20 AM: 250 Ok<EOL>
bbb.ccc.104.133:56531 Recv 10/26/2018 2:01:20 AM: QUIT<EOL>
bbb.ccc.104.133:56531 Sent 10/26/2018 2:01:20 AM: 221 Signing Off<EOL>
bbb.ccc.104.133:56531 Stat Disconnected.
0.0.0.0:0 Stat Disconnected.
What I observe here in that the threads that are hanging originate from gmail.  while from some others succeeds just fine.  The GMail thread sends the reply back to google that the address is ok, google (hotmail too) never respond after that.  they do not disconnect either.

I will dig into this further tomorrow.

-Allen
Remy,
I have opened a ticket with gmail and my network provider. Perhaps the issue lies with them. If there is anything else I might need to run down I am open to suggestion.
Thanks again for your diligent assistance,
-Allen
Reny;

Quick note, ran this piece of software on an unrelated network for a short time this morning and all behaved as expected. So the problems are somehow related to the network and not the software. The use case we have run into is unuseual but interesting. I will circle back to your prior comments and figure out a more elegant solution to cleaning up these "abandoned" threads.

Thanks for the help, and I will come back add in the resolution to this in case it can help someone else out in future.

-Allen
(10-26-2018, 02:29 AM)bluewwol Wrote: [ -> ]I dropped a log intercept on the server and trapped the following:
...
What I observe here in that the threads that are hanging originate from gmail.  while from some others succeeds just fine.  The GMail thread sends the reply back to google that the address is ok, google (hotmail too) never respond after that.  they do not disconnect either.

Setting a reasonable ReadTimeout for each client connection would handle that scenario. I seriously doubt that a high-profile email provider like GMail or Hotmail would exhibit behavior like this, though.

Did you verify the hung clients are REALLY who they claim to be?  You can perform a reverse DNS lookup on the client's IP address to get its actual hostname so you can check its domain name.

Optionally, also use the OnSPFCheck event to check if that hostname is authentic.  Just because a client claims to be Google or Hotmail does not mean it REALLY is.

For example:

Code:
type
 TmySmtpItem = class
   ...
   HostName: string;
 end;

procedure TdSmtpServ.WwolSmtpServConnect(AContext: TContext);
begin
 // FYI, I would suggest deriving a new class from TIdSMTPServerContext
 // and assign it to the TIdSMTPServer.ContextClass property, instead of
 // using the TIdContext.Data property...
 ASender.Data := TmySmtpItem.Create;
 ...
 with AContext.Binding do
   TmySmtpItem(ASender.Data).HostName := GStack.HostByAddress(PeerIP, IPVersion);
end;

procedure TdSmtpServ.WwolSmtpServSPFCheck(ASender: TIdSMTPServerContext;
 const AIP, ADomain, AIdentity: String; var VAction: TIdSPFReply);
begin
 // perform a DNS query, such as with TIdDNSResolver, on the client's hostname
 // to check for the existence of valid SPF records that validate the client really
 // does belong to the domain that it claims.  Read RFCs 4405-4408 for more details.
 //
 // AIP is the client's actual IP address.
 // ADomain is the domain to query.
 // AIdentity is the identity to check that domain for.
 // ASender.HeloString is the identity the client claims to be (AIdentity is this string for HELO/EHLO commands).
 // TmySmtpItem(ASender.Data).HostName is the client's actual hostname

 VAction := ...;
 // set to the result of the validation. spfNeutral is the default.
 // spfNone, spfNeutral, spfPass, and spfSoftFail are success actions.
 // spfFail, spfTempError, and spfPermErrorand are failure actions.
end;
Pages: 1 2