Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Disconnected TCPServer thread won't terminate
#1
I'm having an issue trying to terminate Indy threads from rogue connections on my game server's TCP port which I use for a WebSocket connection. In the OnConnect event I read headers required to establish a valid websocket and I have ReadTimeout set to 10000 so it doesn't get stuck there from bots that are just probing the port. If I get valid headers I assign an object to AContext.Data, otherwise I just Exit and disconnect them in the OnExecute event. I use madExcept's NameThread command along to way to track the progress of the connection. Here is my OnExecute event:

Code:
class procedure WSServerEvents.OnExecute(AContext: TIdContext);
var st: string;
begin
  Sleep(10);  // prevent CPU max
  try
    if Assigned(AContext.Data) = False then
    begin
      MadExcept.NameThread(GetCurrentThreadID, 'WebSocket Rogue (' + AContext.Binding.PeerIP + ')');
      AContext.Connection.Disconnect;  // rogue connection
    end else
    begin
      if TConnectRec(AContext.Data).isAdmin then AdminSessionExecute(AContext)
      else PlayerSessionExecute(AContext);
    end;
  except
    on E: Exception do
    begin
      if E is EIdException then raise else
      begin
        LogData.AddError('WSServerExecute error: ' + E.Message);
        st := StringReplace(madExcept.GetCrashStackTrace, #13#10, ' - ', [rfReplaceAll]);
        LogData.AddError('WSServerExecute stack trace: ' + st);
        raise;
      end;
    end;
  end;
end;

Notice I update the thread name right before disconnecting.  And they do disconnect because when I run TCPView on my server, no one is connected to me. But those rogue threads never terminate and they build up over time and never stop unless I take the game server offline. And I know they don't terminate because my app has a function that retrieves all the live threads assigned to my app. I then use madExcept's GetThreadName command to get the names I assigned to them. Here's what that looks like:

   

Most of those IPs are from Russia. Why are these threads not terminating after the disconnect?

By the way, this is my OnDisconnect event, which should just Exit immediately for these rogue connections:

Code:
class procedure WSServerEvents.OnDisconnect(AContext: TIdContext);
begin
  if Assigned(AContext.Data) = False then Exit;
  try
    if TConnectRec(AContext.Data).isAdmin then AdminSessionDisconnect(AContext)
    else PlayerSessionDisconnect(AContext);
  except
    on E: Exception do
    begin
      if E is EIdException then raise else
      begin
        LogData.AddError('WSServerDisconnect error: ' + E.Message);
        raise;
      end;
    end;
  end;
end;
Reply
#2
(04-17-2022, 02:37 AM)kbriggs Wrote: In the OnConnect event I read headers required to establish a valid websocket and I have ReadTimeout set to 10000 so it doesn't get stuck there from bots that are just probing the port. If I get valid headers I assign an object to AContext.Data, otherwise I just Exit and disconnect them in the OnExecute event.

Why are you waiting for the OnExecute event? You should disconnect them in the OnConnect event if the WebSocket handshake is not valid.

(04-17-2022, 02:37 AM)kbriggs Wrote: I use madExcept's NameThread command along to way to track the progress of the connection.

What is the point of doing that? First off, Indy threads are already named by Indy. But more importantly, having MadExcept name threads only makes sense if an error report is generated, and you shouldn't be getting an error just for disconnecting clients normally.

(04-17-2022, 02:37 AM)kbriggs Wrote: Notice I update the thread name right before disconnecting.  And they do disconnect because when I run TCPView on my server, no one is connected to me. But those rogue threads never terminate and they build up over time and never stop unless I take the game server offline.

Do you have thread pooling enabled on your server?

(04-17-2022, 02:37 AM)kbriggs Wrote: Most of those IPs are from Russia. Why are these threads not terminating after the disconnect?

There is no way to diagnose your problem with the limited information you have provided so far.

(04-17-2022, 02:37 AM)kbriggs Wrote: By the way, this is my OnDisconnect event, which should just Exit immediately for these rogue connections:

What do AdminSessionDisconnect() and PlayerSessionDisconnect() look like? If I had to guess, maybe they are causing deadlocks you are not diagnosing, which are then affecting the rogue threads from cleaning up correctly. For instance, if there is a thread that is holding a lock to the server's Contexts list, then other connection threads can't fully terminate when they try to remove themselves from that list after their OnDisconnect events exit.

Reply
#3
(04-18-2022, 04:13 PM)rlebeau Wrote: Why are you waiting for the OnExecute event?  You should disconnect them in the OnConnect event if the WebSocket handshake is not valid.

I originally performed the disconnects in OnConnect but I saw a post from you in StackOverFlow telling someone else to do it in OnExecute. Perhaps I misunderstood the context. Any in case, I get the same results either way.

Quote:What is the point of doing that? First off, Indy threads are already named by Indy.

Just for debugging purposes to see how far the thread progresses. And to tie them to IP addresses to see if they are rogue or from players on the game server.

Quote:But more importantly, having MadExcept name threads only makes sense if an error report is generated, and you shouldn't be getting an error just for disconnecting clients normally.

It's too late to name them after an error occurs. Like I mentioned, I have a report that generates a list of all running threads to help debug the very type of situation I'm dealing with now: zombie threads using up system resources.

Quote:Do you have thread pooling enabled on your server?

I don't know what that is so probably not unless it's on by default.

Quote:What do AdminSessionDisconnect() and PlayerSessionDisconnect() look like?

I've not had any issues with normal connections terminating. None of those leave zombie threads behind. But here are those OnDisconnect events:

Code:
procedure AdminSessionDisconnect(AContext: TIdContext);
var
  ID, IP, User: string;
  Admin: TAdminSession;
begin
  ID := TConnectRec(AContext.Data).ID;
  IP := '?';
  User := '?';
  Admin := TAdminSession(AdminSessionList.GetKeyObject(ID));
  if Admin <> Nil then
  begin
    IP := Admin.IP;
    User := Admin.UserName;
    if Admin.ReleaseAndFree then FreeAndNil(Admin);
  end;
  AdminSessionList.RemoveKey(ID);
  AdminSessionUpdate.QueueAdd('Del', ID);
  LogData.AddEvent('Admin', User + ' logged out session ' + ID + ' from IP ' + IP);
end;

procedure PlayerSessionDisconnect(AContext: TIdContext);
var
  PS: TPlayerSession;
  ID, p: string;
begin
  if gvar.Running = False then Exit;
  if Assigned(AContext.Data) then ID := TConnectRec(AContext.Data).ID else ID := '';
  if ID = '' then Exit;
  PS := TPlayerSession(PlayerSessionList.GetKeyObject(ID));
  if PlayerSessionList.OpenObject(PS) then // make sure session object not already destroyed
  begin
    PS.ReleaseAndFree;  // decrement AccessCount from GetKeyObject
    PS.PacketCS.Enter;
    try
      PS.Status := 'Discon';
      PS.Disconnected := Now;
      PS.Connection := Nil;
      SessionUpdate.QueueAdd('Ref', PS.ID);
      p := PS.Player;
      if p = '' then p := 'IP ' + PS.IP;
      LogData.AddEvent('Connect', p + ' disconnects session ' + PS.ID + ', PC ' + PS.PC);
    finally
      PS.PacketCS.Leave;
      if PS.ReleaseAndFree then FreeAndNil(PS);
    end;
  end;
end;

FYI, this was the StackOverFlow thread I was referring to:
https://stackoverflow.com/questions/4500...the-server

And if helps, here is my OnConnect event:

Code:
class procedure WSServerEvents.OnConnect(AContext: TIdContext);
var
  IO: TIdIOHandler;
  Key, ProxyHeader, Header, LCHeader, RawHost, Host, RawOrigin, Origin, IP, ProxyIP: string;
  Admin: boolean;
  c: integer;
begin
  try
    IP := AContext.Binding.PeerIP;
    MadExcept.NameThread(GetCurrentThreadID, 'WebSocket Connect (' + IP + ')');
    AContext.Connection.Intercept := TIdConnectionIntercept.Create(AContext.Connection);  // track total bytes in and out
    AContext.Connection.Intercept.OnReceive := InterceptEvents.OnReceive;
    AContext.Connection.Intercept.OnSend := InterceptEvents.OnSend;
    if (AContext.Connection.IOHandler is TIdSSLIOHandlerSocketBase) then
      TIdSSLIOHandlerSocketBase(AContext.Connection.IOHandler).PassThrough:= false;
    IO := AContext.Connection.IOHandler;
    IO.ReadTimeout := 10000;  // 10 second readln timeout
    ProxyIP := IP;
    Key := '';
    ProxyHeader := LowerCase(Trim(sysProxyIPHeader.sValue));
    RawHost := '';
    RawOrigin := '';
    Host := '';
    Origin := '';
    IP := '';
    Admin := False;
    c := 0;

    MadExcept.NameThread(GetCurrentThreadID, 'WebSocket Header Start (' + ProxyIP + ')');

    repeat // read headers until a blank line received
      Sleep(10);
      Header := IO.ReadLn();
      LCHeader := LowerCase(Header);
      if Pos('get /admin', LCHeader) = 1 then Admin := True else
      if Pos('host:', LCHeader) = 1 then
      begin
        RawHost := Trim(Copy(Header, 6, 255));
        Host := URLToHost('http://' + RawHost);
      end else
      if Pos('origin:', LCHeader) = 1 then
      begin
        RawOrigin := Trim(Copy(Header, 8, 255));
        Origin := URLToHost(RawOrigin);
      end else
      if Pos('sec-websocket-key:', LCHeader) = 1 then Key := Trim(Copy(Header, 19, 99))
      else if (ProxyHeader <> '') and (Pos(ProxyHeader + ':', LCHeader) = 1) then
        IP := Trim(Copy(Header, Length(ProxyHeader) + 2, 99));
      Inc(c);
    until (Header = '') or (c > 50);  // 50 header lines max

    MadExcept.NameThread(GetCurrentThreadID, 'WebSocket Header End (' + ProxyIP + ')');

    if Key = '' then Exit; // invalid websocket header, disconnect in OnExecute
    if not(Admin) and not(gvar.Running) then Exit;  // offline for players, disconnect in OnExecute

    MadExcept.NameThread(GetCurrentThreadID, 'WebSocket Key Ok (' + ProxyIP + ')');

    // send response headers and hashed key in accordance with RFC 6455
    IO.WriteLn('HTTP/1.1 101 Switching Protocols');
    IO.WriteLn('Upgrade: websocket');
    IO.WriteLn('Connection: Upgrade');
    IO.WriteLn('Sec-WebSocket-Accept: ' + HashKey(Key));
    IO.WriteLn('');
    if sysSameOriginPolicy.bValue and (Host <> Origin) then
    begin
      SendCloseFrame(AContext, 4000);
      Sleep(1000);  // enough time for writes to go out
      Exit;  // disconnect in OnExecute
    end;
    if IP = '' then
    begin
      IP := ProxyIP;
      ProxyIP := '';
    end;
    if Admin then AdminSessionConnect(AContext, IP, ProxyIP)
    else PlayerSessionConnect(AContext, IP, ProxyIP);
  except
    on E: Exception do
    begin
      if E is EIdException then raise else
      begin
        LogData.AddError('WSServerConnect error: ' + E.Message);
        raise;
      end;
    end;
  end;
end;
Reply
#4
(04-18-2022, 05:48 PM)kbriggs Wrote: I originally performed the disconnects in OnConnect but I saw a post from you in StackOverFlow telling someone else to do it in OnExecute. Perhaps I misunderstood the context.

That person was wanting to disconnect clients from other threads outside of the server, not from within the server threads that own each client. OnConnect, OnDisconnect, and OnExecute are all fired in the context of the same thread.

(04-18-2022, 05:48 PM)kbriggs Wrote:
Quote:Do you have thread pooling enabled on your server?

I don't know what that is so probably not unless it's on by default.

Pooling is not enabled by default. You would have to manually assign a TIdSchedulerOfThreadPool component to the server in order to use pooling. What it does is holds finished threads in memory so they can be re-used for new connections later.

(04-18-2022, 05:48 PM)kbriggs Wrote:
Quote:What do AdminSessionDisconnect() and PlayerSessionDisconnect() look like?

I've not had any issues with normal connections terminating. None of those leave zombie threads behind. But here are those OnDisconnect events:

And, what do AdminSessionList, AdminSessionUpdate, PlayerSessionList, SessionUpdate, and LogData look like? You are accessing a lot of shared data across thread boundaries, so I hope you are adequately and safely synchronizing access to your shared data.

(04-18-2022, 05:48 PM)kbriggs Wrote:
Code:
    if Key = '' then Exit; // invalid websocket header, disconnect in OnExecute
    if not(Admin) and not(gvar.Running) then Exit;  // offline for players, disconnect in OnExecute

That would be an ideal spot to disconnect, eg:

Code:
if (Key = '') or // invalid websocket header
  ((not Admin) and (not gvar.Running)) then  // offline for players
begin
  AContext.Connection.Disconnect;
  Exit;
end;

(04-18-2022, 05:48 PM)kbriggs Wrote:
Code:
if sysSameOriginPolicy.bValue and (Host <> Origin) then
    begin
      SendCloseFrame(AContext, 4000);
      Sleep(1000);  // enough time for writes to go out
      Exit;  // disconnect in OnExecute
    end;

Likewise:

Code:
if sysSameOriginPolicy.bValue and (Host <> Origin) then
begin
  SendCloseFrame(AContext, 4000);
  AContext.Connection.Disconnect;
  Exit;
end;

Reply
#5
(04-18-2022, 08:15 PM)rlebeau Wrote: And, what do AdminSessionList, AdminSessionUpdate, PlayerSessionList, SessionUpdate, and LogData look like?  You are accessing a lot of shared data across thread boundaries, so I hope you are adequately and safely synchronizing access to your shared data.

Yes, those lists are TObjects with properties protected by getters, setters, and functions wrapped in TMultiReadExclusiveWriteSynchronizer blocks. I'm pretty confident those are solid as they've been working for years.


Quote:That would be an ideal spot to disconnect, eg:

Code:
if (Key = '') or // invalid websocket header
  ((not Admin) and (not gvar.Running)) then  // offline for players
begin
  AContext.Connection.Disconnect;
  Exit;
end;

That's exactly how it was before when I discovered the un-terminated threads. I only moved the disconnects to OnExecute to see if it would make any difference but it did not.

What would happen if I tried to terminate the thread myself like this:

Code:
AContext.Connection.Disconnect;  // rogue connection
TIdYarnofThread(AContext.Yarn).Thread.Terminate;  // uses IdSchedulerofThread

I saw that in another StackOverFlow post.  I would of course prefer to figure out the real problem.
Reply
#6
(04-18-2022, 08:58 PM)kbriggs Wrote: Yes, those lists are TObjects with properties protected by getters, setters, and functions wrapped in TMultiReadExclusiveWriteSynchronizer blocks. I'm pretty confident those are solid as they've been working for years.

That doesn't rule out the possibility of a deadlock/mismanaged lock, though.

(04-18-2022, 08:58 PM)kbriggs Wrote: What would happen if I tried to terminate the thread myself like this:

You can try, but it shouldn't make a difference in this case. When the socket is disconnected, the TIdContext.OnRun event (which is called in a loop for the lifetime of the thread, and is responsible for firing the server's OnExecute event on each iteration) should return False, which will cause the thread to call Stop() on itself, and the default behavior of Stop() in a non-pooled server is to call Terminate().

(04-18-2022, 08:58 PM)kbriggs Wrote: I would of course prefer to figure out the real problem.

Then you are just going to have to debug into Indy's code for yourself to see what is really happening.

Reply
#7
(04-18-2022, 10:36 PM)rlebeau Wrote:
(04-18-2022, 08:58 PM)kbriggs Wrote: What would happen if I tried to terminate the thread myself like this:

You can try, but it shouldn't make a difference in this case.

It did make a difference. I changed that part of my OnExecute code to this:

Code:
    if Assigned(AContext.Data) = False then
    begin
      st := 'WebSocket Rogue (' + AContext.Binding.PeerIP + ')';
      MadExcept.NameThread(GetCurrentThreadID, st);
      st := st + ' ' + IntToStr(GetCurrentThreadID);
      LogData.Debug(st);
      AContext.Connection.Disconnect;  // rogue connection
      Sleep(1000);
      TIdYarnofThread(AContext.Yarn).Thread.Terminate;  // uses IdSchedulerofThread
    end else

I put that online and checked back 3 hours later. My log had 127 rogue entries recorded (most from Brazil this time) but not a single one of their threads was running. So problem solved, I guess?

I haven't updated my Indy install in quite a while. I have version 10.6.2.5469.  Have there been any changes since then might be relevant to this situation?
Reply
#8
(04-19-2022, 05:14 AM)kbriggs Wrote: I changed that part of my OnExecute code to this:

Why not OnConnect?

(04-19-2022, 05:14 AM)kbriggs Wrote: I put that online and checked back 3 hours later. My log had 127 rogue entries recorded (most from Brazil this time) but not a single one of their threads was running. So problem solved, I guess?

No, not even close. Under normal conditions, this kind of workaround is never needed. Something in your project is preventing the threads from stopping themselves properly. But calling Terminate() directly is just a signal, the threads still have to actively check that signal and actually exit themselves, which implies the threads are not being blocked. So, I have no idea what is going on here.

Did you check if the OnExecute event is still firing after calling Disconnect()? It shouldn't be, and if it is not then the server should be signalling the thread to stop between iterations.

Something else to try - see if calling AContext.Binding.CloseSocket() instead of AContext.Connection.Disconnect() makes any difference. CloseSocket() just closes the underlying socket at the system level, but leaves everything else in Indy alone. Whereas Disconnect() closes down the entire IOHandler, which will also close the socket during its cleanup.

(04-19-2022, 05:14 AM)kbriggs Wrote: I haven't updated my Indy install in quite a while. I have version 10.6.2.5469.  Have there been any changes since then might be relevant to this situation?

Possibly. For instance, there was a checkin made 7 months ago to have the server use AContext.Binding.CloseSocket() instead of AContext.Connection.Disconnect() when cleaning up threads.

Reply
#9
(04-19-2022, 03:36 PM)rlebeau Wrote: Why not OnConnect?

I'm trying one change at a time. I'll move it back to there before I'm done.

Quote:Did you check if the OnExecute event is still firing after calling Disconnect()?

Not since moving the disconnect to OnExecute. When I move it back I'll add a test for that.

Quote:Something else to try - see if calling AContext.Binding.CloseSocket() instead of AContext.Connection.Disconnect() makes any difference.

I'll try that next.
Reply
#10
Nope, it didn't work. I moved the disconnect to OnConnect like this:

Code:
    if Key = '' then // invalid websocket header
    begin
      MadExcept.NameThread(GetCurrentThreadID, 'WebSocket No Key (' + IP + ')');
      LogData.Debug('WebSocket No Key (' + IP + ') ' + IntToStr(GetCurrentThreadID));
      AContext.Binding.CloseSocket;  // alternative to AContext.Connection.Disconnect
      Exit;
    end;

The thread stayed alive and just kept calling OnExecute every 15 msec because I commented out the terminate:

Code:
class procedure WSServerEvents.OnExecute(AContext: TIdContext);
var st: string;
begin
  Sleep(10);  // prevent CPU max
  try
    if Assigned(AContext.Data) = False then  // rogue connection
    begin
      st := 'WebSocket Rogue (' + AContext.Binding.PeerIP + ')';
      MadExcept.NameThread(GetCurrentThreadID, st);
      st := st + ' ' + IntToStr(GetCurrentThreadID);
      LogData.AddError(st);
      AContext.Binding.CloseSocket;
//      Sleep(1000);
//      TIdYarnofThread(AContext.Yarn).Thread.Terminate;  // uses IdSchedulerofThread
    end else

One rogue connection flooded my log with over 100K entries before I shut it down.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)