Posts: 21
Threads: 5
Joined: Aug 2018
Reputation:
0
Location: Texas
04-17-2022, 02:37 AM
(This post was last modified: 04-17-2022, 02:49 AM by kbriggs.)
I'm having an issue trying to terminate Indy threads from rogue connections on my game server's TCP port which I use for a WebSocket connection. In the OnConnect event I read headers required to establish a valid websocket and I have ReadTimeout set to 10000 so it doesn't get stuck there from bots that are just probing the port. If I get valid headers I assign an object to AContext.Data, otherwise I just Exit and disconnect them in the OnExecute event. I use madExcept's NameThread command along to way to track the progress of the connection. Here is my OnExecute event:
Code: class procedure WSServerEvents.OnExecute(AContext: TIdContext);
var st: string;
begin
Sleep(10); // prevent CPU max
try
if Assigned(AContext.Data) = False then
begin
MadExcept.NameThread(GetCurrentThreadID, 'WebSocket Rogue (' + AContext.Binding.PeerIP + ')');
AContext.Connection.Disconnect; // rogue connection
end else
begin
if TConnectRec(AContext.Data).isAdmin then AdminSessionExecute(AContext)
else PlayerSessionExecute(AContext);
end;
except
on E: Exception do
begin
if E is EIdException then raise else
begin
LogData.AddError('WSServerExecute error: ' + E.Message);
st := StringReplace(madExcept.GetCrashStackTrace, #13#10, ' - ', [rfReplaceAll]);
LogData.AddError('WSServerExecute stack trace: ' + st);
raise;
end;
end;
end;
end;
Notice I update the thread name right before disconnecting. And they do disconnect because when I run TCPView on my server, no one is connected to me. But those rogue threads never terminate and they build up over time and never stop unless I take the game server offline. And I know they don't terminate because my app has a function that retrieves all the live threads assigned to my app. I then use madExcept's GetThreadName command to get the names I assigned to them. Here's what that looks like:
Most of those IPs are from Russia. Why are these threads not terminating after the disconnect?
By the way, this is my OnDisconnect event, which should just Exit immediately for these rogue connections:
Code: class procedure WSServerEvents.OnDisconnect(AContext: TIdContext);
begin
if Assigned(AContext.Data) = False then Exit;
try
if TConnectRec(AContext.Data).isAdmin then AdminSessionDisconnect(AContext)
else PlayerSessionDisconnect(AContext);
except
on E: Exception do
begin
if E is EIdException then raise else
begin
LogData.AddError('WSServerDisconnect error: ' + E.Message);
raise;
end;
end;
end;
end;
Posts: 652
Threads: 2
Joined: Mar 2018
Reputation:
35
Location: California, USA
(04-17-2022, 02:37 AM)kbriggs Wrote: In the OnConnect event I read headers required to establish a valid websocket and I have ReadTimeout set to 10000 so it doesn't get stuck there from bots that are just probing the port. If I get valid headers I assign an object to AContext.Data, otherwise I just Exit and disconnect them in the OnExecute event.
Why are you waiting for the OnExecute event? You should disconnect them in the OnConnect event if the WebSocket handshake is not valid.
(04-17-2022, 02:37 AM)kbriggs Wrote: I use madExcept's NameThread command along to way to track the progress of the connection.
What is the point of doing that? First off, Indy threads are already named by Indy. But more importantly, having MadExcept name threads only makes sense if an error report is generated, and you shouldn't be getting an error just for disconnecting clients normally.
(04-17-2022, 02:37 AM)kbriggs Wrote: Notice I update the thread name right before disconnecting. And they do disconnect because when I run TCPView on my server, no one is connected to me. But those rogue threads never terminate and they build up over time and never stop unless I take the game server offline.
Do you have thread pooling enabled on your server?
(04-17-2022, 02:37 AM)kbriggs Wrote: Most of those IPs are from Russia. Why are these threads not terminating after the disconnect?
There is no way to diagnose your problem with the limited information you have provided so far.
(04-17-2022, 02:37 AM)kbriggs Wrote: By the way, this is my OnDisconnect event, which should just Exit immediately for these rogue connections:
What do AdminSessionDisconnect() and PlayerSessionDisconnect() look like? If I had to guess, maybe they are causing deadlocks you are not diagnosing, which are then affecting the rogue threads from cleaning up correctly. For instance, if there is a thread that is holding a lock to the server's Contexts list, then other connection threads can't fully terminate when they try to remove themselves from that list after their OnDisconnect events exit.
Posts: 21
Threads: 5
Joined: Aug 2018
Reputation:
0
Location: Texas
04-18-2022, 05:48 PM
(This post was last modified: 04-18-2022, 06:46 PM by kbriggs.)
(04-18-2022, 04:13 PM)rlebeau Wrote: Why are you waiting for the OnExecute event? You should disconnect them in the OnConnect event if the WebSocket handshake is not valid.
I originally performed the disconnects in OnConnect but I saw a post from you in StackOverFlow telling someone else to do it in OnExecute. Perhaps I misunderstood the context. Any in case, I get the same results either way.
Quote:What is the point of doing that? First off, Indy threads are already named by Indy.
Just for debugging purposes to see how far the thread progresses. And to tie them to IP addresses to see if they are rogue or from players on the game server.
Quote:But more importantly, having MadExcept name threads only makes sense if an error report is generated, and you shouldn't be getting an error just for disconnecting clients normally.
It's too late to name them after an error occurs. Like I mentioned, I have a report that generates a list of all running threads to help debug the very type of situation I'm dealing with now: zombie threads using up system resources.
Quote:Do you have thread pooling enabled on your server?
I don't know what that is so probably not unless it's on by default.
Quote:What do AdminSessionDisconnect() and PlayerSessionDisconnect() look like?
I've not had any issues with normal connections terminating. None of those leave zombie threads behind. But here are those OnDisconnect events:
Code: procedure AdminSessionDisconnect(AContext: TIdContext);
var
ID, IP, User: string;
Admin: TAdminSession;
begin
ID := TConnectRec(AContext.Data).ID;
IP := '?';
User := '?';
Admin := TAdminSession(AdminSessionList.GetKeyObject(ID));
if Admin <> Nil then
begin
IP := Admin.IP;
User := Admin.UserName;
if Admin.ReleaseAndFree then FreeAndNil(Admin);
end;
AdminSessionList.RemoveKey(ID);
AdminSessionUpdate.QueueAdd('Del', ID);
LogData.AddEvent('Admin', User + ' logged out session ' + ID + ' from IP ' + IP);
end;
procedure PlayerSessionDisconnect(AContext: TIdContext);
var
PS: TPlayerSession;
ID, p: string;
begin
if gvar.Running = False then Exit;
if Assigned(AContext.Data) then ID := TConnectRec(AContext.Data).ID else ID := '';
if ID = '' then Exit;
PS := TPlayerSession(PlayerSessionList.GetKeyObject(ID));
if PlayerSessionList.OpenObject(PS) then // make sure session object not already destroyed
begin
PS.ReleaseAndFree; // decrement AccessCount from GetKeyObject
PS.PacketCS.Enter;
try
PS.Status := 'Discon';
PS.Disconnected := Now;
PS.Connection := Nil;
SessionUpdate.QueueAdd('Ref', PS.ID);
p := PS.Player;
if p = '' then p := 'IP ' + PS.IP;
LogData.AddEvent('Connect', p + ' disconnects session ' + PS.ID + ', PC ' + PS.PC);
finally
PS.PacketCS.Leave;
if PS.ReleaseAndFree then FreeAndNil(PS);
end;
end;
end;
FYI, this was the StackOverFlow thread I was referring to:
https://stackoverflow.com/questions/4500...the-server
And if helps, here is my OnConnect event:
Code: class procedure WSServerEvents.OnConnect(AContext: TIdContext);
var
IO: TIdIOHandler;
Key, ProxyHeader, Header, LCHeader, RawHost, Host, RawOrigin, Origin, IP, ProxyIP: string;
Admin: boolean;
c: integer;
begin
try
IP := AContext.Binding.PeerIP;
MadExcept.NameThread(GetCurrentThreadID, 'WebSocket Connect (' + IP + ')');
AContext.Connection.Intercept := TIdConnectionIntercept.Create(AContext.Connection); // track total bytes in and out
AContext.Connection.Intercept.OnReceive := InterceptEvents.OnReceive;
AContext.Connection.Intercept.OnSend := InterceptEvents.OnSend;
if (AContext.Connection.IOHandler is TIdSSLIOHandlerSocketBase) then
TIdSSLIOHandlerSocketBase(AContext.Connection.IOHandler).PassThrough:= false;
IO := AContext.Connection.IOHandler;
IO.ReadTimeout := 10000; // 10 second readln timeout
ProxyIP := IP;
Key := '';
ProxyHeader := LowerCase(Trim(sysProxyIPHeader.sValue));
RawHost := '';
RawOrigin := '';
Host := '';
Origin := '';
IP := '';
Admin := False;
c := 0;
MadExcept.NameThread(GetCurrentThreadID, 'WebSocket Header Start (' + ProxyIP + ')');
repeat // read headers until a blank line received
Sleep(10);
Header := IO.ReadLn();
LCHeader := LowerCase(Header);
if Pos('get /admin', LCHeader) = 1 then Admin := True else
if Pos('host:', LCHeader) = 1 then
begin
RawHost := Trim(Copy(Header, 6, 255));
Host := URLToHost('http://' + RawHost);
end else
if Pos('origin:', LCHeader) = 1 then
begin
RawOrigin := Trim(Copy(Header, 8, 255));
Origin := URLToHost(RawOrigin);
end else
if Pos('sec-websocket-key:', LCHeader) = 1 then Key := Trim(Copy(Header, 19, 99))
else if (ProxyHeader <> '') and (Pos(ProxyHeader + ':', LCHeader) = 1) then
IP := Trim(Copy(Header, Length(ProxyHeader) + 2, 99));
Inc(c);
until (Header = '') or (c > 50); // 50 header lines max
MadExcept.NameThread(GetCurrentThreadID, 'WebSocket Header End (' + ProxyIP + ')');
if Key = '' then Exit; // invalid websocket header, disconnect in OnExecute
if not(Admin) and not(gvar.Running) then Exit; // offline for players, disconnect in OnExecute
MadExcept.NameThread(GetCurrentThreadID, 'WebSocket Key Ok (' + ProxyIP + ')');
// send response headers and hashed key in accordance with RFC 6455
IO.WriteLn('HTTP/1.1 101 Switching Protocols');
IO.WriteLn('Upgrade: websocket');
IO.WriteLn('Connection: Upgrade');
IO.WriteLn('Sec-WebSocket-Accept: ' + HashKey(Key));
IO.WriteLn('');
if sysSameOriginPolicy.bValue and (Host <> Origin) then
begin
SendCloseFrame(AContext, 4000);
Sleep(1000); // enough time for writes to go out
Exit; // disconnect in OnExecute
end;
if IP = '' then
begin
IP := ProxyIP;
ProxyIP := '';
end;
if Admin then AdminSessionConnect(AContext, IP, ProxyIP)
else PlayerSessionConnect(AContext, IP, ProxyIP);
except
on E: Exception do
begin
if E is EIdException then raise else
begin
LogData.AddError('WSServerConnect error: ' + E.Message);
raise;
end;
end;
end;
end;
Posts: 652
Threads: 2
Joined: Mar 2018
Reputation:
35
Location: California, USA
04-18-2022, 08:15 PM
(This post was last modified: 04-18-2022, 08:15 PM by rlebeau.)
(04-18-2022, 05:48 PM)kbriggs Wrote: I originally performed the disconnects in OnConnect but I saw a post from you in StackOverFlow telling someone else to do it in OnExecute. Perhaps I misunderstood the context.
That person was wanting to disconnect clients from other threads outside of the server, not from within the server threads that own each client. OnConnect, OnDisconnect, and OnExecute are all fired in the context of the same thread.
(04-18-2022, 05:48 PM)kbriggs Wrote: Quote:Do you have thread pooling enabled on your server?
I don't know what that is so probably not unless it's on by default.
Pooling is not enabled by default. You would have to manually assign a TIdSchedulerOfThreadPool component to the server in order to use pooling. What it does is holds finished threads in memory so they can be re-used for new connections later.
(04-18-2022, 05:48 PM)kbriggs Wrote: Quote:What do AdminSessionDisconnect() and PlayerSessionDisconnect() look like?
I've not had any issues with normal connections terminating. None of those leave zombie threads behind. But here are those OnDisconnect events:
And, what do AdminSessionList, AdminSessionUpdate, PlayerSessionList, SessionUpdate, and LogData look like? You are accessing a lot of shared data across thread boundaries, so I hope you are adequately and safely synchronizing access to your shared data.
(04-18-2022, 05:48 PM)kbriggs Wrote: Code: if Key = '' then Exit; // invalid websocket header, disconnect in OnExecute
if not(Admin) and not(gvar.Running) then Exit; // offline for players, disconnect in OnExecute
That would be an ideal spot to disconnect, eg:
Code: if (Key = '') or // invalid websocket header
((not Admin) and (not gvar.Running)) then // offline for players
begin
AContext.Connection.Disconnect;
Exit;
end;
(04-18-2022, 05:48 PM)kbriggs Wrote: Code: if sysSameOriginPolicy.bValue and (Host <> Origin) then
begin
SendCloseFrame(AContext, 4000);
Sleep(1000); // enough time for writes to go out
Exit; // disconnect in OnExecute
end;
Likewise:
Code: if sysSameOriginPolicy.bValue and (Host <> Origin) then
begin
SendCloseFrame(AContext, 4000);
AContext.Connection.Disconnect;
Exit;
end;
Posts: 21
Threads: 5
Joined: Aug 2018
Reputation:
0
Location: Texas
(04-18-2022, 08:15 PM)rlebeau Wrote: And, what do AdminSessionList, AdminSessionUpdate, PlayerSessionList, SessionUpdate, and LogData look like? You are accessing a lot of shared data across thread boundaries, so I hope you are adequately and safely synchronizing access to your shared data.
Yes, those lists are TObjects with properties protected by getters, setters, and functions wrapped in TMultiReadExclusiveWriteSynchronizer blocks. I'm pretty confident those are solid as they've been working for years.
Quote:That would be an ideal spot to disconnect, eg:
Code: if (Key = '') or // invalid websocket header
((not Admin) and (not gvar.Running)) then // offline for players
begin
AContext.Connection.Disconnect;
Exit;
end;
That's exactly how it was before when I discovered the un-terminated threads. I only moved the disconnects to OnExecute to see if it would make any difference but it did not.
What would happen if I tried to terminate the thread myself like this:
Code: AContext.Connection.Disconnect; // rogue connection
TIdYarnofThread(AContext.Yarn).Thread.Terminate; // uses IdSchedulerofThread
I saw that in another StackOverFlow post. I would of course prefer to figure out the real problem.
Posts: 652
Threads: 2
Joined: Mar 2018
Reputation:
35
Location: California, USA
(04-18-2022, 08:58 PM)kbriggs Wrote: Yes, those lists are TObjects with properties protected by getters, setters, and functions wrapped in TMultiReadExclusiveWriteSynchronizer blocks. I'm pretty confident those are solid as they've been working for years.
That doesn't rule out the possibility of a deadlock/mismanaged lock, though.
(04-18-2022, 08:58 PM)kbriggs Wrote: What would happen if I tried to terminate the thread myself like this:
You can try, but it shouldn't make a difference in this case. When the socket is disconnected, the TIdContext.OnRun event (which is called in a loop for the lifetime of the thread, and is responsible for firing the server's OnExecute event on each iteration) should return False, which will cause the thread to call Stop() on itself, and the default behavior of Stop() in a non-pooled server is to call Terminate().
(04-18-2022, 08:58 PM)kbriggs Wrote: I would of course prefer to figure out the real problem.
Then you are just going to have to debug into Indy's code for yourself to see what is really happening.
Posts: 21
Threads: 5
Joined: Aug 2018
Reputation:
0
Location: Texas
(04-18-2022, 10:36 PM)rlebeau Wrote: (04-18-2022, 08:58 PM)kbriggs Wrote: What would happen if I tried to terminate the thread myself like this:
You can try, but it shouldn't make a difference in this case.
It did make a difference. I changed that part of my OnExecute code to this:
Code: if Assigned(AContext.Data) = False then
begin
st := 'WebSocket Rogue (' + AContext.Binding.PeerIP + ')';
MadExcept.NameThread(GetCurrentThreadID, st);
st := st + ' ' + IntToStr(GetCurrentThreadID);
LogData.Debug(st);
AContext.Connection.Disconnect; // rogue connection
Sleep(1000);
TIdYarnofThread(AContext.Yarn).Thread.Terminate; // uses IdSchedulerofThread
end else
I put that online and checked back 3 hours later. My log had 127 rogue entries recorded (most from Brazil this time) but not a single one of their threads was running. So problem solved, I guess?
I haven't updated my Indy install in quite a while. I have version 10.6.2.5469. Have there been any changes since then might be relevant to this situation?
Posts: 652
Threads: 2
Joined: Mar 2018
Reputation:
35
Location: California, USA
(04-19-2022, 05:14 AM)kbriggs Wrote: I changed that part of my OnExecute code to this:
Why not OnConnect?
(04-19-2022, 05:14 AM)kbriggs Wrote: I put that online and checked back 3 hours later. My log had 127 rogue entries recorded (most from Brazil this time) but not a single one of their threads was running. So problem solved, I guess?
No, not even close. Under normal conditions, this kind of workaround is never needed. Something in your project is preventing the threads from stopping themselves properly. But calling Terminate() directly is just a signal, the threads still have to actively check that signal and actually exit themselves, which implies the threads are not being blocked. So, I have no idea what is going on here.
Did you check if the OnExecute event is still firing after calling Disconnect()? It shouldn't be, and if it is not then the server should be signalling the thread to stop between iterations.
Something else to try - see if calling AContext.Binding.CloseSocket() instead of AContext.Connection.Disconnect() makes any difference. CloseSocket() just closes the underlying socket at the system level, but leaves everything else in Indy alone. Whereas Disconnect() closes down the entire IOHandler, which will also close the socket during its cleanup.
(04-19-2022, 05:14 AM)kbriggs Wrote: I haven't updated my Indy install in quite a while. I have version 10.6.2.5469. Have there been any changes since then might be relevant to this situation?
Possibly. For instance, there was a checkin made 7 months ago to have the server use AContext.Binding.CloseSocket() instead of AContext.Connection.Disconnect() when cleaning up threads.
Posts: 21
Threads: 5
Joined: Aug 2018
Reputation:
0
Location: Texas
(04-19-2022, 03:36 PM)rlebeau Wrote: Why not OnConnect?
I'm trying one change at a time. I'll move it back to there before I'm done.
Quote:Did you check if the OnExecute event is still firing after calling Disconnect()?
Not since moving the disconnect to OnExecute. When I move it back I'll add a test for that.
Quote:Something else to try - see if calling AContext.Binding.CloseSocket() instead of AContext.Connection.Disconnect() makes any difference.
I'll try that next.
Posts: 21
Threads: 5
Joined: Aug 2018
Reputation:
0
Location: Texas
Nope, it didn't work. I moved the disconnect to OnConnect like this:
Code: if Key = '' then // invalid websocket header
begin
MadExcept.NameThread(GetCurrentThreadID, 'WebSocket No Key (' + IP + ')');
LogData.Debug('WebSocket No Key (' + IP + ') ' + IntToStr(GetCurrentThreadID));
AContext.Binding.CloseSocket; // alternative to AContext.Connection.Disconnect
Exit;
end;
The thread stayed alive and just kept calling OnExecute every 15 msec because I commented out the terminate:
Code: class procedure WSServerEvents.OnExecute(AContext: TIdContext);
var st: string;
begin
Sleep(10); // prevent CPU max
try
if Assigned(AContext.Data) = False then // rogue connection
begin
st := 'WebSocket Rogue (' + AContext.Binding.PeerIP + ')';
MadExcept.NameThread(GetCurrentThreadID, st);
st := st + ' ' + IntToStr(GetCurrentThreadID);
LogData.AddError(st);
AContext.Binding.CloseSocket;
// Sleep(1000);
// TIdYarnofThread(AContext.Yarn).Thread.Terminate; // uses IdSchedulerofThread
end else
One rogue connection flooded my log with over 100K entries before I shut it down.
|