Atozed Forums

Full Version: TCPServer crash on shutdown
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Since the Embarcadero forums seem to be unusable now, thought I try here first.  I've Googled around and seen this issued raised before but nothing I've found so far has helped.

After years of no issues, I'm suddenly having a problem with shutting down a TIdTCPServer component. Setting Active := False is occasionally locking up my app. So much so that the process cannot be killed in either the Windows Task Manager or Process Explorer. One single thread (identified as TMethodImplementationIntercept in Process Explorer) remains and a Windows reboot is the only way out. I upgraded to a more recent Indy version (10.6.2.5469) but the problem persists. It happens maybe 10% of the time even when I perform the same steps over and over.

By inserting markers in the code to record to an error log (which runs in a critical section) I've traced the problem this deep, where 2 listening threads are supposed to terminate:

Active := False
SetActive
Shutdown
StopListing
LListener.Binding.CloseSocket  (if crashes, always does so on the first of two loops)
Disconnect
GStack.Disconnect(Handle)

That last one never returns. It's a virtual abstract method so I'm not sure where to go from there.

My app is a game server with an HTTPServer (that servers an HTML5 web app client) and a TCPServer and acts as websocket connection for the client. Both are started and stopped together from a button click in the app's main thread. The HTTPServer never crashes on shutdown, only the TCPServer does. I've tried using a worker thread to shut down the TCPServer. That allows me to exit the app when the problem occurs but the EXE is still running on the Processes tab and cannot be killed. I'm using the OnConnect, OnDisconnect, and OnExecute events and all are contained in try/except blocks that reraise Indy exceptions like this:

try
 // stuff here
except
  on E: Exception do
  begin
    if not(E is EIdException) then LogData.AddError('WSServerConnect error: ' + E.Message);
    raise;
  end;
end;

So what else can I check to track this down?
(08-05-2018, 07:50 PM)kbriggs Wrote: [ -> ]After years of no issues, I'm suddenly having a problem with shutting down a TIdTCPServer component. Setting Active := False is occasionally locking up my app.

Then you are probably using it wrong, like synching worker threads when you should not be.  For instance, syncing synchronously with the main UI thread while it is blocked trying to shutdown the server.  To avoid that, use asynchronous syncs, or shutdown the server in a separate thread rather than in the UI thread.

(08-05-2018, 07:50 PM)kbriggs Wrote: [ -> ]So much so that the process cannot be killed in either the Windows Task Manager or Process Explorer.

Processes can ALWAYS be killed via the Task Manager, especially from the Processes tab rather than then Applications tab.

(08-05-2018, 07:50 PM)kbriggs Wrote: [ -> ]I upgraded to a more recent Indy version (10.6.2.5469) but the problem persists. It happens maybe 10% of the time even when I perform the same steps over and over.

Hard to diagnose without seeing your actual code (or a demo that reproduces the same issue).

(08-05-2018, 07:50 PM)kbriggs Wrote: [ -> ]By inserting markers in the code to record to an error log (which runs in a critical section) I've traced the problem this deep, where 2 listening threads are supposed to terminate:

Active := False
SetActive
Shutdown
StopListing
LListener.Binding.CloseSocket  (if crashes, always does so on the first of two loops)
Disconnect
GStack.Disconnect(Handle)

That last one never returns. It's a virtual abstract method so I'm not sure where to go from there.

On Windows, it is overridden by the TIdStackWindows.Disconnect() method, which simply calls the Winsock API shutdown() and closesocket() functions:

Code:
function TIdStackWindows.WSShutdown(ASocket: TIdStackSocketHandle; AHow: Integer): Integer;
begin
 Result := Shutdown(ASocket, AHow); // <-- Winsock API call
end;

function TIdStackWindows.WSCloseSocket(ASocket: TIdStackSocketHandle): Integer;
begin
 Result := CloseSocket(ASocket); // <-- Winsock API call
end;

procedure TIdStackWindows.Disconnect(ASocket: TIdStackSocketHandle);
begin
 // Windows uses Id_SD_Send, Linux should use Id_SD_Both
 WSShutdown(ASocket, Id_SD_Send);
 // SO_LINGER is false - socket may take a little while to actually close after this
 WSCloseSocket(ASocket);
end;

Neither of those API functions should be hanging by default, though closesocket() CAN hang if you have configured the socket beforehand with a non-zero linger timeout (which Indy doesn't do, but users can do manually) and pending data is not being sent to the peer correctly.

(08-05-2018, 07:50 PM)kbriggs Wrote: [ -> ]My app is a game server with an HTTPServer (that servers an HTML5 web app client) and a TCPServer and acts as websocket connection for the client. Both are started and stopped together from a button click in the app's main thread. The HTTPServer never crashes on shutdown, only the TCPServer does.

TIdHTTPServer is a TIdTCPServer descendant.  So that begs the question - what is your WebSocket server doing differently that your HTTP server is not doing?

Also, why not use your TIdHTTPServer for your WebSocket server (since WebSocket starts life as HTTP)?

(08-05-2018, 07:50 PM)kbriggs Wrote: [ -> ]I've tried using a worker thread to shut down the TCPServer. That allows me to exit the app when the problem occurs but the EXE is still running on the Processes tab and cannot be killed. I'm using the OnConnect, OnDisconnect, and OnExecute events and all are contained in try/except blocks that reraise Indy exceptions

That means nothing if threads are deadlocking themselves and not cleaning up correctly.

Also, you should remove the try/except blocks and use the TIdTCPServer.OnException event instead for your logging.
rlebeau >> Then you are probably using it wrong, like synching worker threads when you should not be.  For instance, syncing synchronously with the main UI thread while it is blocked trying to shutdown the server.  To avoid that, use asynchronous syncs, or shutdown the server in a separate thread rather than in the UI thread.

All my VCL updates are done from the main thread, triggered via PostMessage. My Indy threads never touch UI components.

rlebeau >> Processes can ALWAYS be killed via the Task Manager, especially from the Processes tab rather than then Applications tab.

Not in this case. Clicking the "End Process" button on the Processes tab will knock the thread count from 11 or so down to 1 but that last thread will not die. Clicking End Process over and over again does nothing. Finding the thread in Process Explorer and trying to kill it there also does nothing. Rebooting is the only solution.

rlebeau >> Hard to diagnose without seeing your actual code (or a demo that reproduces the same issue).

At 63K lines across 66 source files, that's probably not an option. It's also my flagship commercial product.

rlebeau >> Also, why not use your TIdHTTPServer for your WebSocket server (since WebSocket starts life as HTTP)?

How do you do two-way asynchronous communication with TIdHTTPServer?
(08-07-2018, 12:07 AM)kbriggs Wrote: [ -> ]All my VCL updates are done from the main thread, triggered via PostMessage. My Indy threads never touch UI components.

I wasn't necessarily suggesting that your server was touching the UI itself, just the UI thread. But OK, you use PostMessage() for that. Which HWND do you post to exactly? Hopefully to your own manually-created HWND and NOT to a VCL Form/Control's Handle property? Because the Handle property is not safe to access outside of the UI thread, even for something as "seemingly innocent" as (Send|Post)Message().

(08-07-2018, 12:07 AM)kbriggs Wrote: [ -> ]Not in this case. Clicking the "End Process" button on the Processes tab will knock the thread count from 11 or so down to 1 but that last thread will not die. Clicking End Process over and over again does nothing. Finding the thread in Process Explorer and trying to kill it there also does nothing. Rebooting is the only solution.

Can you run your server inside the debugger and still see the problem happen? TIdTCPServer names all of the threads that it creates, do you see the names appear in the debugger? Does the lingering thread have a name assigned?

Oh yeah, and attaching the debugger to a process and then killing the debugger is usually a good way to kill a process :-)

(08-07-2018, 12:07 AM)kbriggs Wrote: [ -> ]At 63K lines across 66 source files, that's probably not an option. It's also my flagship commercial product.

That is why I asked for a short demo that reproduces the same problem.

(08-07-2018, 12:07 AM)kbriggs Wrote: [ -> ]How do you do two-way asynchronous communication with TIdHTTPServer?

The same way you do it with TIdTCPServer. The trick is you have to keep your OnCommandGet event handler running for the lifetime of the websocket connection. Your OnCommandGet handler could negotiate the websocket handshake as needed, and then loop, processing websocket messages as needed until the client disconnects or the server is shutting down, and then exit. At least with using TIdTCPServer directly, it handles the loop internally for you, so you can focus on individual websocket messages per each invocation of the OnExecute event.
(08-07-2018, 12:53 AM)rlebeau Wrote: [ -> ]But OK, you use PostMessage() for that.  Which HWND do you post to exactly?  Hopefully to your own manually-created HWND and NOT to a VCL Form/Control's Handle property?  Because the Handle property is not safe to access outside of the UI thread, even for something as "innocent" as (Send|Post)Message().

Uh oh. I have this defined in the public declarations section of my main form's class:

procedure SyncMessage(var Msg: TMessage); message WM_User + 2;

and then call it from a worker thread:

PostMessage(MainForm.Handle, WM_User + 2, Mask, 0);

Am I doing that wrong?

(08-07-2018, 12:53 AM)rlebeau Wrote: [ -> ]Can you run your server inside the debugger and still see the problem happen?  TIdTCPServer names all of the threads that it creates, do you see the names appear in the debugger?  Does the lingering thread have a name assigned?

I'll try that tomorrow.  I couldn't get it to crash today. Process Explorer showed the thread as "TMethodImplementationIntercept" with an 32-bit number on the end that changes each time.

Could adjusting the default MAXSTACKSIZE parameter be a problem? I have it set at 256K instead of the default 1024K to reduce the memory load.
(08-07-2018, 03:10 AM)kbriggs Wrote: [ -> ]Uh oh. I have this defined in the public declarations section of my main form's class:

procedure SyncMessage(var Msg: TMessage); message WM_User + 2;

and then call it from a worker thread:

PostMessage(MainForm.Handle, WM_User + 2, Mask, 0);

Am I doing that wrong?

Not so much "wrong" as just "unsafe".

If the VCL happens to recreate the MainForm's HWND (and you have no control over when the VCL recreates its HWNDs - recreations happen more often than you might think) at the exact moment that you call PostMessage(), really bad things can happen. Not the least of which is that you can completely kill your UI.

An HWND has thread affinity, it functions correctly only in the thread that creates it. During a VCL control's window recreation, its HWND gets destroyed and set to 0. When the TWinControl.Handle property getter sees that, it allocates a new HWND before returning. So, if the main thread destroys the MainForm's HWND, and then the worker thread accesses the Handle property before the main thread has created a new HWND, the new HWND will be created in the context of a worker thread instead, making your MainForm window completely inoperable (worse, there is a race condition on the Handle property, so it is possible, even likely, that the main thread may have also created its own HWND, but the worker thread created an HWND too, leading to one of the two threads leaking resources, depending on which thread "won" the race).

The safer approach is to NEVER use the TWinControl.Handle properly across thread boundaries. Allocate your own HWND, such as with the RTL's AllocateHWnd() function, and then you can safely post messages to that HWND all you want.

(08-07-2018, 03:10 AM)kbriggs Wrote: [ -> ]Process Explorer showed the thread as "TMethodImplementationIntercept" with an 32-bit number on the end that changes each time.

Well, that is definitely NOT an Indy-created thread, so something else in your project is not shutting down correctly.

I've heard of the TMethodImplementationIntercept class, but am not aware of whether it creates a worker thread or not.
(08-07-2018, 06:45 PM)rlebeau Wrote: [ -> ]The safer approach is to NEVER use the TWinControl.Handle properly across thread boundaries.  Allocate your own HWND, such as with the RTL's AllocateHWnd() function, and then you can safely post messages to that HWND all you want.

I made that change this morning by creating my own HWND as such:

var
 hSyncWnd: HWND;

procedure TMainForm.FormCreate(Sender: TObject);
begin
 TThread.CurrentThread.NameThreadForDebugging('VCLThread');
 hSyncWnd := AllocateHWnd(SyncMessage);
end;

procedure TMainForm.FormDestroy(Sender: TObject);
begin
 DeallocateHWnd(hSyncWnd);
end;

And then calling it from my worker thread with:
PostMessage(hSyncWnd, WM_User + 2, Mask, 0);

While that works, I've since got another crash on Active := False so that wasn't the problem.

(08-07-2018, 06:45 PM)rlebeau Wrote: [ -> ]I've heard of the TMethodImplementationIntercept class, but am not aware of whether it creates a worker thread or not.

That turns out to be the main VCL thread. I started naming all my own threads including that one and the ID number matches what Process Explorer shows. Which makes sense because I'm calling Active := False in a button click event. But moving that to a worker thread (which I tried already) will just mean the worker thread will be the unkillable thread.

Any thoughts on the MAXSTACKSIZE? Yesterday I changed it to 512K and could not produce a crash. I changed back to 256K and got one crash today. May just be a coincidence.
I managed to get one crash on shutdown today while running in Debug mode. It left the main VCL thread running along with a few of my own (as expected), the two TCPServer listening threads, and a few Kernal threads identified as "ntdll" in Process Explorer. As before, an End Process killed everything except the main VCL thread and a reboot was required. I'm not all that familiar with the Delphi debugger. Is there something else I should be checking on the next crash?
(08-06-2018, 09:18 PM)rlebeau Wrote: [ -> ]On Windows, it is overridden by the TIdStackWindows.Disconnect() method, which simply calls the Winsock API shutdown() and closesocket() functions:

Code:
function TIdStackWindows.WSShutdown(ASocket: TIdStackSocketHandle; AHow: Integer): Integer;
begin
 Result := Shutdown(ASocket, AHow); // <-- Winsock API call
end;

function TIdStackWindows.WSCloseSocket(ASocket: TIdStackSocketHandle): Integer;
begin
 Result := CloseSocket(ASocket); // <-- Winsock API call
end;

procedure TIdStackWindows.Disconnect(ASocket: TIdStackSocketHandle);
begin
 // Windows uses Id_SD_Send, Linux should use Id_SD_Both
 WSShutdown(ASocket, Id_SD_Send);
 // SO_LINGER is false - socket may take a little while to actually close after this
 WSCloseSocket(ASocket);
end;

Neither of those API functions should be hanging by default, though closesocket() CAN hang if you have configured the socket beforehand with a non-zero linger timeout (which Indy doesn't do, but users can do manually) and pending data is not being sent to the peer correctly.

So I extended my error logging deeper and CloseSocket(ASocket) is where it hangs when TCPServer is trying to close down the first listening thread. I've modified WSCloseSocket() as such:

Code:
function TIdStackWindows.WSCloseSocket(ASocket: TIdStackSocketHandle): Integer;
begin
 LogData.AddError('TIdStackWindows.WSCloseSocket(' + IntToStr(ASocket) + ') Start', True);
 try
 Result := CloseSocket(ASocket);
 except
   on E: Exception do
   begin
     LogData.AddError('WSCloseSocket Exception: ' + E.Message);
     raise;
   end;
 end;
 LogData.AddError('TIdStackWindows.WSCloseSocket(' + IntToStr(ASocket) + ') Stop', True);
end;

And when it does crash, my log only records that start entry like this:

TIdStackWindows.WSCloseSocket(10384) Start

No exception fires and it never reaches my "Stop" log.  I don't know what to do now. You mentioned "non-zero linger timeout". I don't know what that is and so pretty sure I didn't do that. And it's definitely not a stack size issue as I've returned it to the default 1MB setting.
So I just came across this:
https://stackoverflow.com/questions/5135...ing-socket

This shows that a Windows update in July might be the problem. Which about matches when I first started having the lockups. A fix has since been released so I just manually applied the one for my Windows 7 machine. Too early to tell but so far so good.
Pages: 1 2