Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
IW Service stops
#1
I have an IW service running on Windows Server and after a while the service stops responding.
It is a big project with many users connected and after some work, with heavy load processing, when a user starts a session, the main page is not shown.
Tried everything without success.
After this starts to happen, the service is still running, no event logs on Windows service side, but main page is not loaded. No service logs entry either.
Any ideas?? This is urgent!!
Thanks
Reply
#2
I had (still have) the same for years. To overcome we have a script that controls the site and will restart the service. This helps a little bit.
Dying deeper it is very difficult. Sometime there can be memoryleaks or -working with pools- the the pool is not guarded correctly. try to count that Lock and Unlock the Datamodules is correct.

I build a dynamic page to have a more deeper look inside

https://sepla.intercongress.de/0/monitordb

DM-Lockcounter schould be always 1. If the counter increases SessionPool.Unlock is missing and you will run into problems..

Good luck
Sven
Greetings
Sven
Reply
#3
(04-18-2023, 10:50 AM)Comograma Wrote: I have an IW service running on Windows Server and after a while the service stops responding.
It is a big project with many users connected and after some work, with heavy load processing, when a user starts a session, the main page is not shown.
Tried everything without success.
After this starts to happen, the service is still running, no event logs on Windows service side, but main page is not loaded. No service logs entry either.
Any ideas?? This is urgent!!
Thanks


Have you considered using a profiler to help pinpoint the issue? I find that tools like the Sampling Profiler and Intel VTune Profiler (you'll need THIS for VTune) can be particularly helpful in these situations. Additionally, running FastMM4 in full debug mode can help you identify any memory leaks. At atozed, they have a helpful guide for this.

While you're working on identifying the root cause, you may also want to consider using content handlers to automatically detect when the service stops working and restart it.
In your ServerController.pas do add something like this:


Code:
uses
  IW.Content.Handlers,
  IW.Content.Health;

initialization

with THandlers.Add('', 'health-check', TContentHealth.Create) do
begin
  CanStartSession := true;
  RequiresSessionStart := false;
end;

and then make a new unit, like this to define your content handler:


Code:
unit IW.Content.Health;

interface

uses
  Classes, IW.Content.Base, HTTPApp, IWApplication, IW.HTTP.Request,
  IW.HTTP.Reply;

type
  TContentHealth = class(TContentBase)
  protected
    function Execute(aRequest: THttpRequest; aReply: THttpReply; const aPathname: string; aSession: TIWApplication; aParams: TStrings): boolean; override;
  public
    constructor Create; override;
  end;

implementation

uses
  IW.Content.Handlers,
  IWMimeTypes;

constructor TContentHealth.Create;
begin
  inherited;
  mFileMustExist := False;
end;

function TContentHealth.Execute(aRequest: THttpRequest; aReply: THttpReply; const aPathname: string; aSession: TIWApplication; aParams: TStrings): boolean;
begin
  Result := True;
  if Assigned(aReply) then
  begin
    aReply.ContentType := MIME_TXT;
    aReply.WriteString('ALLOK');
  end;
  aSession.Terminate;
end;

end.

Then, create a watchdog service that periodically sends a request to http://your_url/health-check and checks if the response is 'ALLOK'. If the response indicates that the server is running properly, the watchdog service does nothing. However, if the response is anything other than 'ALLOK', the watchdog service can take action to automatically restart the server.
Reply
#4
I found this situation many times helping other users. I can say without any doubt that every time there was a logical reason and it was hidden somewhere in the code. Sometimes in the application itself, sometimes in 3rd party code (some report generators are especially "good" at it)

First thing when trying to solve this kind of situation is getting everything that is wrong fixed.

Once I helped a customer and he granted me access to his production environment. I was shocked to know that his exception log folder contained literally hundreds of exception logs clearly showing programming errors like "DataSet not in edit or insert mode", and even SQL errors. There were also memory leaks. We first reduced the exceptions to only what is expected (e.g. session timeouts and unsupported browsers) and what is really an exceptional circumstance (e.g. a database connection failure). In the end, the problem was being caused by updating a global var string from multiple threads which would cause the whole application to crash.

Having said that:
1- is your service Indy-based or Http.sys-based?
2- How many active sessions when this thing happens?
3- Is your exception logger enabled? Are there hundreds of exceptions going on all the time?
4- When the service stops responding: Does the Windows service console shows the service as running or stopped? Did you check the Windows event log for any event when the service stopped? How is the CPU usage at that point? Does it seem to be idle or hitting 100% for some or all cores? If the service is still running, how task manager shows it regarding memory and CPU?
5- Do you have any 3rd party components like report generators?
6- Are you using a connection pool or direct DB access for each session?
PS:
7- Did you extensively check your application for memory leaks? I don't think memory leaks cause this kind of issue (it causes a different issue, though) but memory leaks usually reveal many other problems with the code
Reply
#5
(04-18-2023, 06:16 PM)ioan Wrote:
(04-18-2023, 10:50 AM)Comograma Wrote: I have an IW service running on Windows Server and after a while the service stops responding.
It is a big project with many users connected and after some work, with heavy load processing, when a user starts a session, the main page is not shown.
Tried everything without success.
After this starts to happen, the service is still running, no event logs on Windows service side, but main page is not loaded. No service logs entry either.
Any ideas?? This is urgent!!
Thanks


Have you considered using a profiler to help pinpoint the issue? I find that tools like the Sampling Profiler and Intel VTune Profiler (you'll need THIS for VTune) can be particularly helpful in these situations. Additionally, running FastMM4 in full debug mode can help you identify any memory leaks. At atozed, they have a helpful guide for this.

While you're working on identifying the root cause, you may also want to consider using content handlers to automatically detect when the service stops working and restart it.
In your ServerController.pas do add something like this:


Code:
uses
  IW.Content.Handlers,
  IW.Content.Health;

initialization

with THandlers.Add('', 'health-check', TContentHealth.Create) do
begin
  CanStartSession := true;
  RequiresSessionStart := false;
end;

and then make a new unit, like this to define your content handler:


Code:
unit IW.Content.Health;

interface

uses
  Classes, IW.Content.Base, HTTPApp, IWApplication, IW.HTTP.Request,
  IW.HTTP.Reply;

type
  TContentHealth = class(TContentBase)
  protected
    function Execute(aRequest: THttpRequest; aReply: THttpReply; const aPathname: string; aSession: TIWApplication; aParams: TStrings): boolean; override;
  public
    constructor Create; override;
  end;

implementation

uses
  IW.Content.Handlers,
  IWMimeTypes;

constructor TContentHealth.Create;
begin
  inherited;
  mFileMustExist := False;
end;

function TContentHealth.Execute(aRequest: THttpRequest; aReply: THttpReply; const aPathname: string; aSession: TIWApplication; aParams: TStrings): boolean;
begin
  Result := True;
  if Assigned(aReply) then
  begin
    aReply.ContentType := MIME_TXT;
    aReply.WriteString('ALLOK');
  end;
  aSession.Terminate;
end;

end.

Then, create a watchdog service that periodically sends a request to http://your_url/health-check and checks if the response is 'ALLOK'. If the response indicates that the server is running properly, the watchdog service does nothing. However, if the response is anything other than 'ALLOK', the watchdog service can take action to automatically restart the server.

Well, if I can't figure it out whats causing the problem, this may be an idea! Thanks a lot!

(04-18-2023, 08:17 PM)Alexandre Machado Wrote: I found this situation many times helping other users. I can say without any doubt that every time there was a logical reason and it was hidden somewhere in the code. Sometimes in the application itself, sometimes in 3rd party code (some report generators are especially "good" at it)

First thing when trying to solve this kind of situation is getting everything that is wrong fixed.

Once I helped a customer and he granted me access to his production environment. I was shocked to know that his exception log folder contained literally hundreds of exception logs clearly showing programming errors like "DataSet not in edit or insert mode", and even SQL errors. There were also memory leaks. We first reduced the exceptions to only what is expected (e.g. session timeouts and unsupported browsers) and what is really an exceptional circumstance (e.g. a database connection failure). In the end, the problem was being caused by updating a global var string from multiple threads which would cause the whole application to crash.

Having said that:
1- is your service Indy-based or Http.sys-based?
2- How many active sessions when this thing happens?
3- Is your exception logger enabled? Are there hundreds of exceptions going on all the time?
4- When the service stops responding: Does the Windows service console shows the service as running or stopped? Did you check the Windows event log for any event when the service stopped? How is the CPU usage at that point? Does it seem to be idle or hitting 100% for some or all cores? If the service is still running, how task manager shows it regarding memory and CPU?
5- Do you have any 3rd party components like report generators?
6- Are you using a connection pool or direct DB access for each session?
PS:
7- Did you extensively check your application for memory leaks? I don't think memory leaks cause this kind of issue (it causes a different issue, though) but memory leaks usually reveal many other problems with the code

Alexandre, thanks for your response.
So:
1- Indy-based.
2- Several sessions, massive work, maybe 10 sessions, sometimes more.
3- Exception logger is enabled. I guess there aren't hundreds of exceptions going on all the time, but I'll confirm that.
4- When the service stops responding, Windows service console shows the service as running, but nothing on Windows event log about that service. CPU is normal at when this happens, but must confirm. Regarding memory and CPU, I must check this also on task manager.
5- I do not use any 3rd party components.
6- I'm using direct DB access for each session. Must/should I use connection pool??!!!!!!!!!!!
7- I don't know if "extensively" but I always check for memory leaks with FastMM4 in full debug mode to see how it goes. 

And talking about memory leaks, I was freeing all my forms when each session finishes and I thought it should causing problems, because on latest version of IW, when I'm running IW app on IDE debug mode, it would raize an error. After remove that code, all good. 
Nothing is raized when running IW app normally. On earlier version, this wouldn't happened in debug mode. You must have changed something for this to start to happen.
I don't have to free any of my forms, because IW takes care of this when each session ends, right??!!!!

PS:
I'm using ScaleMM2 in my projects. Should I use FastMM4 ? And about FastMM4, witch one, the one from IW install or from GitHub ?
Reply
#6
From the start:

1- ok
2- 10 sessions is very little even if they are heavy users. 2000 sessions would be "a lot", although I had 3000 running on an Indy server.
3- Check the exceptions that the logger is saving
4- Please check the service process and see if it is still consuming CPU
5- ok
6- Direct connection is fine. Connection pool here would be an overkill
7- You can leave the memory leak check to a second moment

As a test, I would replace the memory manager and use FastMM4 just to see if it makes any difference. It may fix the problem or it can make it worse. I would use the one the comes with IntraWeb (v 4.992 4.991) because it has been fine tuned for IntraWeb/multi-threaded apps. It is rock solid.
Reply
#7
(04-19-2023, 08:09 PM)Alexandre Machado Wrote: From the start:

1- ok
2- 10 sessions is very little even if they are heavy users. 2000 sessions would be "a lot", although I had 3000 running on an Indy server.
3- Check the exceptions that the logger is saving
4- Please check the service process and see if it is still consuming CPU
5- ok
6- Direct connection is fine. Connection pool here would be an overkill
7- You can leave the memory leak check to a second moment

As a test, I would replace the memory manager and use FastMM4 just to see if it makes any difference. It may fix the problem or it can make it worse. I would use the one the comes with IntraWeb (v[font=-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen-Sans, Ubuntu, Cantarell, "Helvetica Neue", sans-serif]4[/font][font=-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen-Sans, Ubuntu, Cantarell, "Helvetica Neue", sans-serif].992[/font] 4.991 ) because it has been fine tuned for IntraWeb/multi-threaded apps. It is rock solid.

Ok, regarding FastMM4, this article https://www.atozed.com/2021/07/detecting...-fastmm-4/ recomends using the latest from Github!!
Again, witch one do you advice to use?
Reply
#8
There was an error in my affirmation. Our version is 4.992, not 4.991. I fixed the original post.

There is no difference. We distribute exactly the latest version that you have on github but with specific settings for maximum performance of multi-threaded applications. You can grab the latest from github and toggle these settings yourself, but I think it is just waste of time.

The article says that you can grab the lastest from Github if you wish, it doesn't recommend nor discourage it. FastMM 4 has been stable for several years now and hasn't changed, especially regarding bugs. Meaning that you can use the one provided with IntraWeb with piece of mind that it is rock solid and bug free (as much as the version on Github).
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)