Posts: 2
Threads: 1
Joined: Oct 2020
Reputation:
0
Hello,
I intend to build a TCP server able to serve lots of clients. Because of the port depletion in Windows (and for some other reasons), I use disconnect from the server side, not from the client side.
The steps are like this: client connects, client requests data, server answers with data, client requests disconnect, server disconnects.
The OnExecute in server does not process the requests, but it adds them to a queue, from which they are retrieved almost immediately by the queue processing thread, and then processed in 4 - 64 parallel task/job processing threads.
The problem I encountered is that many times (not always, but quite frequently) the "connection" object I saved from within the OnExecute (AContext.Connection) and stored into the queue is no longer the same when the task or job is retrieved from the queue. The connection object now points to a different connection, that may have been already closed. This leads to sending data to the wrong client, or using disconnected connections, or to more exceptions because the Sockets, IOHandlers and so on are already freed.
I tried to find the right connection in the Server.Contexts but the problem of course persists, because other threads are concurrently disconnecting from the server side, and until the newly found connection variable is used, it may not be the same as the one retrieved three lines of code above, or 20 milliseconds before.
How can I save from within OnExecute a persistent "connection" that can be used even seconds later, outside the OnExecute event, if the disconnections occur on the server side and not on the client side?
Thank you
Posts: 652
Threads: 2
Joined: Mar 2018
Reputation:
35
Location: California, USA
10-28-2020, 07:09 PM
(This post was last modified: 10-30-2020, 04:36 PM by rlebeau.)
(10-28-2020, 10:14 AM)noname007 Wrote: I intend to build a TCP server able to serve lots of clients. Because of the port depletion in Windows (and for some other reasons), I use disconnect from the server side, not from the client side.
Port exhaustion would not affect the server, only clients. And if clients are running out of ports, they are making too many connections in a short amount of time, so you should be reusing connections instead of dropping them.
(10-28-2020, 10:14 AM)noname007 Wrote: The steps are like this: client connects, client requests data, server answers with data, client requests disconnect, server disconnects.
That is a very unusual design if you are not sending multiple requests per connection. In a 1-request-per-connection scenario, can you include the disconnect request in the initial data request (like HTTP does), or just disconnect blindly without waiting for the client to request it?
(10-28-2020, 10:14 AM)noname007 Wrote: The OnExecute in server does not process the requests, but it adds them to a queue, from which they are retrieved almost immediately by the queue processing thread, and then processed in 4 - 64 parallel task/job processing threads.
What if the client disconnects before the queued request is processed? Do you remove the request from the queue? Or let it fail when it tries to write back to a dead connection?
(10-28-2020, 10:14 AM)noname007 Wrote: The problem I encountered is that many times (not always, but quite frequently) the "connection" object I saved from within the OnExecute (AContext.Connection) and stored into the queue is no longer the same when the task or job is retrieved from the queue. The connection object now points to a different connection that may have been already closed.
Indy does not reuse Connection objects. A new Connection object is created before a client is accepted, and is then destroyed after that client disconnects. Most likely, you are storing a pointer to a Connection object is has been destroyed before you are able to access it, so that memory address is invalid, or worse has been reused for a new object elsewhere (a different Connection, or something else completely unrelated), either way causing undefined behavior in your code.
(10-28-2020, 10:14 AM)noname007 Wrote: How can I save from within OnExecute a persistent "connection" that can be used even seconds later, outside the OnExecute event, if the disconnections occur on the server side and not on the client side?
Don't store a pointer to the Connection object. Store a pointer to the Context object instead, and then validate that object is still present in the server's Contexts list before using its Connection. Alternatively, store per-Context identification info (client ID, etc) and then search the Contexts list for that ID when needed.
Either way, you have a race condition, if the client disconnects while you are using the Connection. Unless you keep the server's Contexts list locked while sending responses, which I don't suggest. Worse case, you may have to just store the underlying SOCKET itself and write to it directly, and just let the OS fail if the SOCKET is closed. That will work on Windows, at least, where SOCKETs are unique kernel objects. Not so much on Posix systems, where sockets are file descriptors that can be reused.
Posts: 2
Threads: 1
Joined: Oct 2020
Reputation:
0
Thank you very much for your swift reply! To me, you are much more famous than the character! Apart from Indy itself, I have read many useful answers you gave for other users, and they (hopefully) taught me a lot. Thank you for them, also! I am actually glad that I can get to thank you, even in this way!
I don't know how to quote you directly in the text (I don't know how to use these functions of the forum), for which I apologize. I simply inserted quotes and pasted your answers. Also English is not my native tongue and I don't use it regularly for communicaton, so please excuse my inevitable mistakes and strange expressions.
Quote:Port exhaustion would not affect the server, only clients. And if clients are running out of ports, they are making too many connections in a short amount of time, so you should be reusing connections instead of dropping them.
I'm trying to build a system that would comprise quite a lot of servers with different roles, that would be communicating with each other, and reusing connections would mean that I would have too many open TCP connections in that network. I am trying to make it scalable. I also considered UDP communication within the network(s), and if it will be necessary I will move to that model, as the packets are rarely lost locally. Until then, I am (most likely clumsily) trying to get as much freedom and reliability from TCP as I can. I found out that server disconnection is avoiding port clogging on other servers' sides, which also act as clients for many other servers in this system, and in this way a thread does not have to wait for a certain connection to receive an answer, to use it. I am also avoiding the use of tunelling for the client requests, to fit them into the same tube and to create a mechanism that would receive them asinchronously. I would better switch to UDP in that case, it's almost the same programming effort.
Quote:That is a very usual design if you are not sending multiple requests per connection. In a 1-request-per-connection scenario, can you include the disconnect request in the initial data request (like HTTP does), or just disconnect blindly without waiting for the client to request it?
I use only one request per connection, even if, internally, from the point of view of the application, that request is complex and the answer is complex, too - they are received and sent as a single request and a single answer.
I have tried many scenarios, using many threads simultaneously, and also many configurations. This is what have found out:
1. Apart from the port exaustion problem, if the client disconnects the operation is at least 10 times faster than if the server disconnects, in any scenario from below. But because of the port exhaustion, the client never disconnects in my application, it never performs a disconnection, only asks for it.
2. If the server blindly disconnects, in most scenarios this is 4-10 times slower than if the server waits for the client to ask for the disconnection form the server (scenario no 3), which means this no 2 scenario is about 100 times slower than the scenario no 1. Plus this is prone to more exceptions, as the disconnect request may interrupt the data transmission, and when I read it (ReadBytes) it may tell in many ways that the connection is no longer available. If the server and the client are on the same machine (or even part of the same application), a single client thread can run at most 3 requests per second (using an Intel 4.4 hz processor, 16 gb of RAM). If they are on different machines, it depends, if the connection is wireless then this number is about 12, and if it is wired it is around 8 (less). If the server and client are on different virtual machines, the latter drops by 30%.
3. If the server waits for a request from the client in order to disconnect, this step elliminates the transmission errors but also adds some extra traffic. Even so, the speed is 4-10 times better then the case no 2. The only scenario when this is slower is the wireless connection for either the clients or the servers, but this is not important in this case. So I use this model.
So if I include the disconnect request in the initial request, as you have said, we have case no 2. In order to speed things up by an order of magnitude, I have to send a different request for the server (in which the client asks for disconnection), subsequent to the first request for data; after receiving it, the server disconnects without replying in any way.
So this is the fastest and most reliable I have found until now is: client connects, client sends data request, client receives data request, client sends disconnect request, server performs the actual disconnection.
Quote:What if the client disconnects before the queued request is processed? Do you remove the request from the queue? Or let it fail when it tried to write back to a dead connection?
In my model, the client does not disconnect, it never performs the disconnection. Only the servers perform the disconnectinons. The client only requests from the server "disconnect this connection", and the server does it without sending any reply to the client.
Anyway, the requests are processed very quickly, so the server has no time to notice a broken connection. But in that eventuality, the sending will fail and the request should be transmitted once more. Internally, the servers may keep that request's result and spare the time to process it once more, but this is not important from the connection's point of view.
Quote:Indy does not reuse Connection objects. A new Connection object is created before a client is accepted, and is then destroyed after that client disconnects. Most likely, you are storing a pointer to a Connection object is has been destroyed before you are able to access it, so that memory address is invalid, or worse has been reused for a new object elsewhere (a different Connection, or something else completely unrelated), either way causing undefined behavior in your code.
This is interesting, because what I see is this: a connection variable I pick from the OnExecute event and store in the queue list along with a tag and the peer coordinates, when retrieved milliseconds later no longer has the same attributes (it may point to a different ip and port, or be invalid). I could not find any error in my code, and the connection variable is untouched. But I will dig deeper and I will communicate the results. I will strip down this part, and make further tests. Thank you again.
Quote:Don't store a pointer to the Connection object. Store a pointer to the Context object instead, and then validate that object is still present in the server's Contexts list before using its Connection. Alternatively, store per-Context identification info (client ID, etc) and then search the Contexts list for that ID when needed.
This may be slow as I will be locking the list a lot of times searching for a particular connection.
I also had the same problem of inconsistence using /storing the Context object itself: the Connection to which it pointed has had the same problems, and I could not find them in my code, but I will try harder. The AContext.Connection was different. I am talking about 5% cases in fast processing, about 100-2900 client threads and at least 32-64 queue processing threads.
Quote:Either way, you have a race condition, if the client disconnects while you are using the Connection. Unless you keep the server's Contexts list locked while sending responses, which I don't suggest. Worse case, you may have to just store the underlying SOCKET itself and write to it directly, and just let the OS fail if the SOCKET is closed. That will work on Windows, at least, where SOCKETs are unique kernel objects. Not so much on Posix systems, where sockets are file descriptors that can be reused.
The clients never disconnect, though the connection may drop due to network conditions. I do not intend to keep that list locked, as I have read in many other of your posts it slows down the server, and this is logical. I have to build something that would work both on Windows and on Posix, therefore I want to avoid, as much as possible, working at socket level.
I have tested only on Windows until now, on quite fast Intel processors and networks.
I will be back with details. I will try to strip and isolate the queue, the server and the clients, to see what I did wrong, and hopefully help others with this.
Thank you, again!
Posts: 652
Threads: 2
Joined: Mar 2018
Reputation:
35
Location: California, USA
10-30-2020, 05:36 PM
(This post was last modified: 10-30-2020, 05:37 PM by rlebeau.)
(10-30-2020, 09:28 AM)noname007 Wrote: 1. Apart from the port exaustion problem, if the client disconnects the operation is at least 10 times faster than if the server disconnects, in any scenario from below. But because of the port exhaustion, the client never disconnects in my application, it never performs a disconnection, only asks for it.
There should be no effect on server-side performance whether the server or the client performs the disconnect first. A disconnect is a disconnect, and the server checks for that condition in between each firing of the OnExecute event.
If anything, the only performance hit on the client disconnecting first is the port exhaustion issue due to the client entering the TIME_WAIT state and potentially not being able to reconnect to the server right away.
(10-30-2020, 09:28 AM)noname007 Wrote: 2. If the server blindly disconnects, in most scenarios this is 4-10 times slower than if the server waits for the client to ask for the disconnection form the server (scenario no 3), which means this no 2 scenario is about 100 times slower than the scenario no 1.
The only way that makes sense to me is if the client is not paying attention to the disconnect and doesn't notice for awhile that the connection was closed. That should not be slowing down the server side.
(10-30-2020, 09:28 AM)noname007 Wrote: Plus this is prone to more exceptions, as the disconnect request may interrupt the data transmission
[/qoute]
Not if the server is performing the disconnect in between transmissions, ie after sending its response and before reading a new request.
[quote="noname007" pid="7126" dateline="1604050109"]
and when I read it (ReadBytes) it may tell in many ways that the connection is no longer available.
Not really. It only has 1 way - it raises an exception. There may be different error codes behind that exception, depending on the particular condition that was detected due to thread timing. But you shouldn't care about that. If you get a socket error, the connection is no longer in a stable state, so the only sane thing to do is close it.
(10-30-2020, 09:28 AM)noname007 Wrote: If the server and the client are on the same machine (or even part of the same application), a single client thread can run at most 3 requests per second (using an Intel 4.4 hz processor, 16 gb of RAM).
You should be getting a LOT more bandwidth than that. Unless you are taxing the CPU too much and starving threads for time.
(10-30-2020, 09:28 AM)noname007 Wrote: In order to speed things up by an order of magnitude, I have to send a different request for the server (in which the client asks for disconnection), subsequent to the first request for data; after receiving it, the server disconnects without replying in any way.
If it is faster for the server to wait for a 2nd request before closing the connection, than it is to just close the connection after sending the 1st response, there there is something very funky in your setup that you are not accounting for correctly. What you are describing is not what should be happening.
(10-30-2020, 09:28 AM)noname007 Wrote: So this is the fastest and most reliable I have found until now is: client connects, client sends data request, client receives data request, client sends disconnect request, server performs the actual disconnection.
That should neither be the fastest nor the safest/reliable way to go.
(10-30-2020, 09:28 AM)noname007 Wrote: In my model, the client does not disconnect, it never performs the disconnection. Only the servers perform the disconnectinons. The client only requests from the server "disconnect this connection", and the server does it without sending any reply to the client.
That does not preclude the possibility of the network itself dropping a connection unexpectedly. Networks are not perfect, things happen outside of an application's control. You need to be resilient to that. So your server needs to be prepared to handle the OnDisconnect event before a client explicitly requests a disconnect. And yes, the client must still perform a disconnect on its side, even if it performs that after the server performs its disconnect. Both parties need to cleanup after themselves.
(10-30-2020, 09:28 AM)noname007 Wrote: Anyway, the requests are processed very quickly, so the server has no time to notice a broken connection.
Nonetheless, the race condition still exists. The connection CAN drop before (or even while) the server is sending a response.
(10-30-2020, 09:28 AM)noname007 Wrote: This is interesting, because what I see is this: a connection variable I pick from the OnExecute event and store in the queue list along with a tag and the peer coordinates, when retrieved milliseconds later no longer has the same attributes (it may point to a different ip and port, or be invalid).
Then you clearly have undefined behavior in your code. What you describe can happen if you are trying to access an object after it been destroyed, for instance. You need to build in some better safeguards into your code to avoid that.
For example, if you insist on storing the Connection objects directly in your queue, you will have to put a thread-safe lock on the queue, or even the individual requests themselves. When the OnDisconnect event fires, lock the queue, remove/invalidate any requests that refer to that Connection, and unlock the queue. When sending a response, lock the queue, send the response only if the Connection is still invalid, and then unlock the queue.
This is why I don't recommend storing the Connection objects themselves in your queue.
(10-30-2020, 09:28 AM)noname007 Wrote: I could not find any error in my code, and the connection variable is untouched.
Your variable, perhaps, but not the object the variable is pointing at.
(10-30-2020, 09:28 AM)noname007 Wrote: This may be slow as I will be locking the list a lot of times searching for a particular connection.
Yes, but it is safer. Since TIdTCPServer is a multi-threaded component, you have to take thread safety into account, especially since you don't own the threads or the Connection objects. Since you are doing asynchronous processing outside of the OnExecute event, extra measures have to be taken.
One thing you could do to speed up the searches is assign each client a unique ID, and store the Connection object in a thread-safe dictionary/hashtable keyed by that ID, and then store only the ID in the queue. In the OnDisconnect event, you can lock the table, remove the ID, and unlock the table. When sending a response, lock the table, lookup the ID and if found then send the response, and unlock the table.
You are still doing a fair amount of locking/unlocking, especially since you have such short-lived connections, but at least the storage/retrieval operations will be performed using fast hash lookups rather than linear searches. And there are ways to mitigate the overhead of locks, such as using a spin-lock rather than a criticalsection/mutex, for instance (if you are careful with them).
(10-30-2020, 09:28 AM)noname007 Wrote: I also had the same problem of inconsistence using /storing the Context object itself
Then you likely have an underlying logic flaw in your code, where it is not adequately protecting its objects from concurrent access across multiple threads correctly.
|