A couple of weeks ago, I found myself experimenting with different memory managers for Delphi, trying to determine the best one for a specific scenario that would deliver optimal performance.
After spending some time tinkering with them, I couldn’t find a straightforward answer to my question. Consequently, I decided to document my findings, focusing on how the chosen memory manager impacts the capacity (or throughput) of an IntraWeb application.
First and foremost, if you’re not explicitly using any specific memory manager in your Delphi application, Delphi defaults to the built-in one, which is based on FastMM 4 (not the latest version, though).
Customers often ask me, “How do I know which MM I’m using?” It’s simple:
Q: Open your .DPR project file. What’s the first unit in your USES clause?
A: If you have something like FastMM4 or ScaleMM2, you are using one of these MMs; otherwise, you are using the Delphi built-in MM.
FastMM4, especially the version that originated the Delphi built-in MM, is not particularly fast in multi-threaded scenarios. When two or more threads attempt to allocate or deallocate memory, contention arises, causing one thread to wait for the other to complete the operation. This results in a significant performance impact on your multi-threaded application.
Recognizing this, Primož Gabrijelčič, the creator of Omni Thread Library, suggested and implemented changes in FastMM4, making it more multi-thread friendly and significantly improving its performance. This led to the current FastMM4 version, 4.992, available on GitHub: https://github.com/pleriche/FastMM4
By the way, this is the same FastMM version that you can install with IntraWeb if you choose to include the memory manager during installation.
Other memory managers emerged around the same time, such as ScaleMM2, which we also include in our installer. More recently, I discovered another memory manager named MSHeap, created by Roberto Della Pasqua (https://github.com/RDP1974/DelphiMSHeap/), which claims to perform well in heavily multi-threaded applications.
After releasing the first version of this post I sent an email to Eivind Bakkestuen from NexusDB (https://www.nexusdb.com/) and he kindly provided me with a license of their excellent NexusDB product which comes with their NexusDB Memory Manager. Then I revised the tests and included the NexusDB Memory Manager in the mix.
So, the inevitable question is: which one is better (whatever definition of better you choose) and which one should you use?
The only way to answer this question is by testing each memory manager using various configurations and observing their performance.
The test was conducted as follows: I created a simple test application with IntraWeb—just two forms, each with a button. Clicking the button on one form displays the other, and vice versa. This allowed me to create a straightforward test plan that continuously alternates between the two forms, executing a wide range of IntraWeb code—from session creation, retrieval, locking and unlocking, to form creation and rendering.
These were the 5 managers tested:
- Delphi default MM (Built-in)
- FastMM 4.992 (Open source)
- FastMM 5 (Commercial license required)
- ScaleMM 2 (Open source)
- MSHeap MM (Delphi wrapper is open source. Uses Windows API)
- Nexus MM (Commercial license required)
To execute the test plan, I used Apache JMeter, a well-established testing application for web applications, known for its simplicity in configuration and execution.
The test plan was conceived like this:
- 100 threads (simulating 100 simultaneous users)
- 201,000 requests in total in the shortest possible time
- No pauses
The application was built with Delphi 12, utilizing Http.sys, Indy, and ISAPI servers in both 32 and 64 bits, and the latest IntraWeb version available (15.5.5). Build was generated in Release configuration (Optimization ON, Use Debug DCUs OFF). While I experimented with other configurations, I won’t delve into them here, as the primary focus is on determining how the application performs based on the memory manager used.
The application server and JMeter were run on different machines in a local Ethernet network, as recommended by JMeter documentation. For comparison purposes, I also ran the same test suite using a localhost address (both server and JMeter on the same machine). Running the test suite from the same machine is significantly faster than using the network, but I’m omitting this data in this article.
Server machine spec: Intel core i7-11700, 8 cores/16 threads, 4.9 GHz max frequency, 32 Gb RAM
Here is the tabulated result of the execution in a local network:
And a chart comparing the throughput:
From the results some conclusions are easily visible:
- Http.sys is the fastest application type. It is even faster than ISAPI on IIS (which is also based on the http.sys kernel-mode subsystem)
- ScaleMM2, MSHeap and FastMM5 have very similar performances when used in this scenario
- The default memory manager (the built-in Delphi MM based on an older version of FastMM4) does not scale well for multi-threaded applications, as expected
Another conclusion, not directly visible from the published data:
- The network is the greatest bottleneck when testing HTTP(S) servers. The throughput of each application when testing from localhost more than doubles.
A closer look at the results also shows that:
- ScaleMM2 is likely the fastest memory manager in this test, but the memory usage is much higher than the others, in some cases by a factor of 3.
- MSHeap has a very similar performance and the memory usage is slightly higher than FastMM5 and FastMM4, but much lower than ScaleMM2.
- FastMM5 indeed performs better than FastMM4 under heavy multi-threading.
- NexusDB Memory Manager also performs very well, above FastMM 4.992. Memory comsumption is also low, on par with all the others, except ScaleMM2.
- The changes introduced in FastMM 4.992 (the release stack) are very effective and considerably improve the performance under heavy multi-threading
Other considerations about the test:
- Do not consider performance differences inferior to, say, 5%. Up to 5% variations between 2 consecutive runs of the same test were common.
- Also do not take the numbers obtained here as the throughput limit of an IntraWeb application. This was the limit reached for that specific scenario. There are several factors that can influence the result, most importantly the network latency and speed.
- Indy + FastMM4 were capable of handling 3K+ requests per second, which is a whole lot, even if the numbers don’t look that good when compared to, for instance, Http.sys. If your application receives 100 requests/second, Indy+FastMM4 are more than capable to deliver excellent performance. Or yet, using the old car analogy, driving a Ferrari everyday to work won’t necessarily get you there faster.
- On average, a typical user generates a few requests per minute. A widely accepted estimate is one request every 10 seconds per user. In a simplified scenario, theoretically handling 5,000 requests per second would imply that the application can accommodate approximately 50,000 users simultaneously, which is huge! However, it’s crucial to note that various factors, particularly database access, significantly impact the real-world scenario and may alter these calculations.
- Given the excellent performance shown by MSHeap, we are also including it in the next IntraWeb release and it will be another option when creating a new IntraWeb application via IW application wizard.
Final considerations when chosing a memory manager:
- FastMM4 and FastMM5 are the best memory managers for development, hands down. The capacity to detect memory leaks and other features to help developers to find memory corruptions and other problems is unrivaled in the Delphi world.
- If memory is virtually infinite and performance is vital, use ScaleMM2 in production
- If you have a NexusDB license, using NexusDB Memory Manager becomes a very interesting option.
- If memory is limited and top performance is needed, use FastMM5 or MSHeap in production
- Moving away from the default MM to FastMM4, which is absolutely safe, will multiply the capacity of your application by a factor of 2 or 3.
- Memory fragmentation hasn’t been considered here. It is possible that in some scenarios one memory manager performs better or worse considering fragmentation. FastMM4 and 5 are very good in keeping the memory fragmentation low.
- Always deploy to production the 64-bit version of the application. Even if in some cases the 32 bit version is slightly faster, the 64-bit process can use all the available memory which can be used to improve the performance in other ways (e.g. caching data, etc).
Updated on 26-Jan-2024