Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
IW15 64bit deployment
#11
Bug 
Hi All

I think I've identified this problem, it lies with the ZLib implementation and the supported instructions of the CPU you are running the program on. When building x64 Intraweb application with XE4 they run without issue on an Intel Core i7 8700 (Server and Desktop OS) however on an Intel Core i7 950 they crash with an illegal instruction error (again server or desktop OS), the same happens on an Intel Xeon x5550.

My suspicion is the CloudFlare Zlib is using instruction available in newer processors without checking if the instruction is available!

I checked this by recompiling with the x64 Intraweb app using the ZLIB_1211 by commenting out the  {$DEFINE USE_ZLIB_CLOUDFLARE} in the IWZLibExApi.pas and including it in my project to ensure the DCU's were recompiled.

After making this change the x64 Intraweb app would run on all CPU and OS combinations i had been testing.

Hopefully this helps...

I suggest that an effort is made to identify the CPU instruction in the CloudFlare Implementation causing the issue and ensure that this implementation is only used if those instructions are available. in the mean time its probably a good idea to revert wholesale to the ZLIB_1211 code branch.

I think the CloudFlare code must be making use of instructions in one or more of the following instruction sets (AES, AVX, AVX2, FMA3, TSX). Without further investigation i can't narrow it down. 

EDIT : (I believe it may be an AVX instruction, but it also requires SSE4.2 instructions which may not be available either)

As an interesting point here is what the TurboVNC devs said about the CloudFlare ZLib Implementation:

Quote:I ultimately backed away from CloudFlare, despite its superior performance, because it doesn't perform run-time CPU feature detection. It assumes the existence of SSE 4.2, which is not available on all x86-64 CPUs, and of SSE2, which is not available on all x86 CPUs.

Investigate SIMD-accelerated Zlib implementations
Reply
#12
Wow. Thanks for the detailed info. I have asked someone to look into and follow up with this.
Reply
#13
That's a good info indeed, however AFAIK, only zlib decompression uses optimized SSE 4.2 CRC32. AVX and AVX2 are not used at all. It might be still some instruction not supported by the processor.
BTW, gzip decompression is not used by IntraWeb. Only compression.

It is something questionable that people are using very old processors in servers supposed to run web applications, though. Core i3, i5 and i7 from 2008 already supported SSE 4.2 instructions. Trying to make a brand new IW application run in a 486 DX2/66 MHz doesn't make any sense to me
Reply
#14
(10-11-2018, 12:42 AM)Alexandre Machado Wrote: That's a good info indeed, however AFAIK, only zlib decompression uses optimized SSE 4.2 CRC32. AVX and AVX2 are not used at all. It might be still some instruction not supported by the processor.
BTW, gzip decompression is not used by IntraWeb. Only compression.

It is something questionable that people are using very old processors in servers supposed to run web applications, though. Core i3, i5 and i7 from 2008 already supported SSE 4.2 instructions. Trying to make a brand new IW application run in a 486 DX2/66 MHz doesn't make any sense to me

All I can tell you is that an Intraweb x64 application built with XE4 Using the CloudFlare Zlib will raise an illegal instruction error when starting up on on the processors i mentioned. 

I've included the CPU-Z screen shots here to show what the difference is in supported instruction sets, and as you can see the difference in instruction set is (AES, AVX, AVX2, FMA3, TSX)

   

   

   

Below is a list of some of the improvements in the Cloud Flare fork of zlib
Quote:* uint64_t as the standard type - the default fork used 16-bit types.
* Using an improved hash function - we use the iSCSI CRC32 function as the hash function in our zlib. This specific function is implemented as a hardware instruction on Intel processors. It has very fast performance and better collision properties.
* Search for matches of at least 4 bytes, instead the 3 bytes the format suggests. This leads to fewer hash collisions, and less effort wasted on insignificant matches. It also improves the compression rate a little bit for the majority of cases (but not all).
* Using SIMD instructions for window rolling.
* Using the hardware carry-less multiplication instruction PLCMULQDQ for the CRC32 checksum.
* Optimized longest-match function. This is the most performance demanding function in the library. It is responsible for finding the (length, distance) matches in the current window.

As the AVX and AVX2 are SIMD instructions i think they may be using them somewhere. as the I7 950 and Xeon X5550 both support SSE4.2 but wont run the Cloud Flare code.

I agree with you that servers should be running recent processors and hardware, however we can't always dictate when our customers will invest in new servers. therefore it is important to us that we can deploy our Intraweb apps on slightly older platforms. 

I don't think its unreasonable to expect it to run on an Intel Xeon X5550, do you?

Maybe looking at integrating zlib-ng would be a better solution once it has a stable release, as it may be faster than the cloud flare implementation and it contains fallback code if the advanced instruction sets are not available.

EDIT: Having taken a closer look at the Cloud Flare source, I suspect that the Cloud Flare obj files included with Intraweb where complied with the HAS_AVX flag so the OBJ file emitted by GCC contains AVX instructions...

It’s not a bug – it’s an undocumented feature.
Reply
#15
It might be that GCC is generating AVX instructions when building with full optimization. I'll check it and let you know.

Kind regards,
Reply
#16
About zlib-ng, it was also considered. However, 6 months ago it didn't look production ready (I'm not convinced yet, check out this issue: https://github.com/Dead2/zlib-ng/issues/90). There are reported bugs which caused application crash which are still open. This branch contains massive modifications to the main zlib branch and, at that time, we couldn't find anyone using it in production. It would be basically a gamble. Intel and Cloudflare branches have their own issues, but they have been reported to be stable.

In case someone desperately needs to deploy to an old processor, you can still use x86 binary which doesn't require AVX at all.

We are still investigating this issue, so we should have more information soon.

Cheers
Reply
#17
(10-12-2018, 03:24 AM)Alexandre Machado Wrote: About zlib-ng, it was also considered. However, 6 months ago it didn't look production ready (I'm not convinced yet, check out this issue: https://github.com/Dead2/zlib-ng/issues/90). There are reported bugs which caused application crash which are still open. This branch contains massive modifications to the main zlib branch and, at that time, we couldn't find anyone using it in production. It would be basically a gamble. Intel and Cloudflare branches have their own issues, but they have been reported to be stable.

In case someone desperately needs to deploy to an old processor, you can still use x86 binary which doesn't require AVX at all.

We are still investigating this issue, so we should have more information soon.

Cheers

Thanks for looking into it, for now we will simply modify our copy of the source to use Define USE_ZLIB_1211 for our 64 Bit apps. hopefully a more permanent fix can be found.

Initially i was trying to see if there was a way to detect the CPU features and bind the appropriate zlib version at run time rather than compile time.

As for zlib-ng I would stay well clear until a stable version is released. Looking at the Cloud Flare source code (although i'm no expert in this) it looks as if there is some detection of CPU instruction sets going on now, but I'm not sure if its at compile or run time.

It’s not a bug – it’s an undocumented feature.
Reply
#18
Yes, a fix is on the pipeline. Only SSE 4.2 (not AVX) is required for Cloudflare branch use. I honestly don't believe there is anyone using a production server which doesn't support SSE 4.2
Reply
#19
(10-13-2018, 10:46 PM)Alexandre Machado Wrote: Yes, a fix is on the pipeline. Only SSE 4.2 (not AVX) is required for Cloudflare branch use. I honestly don't believe there is anyone using a production server which doesn't support SSE 4.2

That's great news, 

if you want me to test the fix out i will on several different processors when its ready.

It’s not a bug – it’s an undocumented feature.
Reply
#20
Please update to 15.0.14 so you can use new zlib object files which shouldn't contain any AVX/AVX2 instructions. We ran an extensive testing using Intel and AMD processors which don't support AVX/AVX2 instruction set and everything worked as expected.
When using Cloudflare zlib branch, SSE 4.2 is required. As I mentioned before, I honestly don't believe that a production server (nor a dev machine) nowadays will not support SSE 4.2, having in mind that most processors released after 2008 support it.
When Cloudflare zlib is used, IntraWeb will detect SSE 4.2 instruction set support and disable zlib compression automatically, if support is not present. A warning will be logged to application log file. This prevents application crash. Usage of a different branch/zlib version for such cases is discarded for now once we think that this will be extremely rare in practice, if it ever occurs.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)