Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Encoding problem?
#1
My Client is failing to logon to my server now and I think this is an upgrade woe (I'm using 10.6.3.3)

The client sends a string command like this: logon¦<username>¦password

After being stumped as to it not working anymore, I ended up doing a debug and seeing that the clients TCP thread is sending it but the server isn't happy - and sends back a 400 quoting it with ? instead of ¦

So I went and debugged the server and the TIdCmdTCPServer.DoExecute function is reading it in as: logon?<username>?password

Clearly this is not what was sent however ¦ is turning into ?

In IdGlobal there is this function that the client calls when calling WriteLn():


Code:
function ToBytes(const AValue: string; const ALength: Integer; const AIndex: Integer = 1;
  ADestEncoding: IIdTextEncoding = nil
  {$IFDEF STRING_IS_ANSI}; ASrcEncoding: IIdTextEncoding = nil{$ENDIF}
  ): TIdBytes; overload;
var
  LLength: Integer;
  {$IFDEF STRING_IS_ANSI}
  LBytes: TIdBytes;
  {$ENDIF}
begin
  {$IFDEF STRING_IS_ANSI}
  LBytes := nil; // keep the compiler happy
  {$ENDIF}
  LLength := IndyLength(AValue, ALength, AIndex);
  if LLength > 0 then
  begin
    EnsureEncoding(ADestEncoding);
    {$IFDEF STRING_IS_UNICODE}
    SetLength(Result, ADestEncoding.GetByteCount(AValue, AIndex, LLength));
    if Length(Result) > 0 then begin
      ADestEncoding.GetBytes(AValue, AIndex, LLength, Result, 0);
    end;
    {$ELSE}
    EnsureEncoding(ASrcEncoding, encOSDefault);
    LBytes := RawToBytes(AValue[AIndex], LLength);

//LBytes is: (76, 111, 103, 111, 110, 166, 83, 97, 102, 102, 108, 101, 115, 166, 104, 111, 114, 97, 99, 101, 13, 10)

    CheckByteEncoding(LBytes, ASrcEncoding, ADestEncoding);

//LBytes is: (76, 111, 103, 111, 110, 63, 83, 97, 102, 102, 108, 101, 115, 63, 104, 111, 114, 97, 99, 101, 13, 10)

    Result := LBytes;
    {$ENDIF}
  end else begin
    SetLength(Result, 0);
  end;
end;

As you can see the bytes 166 are transformed into bytes 63. This never used to happen. Why does it happen now?

This seems to be triggered here:

Code:
function TIdASCIIEncoding.GetBytes(const AChars: PIdWideChar; ACharCount: Integer;
  ABytes: PByte; AByteCount: Integer): Integer;
var
  P: PIdWideChar;
  i : Integer;
begin
  // TODO: decode UTF-16 surrogates...
  P := AChars;
  Result := IndyMin(ACharCount, AByteCount);
  for i := 1 to Result do begin
    // replace illegal characters > $7F
    if UInt16(P^) > $007F then begin

//This next line seems to do it
      ABytes^ := Byte(Ord('?'));

    end else begin
      ABytes^ := Byte(P^);
    end;
    //advance to next char
    Inc(P);
    Inc(ABytes);
  end;
end;


Any ideas please?

Thanks
Reply
#2
(10-27-2024, 08:27 PM)Justin Case Wrote: My Client is failing to logon to my server now and I think this is an upgrade woe (I'm using 10.6.3.3)


What version are you upgrading from?

(10-27-2024, 08:27 PM)Justin Case Wrote: The client sends a string command like this: logon¦<username>¦password


Why is the client using a non-ASCII character as a token separator? What version of Delphi (or other programming language) is the client using?

The ASCII pipe character is | (124, $7C). ¦ (166, $AC) is outside of ASCII and thus subject to charset interpretation.

(10-27-2024, 08:27 PM)Justin Case Wrote: After being stumped as to it not working anymore, I ended up doing a debug and seeing that the clients TCP thread is sending it but the server isn't happy - and sends back a 400 quoting it with ? instead of ¦


That means you have a charset mismatch between the client and server. Indy's default charset is ASCII. Clearly the client is not sending ASCII. So, what charset is it actually using? Looks like Windows-1252 or similar. You will need to match that charset on the server, ie by setting the AContext.Connection.IOHandler.DefStringEncoding property in the server's OnConnect event

(10-27-2024, 08:27 PM)Justin Case Wrote: This never used to happen. Why does it happen now?

Indy 10 has always behaved this way since Delphi migrated to Unicode in 2009.

Reply
#3
(10-27-2024, 09:39 PM)rlebeau Wrote:
(10-27-2024, 08:27 PM)Justin Case Wrote: My Client is failing to logon to my server now and I think this is an upgrade woe (I'm using 10.6.3.3)


What version are you upgrading from?

It was 10.2.3 - very old! I'm using Delphi 6.

(10-27-2024, 09:39 PM)rlebeau Wrote:
(10-27-2024, 08:27 PM)Justin Case Wrote: The client sends a string command like this: logon¦<username>¦password


Why is the client using a non-ASCII character as a token separator? What version of Delphi (or other programming language) is the client using?

Delphi 6 for both Server and Client.

It's the character I've always used - even in Indy 8 and 9 - the ¦ (166) character always seemed a good non-obvious one that a user was unlikely to type in. As for whether it's ASCII or not, I just assumed if it was on a computer keyboard, it was ok to use and as v8 and 9 had no problems with it, I've always used it - even in the previous version 10 that I was using.

The CmdDelimiter and ParamDelimiter for all my command handlers are set to #166 and have been for years - and it worked well.

(10-27-2024, 09:39 PM)rlebeau Wrote: The ASCII pipe character is | (124, $7C), not ¦ (166, $AC).

I don't think I mentioned the pipe character? (I may well be wrong though, I have a habit of making myself look daft)

(10-27-2024, 09:39 PM)rlebeau Wrote:
(10-27-2024, 08:27 PM)Justin Case Wrote: After being stumped as to it not working anymore, I ended up doing a debug and seeing that the clients TCP thread is sending it but the server isn't happy - and sends back a 400 quoting it with ? instead of ¦


That means you have a charset mismatch between the client and server. Indy's default charset is ASCII. Clearly the client is not sending ASCII. So, what charset is it actually using?

Well that's just it - I've not set a charset on the client or server so as far as I'm concerned they are supposed to be using the same - and they are running on the same WinXP virtual machine with the Delphi 6 IDE.

(10-27-2024, 09:39 PM)rlebeau Wrote:
(10-27-2024, 08:27 PM)Justin Case Wrote: This never used to happen. Why does it happen now?

Indy 10 has always behaved this way, at least since the migration to Unicode around 2009.

But it didn't do this in the previous 10.2.3 that I was using? - even though I can see the same code in the source for it!

(10-27-2024, 09:39 PM)rlebeau Wrote: You will need to match that charset on the server, ie by setting the AContext.Connection.IOHandler.DefStringEncoding property

I have no idea what values to use there? - it's defined as a property of IIdTextEncoding but I can't find a definition for IIdTextEncoding anywhere to know what i can use as a value

Ok, eventually I managed to find: IdTextEncodingType = (encIndyDefault, encOSDefault, enc8Bit, encASCII, encUTF16BE, encUTF16LE, encUTF7, encUTF8);

It was below the type definition for IIdTextEncoding which is a interface with a load of functions... so I'm not sure how I'm supposed to change the the AContext.Connection.IOHandler.DefStringEncoding property..

Tried this: AContext.Connection.IOHandler.DefStringEncoding := encIndyDefault;
"[Error] Server_Main_Source.pas(603): Incompatible types: 'IdTextEncodingType' and 'IIdTextEncoding'"

Tried this (invalid typecast it said): AContext.Connection.IOHandler.DefStringEncoding := IIdTextEncoding(encIndyDefault);
Reply
#4
(10-27-2024, 09:57 PM)Justin Case Wrote: It was 10.2.3 - very old!

IIRC, that version predates Indy's inclusion of Unicode support. Everything in that version treated AnsiString as raw bytes, so didn't care about any character encodings.

(10-27-2024, 09:57 PM)Justin Case Wrote: I'm using Delphi 6.

That version's support for Unicode is *extremely* limited. Everything was based on AnsiString. That hasn't been the case in newer Delphi versions for the past 15 years.

In any case, modern Indy *does* still support Delphi 6. However, Indy uses Unicode internally, and only handles ANSI data at the boundaries where data enters and leaves Indy. So, you need to take extra steps to handle charset encodings properly in such old environments. Such as setting the IOHandler.DefAnsiEncoding property to match the encoding that AnsiString uses.

(10-27-2024, 09:57 PM)Justin Case Wrote: It's the character I've always used - even in Indy 8 and 9 - the ¦ (166) character always seemed a good non-obvious one that a user was unlikely to type in.

That character is not a good choice for a communication prototol, as it is dependent on charset interpretation. Even worse if you are dependent on the user's system charset, because it doesn't exist in many charset.

(10-27-2024, 09:57 PM)Justin Case Wrote: As for whether it's ASCII or not, I just assumed if it was on a computer keyboard, it was ok to use

Most modern keyboards don't have a key for ¦, and those that do will usually produce | instead.

(10-27-2024, 09:57 PM)Justin Case Wrote: as v8 and 9 had no problems with it, I've always used it - even in the previous version 10 that I was using.

Only because none of those handled Unicode at all. Now, with Unicode, it is more complicated since ¦ is outside of ASCII.

(10-27-2024, 09:57 PM)Justin Case Wrote: The CmdDelimiter and ParamDelimiter for all my command handlers are set to #166 and have been for years - and it worked well.

In order for that to continue working for you, the DefStringEncoding and DefAnsiEncoding properties need to be compatible with the encoding of the bytes on the wire.

When Indy is receiving a command, before it can parse the command and compare the command's delimiters, it has to receive the command as a native String,which in your case is AnsiString. It will receive the raw bytes, decode them to Unicode first using DefStringEncoding, and then convert the Unicode to AnsiString using DefAnsiEncoding.

Conversely, to send an AnsiString, it has to first be converted to Unicode using DefAnsiEncoding, and then encoded to bytes using DefStringEncoding, and then the bytes are sent on the wire.

Any of those steps can cause data loss if the charset encodings are mismatched.

(10-27-2024, 09:57 PM)Justin Case Wrote: Well that's just it - I've not set a charset on the client or server

Then you are dependent on the user's default system charset. Which can differ from one machine to another. That is not a good position to be in if you need to communicate with other machines.

(10-27-2024, 09:57 PM)Justin Case Wrote: But it didn't do this in the previous 10.2.3 that I was using?

Because that old version predates both Delphi's and Indy's modern Unicode handling.

(10-27-2024, 09:57 PM)Justin Case Wrote: I have no idea what values to use there? - it's defined as a property of IIdTextEncoding but I can't find a definition for IIdTextEncoding anywhere to know what i can use as a value

IIdTextEncoding is an interface type that is defined in Indy's IdGlobal unit. There are several IndyTextEncoding...() functions that return IIdTextEncoding wrappers for various system-defined and user-defined charsets.

(10-27-2024, 09:57 PM)Justin Case Wrote: Ok, eventually I managed to find: IdTextEncodingType = (encIndyDefault, encOSDefault, enc8Bit, encASCII, encUTF16BE, encUTF16LE, encUTF7, encUTF8);

You can't assign an IdTextEncodingType enum directly to DefStringEncoding or DefAnsiEncoding (however, there is an overload of IndyTextEncoding() that returns an IIdTextEncoding for a IdTextEncodingType).

(10-27-2024, 09:57 PM)Justin Case Wrote: I'm not sure how I'm supposed to change the AContext.Connection.IOHandler.DefStringEncoding property..

In this situation, I would suggest using IndyTextEncoding_OSDefault (or IndyTextEncoding_8bit), eg:

Code:
with AContext.Connection.IOHandler do
begin
  DefStringEncoding := IndyTextEncoding_OSDefault; // or 8bit
  DefAnsiEncoding := IndyTextEncoding_OSDefault; // or 8bit
end;

That should preserve the old behavior you are looking for.

Reply
#5
(10-28-2024, 02:35 AM)rlebeau Wrote: In this situation, I would suggest using IndyTextEncoding_OSDefault (or IndyTextEncoding_8bit), eg:

Code:
with AContext.Connection.IOHandler do
begin
  DefStringEncoding := IndyTextEncoding_OSDefault; // or 8bit
  DefAnsiEncoding := IndyTextEncoding_OSDefault; // or 8bit
end;

That should preserve the old behavior you are looking for.

Remy you are an absolute STAR! That's fixed it - thanks a lot!
Reply
#6
Now I'm suffering the same problem with TIdEncoderMime and TIdDecoderMime both converting ¦ to ? also.

I create my client objects with a TIdEncoderMime component so that they can base64 encode strings from a TStringList.Text property (deals with all the carriage returns etc so i can send it all in one line) but now the ¦ characters are also going to question marks and your workaround doesn't seem to work with EncodeString() which has two paramters after the input string for the encoding.

Any ideas please?

I'm not opposed to a completely different delimiter character if you can suggest a good one that simply works ..
Reply
#7
(11-03-2024, 09:54 PM)Justin Case Wrote: I create my client objects with a TIdEncoderMime component so that they can base64 encode strings from a TStringList.Text property (deals with all the carriage returns etc so i can send it all in one line) but now the ¦ characters are also going to question marks and your workaround doesn't seem to work with EncodeString() which has two paramters after the input string for the encoding.

Those parameters work the same way as the IOHandler properties. The ASrcEncoding parameter is used to convert the AnsiString to Unicode, and the AByteEncoding parameter is used to convert the Unicode to bytes, which are then encoded to base64. So you have to make sure the encodings match your situation. In this case, the default ASrcEncoding is OSDefault, and the default AByteEncoding is ASCII, which is why you are losing the ¦ character. So you just need to pick a more suitable byte encoding which supports that character.

(11-03-2024, 09:54 PM)Justin Case Wrote: I'm not opposed to a completely different delimiter character if you can suggest a good one that simply works ..

I already addressed that in my very first reply of this discussion.

Reply
#8
This is what I tried to workaround the problem:
Base64Encoder.EncodeString(UserStrList.Text, IndyTextEncoding_OSDefault, IndyTextEncoding_OSDefault);

It didn't work - Question marks were the gift that kept on giving..

Guess I'll just go with the pipe character then! It's weird because my keyboard shows the pipe on one key (which actually types the broken pipe) and my broken pipe key types the solid pipe.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)