Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Dealing with binaries and text in TIdHTTP
#6
(12-18-2018, 12:50 AM)thijsvandien Wrote: In particular, I did (or still do) not understand if/when Indy infers encodings, how it does so (e.g. looking at the headers of a request/response or at the data), if/when it does conversions or merely "advertises" the encoding (e.g. in the headers).

When sending a request, TIdHTTP doesn't really infer very much, it requires you to provide info up front, via the TIdHTTP.Request sub-properties, like ContentType and CharSet, etc. If you don't supply that, it may use some defaults, but not very many. For instance, a default Content-Type header is sent only when posting a TStrings or TIdMultipartFormDataStream object (where the Content-Type needs to be a specific value). Otherwise, no default Content-Type is sent at all, not even something like "application/octet-stream" (which the server should infer on its own when no Content-Type is present).

When reading a response, TIdHTTP looks at the Content-Type header (or, in the specific case of HTML or non-textual XML, at the body data itself) to determine the charset to assign to the TIdHTTP.Response.CharSet property. And then, if (and only if) you use an overload that returns the body data as a String, that charset is used to decode the body data into Unicode (in Delphi 2007 and earlier, that Unicode is then converted to ANSI for output). If you use an overload that returns the response body in a TStream instead, the body data is returned as-is in its raw form, and you would have to then process it yourself as needed.

(12-18-2018, 12:50 AM)thijsvandien Wrote: From what you are saying, I get that for methods use a TStream for the request body, Indy takes those bytes as-is.

Yes.

(12-18-2018, 12:50 AM)thijsvandien Wrote: Then is any charset ever put in the headers automatically? If so, from where?

There is no default TIdHTTP.Request.CharSet assigned for TStream data. You are responsible for specifying the TIdHTTP.Request.ContentType and TIdHTTP.Request.CharSet properties as needed.

If you do assign the TIdHTTP.Request.ContentType property manually, a default CharSet may be assigned, depending on the particular media type being specified. If the input string specifies a "charset" attribute explicitly, it is used as-is (unless it is blank). If no charset is specified, and only if the media type is a "text/..." type, then the default CharSet is "us-ascii" for XML, and "ISO-8859-1" for other types. But, if the media type is not a "text/..." type, no default CharSet is assigned at all.

(12-18-2018, 12:50 AM)thijsvandien Wrote: For responses, Indy will either write them as-is to the given TStream, or (using another overload) use ReadStringAsCharset decode them to String based on the charset provided in the headers or HTTP's default of ISO-8859-1 in case none is specified.

Or, if the response is HTML or XML, the charset is taken from the HTML/XML header instead. And ISO-8859-1 is not always the default charset used (see above).

Also, if no charset is specified by the response, and none can be inferred implicitly, then ReadStringAsCharset() will end up using Indy's built-in 8bit encoding instead, which will decode the raw bytes as-is to Unicode codepoints U+0000..U+00FF.

(12-18-2018, 12:50 AM)thijsvandien Wrote: Only if the content type is XML does it ignore all that and look at the actual data.

And HTML, too.

(12-18-2018, 12:50 AM)thijsvandien Wrote: Apart from that it does not look at content types to determine whether the response should be decoded at all (i.e. considered text) or not (i.e. considered binary). Correct?

Only when deciding where to get the value for the TIdHTTP.Response.CharSet property from, and if the response body then needs to be decoded to a String for output.

Reply


Messages In This Thread
RE: Dealing with binaries and text in TIdHTTP - by rlebeau - 12-18-2018, 09:47 PM

Forum Jump:


Users browsing this thread: 1 Guest(s)