REBOL3 tracker
  0.9.12 beta
Ticket #0001234 User: anonymous


Previous Next
TypeBug Statusreviewed Date15-Sep-2009 07:19
Versionalpha 82 CategoryNative Submitted bysqlab
PlatformWindows Severityminor Prioritynormal

Summary read/string does not convert, when read from the WEB
Description read/string does not automatically convert to string, when reading via htttp
Example code
>> read/string
== #{

Assigned ton/a Fixed in- Last Update15-Mar-2013 00:54

17-Sep-2009 03:47

As I understand from the documentation, read/strings will only convert UTF-8 encoded data to a string. is ISO-8859-1 encoded. However, I tested with a UTF-8 encoded site and got the same problem.

>> read/string
== 64#{

As the built-in decoding expects to the input to be a UTF-8 encoded binary string, there will always be issues in automatically converting web pages to strings is ISO-8859-1 encoded, to string! happily converts the binary of to a string:

>> to string! read
== {html
META NAME="Description" CONTENT="REBOL: a Web 3.0 language and system based on new lightweight computing methods. Site includes products, downloads, documentation, and support.">
META NAME="Keywords" CONTENT="REBOL, Web 3.0, Web 2.0, programming, Internet, software, domain specific language, distributed computing, collaboration, operating systems, development, rebel">

meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
title>REBOL Technologies

This because homepage has only characters in the range 0 - 127.

However if there are characters above 128 they will be incorrectly converted by to string! (Perhaps to string! only decodes UTF-8 binary strings)

>> to string! read http://localhost/iso88591.html
== {hmtl>
meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
title>᧠a grave followed by e acute

;; The ᧠ is an a with a grave accent followed by a e with an acute accent in the source.

[Note: I removed the leading < from the HTML tags to get the html to display in Curecode.]

26-Jan-2013 23:42

READ (without /string) of an URL already automatically converts to a string! if the HTTP headers claim the content to be UTF8-encoded. (This could be extended to handle other encodings, but that's a separate issue.)

READ/string is intended as a shortcut for `to string! read`, i.e. forced conversion from UTF-8. (`to string!` of binary data _always_ decodes using UTF-8; so would READ/string). READ/string is currently not implemented for URLs.
15-Mar-2013 00:54

Note, READ without /string should never convert to string, it should return a binary. If you want READ to autoconvert, please rewrite the Rebol language parser so it can parse strings. Until then, it depends on READ without /string returning a binary, and every time READ returns a string it has to be converted back into a binary before it can be parsed.

READ/string is a shortcut for TO-STRING READ, but when /string is not specified that means that the developer doesn't want it converted to a string. If they did, they would have called READ/string instead. So that means that the HTTP conversion code needs to happen a little less automatically before this ticket can be considered fixed.

Date User Field Action Change
15-Mar-2013 00:58 BrianH Comment : 0003667 Modified -
15-Mar-2013 00:54 BrianH Comment : 0003667 Added -
26-Jan-2013 23:42 abolka Comment : 0003404 Added -
5-Oct-2009 20:45 carl Code Modified -
5-Oct-2009 20:45 carl Status Modified submitted => reviewed
17-Sep-2009 03:51 PeterWood Comment : 0001601 Modified -
17-Sep-2009 03:50 PeterWood Comment : 0001601 Modified -
17-Sep-2009 03:49 PeterWood Comment : 0001601 Modified -
17-Sep-2009 03:48 PeterWood Comment : 0001601 Modified -
17-Sep-2009 03:47 PeterWood Comment : 0001601 Modified -
17-Sep-2009 03:47 PeterWood Comment : 0001601 Added -
15-Sep-2009 07:19 sqlab Ticket Added -