Ticket #0001234

Type	Bug	Status	reviewed	Date	15-Sep-2009 07:19
Version	alpha 82	Category	Native	Submitted by	sqlab
Platform	Windows	Severity	minor	Priority	normal

Summary	read/string does not convert, when read from the WEB
Description	read/string does not automatically convert to string, when reading via htttp
Example code	>> read/string http://www.rebol.com == #{ 3C68746D6C3E0A3C686561643E0A3C4D455441204E414D453D22446573637269 7074696F6E2220434F4E54454E543D225245424F4C3A20612057656220332E30 206C616E677561676520616E642073797374656D206261736564206F6E206E65 77206C6967687477656967687420636F6D707574696E67206D6574686F64732E 205369746520696E636C756465732070726F64756374732C20646F776E6C6F61 64732C20646F63756D656E746174696F6E2C20616E6420737570706F72742E22 3E0A3C4D455441204E414D453D224B6579776F7264732220434F4E54454E543D 225245424F4C2C2057656220332E302C2057656220...

Summary

read/string does not convert, when read from the WEB

Description

read/string does not automatically convert to string, when reading via htttp

Example code

>> read/string http://www.rebol.com
== #{
3C68746D6C3E0A3C686561643E0A3C4D455441204E414D453D22446573637269
7074696F6E2220434F4E54454E543D225245424F4C3A20612057656220332E30
206C616E677561676520616E642073797374656D206261736564206F6E206E65
77206C6967687477656967687420636F6D707574696E67206D6574686F64732E
205369746520696E636C756465732070726F64756374732C20646F776E6C6F61
64732C20646F63756D656E746174696F6E2C20616E6420737570706F72742E22
3E0A3C4D455441204E414D453D224B6579776F7264732220434F4E54454E543D
225245424F4C2C2057656220332E302C2057656220...

Assigned to	n/a	Fixed in	-	Last Update	15-Mar-2013 00:54

Comments
(0001601) PeterWood 17-Sep-2009 03:47	As I understand from the documentation, read/strings will only convert UTF-8 encoded data to a string. www.rebol.com is ISO-8859-1 encoded. However, I tested with a UTF-8 encoded site and got the same problem. >> read/string http://www.bbc.co.uk == 64#{ PCFET0NUWVBFIGh0bWwgUFVCTElDICItLy9XM0MvL0RURCBYSFRNTCAxLjAgU3Ry aWN0Ly9FTiIgImh0dHA6Ly93d3cudzMub3JnL1RSL3hodG1sMS9EVEQveGh0bWwx As the built-in decoding expects to the input to be a UTF-8 encoded binary string, there will always be issues in automatically converting web pages to strings www.rebol.com is ISO-8859-1 encoded, to string! happily converts the binary of www.rebol.com to a string: >> to string! read http:www.rebol.com == {html head META NAME="Description" CONTENT="REBOL: a Web 3.0 language and system based on new lightweight computing methods. Site includes products, downloads, documentation, and support."> META NAME="Keywords" CONTENT="REBOL, Web 3.0, Web 2.0, programming, Internet, software, domain specific language, distributed computing, collaboration, operating systems, development, rebel"> meta http-equiv="content-type" content="text/html;charset=iso-8859-1"> title>REBOL Technologies styl... This because rebol.com homepage has only characters in the range 0 - 127. However if there are characters above 128 they will be incorrectly converted by to string! (Perhaps to string! only decodes UTF-8 binary strings) >> to string! read http://localhost/iso88591.html == {hmtl> head> meta http-equiv="content-type" content="text/html; charset=iso-8859-1"> title>᧠a grave followed by e acute /head> ;; The ᧠ is an a with a grave accent followed by a e with an acute accent in the source. [Note: I removed the leading < from the HTML tags to get the html to display in Curecode.]
(0003404) abolka 26-Jan-2013 23:42	READ (without /string) of an URL already automatically converts to a string! if the HTTP headers claim the content to be UTF8-encoded. (This could be extended to handle other encodings, but that's a separate issue.) READ/string is intended as a shortcut for `to string! read`, i.e. forced conversion from UTF-8. (`to string!` of binary data _always_ decodes using UTF-8; so would READ/string). READ/string is currently not implemented for URLs.
(0003667) BrianH 15-Mar-2013 00:54	Note, READ without /string should never convert to string, it should return a binary. If you want READ to autoconvert, please rewrite the Rebol language parser so it can parse strings. Until then, it depends on READ without /string returning a binary, and every time READ returns a string it has to be converted back into a binary before it can be parsed. READ/string is a shortcut for TO-STRING READ, but when /string is not specified that means that the developer doesn't want it converted to a string. If they did, they would have called READ/string instead. So that means that the HTTP conversion code needs to happen a little less automatically before this ticket can be considered fixed.

Date	User	Field	Action	Change
15-Mar-2013 00:58	BrianH	Comment : 0003667	Modified	-
15-Mar-2013 00:54	BrianH	Comment : 0003667	Added	-
26-Jan-2013 23:42	abolka	Comment : 0003404	Added	-
5-Oct-2009 20:45	carl	Code	Modified	-
5-Oct-2009 20:45	carl	Status	Modified	submitted => reviewed
17-Sep-2009 03:51	PeterWood	Comment : 0001601	Modified	-
17-Sep-2009 03:50	PeterWood	Comment : 0001601	Modified	-
17-Sep-2009 03:49	PeterWood	Comment : 0001601	Modified	-
17-Sep-2009 03:48	PeterWood	Comment : 0001601	Modified	-
17-Sep-2009 03:47	PeterWood	Comment : 0001601	Modified	-
17-Sep-2009 03:47	PeterWood	Comment : 0001601	Added	-
15-Sep-2009 07:19	sqlab	Ticket	Added	-