REBOL3 tracker
  0.9.12 beta
Ticket #0001042 User: anonymous

Project:

Previous Next
rss
TypeBug Statusdismissed Date2-Jul-2009 08:20
Versionalpha 66 CategoryParse Submitted bySunanda
PlatformAll Severitynot a bug Prioritynormal

Summary PARSE string with quote, comma, and tab delimiting
Description This works as expected:
x: rejoin [{"a,"} tab "b"]
== {"a"^-b}
parse/all x tab
== ["a," "b"] ;; block length 2 as expected

But here, parse effectively promotes the comma to [tab ","]
x: rejoin [{"a",} tab "b"]
{"a",^-b}
parse/all x tab
== ["a" "," "b"] ;; block length 3 !?

R2 does the same.
Issue found in R2 while debugging a live application that attempts to read a tab-delimited file. At they very least the issue is a gotcha that needs documenting so we can develop robust import routines.
Example code
    x: rejoin [{"a",} tab "b"]
    parse/all x tab

Assigned ton/a Fixed in- Last Update10-Jul-2009 18:40


Comments
(0001225)
BrianH
3-Jul-2009 22:06

This is a bug, not a gotcha that needs documenting. Is this one more consideration for the PARSE rewrite, or a quick fix?
(0001252)
Carl
7-Jul-2009 17:10

I don't understand what result you want. The delimiters are in conflict. The quotes on the string make it a single "atom". Then you have both comma and tab for delimiters in the data, but you only specify tab as the delimiter? If so, then the comma is just data, not a delimiter, so the result above is correct.

If you want the comma removed, specify it as a delimiter.

parse/all str "^-," ; tab and comma

If you want a specific result that you're not seeing, please post it in the ticket.
(0001254)
BrianH
7-Jul-2009 17:27

I think that "The quotes on the string make it a single atom." was the source of confusion. I guess it is a gotcha that needs documenting after all, particularly since fixing this would break the ability for the data to contain the delimiter in the quoted portion. We can start by marking this as not a bug.
(0001257)
Sunanda
7-Jul-2009 20:09

What I wanted (and the application needed) was for parse to break the input string at the tabs character, regardless of any other special characters -- like quotes or commas. I know my application's input cannot consist of strings with embedded tabs. And I did not want parse to use its initiative.

But (as Brian suggests) that would conflict with my wish in other applications where I'd expect parse to intelligently handle embedded tabs and/or commans in CSV files.

So (also as Brian has suggested offline) the real issues are:
1. unclear mental model of parse's built-in logic when it encounters embedded delimited strings
2. expectation that parse can handle all CSV files, when we really need a snazzy mezz like decode-csv to handle all
the messiness and RFC#4180 specifications.

The gotcha's are:
1. assuming parse does not have special handling for quotes
2. assuming parse unaided can handle all possible CSV files.
(0001260)
BrianH
8-Jul-2009 00:18

Well,
1. Simple PARSE's handling of quotes is mostly* consistent with RFC4180, and useful.
2. Handling all possible CSV files is unlikely for simple PARSE, since the differences are contradictory.

The rest sounds like a job for a DECODE-CSV mezzanine. Let's declare this a feature.

* Mostly:
According to R3:
>> parse {"hello""world^/",a} ","
== ["hello" "world^/" "a"]
>> length? parse {"hello""world^/",a} ","
== 3

According to http://tools.ietf.org/html/rfc4180 :
>> parse {"hello""world^/",a} ","
== [{hello"world^/} "a"]
>> length? parse {"hello""world^/",a} ","
== 2

Added a ticket for the above: #1079
(0001296)
Carl
10-Jul-2009 17:40

I agree. I is handy if PARSE can deal with simplistic CSV formats.

For the heavy-duty create DECODE 'CSV data -- an R3 codec, rather than a separate function. That way you can build the encoder at the same time, and have a cool combo. (And yes, it should be possible to write it in R code.)

Date User Field Action Change
10-Jul-2009 18:49 BrianH Comment : 0001260 Modified -
10-Jul-2009 18:40 BrianH Category Modified => Parse
10-Jul-2009 17:40 carl Comment : 0001296 Added -
8-Jul-2009 00:25 BrianH Comment : 0001260 Modified -
8-Jul-2009 00:18 BrianH Status Modified problem => dismissed
8-Jul-2009 00:18 BrianH Severity Modified minor => not a bug
8-Jul-2009 00:18 BrianH Comment : 0001260 Added -
7-Jul-2009 20:09 sunanda Comment : 0001257 Added -
7-Jul-2009 17:33 BrianH Comment : 0001254 Modified -
7-Jul-2009 17:27 BrianH Comment : 0001254 Added -
7-Jul-2009 17:13 carl Status Modified reviewed => problem
7-Jul-2009 17:13 carl Comment : 0001252 Modified -
7-Jul-2009 17:10 carl Comment : 0001252 Added -
7-Jul-2009 17:03 carl Summary Modified parse finds a phantom tab => PARSE string with quote, comma, and tab delimiting
3-Jul-2009 22:06 BrianH Comment : 0001225 Added -
3-Jul-2009 22:03 BrianH Status Modified submitted => reviewed
3-Jul-2009 22:03 BrianH Code Modified -
3-Jul-2009 22:03 BrianH Description Modified -
2-Jul-2009 08:20 sunanda Ticket Added -