Type | Bug | Status | dismissed | Date | 2-Jul-2009 08:20 |
---|---|---|---|---|---|
Version | alpha 66 | Category | Parse | Submitted by | Sunanda |
Platform | All | Severity | not a bug | Priority | normal |
Summary | PARSE string with quote, comma, and tab delimiting |
---|---|
Description |
This works as expected: x: rejoin [{"a,"} tab "b"] == {"a"^-b} parse/all x tab == ["a," "b"] ;; block length 2 as expected But here, parse effectively promotes the comma to [tab ","] x: rejoin [{"a",} tab "b"] {"a",^-b} parse/all x tab == ["a" "," "b"] ;; block length 3 !? R2 does the same. Issue found in R2 while debugging a live application that attempts to read a tab-delimited file. At they very least the issue is a gotcha that needs documenting so we can develop robust import routines. |
Example code |
x: rejoin [{"a",} tab "b"] parse/all x tab |
Assigned to | n/a | Fixed in | - | Last Update | 10-Jul-2009 18:40 |
---|
Comments | |
---|---|
(0001225)
BrianH 3-Jul-2009 22:06 |
This is a bug, not a gotcha that needs documenting. Is this one more consideration for the PARSE rewrite, or a quick fix? |
(0001252)
Carl 7-Jul-2009 17:10 |
I don't understand what result you want. The delimiters are in conflict. The quotes on the string make it a single "atom". Then you have both comma and tab for delimiters in the data, but you only specify tab as the delimiter? If so, then the comma is just data, not a delimiter, so the result above is correct.
If you want the comma removed, specify it as a delimiter. parse/all str "^-," ; tab and comma If you want a specific result that you're not seeing, please post it in the ticket. |
(0001254)
BrianH 7-Jul-2009 17:27 |
I think that "The quotes on the string make it a single atom." was the source of confusion. I guess it is a gotcha that needs documenting after all, particularly since fixing this would break the ability for the data to contain the delimiter in the quoted portion. We can start by marking this as not a bug. |
(0001257)
Sunanda 7-Jul-2009 20:09 |
What I wanted (and the application needed) was for parse to break the input string at the tabs character, regardless of any other special characters -- like quotes or commas. I know my application's input cannot consist of strings with embedded tabs. And I did not want parse to use its initiative.
But (as Brian suggests) that would conflict with my wish in other applications where I'd expect parse to intelligently handle embedded tabs and/or commans in CSV files. So (also as Brian has suggested offline) the real issues are: 1. unclear mental model of parse's built-in logic when it encounters embedded delimited strings 2. expectation that parse can handle all CSV files, when we really need a snazzy mezz like decode-csv to handle all the messiness and RFC#4180 specifications. The gotcha's are: 1. assuming parse does not have special handling for quotes 2. assuming parse unaided can handle all possible CSV files. |
(0001260)
BrianH 8-Jul-2009 00:18 |
Well,
1. Simple PARSE's handling of quotes is mostly* consistent with RFC4180, and useful. 2. Handling all possible CSV files is unlikely for simple PARSE, since the differences are contradictory. The rest sounds like a job for a DECODE-CSV mezzanine. Let's declare this a feature. * Mostly: According to R3: >> parse {"hello""world^/",a} "," == ["hello" "world^/" "a"] >> length? parse {"hello""world^/",a} "," == 3 According to http://tools.ietf.org/html/rfc4180 : >> parse {"hello""world^/",a} "," == [{hello"world^/} "a"] >> length? parse {"hello""world^/",a} "," == 2 Added a ticket for the above: #1079 |
(0001296)
Carl 10-Jul-2009 17:40 |
I agree. I is handy if PARSE can deal with simplistic CSV formats.
For the heavy-duty create DECODE 'CSV data -- an R3 codec, rather than a separate function. That way you can build the encoder at the same time, and have a cool combo. (And yes, it should be possible to write it in R code.) |
Date | User | Field | Action | Change |
---|---|---|---|---|
10-Jul-2009 18:49 | BrianH | Comment : 0001260 | Modified | - |
10-Jul-2009 18:40 | BrianH | Category | Modified | => Parse |
10-Jul-2009 17:40 | carl | Comment : 0001296 | Added | - |
8-Jul-2009 00:25 | BrianH | Comment : 0001260 | Modified | - |
8-Jul-2009 00:18 | BrianH | Status | Modified | problem => dismissed |
8-Jul-2009 00:18 | BrianH | Severity | Modified | minor => not a bug |
8-Jul-2009 00:18 | BrianH | Comment : 0001260 | Added | - |
7-Jul-2009 20:09 | sunanda | Comment : 0001257 | Added | - |
7-Jul-2009 17:33 | BrianH | Comment : 0001254 | Modified | - |
7-Jul-2009 17:27 | BrianH | Comment : 0001254 | Added | - |
7-Jul-2009 17:13 | carl | Status | Modified | reviewed => problem |
7-Jul-2009 17:13 | carl | Comment : 0001252 | Modified | - |
7-Jul-2009 17:10 | carl | Comment : 0001252 | Added | - |
7-Jul-2009 17:03 | carl | Summary | Modified | parse finds a phantom tab => PARSE string with quote, comma, and tab delimiting |
3-Jul-2009 22:06 | BrianH | Comment : 0001225 | Added | - |
3-Jul-2009 22:03 | BrianH | Status | Modified | submitted => reviewed |
3-Jul-2009 22:03 | BrianH | Code | Modified | - |
3-Jul-2009 22:03 | BrianH | Description | Modified | - |
2-Jul-2009 08:20 | sunanda | Ticket | Added | - |