Type | Bug | Status | problem | Date | 1-Mar-2013 12:38 |
---|---|---|---|---|---|
Version | r3 master | Category | Native | Submitted by | Sunanda |
Platform | All | Severity | not a bug | Priority | normal |
Summary | UNIQUE broken with Binary data |
---|---|
Description |
Let's start with three Unicode chars as binary: bin: to-binary "^(b1)^(b2)^(b3)" == #{C2B1C2B2C2B3} A straight UNIQUE on this treats it as six octets: unique bin == #{C2B1B2B3} So the obvious thing to try is adding /skip to treat it as 3 pairs of two... unique/skip bin 2 == #{C2B1} ....But it looks like the code is partially treating the binary as a string, and then getting confused |
Example code |
bin: to-binary "^(b1)^(b2)^(b3)" (unique bin) = bin ;; true or false depending on final model (unique/skip bin 2) = bin ;; definitely should be true (sort copy bin) = bin ;; true or false depending on final model (sort/skip copy bin 2) = bin ;; ought to be true? (may be broken - see cc#11) |
Assigned to | n/a | Fixed in | - | Last Update | 1-Mar-2013 22:28 |
---|
Comments | |
---|---|
(0003549)
abolka 1-Mar-2013 20:44 |
That seems to be another facet of #1963: unique/skip #{c2b1c2b2c2b3} 2 sees only one distinct "key", that is #{c2} and keeps only the first occurrence of elements with that key (#{c2b1}). |
(0003550)
Sunanda 1-Mar-2013 21:35 |
That is quite likely if the /skip means the data is being treated as a string.
It should (I think) be treated as pairs of octets, so the data is seen as #{ c2b1 c2b2 c2b3 } But there are some design decisions needed before we have a final definition of what /skip means on binary. |
(0003552)
BrianH 1-Mar-2013 22:27 |
A binary is a series, so using /skip on binary when operating on series-record /skip functions should treat it like any other series of fixed-length records. Just because it's binary doesn't mean it's a not still a series. Fixed-length-record binary formats are pretty common - I use them all the time when writing EDI software. Moreso than fixed-length-record string formats, since most cases where people think they are working with fixed-length-record string formats they are actually binary because they assume ASCII encoding.
So, UNIQUE/skip is definitely working as designed here and this ticket is not a bug. It's not related to #1963 because that ticket was about breaking with mixed-type comparison in a block, not the behavior of UNION in general (that was a side-track discussion). However, this is another example that the behavior-as-designed of these functions is not desirable. Please voice your concern about this design shortcoming in #428, where it is already being discussed. I'll post a link to this ticket there. Marked as a not-a-bug with problem status because the behavior is as designed, but probably still wrong for other reasons. When #428 is resolved one way or the other, we can dismiss or adjust this ticket. |
Date | User | Field | Action | Change |
---|---|---|---|---|
2-Mar-2013 08:41 | BrianH | Comment : 0003552 | Modified | - |
1-Mar-2013 22:32 | BrianH | Comment : 0003552 | Modified | - |
1-Mar-2013 22:28 | BrianH | Description | Modified | - |
1-Mar-2013 22:28 | BrianH | Code | Modified | - |
1-Mar-2013 22:28 | BrianH | Severity | Modified | minor => not a bug |
1-Mar-2013 22:28 | BrianH | Status | Modified | submitted => problem |
1-Mar-2013 22:27 | BrianH | Comment : 0003552 | Added | - |
1-Mar-2013 21:35 | sunanda | Comment : 0003550 | Added | - |
1-Mar-2013 21:10 | abolka | Comment : 0003549 | Modified | - |
1-Mar-2013 20:44 | abolka | Comment : 0003549 | Added | - |
1-Mar-2013 12:38 | sunanda | Ticket | Added | - |