REBOL3 tracker
  0.9.12 beta
Ticket #0001978 User: anonymous

Project:



rss
TypeBug Statusproblem Date1-Mar-2013 12:38
Versionr3 master CategoryNative Submitted bySunanda
PlatformAll Severitynot a bug Prioritynormal

Summary UNIQUE broken with Binary data
Description Let's start with three Unicode chars as binary:

bin: to-binary "^(b1)^(b2)^(b3)"
== #{C2B1C2B2C2B3}


A straight UNIQUE on this treats it as six octets:
unique bin
== #{C2B1B2B3}


So the obvious thing to try is adding /skip to treat it as 3 pairs of two...
unique/skip bin 2
== #{C2B1}
....But it looks like the code is partially treating the binary as a string, and then getting confused
Example code
bin: to-binary "^(b1)^(b2)^(b3)"
(unique bin) = bin             ;; true or false depending on final model
(unique/skip bin 2) = bin      ;; definitely should be true

(sort copy bin) = bin   ;; true or false depending on final model
(sort/skip copy bin 2) = bin ;; ought to be true? (may be broken - see cc#11)

Assigned ton/a Fixed in- Last Update1-Mar-2013 22:28


Comments
(0003549)
abolka
1-Mar-2013 20:44

That seems to be another facet of #1963: unique/skip #{c2b1c2b2c2b3} 2 sees only one distinct "key", that is #{c2} and keeps only the first occurrence of elements with that key (#{c2b1}).
(0003550)
Sunanda
1-Mar-2013 21:35

That is quite likely if the /skip means the data is being treated as a string.

It should (I think) be treated as pairs of octets, so the data is seen as
#{ c2b1 c2b2 c2b3 }
But there are some design decisions needed before we have a final definition of what /skip means on binary.
(0003552)
BrianH
1-Mar-2013 22:27

A binary is a series, so using /skip on binary when operating on series-record /skip functions should treat it like any other series of fixed-length records. Just because it's binary doesn't mean it's a not still a series. Fixed-length-record binary formats are pretty common - I use them all the time when writing EDI software. Moreso than fixed-length-record string formats, since most cases where people think they are working with fixed-length-record string formats they are actually binary because they assume ASCII encoding.

So, UNIQUE/skip is definitely working as designed here and this ticket is not a bug. It's not related to #1963 because that ticket was about breaking with mixed-type comparison in a block, not the behavior of UNION in general (that was a side-track discussion).

However, this is another example that the behavior-as-designed of these functions is not desirable. Please voice your concern about this design shortcoming in #428, where it is already being discussed. I'll post a link to this ticket there.

Marked as a not-a-bug with problem status because the behavior is as designed, but probably still wrong for other reasons. When #428 is resolved one way or the other, we can dismiss or adjust this ticket.

Date User Field Action Change
2-Mar-2013 08:41 BrianH Comment : 0003552 Modified -
1-Mar-2013 22:32 BrianH Comment : 0003552 Modified -
1-Mar-2013 22:28 BrianH Description Modified -
1-Mar-2013 22:28 BrianH Code Modified -
1-Mar-2013 22:28 BrianH Severity Modified minor => not a bug
1-Mar-2013 22:28 BrianH Status Modified submitted => problem
1-Mar-2013 22:27 BrianH Comment : 0003552 Added -
1-Mar-2013 21:35 sunanda Comment : 0003550 Added -
1-Mar-2013 21:10 abolka Comment : 0003549 Modified -
1-Mar-2013 20:44 abolka Comment : 0003549 Added -
1-Mar-2013 12:38 sunanda Ticket Added -