Type | Wish | Status | dismissed | Date | 25-Jun-2010 02:23 |
---|---|---|---|---|---|
Version | alpha 99 | Category | Mezzanine | Submitted by | PeterWood |
Platform | All | Severity | major | Priority | none |
Summary | Change default context for words from "global" to "local" |
---|---|
Description |
To paraphrase Douglas Crockford from the appendix "Awful Parts" of his "JavaScript: the good parts": The worst of all of REBOL's bad features (and is has far less of those than most languages) is its preference for global variables. Global variables can be a convenience in very small programs, but they quickly become unwieldy as programs get larger. Because a global variable can be changed by any part of a program at any time, they can significantly complicate the behaviour of the program. Use of global variables degrades the reliability of programs that use them. I'm sure this and further arguments against setting words in the global context are very familiar and I don't need to repeat them here. I believe that in practice the actual use of global "variables" is quite sporadic and mainly by accident rather than design. How many global "variables" are used in REBOL mezzanine functions? (I haven't seen one yet). Doesn't REBOL 3 provide the opportunity to change this behaviour for the better? |
Example code |
Assigned to | n/a | Fixed in | - | Last Update | 27-Sep-2010 17:11 |
---|
Comments | |
---|---|
(0002397)
Ladislav 25-Jun-2010 08:44 |
Have you seen the FUNCT function?
|
(0002401)
BrianH 26-Jun-2010 05:35 |
There are so many problems with this request, and they all come from trying to apply a criticism of JavaScript to REBOL, a language with a completely different semantic model. And then it gets worse, because of the change in the binding and scoping models of R3 relative to R2. So here we go:
* There is no global context in R3. There is the system context, the task-local (in theory) user context(s), module contexts, object contexts, function contexts, and loop contexts. They are all local contexts. * Words are not by default bound to any global context. By default, words are not bound at all. We have to bind them on purpose. * In order for words to be bound to a context, the context has to contain those words. For function and loop contexts, and to a lesser extent module contexts, that means that we need to know what words we want to bind before the code is run, because we can't rebind them later. Because of this, we have added functions that preprocess source code in order to collect words to put in the context: FUNCT (for function contexts), MAKE-MODULE (for module contexts), MAKE object! (native, for object contexts), etc. Even BIND has added options for this. * LOAD, DO, IMPORT, INTERN, DO-NEEDS and MAKE-MODULE are the functions that normally do the binding of script and module contexts. None of these functions are native, so the category is wrong. * "I believe that in practice the actual use of global variables is quite sporadic and mainly by accident rather than design." - Absolutely not true. REBOL is built around the concept of nested contexts (even though it fakes them in practice). So "global" and "local" are in this case relative terms. And most variables that are accessed are defined up the chain because of three things: functions ('print is a relatively global variable), OOP (all object variables are relatively global within their method functions) and loops (all function or object variables are relatively global in a loop). So you see "global" variable access in most non-trivial functions in REBOL, and it is that way on purpose. * We fixed this already. FUNCT was added to handle the cases where you didn't want to keep track of the words you were overwriting - it collects set-words and adds them to the function context as locals before the function is built. In most cases in the mezzanine source where this would be appropriate, functions are now defined using FUNCT. In general, we recommend using FUNCT to define your function unless it won't work for you. * The basic premise of this request is wrong for REBOL, and we've proved it. We wrote every mezzanine function we could with FUNCT, and there are still functions that use locals that we can't write with it. The reason for this is that sometimes you need fine-grained control over which words are defined in which context, and those times are frequent. This truth can't be avoided. This request dates back to the C era, back when languages generally didn't have nested functions, module or object scopes, and so on, and so just had "global" and "local" variables. For modern (since the 1980s) languages that have these things, it's really not that simple. Nonetheless, we added the FUNCT function to get the Ruby-like simplicity without the Ruby-like major drawbacks. Yes, FUNCT was actually inspired by Ruby, so use it carefully. |
(0002402)
BrianH 26-Jun-2010 06:34 |
By the way, I can't reference the original request for FUNCT because it is older than CureCode. We originally made the function to simplify development of the R3 GUI, over two years ago. I say "we" because I can't remember whether Carl or I wrote the original function, though I remember the inspiration. The current version dates back to #766. The first public mention of it is here: http://www.rebol.net/r3blogs/0141.html where an old version of it is called FUNC-LOCAL.
You can also use FUNCT with R2, as of 2.7.7 or with R2/Forward. The code is different, but the effect the same. |
(0002403)
PeterWood 26-Jun-2010 14:41 |
Ladislav, as I understand the funct function automates the declaration of all set-words in a function as local. I see that it can be used to mask a symptom of the default to global approach. |
(0002404)
BrianH 26-Jun-2010 21:35 |
"mask a symptom"... OK, I can see that more explanation is needed. For the moment, let's ignore whether this is a good idea and just focus on whether the concept can even apply to the language at all.
Function contexts are fixed in length once the function is created. This is necessary for functions to to be called correctly, because otherwise the number of arguments wouldn't be consistent. All function words are arguments, even the locals - /local is just another refinement. Words are direct-bound, and blocks are not bound at all. This means that for a given block, there is *no such thing as a local context*. Each individual word in the block has its own local context. A block also has no reference to which function it is being executed in, and can be executed by more than one function if passed to DO, IF, WHILE, etc. In REBOL, there are no nested scopes at runtime that affect which context a word is considered to be bound to. Nested scopes are faked, and can only be faked at definition time by applying BIND to blocks of words over and over again, overriding the old bindings - this is knows as definitional binding. We don't have a difference between functions and variables; all functions are assigned to variables, except for anonymous functions, and we call them through these variables. And IF, WHILE, FOR and such are all functions, accessed through variables. So we if we make all words that we find in a function local then the function won't work. So (like Ruby) we would need to limit it to words whose values change in the function, since those are the only ones that matter whether or not they are local. So we look for words that are set, though that also has constraints. When looking at code, you can't assume that the word 'set will be bound to the function SET unless you actually execute the code. And once the code starts executing, any changes to the contexts that the code is bound to won't affect the code, because it has already been bound. For that matter, 'set might not even be set to SET at the time of the definition of your function, or might be then, but not at execution time. So if you want to know whether a word will be set in the DO dialect by just looking at the syntax alone, you check for set-words. There really isn't any other statically determinable approach that will work. So what does this add up to? The Ruby approach, which is the result of applying the Crockford criticism to a comparable language to JS (Ruby), can not possibly be done directly in REBOL, because at runtime there are no nested scopes in REBOL at all. No "local" contexts, no "global" contexts, no "outer" contexts, all contexts are discrete and unrelated to each other. Lexical scoping does not exist at all at runtime - all words are direct-bound. The only time that lexical scoping exists in any way in REBOL is at definition time, and even then it is faked. The reason why it it has to be faked is that REBOL is not compiled, not like JS and Ruby, so any behavior that resembles that of compiled languages can only be approximated or emulated. Since definition-time is the only time that lexical scopes exist at all, any solution to a scoping issue has to happen at definition-time. The FUNCT approach is such a definition-time solution, and the best solution that can be done without compiling the language. If we built in a solution into MAKE function!, it would still have to act like FUNCT because of the reasons given above. So while FUNCT is not a fully native solution, it *is* a solution, not just masking a symptom. In my next message I will explain why FUNCT's behavior is not the default behavior, and why a more directly Ruby-like solution is impossible for REBOL. |
(0002405)
Ladislav 27-Jun-2010 23:14 |
"I see that it can be used to mask a symptom of the default to global approach. " - I disagree. No "masking". All set-words are "local by default" in FUNCT generated functions. Why you call it "masking" is beyond my understanding. Certainly, the MAKE function, or the FUNC function behave differently, so what? They just use a different dialect to specify functions. The unique property of REBOL is that you can pick the dialect that suits your needs best. If you like to have all set-words local, use FUNCT. I admit that this approach still does not declare words that don't appear as set-words to be local, but it is trivial to define a dialect that suits your needs best, if you are not content with any of the dialects available. Don't ask Carl to do that, it is in your power. Only when you define a dialect you find best for your needs, propose it as a viable alternative, and, maybe, you will find out that it will be accepted. |
(0002406)
Ladislav 27-Jun-2010 23:31 |
"How many global "variables" are used in REBOL mezzanine functions? (I haven't seen one yet)." - then you haven't looked. These variables are usually not local in functions:
'func 'parse 'print 'for 'foreach 'probe 'make 'do 'if 'either 'while 'unless 'until 'repeat 'break 'throw 'catch 'funct 'set 'get 'quit 'bind 'bind? 'system 'load 'mold 'return 'pick 'poke 'insert 'change 'append 'first ... (shall I continue?) |
(0002407)
PeterWood 28-Jun-2010 09:43 |
I do, and will continue to, define words in what I believe is the correct context. For me, that is usually an object! or a function!. Sometimes I do want to add words to the "global" context but very rarely one that I define within a function!.
When I was referring to global "variables" in mezzanine functions, I meant those that were defined inside functions. You are no doubt correct, I haven't looked enough. Clearly, I have a different perspective from Brian and you on this issue. It is best to leave it at that. |
(0002408)
BrianH 30-Jun-2010 05:16 |
You *can't* define a word on assignment in REBOL, not like in Ruby. Assignments are just that: assignments, not definitions. Words have to be added-and-bound to a context before they can be used as variables, before any assignments are seen. So you can't define a word in a "global" context from within a function, unless you add the word to the context manually using something like APPEND self TO-WORD "something" and then rebinding the function code so you can use it.
So you are not doing what you think you are doing. Any word in a function is already defined somewhere else, or not defined at all, or overriden by function-local words, or defined by inner code (USE, loops, objects, other functions). You don't define words in outer contexts by assigning them in a function - they are either already defined in that context, or they aren't. And all contexts that aren't function contexts are object contexts, even the "global" ones; R3 only has the two context types. So back to what I was saying... REBOL (by which I mean the DO dialect) is an imperative language. This means that most of the activities that the language does involve changing state. This either means calling external code, modifying series or other structures, or making assignments. Functions in R3 don't have persistent state ("static" variables), so in order for them to do anything other than just being pure functions, they have to reference and change state from outside the function. This means that assignments to outside variables are inevitable in normal REBOL code. Ruby, like most object-oriented languages, is also imperative, so it has to reference outside variables quite a bit. Since Ruby has the define-on-assign feature that is impossible for REBOL, it has to have some way to distinguish the outer variables from the local ones; for this it uses sigils, with 4 levels of sigil to refer to the 4 levels of scope in Ruby: global, class, object and local. This (barely) works for Ruby. In REBOL, there are no fixed number of levels of scope, it can be nested to the limits of memory. But even if we ignore that and try to adopt some sigil system in REBOL, it *still* wouldn't work, because REBOL has no lexical scope at runtime, none at all. There is no "current" object or class, so those sigils wouldn't know what to refer to, and there is no "global" scope at all, so that sigil wouldn't work either. Sigils only work if you can compile the code to resolve those scopes ahead of time (impossible in REBOL) or if you keep some reference to lexical scopes at runtime and do lookup then, what R1 did and why it was 30x slower. So a Ruby-style sigil system wouldn't work in REBOL, and without some way of referencing outer scopes neither REBOL nor Ruby would work at all without switching to pure functions. Ignoring a Ruby-style system (as we must), why isn't FUNCT behavior the default? It turns out that it works better this way. While it is easy to fake Ruby-style set-word declaration in REBOL (without actually supporting it) with a little preprocessing code that fills in explicitly defined locals, it is much more difficult to do the reverse in order to *not* define locals when you don't want to override globals and MAKE function! overrides them by default. And since once the function is created there are no nested scopes anymore, any syntactic workarounds to use globals wouldn't MOLD, so the function source wouldn't make sense or be recoverable. We've compared the code, and the alternative to using a mezzanine for FUNCT is less efficient than what we do now, in real-world code. Regardless of whether we keep FUNCT mezzanine, we could in theory call it FUNC and call the old function of that name FUNCT. The problem is that this preprocessing is unnecessary for most functions, and a lot of common functions really would be broken by it. Even if FUNC and FUNCT were native, the non-preprocessing function would be preferred at runtime. This is why FUNCT is not generally used for functions with no undeclared locals, and why even when the source of R3 uses FUNCT internally, the generated source uses FUNC. On the plus side, the advantage of keeping FUNCT mezzanine has been to make the preprocessor do even more semantic tricks. In addition to faking Ruby-style assignment declarations, FUNCT/with also allows you to bind words to an object or object spec, for emulating static local variables (C or functional-imperative OOP), friend functions (C++), and other tricks from other languages. Mezzanine code has made it much more flexible and powerful. I hope this helps you understand why the current solution was chosen, why what you request is not even definable in REBOL, and what the possible alternatives are. |
(0002412)
PeterWood 30-Jun-2010 16:32 |
In the following console session the word Z is set in a function which is set in an object. It is not defined anywhere else. The word Z is accessible globally. I believe this behaviour is less desirable than such a word being local to the function. If it can't be done in REBOL then it can't be done.
>> a: make object! [b: func [] [z: 12345]] == make object! [ b: make function! [[][z: 12345]] ] >> c: make object! [d: func [] [print z]] == make object! [ d: make function! [[][print z]] ] >> a/b == 12345 >> z == 12345 >> c/d 12345 |
(0002413)
BrianH 30-Jun-2010 20:59 |
It can be done in REBOL, or rather it can be faked. In a fresh console:
>> a: make object! [b: funct [] [z: 12345]] == make object! [ b: make function! [[ /local z ][z: 12345]] ] >> c: make object! [d: func [] [print z]] == make object! [ d: make function! [[][print z]] ] >> a/b == 12345 >> z ** Script error: z has no value >> c/d ** Script error: z has no value ** Where: d ** Near: c/d Console commands are treated as individual user scripts. The user script mode is there to act like R2, or at least fake it, so they do a couple tricks that make it seem like we still have system/words. All words in a script are defined in the user context with INTERN at LOAD time. This means that the first command defines 'z in the user context simply by the 'z being there, as would the second and fourth commands if 'z wasn't already defined. This is why the script error is "z has no value"; if 'z wasn't defined, the error would be like this: >> get to-word "x" ** Script error: x word is not bound to a context ** Where: get ** Near: get to-word "x" If you write your code as a module then you won't get the define-all-words-found behavior. Like this, with the erroneous statements wrapped in TRY so you see all the errors (notice the FUNC in a/b): >> do "rebol [type: module] a: make object! [b: func [] [z: 12345]] c: make object! [d: func [] [z]] print try [a/b] print try [z] print try [c/d]" Module: "Untitled" Version: none Date: none ** Script error: z: word is not bound to a context ** Where: b try do applier make catch either either applier do ** Near: a/b ** Script error: z word is not bound to a context ** Where: try do applier make catch either either applier do ** Near: try [z] print try [c/d] ** Script error: z word is not bound to a context ** Where: d try do applier make catch either either applier do ** Near: c/d Note that 'z isn't defined anywhere at all; this is helpful for finding references to undefined words. If you want the defined-locally behavior, use FUNCT: >> do "rebol [type: module] a: make object! [b: funct [] [z: 12345]] c: make object! [d: func [] [z]] print try [a/b] print try [z] print try [c/d]" Module: "Untitled" Version: none Date: none 12345 ** Script error: z word is not bound to a context ** Where: try do applier make catch either either applier do ** Near: try [z] print try [c/d] ** Script error: z word is not bound to a context ** Where: d try do applier make catch either either applier do ** Near: c/d The behavior of modules is strict like that because undefined words are considered to be errors, and errors are your friends: You want them to be triggered so you can figure out where to fix your code. Modules are designed to be reliable, for programming in the large (PITL). The behavior of scripts is loose like that because sometimes you just want to get something done quickly, and because most R3 programmers are familiar with R2 behavior. If we didn't have script mode, porting R2 scripts (and programmers) to R3 would be too much of a hassle, and we would end up with R3 not being adopted by either R2 programmers or programmers of other scripting languages. R3 is both for PITL and its own scripting language. So what you ask can be done in REBOL, sort of. All you have to do is add a single t character to your script and you get the behavior you desire. Or you can write modules and the behavior that you consider to be bad will be nicely flagged for you by the errors triggered. Best of both worlds :) |
(0002419)
PeterWood 2-Jul-2010 10:35 |
No what I am asking for is that the behaviour of scripts be changed so that words "defined" in a function are created in the local context. I know that I can define words to be local with the /local refinement, I know I could be lazy and use funct. I am asking for this to be considered because I believe that "default to local" will promote better programming habits and is less prone to error.
Though personally, I would be even happy to see the same errors thrown when defining words in functions in scripts as in modules, I suspect that would be too strict for many. (Though, I do see the advantage in terms of speed). As for whether, changing this behaviour would make porting scripts from R2 to R3 any more hassle, I don't think that it would make much difference. I believe that there is not much use of words defined in functions outside of the function. I looked at a handful of scripts in the library, two of them are good examples of why "default to local' would be better behaviour - they both define "global" words in a function that are not, nor appear to be designed, to be used out side of the function. They are http://www.rebol.org/view-script.r?script=balls.r and http://www.rebol.org/view-script.r?script=substr.r |
(0002428)
Sunanda 2-Jul-2010 20:10 |
This is not a scope issue so much as a security one.
If I have this code.... counter: 0 do http://www.rebol.org/view-script.r?script=balls.r ....Should script-balls.r be able to access, alter, redefine or unset counter? For safe, secure, and flexible programming, I'd like the choice -- via a Security policy setting. |
(0002429)
BrianH 2-Jul-2010 21:34 |
R3's security is based around modules, and hidden or read-only fields if necessary. If you don't want a variable accessed or modified by unauthorized code, hide it. If you define an unhidden, unprotected variable in the script context then yes, absolutely, scripts should be able to access it, because that is what you asked for. If you don't use the built-in protections then we have to assume that you are not doing so on purpose. R3 is definitely a do-what-I-say language, as a rule.
This is all qualified by the knowledge that the built-in protections aren't done yet, and that the missing parts already have tickets for them. |
(0002430)
PeterWood 3-Jul-2010 08:11 |
So the approach to one aspect of security in R3 appears to be unsafe by default. Surely, REBOL would be far better served by a safe by default approach? |
(0002511)
Carl 21-Sep-2010 20:33 |
I've dismissed this ticket. The REBOL model was designed in the best way possible (not somehow left to chance as the above might imply.)
The reality is that REBOL has no "globals" at all, just some default binding rules... that the user can modify if so desired! Non-local variables are bound to a context. That context is selected during the loading process. You can modify that process if you want. I've talked to Douglas Crockford for many years about this issue, and I understand his concern. He is concerned about "escaped variables" -- that is, variables that are unintentionally made non-local. This can actually happen in more than one way. However, unlike most other programming languages, in REBOL we have complete control over how binding happens, so I don't think there's anything to stop a user from designing a function-definition methodology of their own. Along these lines, in R3 we've introduced a mezzanine, FUNCT, that helps by making local definitions the default, rather than the opposite. (Let's see you do that in your other languages where function defining functions aren't even a known concept.) |
(0002544)
PeterWood 27-Sep-2010 17:11 |
I can see that "escaped variables" are not a problem for experienced Rebollers but I believe that they lead beginners blindly down a rocky road.
By the way, I don't think that it's so difficult to write functions that define function in other dynamic languages these days. Here is a quick and nasty JavaScript function which defines a function. I'm pretty sure that it could be expanded to identify all implicit variable declarations within the code and declare them as locals. var i; var defineFunction = function defineFunction(args, locals, code) { var funcStr = 'var tempFunc = function ('; for (i = 0; i < args.length; i++) {funcStr += args[i] + ', ';} if (args.length > 0) {funcStr = funcStr.slice(0,-2);} funcStr += ') { '; for (i = 0; i < locals.length; i++) {funcStr += 'var ' + locals[i] + '; ';} funcStr += code; funcStr += '}'; eval(funcStr); return tempFunc; }; var myFunc = defineFunction([], [], "alert('myFunc says hello');"); myFunc(); var myAddFunc = defineFunction(["a", "b"], ["c", "d"], "alert(a + b);"); myAddFunc(1, 2); |
Date | User | Field | Action | Change |
---|---|---|---|---|
27-Feb-2013 12:29 | Ladislav | Comment : 0002405 | Modified | - |
27-Sep-2010 17:11 | PeterWood | Comment : 0002544 | Added | - |
21-Sep-2010 20:33 | carl | Comment : 0002511 | Added | - |
21-Sep-2010 20:17 | carl | Status | Modified | problem => dismissed |
3-Jul-2010 14:52 | PeterWood | Comment : 0002430 | Modified | - |
3-Jul-2010 08:11 | PeterWood | Comment : 0002430 | Added | - |
2-Jul-2010 21:37 | BrianH | Comment : 0002429 | Modified | - |
2-Jul-2010 21:34 | BrianH | Comment : 0002429 | Added | - |
2-Jul-2010 20:10 | sunanda | Comment : 0002428 | Added | - |
2-Jul-2010 10:35 | PeterWood | Comment : 0002419 | Added | - |
30-Jun-2010 21:26 | BrianH | Comment : 0002413 | Modified | - |
30-Jun-2010 20:59 | BrianH | Comment : 0002413 | Added | - |
30-Jun-2010 16:32 | PeterWood | Comment : 0002412 | Added | - |
30-Jun-2010 05:16 | BrianH | Comment : 0002408 | Added | - |
28-Jun-2010 09:44 | PeterWood | Comment : 0002407 | Modified | - |
28-Jun-2010 09:44 | PeterWood | Comment : 0002407 | Modified | - |
28-Jun-2010 09:43 | PeterWood | Comment : 0002407 | Added | - |
27-Jun-2010 23:42 | Ladislav | Comment : 0002406 | Modified | - |
27-Jun-2010 23:41 | Ladislav | Comment : 0002406 | Modified | - |
27-Jun-2010 23:35 | Ladislav | Comment : 0002406 | Modified | - |
27-Jun-2010 23:34 | Ladislav | Comment : 0002406 | Modified | - |
27-Jun-2010 23:33 | Ladislav | Comment : 0002406 | Modified | - |
27-Jun-2010 23:31 | Ladislav | Comment : 0002406 | Added | - |
27-Jun-2010 23:22 | Ladislav | Comment : 0002405 | Modified | - |
27-Jun-2010 23:22 | Ladislav | Comment : 0002405 | Modified | - |
27-Jun-2010 23:17 | Ladislav | Comment : 0002405 | Modified | - |
27-Jun-2010 23:14 | Ladislav | Comment : 0002405 | Added | - |
26-Jun-2010 21:38 | BrianH | Comment : 0002404 | Modified | - |
26-Jun-2010 21:35 | BrianH | Comment : 0002404 | Added | - |
26-Jun-2010 14:41 | PeterWood | Comment : 0002403 | Added | - |
26-Jun-2010 06:34 | BrianH | Comment : 0002402 | Added | - |
26-Jun-2010 05:55 | BrianH | Priority | Modified | normal => none |
26-Jun-2010 05:55 | BrianH | Status | Modified | submitted => problem |
26-Jun-2010 05:55 | BrianH | Category | Modified | Native => Mezzanine |
26-Jun-2010 05:55 | BrianH | Description | Modified | - |
26-Jun-2010 05:50 | BrianH | Comment : 0002401 | Modified | - |
26-Jun-2010 05:35 | BrianH | Comment : 0002401 | Added | - |
25-Jun-2010 08:44 | Ladislav | Comment : 0002397 | Added | - |
25-Jun-2010 02:23 | PeterWood | Ticket | Added | - |