Friday, 14 November 2008

COMPLETELY off-topic...

...but if this doesn't gob-smack you, you're dead inside:
http://news.bbc.co.uk/1/hi/sci/tech/7725584.stm
I mean, we've known they were there, in theory, at least since Newton; we've known they were there, by observation (star wobbles, etc.), for 20 years. But this is direct observation. We can see them. To me, this is a Neil Armstrong moment.

Sunday, 19 October 2008

Unobtrusiveness

Another link-post today (although again a link to a snippet I've written, so not completey OT) since most of my writing time is going to the Unofficial Prototype & script.aculo.us wiki at the moment: How To - Using Unobtrusive JavaScript

Thursday, 16 October 2008

Minimizing Download Times

Hello all,

That's right, first post since...wow, since April. And it's not even a post, it's sort of a link-post.

I've been doing some work helping build the Prototype user community (moderating the user discussion group, creating an unofficial wiki, that kind of thing) and as part of that I've been doing little mini-articles, much like the ones I expected to do here.

So if you're writing web applications or web pages and you're interested in minimizing the download times for your scripts, check out this article I posted over there: Tip - Minimizing Download Times

Tuesday, 1 April 2008

You must remember 'this'

Of all the tech blogs in all the sites in all the worldwide web, you walk into mine...

If you hang out in JavaScript-oriented newsgroups like these for any length of time, you will eventually see some variation of this question:

Hey, why doesn't this work?
function MyWidget(name)
{
this.name = name;
this.element = null;
}
MyWidget.prototype.showName = function()
{
alert('The name is ' + this.name);
}
MyWidget.prototype.hookElement = function(element)
{
this.element = element;
Event.observe(this.element, 'click', this.showName);
}
function test()
{
var widget;

widget = new MyWidget('Test Name');
widget.hookElement(document.getElementById('testDiv'));
}
"testDiv" is a div in the document, and I know that I'm not calling the test() function before the DOM is loaded, so why is it when I click the div I get the message "The name is undefined"?!
(As always, I'm using some convenience syntax in the above for hooking up the event handler.)

The OP (original poster) might even follow on with:
I even tried changing the observe() line to this:
Event.observe(this.element, 'click', function() {
this.showName();
});
because I heard somewhere that you have to do that, but that's even worse, it causes an error saying this.showName() isn't a function?!
The issue here is that the OP hasn't quite grokked "this" and its special role in the JavaScript world.

I talked a bit about 'this' over here, but I wanted to do a post focussing on the specific pitfall the OP above, like so many of us, has fallen into (forgetting 'this') and how you deal with it.

Let's look at what's wrong with this line first:
Event.observe(this.element, 'click', this.showName); // Wrong
JavaScript doesn't have methods (see link above), and so this.showName just returns a function reference with absolutely no connection to the instance the OP wanted to bind to the element. It's just a function. (Used properly, this is a powerful feature, but in this situation it's causing the OP some trouble.) Recall that showName is defined like this:
MyWidget.prototype.showName = function()
{
alert('The name is ' + this.name);
}
Within the code, 'this' is determined not by where the event handler is set up, but by how the function gets called. Most likely, 'this' will be a reference to the element that was clicked ('testDiv'), because modern browsers use the element related to the event as 'this' within event handlers. Consequently, this.name is undefined unless the element in question happens to have a name attribute.

So to get the intended effect, you have to wrap the call to this.showName() so that 'this' is the MyWidget instance when the code gets executed -- you must remember 'this'. Which is probably what the OP heard about when he tried this:
Event.observe(this.element, 'click', function() {
this.showName();
}); // Still wrong
This is getting closer, and in fact it would work if we were using a variable to reference the widget rather than 'this', but because it's 'this', we actually still have exactly the same problem we had before: When the event handler gets called, 'this' is the element, not the widget, and so there's no showName() function to call.

So how do we deal with this? Well, here's one approach I've seen to rewriting the hookElement function:
MyWidget.prototype.hookElement = function(element)
{
var self;

this.element = element;
self = this;
Event.observe(this.element, 'click', function() {
self.showName();
});
}
This works because we're no longer using 'this' within the event handler, we're using 'self' (the event handler has access to 'self' because it's a closure; more here). So that solution works. I can't say I like it, though. It just feels...hacky, I guess. But still, it works, and although it looks a bit funny the first time, if you're familiar with the idiom you read right past it thereafter. You just need to be sure the closure isn't unnecessarily preserving some other big amount of data from elsewhere in the function.

Personally, though, I prefer using a reusable "binding" function. Many JavaScript toolkits have these (such as Prototype's bind() and bindAsEventListener()), but it's not complicated:
function bind(f, obj)
{
return function() {
return f.apply(obj, arguments);
};
}
This is a function factory: It creates functions that, when called, will call the given function with the given object set as 'this' (using JavaScript's convenient apply() function; insert your own "The fundamental things apply" joke here). Now we can rewrite the OP's hookElement function like so (changes from the original at the top in bold):
MyWidget.prototype.hookElement = function(element)
{
this.element = element;
Event.observe(this.element, 'click',
bind(this.showName, this)
);
}
You might be wondering why we have to specify 'this' twice. Remember that this.showName just returns a function reference, with nothing about the instance (we could replace this.showName in the above with MyWidget.prototype.showName if we liked). If we want bind() to know what instance we want to bind the function to, we have to specify it -- the this at the end.

And that's it! Now the event handler works as the OP expected it to.

Saturday, 29 March 2008

What's in a name?

Micro-post today, folks:

You see a fair bit of confusion from newbies in mailing lists around the "name" attribute vs. the "id" attribute. (Edit 21 March 2010: Confusion not helped by a bug in Internet Explorer; more here.) For instance, I recently saw a "how do I do this" -style post asking how to deal with a form, with the sample being:

<form name='form1'>
<label><input type='checkbox' id='cb1'> Checkbox 1</label>
</form>
The poster wanted to know how to retrieve the form and submit it (along with some further information) via Prototype's Ajax.Updater. So he wanted to use the serialize() method Prototype adds to form elements when it extends them (e.g., when you retrieve the element using $()).

Consequently, the use of "id" and "name" attributes in his example was exactly backward: He wanted to be able to retrieve the form using $(), which is a general-purpose routine that retrieves elements by their unique ID, and then have the form fields submitted -- but the form field had no name.

Here's the mantra:
Elements have IDs; form fields have names.
IDs are unique in the document; form field names are not (necessarily).

Two further notes:

1. Form fields can also have IDs if you need to refer to their elements in your code (to enable/disable them, etc.), but in terms of sending in a form, fields have names.

2. You don't have to use an ID to get at the element for a form; you can use a name and then get at the form element via document.YourFormName in your JavaScript code. But nowadays we mostly look elements up by their unique IDs; and in the specific case of the question from the poster above, since he was going to want to use $(), he would want an ID.

Monday, 24 March 2008

Mythical methods

We frequently talk about JavaScript objects having methods. This is just a convenient myth. JavaScript has functions, but it doesn't have methods. It doesn't need them. Its functions, combined with some syntactic sugar, are more than up to the job.

What is a "method"? I'd have to say that the definition given on Wikipedia is pretty good. As of this writing, it says a method is

...a subroutine that is exclusively associated either with a class...or with an object...
Yeah, JavaScript doesn't have any of those. Granted it seems to have them. For example:
function Guess(killer, location, weapon)
{
    this.killer = killer;
    this.location = location;
    this.weapon = weapon;
}
Guess.prototype.accuse = function()
{
    alert('It was ' + this.killer +
          ' in the ' + this.location +
          ' with the ' + this.weapon +
          '!');
};

function testGuess()
{
    var mustardStudyLeadPipe;
    var plumHallRope;

    mustardStudyLeadPipe = new Guess(
        'Colonel Mustard',
        'study',
        'lead pipe');
    mustardStudyLeadPipe.accuse();

    plumHallRope = new Guess(
        'Professor Plum',
        'hall',
        'rope');
    plumHallRope.accuse();
}
Running testGuess does indeed show us
"It was Colonel Mustard in study with the lead pipe!"
and then
"It was Professor Plum in hall with the rope!"
so that looks an awful lot like accuse is a method of Guess objects.

Except it isn't. Let's add a bit to our testGuess function (new bits in bold):
function testGuess()
{
    var mustardStudyLeadPipe;
    var plumHallRope;
    var accuse;

    mustardStudyLeadPipe = new Guess(
        'Colonel Mustard',
        'study',
        'lead pipe');
    mustardStudyLeadPipe.accuse();

    plumHallRope = new Guess(
        'Professor Plum',
        'hall',
        'rope');
    plumHallRope.accuse();

    accuse = mustardStudyLeadPipe.accuse;
    accuse();
}
What does the final call to accuse show? Does it accuse Colonel Mustard? No, all we've done is get a reference to the accuse function into our local variable, there's nothing there (or in the function definition) that refers to the mustardStudyLeadPipe instance or indeed anything related to Guess. No, what this shows will depend on the HTML document in which it's found, but it'll be something along these lines:
"It was undefined in the http://blog.niftysnippets.org with the undefined!"
Why? Because we didn't do anything to define "this" within the function. (I'll come back to details on why we'd get seemingly-odd alert that includes a URL in a bit.)

There are three main things about JavaScript that make it seem to have methods (leaving aside prototypes and "classes" for the moment): The "this" keyword, the fact that object properties can refer to functions (since they are, after all, just objects), and the fact that when you call a function using an expression that gets the function reference from an object property (e.g., object.functionName() or object['functionName']()), the object is set automatically as "this" within the function call.

Let's look at each of those.

The "this" keyword: This keyword looks familiar if you're coming to JavaScript from a background in C++, Java, C#, and the like, where "this" within a method is guaranteed to refer to an instance of the class defining the method (or a subclass). And in JavaScript, "this" does refer to an object instance, but that's where the similarity ends. Other than the name being the same and it referencing an object, it bears no relation to the "this" keyword in class-based languages like C++, Java, or C#. In fact, in many ways, "this" is actually just a function argument that is supplied in a non-obvious (but convenient!) way.

Functions as properties: Let's look again at one line from the code above:
mustardStudyLeadPipe.accuse();
If we were talking about that line of code, we'd probably say "...the accuse method of the mustardStudyLeadPipe object...", which is a useful and convenient way to put it. A more painstakingly-geeky-accurate way of putting it, though, would be "...the function referenced by the accuse property of the mustardStudyLeadPipe object..." Not that anyone's going to say that, but that's really what's going on. Objects don't have methods, they have properties; it's just that a property can refer to a function, since functions are objects like everything else.

Functions called via property references get "this" set for them: This is the part that really makes it seem like JavaScript has methods: The "this" reference gets set automagically to the object instance if you call a function via a property reference. Let's look at that call again:
mustardStudyLeadPipe.accuse();
This does three completely distinct things: Firstly, it identifies a function to call by getting the function reference from an object property; secondly, it says to call the function and return its result rather than just get a reference to it (the parentheses do that); and thirdly, it says that within that call, use the object in question as "this". These completely distinct things are combined in that notation for our convenience. Getting the function reference from an object property doesn't link it in any way to the object the property came from (as our accuse test above confirmed), it just gets a reference the function; the JavaScript engine treats calls of functions just retrieved from object properties as special and sets up "this" accordingly, but it has nothing to do with the function being called.

Lets prove that another way:
function testGuess2()
{
    var plumHallRope;
    var fakeGuess;

    plumHallRope = new Guess(
        'Professor Plum',
        'hall',
        'rope');
    plumHallRope.accuse();

    fakeGuess = {};
    fakeGuess.location = 'library';
    fakeGuess.killer = 'Mrs. Peacock';
    fakeGuess.weapon = 'revolver';
    fakeGuess.demo = plumHallRope.accuse;
    fakeGuess.demo();
}
This accuses Professor Plum as before, and then:
"It was Mrs. Peacock in the library with the revolver!"
The exact same function produces the alerts in both cases, it's purely the way it was called that defined "this". The fakeGuess object isn't even really a Guess -- "fakeGuess instanceof Guess" returns false -- but it has all of the properties the function expects (killer, location, weapon), and so it works just fine (this is something we'll come back to in a later post). Essentially, "this" is just a function argument that's passed into the function as an implicit feature of calling a function via a property reference.

We can also do it explicitly. JavaScript gives us the call and apply methods on function instances, with which we can say explicitly what we want "this" to be. (They do the same thing, they just provide different ways to specify the function's arguments.) If you say myfunction.call(myobject), you're saying "call myfunction and use myobject as 'this'", which is what the property-retrieval stuff does implicitly for you. In fact, this:
plumHallRope.accuse();
equates to
plumHallRope.accuse.call(plumHallRope);
Now all three of the parts we identified above are shown distinctly: Getting the function reference from the property (plumHallRope.accuse) is distinct from the fact we're calling it (call) is distinct from what "this" should be ((plumHallRope)). The fact that these are distinct can be made clearer:
var f;
f = plumHallRope.accuse;
f.call(plumHallRope);
Alternately, here's a more dramatic example:
function testGuess3()
{
    var plumHallRope;
    var fakeGuess;

    plumHallRope = new Guess(
        'Professor Plum',
        'hall',
        'rope');
    plumHallRope.accuse();

    fakeGuess = {};
    fakeGuess.location = 'library';
    fakeGuess.killer = 'Mrs. Peacock';
    fakeGuess.weapon = 'revolver';

    plumHallRope.accuse.call(fakeGuess);
}
Not to bang on about it, but the call at the end accuses Mrs. Peacock, not Professor Plum. The plumHallRope instance was only used to get the function reference, it wasn't used within the function call. In our original testGuess function, we could even do this if we wanted to:
mustardStudyLeadPipe.accuse.call(plumHallRope);
Which accuses Professor Plum. But it's more convenient, isn't it, to say plumHallRope.accuse?

So if "this" is really just a sort of obscure function argument, why have it? Why not just write everything as global functions and pass in the object to act on as the first argument? We could do that (and in fact I did it for years, as a C programmer), but it's more cumbersome. You have to know what functions are meant to be used with what kinds of objects, there's all sorts of stuff littering up the global namespace, etc., etc. By passing around object references, which have properties on them (usually inherited from their prototype; again the topic of an upcoming post) that reference functions intended for use with them, it's just so much more convenient.

Okay, but what was that about that URL showing up earlier when we called accuse? You'll remember earlier when we ran this code:
    accuse = mustardStudyLeadPipe.accuse;
    accuse();
we got the alert with the URL in it. So why did that happen, and where did the URL come from? It was because we didn't give an object to use for "this", and so the function call got the default, which is the JavaScript global object. In browser implementations, the global object is the window object -- so we were showing the values of window.killer, window.location, and window.weapon. In a typical situation, window.killer and window.weapon will be undefined, and of course window.location is the URL of the document in the window.

I chose that example intentionally, because it's a pitfall people fall into a lot: Losing "this". Usually, people lose "this" in the context of an event handler -- e.g., they have an instance (plumHallRope, perhaps), and want to hook up a "method" on it (say, accuse) to an event (a click handler for a button, maybe?), and so understandably they do something like this (I'm again using convenience syntax, as I describe here):
Event.observe('theButton', plumHallRope.accuse); // WRONG
The problem being, that just references the function, nothing about its context. Its context will be determined by how it's called. Earlier when we called accuse directly, because we didn't do that via property retrieval or call or apply, "this" was the global object; event handlers like this one will usually get "this" set to the element (although not if you're using old DOM0 -- onclick attribute -- events or IE's attachEvent function). To maintain its context, you have to do something like this:
Event.observe('theButton', function() { plumHallRope.accuse(); });
The function that gets called by the event handler then turns around and calls the accuse function such that plumHallRope is "this", so you've preserved "this" using a wrapper (which is also a closure; details). (Note that thanks to a bug in Internet Explorer, that may well cause a memory leak for your IE users. Prototype's Event.observe() does some things to minimize the issue; other frameworks will have other helpers along those lines.)

In Conclusion:

I've seen people deride JavaScript for having "fake" methods, but I think that's missing the point. It doesn't have fake methods, it just doesn't have methods at all. What it does have is very powerful, flexible functions and some convenient syntactic sugar that lets us express the 90% case -- calling a function related to an object instance passing in that object instance as an argument -- in a compact and expressive way, but without limiting our use of the functions involved. By not limiting our use, we can use just about anything as a mixin, we can use duck typing, easily create wrapper objects to enforce or verify contracts, etc.

Like so many things about the language, it's confusing until you grok just how simple it is, and then you start appreciating that it's simple but powerful.

Convenience Syntax in Examples

This is a "meta" post, e.g., it's about the blog itself.

In many of my posts, I give code examples to illustrate a point. To avoid cluttering the examples up with browser-specific stuff, I'm using a couple of calls from the Prototype library.

Actually, right now, just the one: Event.observe(). This function hooks up an event handler in a way that is consistent across browsers, even Internet Explorer, which is good because IE doesn't support the standard addEventListener() function, and its own attachEvent() function behaves slightly differently (within the handler, "this" is the global object, like in old DOM0 handlers; whereas with addEventListener it's the element you're hooking the event to). Prototype's Event.observe() ensures that "this" is always the element, even on IE, and also does some housekeeping to try to mitigate an IE bug where it tends to leak memory if you use closures as event handlers (which is incredibly common).

If I use more Prototype stuff, I'll list it here as well, but this is only about trying to keep the examples simple and clear, not about using Prototype or any other specific library...

Tuesday, 18 March 2008

The Horror of Implicit Globals

This code should cause an error, right?

function doSomething()
{
var x;

x = 10;
y = 20;

alert('x = ' + x);
alert('y = ' + y);
}
Well, you'd think so, wouldn't you? (There's no declaration for y.) It doesn't, though. It also probably doesn't do what the author intended. Welcome to The Horror of Implicit Globals. The good news is that there's something we can do about it.

Update 2010/03/31: Please see the update at the end of the article, there's good news on this front.
So what did the code really do? Well, in a simple test, it would seem to do what the author intended, in that it shows two alerts:
  • x = 10
  • y = 20
But x and y are completely different; x is a local variable within the function, whereas y is created as a "global variable". Said "global variable" is not declared anywhere at all, it exists purely because it was assigned to by this function. Functions can create global variables willy-nilly with no advance declaration or indeed, with no declaration at all.

This is clearly a Bad Thing, so why would the language designers let it happen? Well, in some sense I think it's a side-effect. To see why we have the possibility of implicit globals, we have to delve into objects and properties.

If you've done any JavaScript stuff with objects, you know that you can assign a property to an object simply by, well, assigning it:
myobject.myprop = 5;
This sets the property myprop on the object myobject, creating a new property if the property didn't exist before. This is simple, it's convenient, and it's part of the power of JavaScript that objects can just acquire properties without any preamble.

But what does this have to do with implicit global variables? Well, you see, JavaScript doesn't have global variables. No, seriously. It has a global object (just the one), and that object has properties. Those properties are what we think of as global (or page-scope) variables. You might think that this is a distinction without a difference, but if it were, the code shown at the beginning of this post would fail because y isn't declared. It doesn't fail, because by assigning y a value, we created a property on the global object.

"Now hang on," you're saying, "there's nothing in that code referencing some kind of 'global object'." Ah, but there is. And that brings us to the scope chain.

I've mentioned the scope chain before. The scope chain is how JavaScript resolves unqualified references: In any given execution context (e.g., global code or function), there is a chain of objects that the JavaScript engine looks to when resolving an unqualified reference: It first checks to see if the top object has a property with the given name, and uses that property if it does; if not, it checks the next object in the chain, etc. The global object is always the last object in the chain. If the engine gets all the way down to the global object and we're assigning a value to the unqualified reference, the reference is assigned as a property of the global object. And since properties don't have to be declared in advance, voilá, implicit globals.

We can see this in practice. The JavaScript specification says simply that there is a global object and that it's always the root of the scope chain, but it doesn't say what it is -- which makes sense as JavaScript is a general-purpose language. In browser-based implementations, though, the global object has a name: window. Technically, the symbol window in browser-based JavaScript is a property of the global object that refers to the object itself. I mention this not only because it's useful to know, but because it lets us prove to ourselves that "global variables" are really properties of the global object. Let's add a line to our code from above:
function doSomething()
{
var x;

x = 10;
y = 20;

alert('x = ' + x);
alert('y = ' + y);
alert('window.y = ' + window.y);
}
This starts out by showing the same two alerts we had before:
  • x = 10
  • y = 20
...and then it shows a third:
  • window.y = 20
...because y and window.y both refer to the same property of the same object, the global object.

All of this holds true even if we do declare y at global scope:
var y;
function doSomething()
{
var x;

x = 10;
y = 20;

alert('x = ' + x);
alert('y = ' + y);
alert('window.y = ' + window.y);
}
This shows the same three alerts shown earlier. (There is a very slight technical difference between declared and undeclared globals that relates to the delete statement, but the distinction isn't important here, it's just a flag on the property.)

Now, some readers will be thinking "Cool! This means I don't have to have 'var' statements for my globals! I can save a bit of space on the downloaded script files!" Well, true, you could. It's an awfully bad idea, though. Not only would it be a bit unfriendly to anyone else trying to read your script (I suppose you could address that somewhat with comments that get stripped out before download), but it would also mean you couldn't use any lint tools, and you really, really want to. Because the ramification of all of this is that a simple typo, perhaps:
thigny = 10;
instead of
thingy = 10;
can create a new global variable in your page, introducing a bug that's awfully hard to find. Fortunately, though, people have created various tools to do lint checking on JavaScript, tools that can find that error for you before you even get to the testing stage, much less production.

And so there we are, the horror of implicit globals -- and the relief of knowing that we have a defense against them!

Update 2010/03/31: Great news for anyone who's been hit by the Horror of Implicit Globals. The new ECMAScript 5th edition specification is out, and it introduces "strict mode." One of the (several) things strict mode does is prevent the very thing this article warns about: Implicit globals. If we were using strict mode, the function at the beginning of this page would have a syntax error rather than a very subtle bug. Result!

Monday, 17 March 2008

Now at blog.niftysnippets.org

Just briefly, this blog is now at http://blog.niftysnippets.org, rather than http://niftysnippets.blogspot.com. It's still hosted by Blogger (for now), and Blogger is kind enough to forward you from the old address, but if for some reason you're a regular reader of this blog, it's worth updating your Atom/RSS links just in case someday I end up hosting it myself or moving to Wordpress or something...

When you absolutely, positively need new scope

Just a quick one today, one that in many ways is a solution in search of a problem... ;-)

As you probably know, in most of the languages with vaguely C-like syntax (C, C++, Java, C#, ...), a new block introduces new scope. E.g., this Java code:

String myJavaMethod()
{
String s;

s = "outer string";

{
String s;

s = "inner string";

// (Presumably do something with 's' in here)
}

return s;
}
...while strange, is perfectly valid. It returns a String with the text "outer string". The s variable declared within the inner block is scoped only to the inner block.

A similar-looking JavaScript function has a completely different result:
function myJavaScriptMethod()
{
var s;

s = "outer string";

{
var s;

s = "inner string";

// (Presumably do something with 's' in here)
}

return s;
}
It's still valid, but it returns the string "inner string" rather than "outer string". Blocks do not introduce new scope in JavaScript, only functions do that. (If you're thinking that the second var declaration should be an error, well, I agree -- but it isn't, details here.)

But what if you absolutely, positively need new scope within a function? It's actually pretty easy to achieve; just use a closure. If I really needed my JavaScript function above to behave like the Java version, I could rewrite it like this:
function myJavaScriptMethodUpdated()
{
var s;

s = "outer string";

(function()
{
var s;

s = "inner string";

// (Presumably do something with 's' in here)
})();

return s;
}
Looks weird, eh? What I've done is create an anonymous inline function and then immediately execute it. Since it's a closure (all JavaScript functions are closures), it has access to all of the arguments and variables defined in its containing function except for the ones it's masked by declaring its own (which it's done with s).

Naturally, as this is defining and calling a function, there's a performance aspect although setting up a function call should be fairly quick in any decent JavaScript implementation -- and again, you're only doing this when you absolutely, positively need new scope, right?

So, okay, how is this useful? Well, as I say, this may well be a solution in search of a problem. It mostly gave me an excuse to highlight the fact that blocks don't introduce new scope in JavaScript. And frankly, in most cases if you need to set up new scope within a function like that, it probably means you haven't broken things up into small enough pieces; if you do that, you'll probably find that you don't absolutely, positively need new scope because your pieces are small enough that things are scoped properly.

That said, though, think in terms of sandboxing and namespacing -- if you wrap an entire external library in a closure and then execute that closure, any globally-declared vars in that library become local variables within the closure and can't conflict with others defined at global scope. So there may be something useful there, although it's far from a real sandboxing or namespacing solution...

Sunday, 16 March 2008

A simple point about design

A good friend of mine pointed me at this comic and I just had to share it. I've rarely seen this point put more...simply.

Saturday, 15 March 2008

Closures by example

In an earlier post, I promised a follow-up with a few examples of closures. Before we get to the examples, a quick note: To avoid cluttering things up with browser-specific stuff, I'm using the Event.observe() method from Prototype. Event.observe() hooks up an event handler, allowing for differences between browser implementations (see the link for more detail).

I have to say that I found this post to be a real challenge, and it took me a while to figure out why: I kept trying to boil things down to their essence, and the fact is that if you do that, you end up with a single example, one that demonstrates that a closure function has access to intrinsic data. And let's face it, that's not very interesting. :-)

So rather than boil things down, I'll give a few examples of where closures frequently get used, even though they all pretty much demonstrate the same fundamental point (about the functions having intrinsic data).

Okay, to the examples:

  1. A Simple Bound Event Handler
  2. Enduring References
  3. Enduring References to What
  4. Private Properties
  5. Callbacks
  6. The Inadvertent Closure (an anti-pattern)
  7. Your Examples
#1: A Simple Bound Event Handler

One of the most common uses of closures is to bind a function and some data to an event handler:
function wireUpMessage(element, msg)
{
Event.observe(
element,
'click',
function()
{
alert(msg);
}

);
}
This wireUpMessage function creates a closure (the bit in bold) and hooks it up to the "click" event of the given element, having it show the given message. The closure keeps a reference to the arguments and variables in scope where it was defined, so it has access to the msg argument we passed the wireUpMessage function, even though that function has returned.

Assuming we have a button with the ID "btnSayHey", we can use this as follows:
wireUpMessage('btnSayHey', 'Hey there');
Try it out:



#2: Enduring References

You could easily get the idea from the previous example that the closure we created somehow had the text "Hey there" bound into it literally. That's not true. It's the reference to the context containing the msg argument that's bound to the closure, not the value of that argument. The msg argument's value is evaluated when the closure is executed, not when it's defined. This is important, powerful, and frequently misunderstood. ;-)

Consider this:
function setupCounterButtons(startBtnName, showBtnName, stopBtnName)
{
var counter;
var intid;

counter = 0;
intid = undefined;

function startCounter()
{
stopCounter();
counter = 0;
intid = window.setInterval(function() { ++counter; }, 200);
}
function stopCounter()
{
if (intid !== undefined)
{
window.clearInterval(intid);
intid = undefined;
}
}
function showCounter()
{
var msg;

msg = "Counter = " + counter + " (";
if (intid === undefined)
{
msg += "not ";
}
msg += "running)";
alert(msg);
}

Event.observe(
document.getElementById(startBtnName),
'click',
startCounter
);
Event.observe(
document.getElementById(showBtnName),
'click',
showCounter
);
Event.observe(
document.getElementById(stopBtnName),
'click',
stopCounter
);
}
This function has the local variables counter and intid, three named functions (which are closures), and a fourth anonymous closure we've passed into window.setInterval. The named functions are: startCounter, which starts a 200ms repeating update of the counter local variable; showCounter, which displays the current counter value; and stopCounter, which stops the 200ms update. It hooks these functions up to the buttons whose names we pass into the function. So if we hook up the buttons 'btnStartCounter1', 'btnShowCounter1', 'btnStopCounter1' with this code:
setupCounterButtons('btnStartCounter1', 'btnShowCounter1', 'btnStopCounter1');
...we get to try it out:



This demonstrates two important things: Firstly, the closures have an ongoing reference to the counter variable, not its literal value where they're defined. Secondly, all of the closures are referencing the same context: When startCounter sets the counter and intid values, showCounter sees those changes; when the closure we've enabled via window.setInterval updates counter, we see that update. All closures defined in the same context share access to that same context.

#3: Enduring References to What?

Look again at the previous example and think for a moment about what it is that the closures have access to, which endures even after the setup function has completed. Is it something unique attached to the setup function? Well, if that were true, what would happen if I called the function again and hooked it up to different buttons; would those buttons control and access the same counter that the first set of buttons do? If your impulse is to say "yes," consider that counter is a local variable within the function. Local variables aren't specific to the function for all eternity, right? They only relate to a specific time you've called that function. So if we call it again...yup, that's it, we get a new context, and the closures within that context access that new context, not the previous one.

And so if we hook up a new set of buttons:
setupCounterButtons('btnStartCounter2', 'btnShowCounter2', 'btnStopCounter2');
We get a second counter. Try it out and see whether there's any interaction between the counter controlled by these buttons:



...and the one controlled by the buttons in the previous example.

Right, there isn't any, because they're referencing different counters.

#4: Private Properties

Lots of JavaScript apps these days make extensive use of object orientation. One key OOP principle is information hiding. There are several good reasons for information hiding. One example is to defend against an object instance having an invalid state. Frequently this is done by having private data members and only allowing access to them via accessor methods that can prevent invalid values. Since all properties of JavaScript objects are public, how can we have private data members? The answer, of course, is to use closures, much as we did with the counters above. (Crockford is probably the first to document how you can do this, here.)

Here's an example, a Circle class that ensures that its radius is never allowed to become negative:
function Circle(radius)
{
this.setRadius = function(r)
{
if (r < 0)
{
throw "Radius cannot be less than zero.";
}
radius = r;
};
this.getRadius = function()
{
return radius;
};

this.setRadius(radius);
}
Note that we don't have a "radius" property on this object at all. This code:
var c;
c = new Circle(10);
alert("Radius: " + c.radius); // undefined!!
Shows "Radius: undefined" because the property simply doesn't exist. Try it:



Instead, the setRadius and getRadius methods of the object are closures, both of which have a reference to the context in which they were defined, and they share that context as they were defined in the same scope. So rather than having a "radius" property, they simply use the radius parameter given to the constructor, which they keep a reference to even after the constructor has returned. Consequently, this code works just fine:
var c;
c = new Circle(10);
alert("Radius before update: " + c.getRadius());
c.setRadius(20);
alert("Radius after update: " + c.getRadius());
Try it:



(I should mention that there is a downside to this. When you do this, all Circle objects [for example] have their own copies of the setRadius and getRadius functions, rather than sharing copies on an underlying Circle.prototype object. That's a whole different topic, though.)

#5: Callbacks

Closures get used as callbacks a lot. Of course, event handlers are callbacks and so we've already talked about this, but it's useful to remember non-event-handler callbacks.

One very common example is container objects that offer a method (usually called "each" or "forEach") that will call a callback once for each contained object. Prototype provides this (Enumerable.each) on arrays and several other objects; Dojo has something similar (dojo.forEach); JavaScript 1.6 defines this for arrays (Array.forEach). So if we have an array (say) of Person objects, we can act on each element like this (using Prototype syntax):
personArray.each(
function(person)
{
// Do something with the 'person' object
}
);
Suppose we wanted to build a sublist of only the people who are 65 and over:
var sixtyFivePlus;
sixtyFivePlus = [];
personList.each(
function(person)
{
if (person.age >= 65)
{
sixtyFivePlus.push(person);
}
}
);
That uses a closure to add each 65-and-over person to the sixtyFivePlus array.

In practice, closures are used like this in modern JavaScript all over the place.

#6: The Inadvertent Closure (an anti-pattern)

In Example #1, I said that a very common use of closures was to bind a function and some data to an event handler. Unfortunately, while it may literally be true that that's a very common use, I suspect many times it's not the intent of the programmer creating the closure. I suspect much of the time, the programmer just wanted to bind the function, not any data, to the event handler, but didn't realize they were creating a closure and binding data as well.

Unless a page author is using DOM Level 0 event handling (e.g., onclick attributes in the HTML markup), usually there's a setup function called on page load that hooks up event handlers to the buttons and such using some analog of the Event.observe function I've been using in this post. Sometimes those setup functions are dealing with large data sets, perhaps a big JSON or XML document retrieved via an XMLHttpRequest query, referencing that data with local variables. Then the event handlers are defined within the setup function, which means they're closures, which means...right, they keep a reference to the setup function's context, and so the large data set is retained in memory.

Now, if that's on purpose -- the page is referring back to it periodically, etc. -- that's fine. But if the data set was just to be used for setup, it's a waste of memory. I call this the "Inadvertent Closure" anti-pattern.

Here's an example:
function mySetup()
{
var setupData;
var n;

// Set 'setupData' to some large amount of data used only for setup;
// the below is just a contrived array.
setupData = [];
for (n = 0; n < 1000; ++n)
{
setupData[n] = "Item " + n;
}

// (Presumably use the setup data for something here)

// Hook up some event handlers
Event.observe(
'someButton',
'click',
function()
{
// Do something interesting
}
);
Event.observe(
'someOtherButton',
'click',
function()
{
// Do something else interesting
}
);
// (Etc.)
}
Here, a bunch of setup data is allocated and referenced via the setupData array, and then we hook up a couple of event handlers that don't need a reference to the setupData array.

Well, they may not need it, but they have it, which means that all of that data is sitting around eating up memory unnecessarily.

The good news is that it's easy to fix. There are at least two ways to fix it; the way I don't like, and the way I like. ;-)

The way I don't like: Just "null out" the big data set when you're done with it; in our example, we'd just add this line to the end of the mySetup function:
setupData = undefined;
Personally, though, I prefer modularity: Break out the bit that needs to deal with the big data set into its own function, and the setup of the event handlers in their own function:
function myBetterSetup()
{
doTheBigDataThing();
setupEventHandlers();
}
function doTheBigDataThing()
{
var setupData;
var n;

// Set 'setupData' to some large amount of data used only for setup;
// the below is just a contrived array.
setupData = [];
for (n = 0; n < 1000; ++n)
{
setupData[n] = "Item " + n;
}

// (Presumably use the setup data for something here)
}
function setupEventHandlers()
{
Event.observe(
'someButton',
'click',
function()
{
// Do something interesting
}
);
Event.observe(
'someOtherButton',
'click',
function()
{
// Do something else interesting
}
);
// (Etc.)
}
This give us a clear separation of what we're doing, and as a nice side-effect, prevents the closures from keeping our data set around unnecessarily.

At this point, a couple of you might be thinking "OMG! But that means any time I'm dealing with a big temporary data set, I need to be sure no closures are defined within that context! What if I want to act on the data set with an iteration function like in Example #5?!" Don't worry. Remember that the context is kept around because it has a reference from the closure; when the closure is released and cleaned up by the garbage collector, the context is also released and can be cleaned up by the GC. This only matters if you're keeping an enduring reference to the closure, as in the case of event handlers.

#7 and on: Your Examples!

I've tried to give some overview of closures based on examples of usage, and I hope it's been useful, but I've really only scratched the surface here; what are your examples? Post away!

Monday, 3 March 2008

Poor misunderstood 'var'

It seems most programmers coming to JavaScript from C, C++, Java, and the like equate the var statement with variable declaration statements in the languages they come from and use it the same way. And at some casual level that's reasonable; but it can lead you down a misleading path...

Consider this code:

function foo()
{
var ar;

// ...do some stuff, create an array in 'ar'...

for (var index = 0; index < ar.length; ++index)
{
doSomethingWith(ar[index]);
}
}
This is a common idiom, but a misleading one. You might think that index is only defined within the for loop (that's certainly the impression we're giving in the code). But it's not true: In fact, index is defined throughout the function -- within the loop, outside the loop, above the loop, and below the loop. The var statement defines a variable within the current scope (all of it, not just "from here on"), and unlike some other languages, in JavaScript blocks don't have any effect on scope; only functions introduce a new scope.

Consequently, the above function can be written as it is above, but also with the index declaration...

...at the top:
function foo()
{
var ar;
var index;

// ...do some stuff, create an array in 'ar'...

for (index = 0; index < ar.length; ++index)
{
doSomethingWith(ar[index]);
}
}
...at the bottom:
function foo()
{
var ar;

// ...do some stuff, create an array in 'ar'...

for (index = 0; index < ar.length; ++index)
{
doSomethingWith(ar[index]);
}

var index;
}
...anywhere in the middle:
function foo()
{
var ar;

// ...do some stuff, create an array in 'ar'...

for (index = 0; index < ar.length; ++index)
{
var index;
doSomethingWith(ar[index]);
}
}
...or even all of them!
function foo()
{
var ar;
var index;

// ...do some stuff, create an array in 'ar'...

for (var index = 0; index < ar.length; ++index)
{
doSomethingWith(ar[index]);
}

var index;
}
We can get away with that last one because a var statement defining a variable that already exists in the current scope does not replace the variable (this is what keeps you from accidentally masking your function's arguments, or even the arguments array that's provided for you).

This seems like an odd way to define the var statement until you get into the plumbing of JavaScript and how it sets up calls to functions. You can get into some of that by reading my earlier post, Closures are not complicated, but the net effect of the plumbing is that all var statements are treated as though they were at the top of the function (if they have initializers, those become assignments and stay where they are).

So does that mean that the common idiom of declaring an indexer within the loop statement is "wrong"? Well, that's a matter of perspective, and the older I get the more experience I accumulate, the less I think in terms of absolutes like right and wrong. The language spec allows it, so in that sense it's not "wrong". In some ways, it's sort of a shorthand way of telling the next person reading the code that you're going to use it for the loop (and only for the loop, right?), so in that sense perhaps it's not "wrong".

But the further your code gets from expressing what's really happening, the easier it is for someone reading the code later (perhaps you!) to get the wrong end of the stick and introduce a problem. For example, suppose you have a 30-some-odd-line function and the loop appears in within the body of a conditional about two-thirds of the way down:
function foo(someArray)
{
var thingy;
var otherThingy;

// ...20 lines of code...

if (thingy > otherThingy)
{
for (var index = 0; index < someArray.length; ++index)
{
doSomethingWith(someArray[index]);
}
}

// ...10 more lines of code...
}
Six months after you write this, Mike edits the function and needs to remember the index of something at the top so he can do something with it at the bottom; he declares an "index" variable, sets index at the top, and then uses it at the bottom, having missed the loop:
function foo(someArray)
{
var thingy;
var otherThingy;
var index;

index = findSomething(someArray);

// ...20 lines of code...

if (thingy > otherThingy)
{
for (var index = 0; index < someArray.length; ++index)
{
doSomethingWith(someArray[index]);
}
}

// ...10 more lines of code...

restoreSomething(someArray, index);
}
Mike's introduced a bug, an irritating, intermittent bug. Sometimes the restoreSomething call at the end fails for some reason; not always, mind, but sometimes. (Because index gets set by the loop, but only when thingy > otherThingy.)

Obviously, this bug could have been avoided if Mike had read through the entire function carefully before making his mods. Or if you'd chosen a different name for your index variable (in hopes of reducing the odds of Mike using it). Or it could have been caught by thorough unit tests that explore all conditions (and then Mike would have to go back and fix it).

But let's throw Mike a bone, eh? If we declare the variable in the text in the same place it's defined by the interpreter at runtime, we help him avoid making the mistake in the first place. And we like Mike, we don't want to trip him up...right?

Regardless of your decision about how to write your code, though, understanding what var is really doing can help you get that code doing what you want it to do.

Friday, 29 February 2008

Closures are not complicated

Note December 2010: The terminology I attribute to the ECMAScript spec below was from the then-current 3rd edition. The current 5th edition spec (there was no 4th edition) uses different terminology. At some point I'll get around to updating...)

Closures seem to frighten people a bit. Partially I suspect this is down to the academic nature of the term "closure". It sounds like something Californians look to achieve, not something to help you write software. (Disclosure: I mostly grew up in San Francisco, so I'm allowed to poke fun at my fellow Californians. Don't Try This At Home.) I suspect it's also because if you don't know the rules for them, they seem mysterious, and the mysterious is often frightening.

First off, what is a closure? I'll leave a thorough definition to my academic betters, but let's put it this way for my fellow plodders and I: A closure is a function with data intrinsically bound to it.

Consider this code:

function updateDisplay(panel, contentId)
{
var url;

url = 'getcontent?id=' + contentId;
jsonRequest(
url,
function(resp)
{
if (resp.err)
{
panel.addClassName('error');
panel.update(
'Error retrieving content ID '
+ contentId
+ ' from "'
+ url
+ '", error: '
+ resp.err);
}
else
{
panel.removeClassName('error');
panel.update(resp.json.content);
}
}
);
}

This updates a panel element based on the results of a call to retrieve some JSON data. The call to the jsonRequest function accepts two parameters: The URL that will return the JSON data, and a callback function to trigger when the request completes (one way or the other). If the data was returned successfully, the callback sets the content of the panel from the JSON data and makes sure that the "error" CSS class is not set on the element; if there's an error, it shows details about the error and sets the "error" CSS class. (In this example, we happen to be using Prototype to extend our panel element with the nifty class name and update methods, but that's as much to keep the example simple as anything else.)

The callback above is an example of a closure: A function with data bound to it, in this case the panel and contentId arguments we passed into the updateDisplay function, and also updateDisplay's url local variable. It's useful to have this information bound to the callback, because the jsonRequest function doesn't know anything about panel or contentId, it just knows about triggering the callback — but the callback has the information it needs to do its job.

Magic? Nah. I can't speak for closures in other languages, but there's nothing complicated about closures in JavaScript. Seriously. They're dead easy. You need to know, say, three things and you're good to go. Well, four things, but that's only because someone has been misinforming you. Here's a quick list of those four things, after which we'll delve in a bit deeper, and then I'll tell you something at the end that will surprise you for three seconds before you go "Oh, of course":

  1. In JavaScript, everything is a data structure, even functions, and — critically — even the context in which functions run. One aspect of that context is a "call object" (that's Flanagan's term, the ECMA specification uses "activation object" — I'll use "call object" because I'm lazy and it's less typing I find it clearer).
  2. JavaScript variable names are resolved on the basis of a "scope chain", which includes (among other things) the global object and all of the current call objects in scope.
  3. Functions in JavaScript are lexically scoped — for the plodders like me out there, that means that the things a function can access (the things "in scope" for it) are determined by the context in which the function is defined, not the context in which it's executed.
  4. Closures do not create memory leaks. Someone probably told you that they did at some point (perhaps they thought that's what Microsoft was trying to say here).
Okay, let's see how those things come together to make closures easy.

#1: Everything is a object

When we call a JavaScript function, the context of that call to the function — the parameters we've used, the variables within the function itself — are part of an object called the call object (or, again, "activation object" in ECMA parlance). The interpreter creates a call object for this particular execution of the function and sets some properties on that call object before passing it into the function's code. The properties are:

  1. A property called arguments which is an array (of sorts, in most implementations it's not actually an Array object) of the actual arguments we called the function with, plus a callee property which is a reference to the function being called.
  2. A property for each of the arguments defined by the function declaration. The values of these properties are set to the values passed into the function. If any arguments weren't specified (because JavaScript lets you call functions with however many arguments you want, regardless of the definition), properties for any missing arguments are still created on the call object — with the value undefined.
  3. A property for every declared variable within the function (e.g., every "var" statement). (These start out with the value undefined.)

Let's look again at selected bits of the updateDisplay function from earlier:

function updateDisplay(panel, contentId)
{
var url;

// ...
}

And let's assume we call it with a reference to a 'leftPanel' div and a contentId of 123:

updateDisplay(leftPanel, 123);

That creates a call object for this execution of updateDisplay that essentially looks like this:

call.arguments = [leftPanel, 123]
call.arguments.callee = updateDisplay
call.panel = leftPanel
call.contentId = 123
call.url = undefined

This object is then used within the body of the function when the code uses the arguments panel or contentId, or the local variable url, etc.

"But wait," you say. "I don't refer to an object when I use the arguments or variables in my function, I just use their names." Indeed — the use of the call object is implicit. How do we end up using the call object when we just write "contentId" (for instance)? Because once the call object is created, it's put at the top of the scope chain for the duration of the function call. Which takes us nicely to...

#2: Variable names are resolved using a "scope chain"

If you've written JavaScript in a browser environment, you've probably used the global document object, and perhaps the global navigator object as well. Here's the thing: Those aren't global objects. Those are properties of a global object — in fact, of the global object. The document and navigator properties are set up for you by the browser, but they're just properties of an object.

So how do you get away with just giving the property name, rather than giving the object name as well? Because of the scope chain, an ordered series (indeed, a chain) of objects. When the JavaScript interpreter sees an unqualified variable reference, it checks the top object on the scope chain to see if it has a property by that name: If so, it gets used; if not, the interpreter checks the next object down the chain. The global object is the bottom of the chain, so if you just type document.writeln("Blah blah blah"), eventually the document property is found on the global object and used.

(The global object is an abstract entity in the generic JavaScript definition, but in the specific case of a web browser you know it by another name: window. In browsers, the window object is the global object; it just also has a property, "window", that refers to itself.)

So quick: Within a function, how do variable names get resolved? Right! When the function is being executed, the call object with all those nifty properties for the function's arguments and variables is put at the top of the scope chain. So during our call to updateDisplay from earlier, the call object for the call is at the top of the scope chain, followed by the global object, like this:

When the interpreter sees a variable reference, say contentId, it looks on the call object: If there's a contentId property on the call object, it gets used; otherwise, the interpreter looks at the next object in the scope chain, which in this case is the global object.

(There's more to know about the scope chain than I've described here; the astute reader will be wondering, for instance, how objects and their instance members come into play, or what the with statement does to things. Alas, we can't tackle everything all at once...)

But how can I know when I'm writing updateDisplay that the scope chain will look like that? I mean, doesn't it depend on who's calling the function? Nope. And that brings us to our next point:

#3: Functions in JavaScript are lexically scoped

Okay, so we get the concept of the call object, which is created when we call a function; and we get the global object, which sits at the bottom of the scope chain to handle globals. But what's this "lexically scoped" thing? It's just a fancy way of saying that a function's scope is determined by where it's defined (e.g., the text defining it; léxis is Greek for "word" or "speech"), not where it's called.

Let's put that another way: When a function is defined, the code defining it is in some kind of context — a function is running, or the page itself is running if you've done the code at the top level. That context has an active scope chain when the function is defined. So when creating the function object, the interpreter creates a property on it (called [[Scope]] in the ECMA spec, but you can't access it directly) with a reference to the active execution context's scope chain. Even when the context goes away (e.g., the function returns), because the function object created by the definition has a reference to the scope chain, the scope chain isn't garbage collected — it's kept alive by the active reference to it from our function object. (Assuming that we've kept a reference to the function object, as we did in our example by passing it into the jsonRequest method; otherwise, the function and the scope chain are both garbage-collected since nothing references them.)

Remember our closure at the beginning of this post? It gets a [[Scope]] property pointing to the scope chain in effect when updateDisplay was called with panel = leftPanel, etc.:

So when we call it, first its scope chain is put in place, then the call object for this particular call to the function is put on top of the scope chain, and the function is executed with this chain:

And there we are, the closure can access panel and all of the other properties of the call object for the call to updateDisplay because they're on the scope chain. They're on the scope chain because that was stuck onto the closure's function object when it was created. No magic. Just objects, the scope chain, and lexical scoping all working together.

#4: Closures don't cause memory leaks

So why do people think closures cause memory leaks? A couple of reasons, I suspect, but chief among them being a bug in an issue with Internet Explorer. IE's DOM objects are not garbage-collected in the same way that JavaScript objects are; instead, IE shows its COM roots by using reference counting, a mechanism that is, unfortunately, prey to issues with circular references. Event handlers can easily end up being circular references, and so in IE it's easy to "leak" the memory associated with a DOM element and its event handler because they're referring to each other. (JavaScript's garbage collector doesn't have an issue with circular references; it works on the "reachability" principle rather than reference counting.) This is an issue with event handlers, rather than closures, but since many event handlers are written as closures, people associate it with closures. Fortunately, it's not that hard to kick IE into shape (in this particular way); Crockford talks about the problem and its solution here and many toolkits [like Prototype] will do this for you if you ask them nicely.

Lest I be accused of Microsoft-bashing, I should point out that IE is not the only browser that sometimes loses track of things; it's just by far the worst. Firefox 2 has a bad habit of leaking a bit of memory on XMLHttpRequest calls, which also frequently involve closures. The good news there is that Firefox 3 beta 3 is looking awfully good indeed on this front.

Separate from browser issues, though, if you're not really clear on how closures work, particularly with regard to the scope chain, you'll miss the fact that a closure keeps a reference to all of the variables and arguments in scope where it's created, not just the ones it uses; and so if you have (say) a massive temporary array allocated in that scope that the closure doesn't use, you might be tempted to say that the closure is causing the array's memory to leak. (The answer there is simple: Clear the array's variable when you're done with it.)

In Conclusion

There are lots of nifty things you can do with closures. I'll post a follow-up in a couple of days, "Closures By Example", demonstrating a few of them and just generally walking through some seemingly-complicated examples. But until then, I'll just reiterate my theme: Closures in JavaScript are not complicated. They're powerful, but if you understand that 1. Everything is a data structure, 2. Variables — I should probably call them unqualified references — are resolved according to a scope chain, and 3. The scope chain for a function is defined by where the function is defined lexically, you'll be golden.

Oh! I said I'd tell you something at the end that would surprise you for about three seconds before you said "Oh, of course." Here it is:

All JavaScript functions are closures.

Happy coding!

Monday, 18 February 2008

JavaScript's Curiously-Powerful OR Operator (||)

The "or" operator (||). It seems innocuous enough: A binary logical operator that returns true if either of its params is true, usually used in the context of a conditional statement:

if (a || b)
Perhaps if you've been programming in any of several other high-level languages for a while, you expect that it won't evaluate the second param if the first one is true, something called short-circuiting. And you'd be right. But there's a lot more to the JavaScript || operator than that!

Consider this statement:
x = a || b;
If the || operator were purely a logical operator, then x would receive one of two values: true or false. But that's not what happens. Instead, x gets the value of a if a is true or can be converted to being "true", otherwise it gets the value of b. And that's another kettle of fish entirely.

First off, there's this question of "...or can be converted to being "true"..." What is truth? (And whither beauty? Never mind, that's a different blog...) I won't go into the ins and outs of JavaScript's type coercion here (refer to Flanagan's book), but for our purposes today, a can be converted to "true" if it's not zero, an empty string, null, or undefined. So 1 is considered true, as is the string "Fred", as is an object reference.

But lots of languages do at least some level of type coercion; what marks the JavaScript operator out as special is the result of the expression: What you get back isn't a true or false, but rather the value of one of the operator's parameters. Now that's interesting, and it's something you may not be expecting if you're relatively new to the language. It can make it difficult for a newcomer to read some of the more dense JavaScript found in sample code, toolkits, etc.

Here's an example I ran across a few weeks back (the names have been changed for simplicity):
function sort(comparator)
{
comparator = comparator || this.defaultComparator;
// ...
}
Here, the author provides a sort method which takes an optional comparator function reference; if you don't supply one, the class's default comparator is used instead. Let's break that down a bit in each scenario.

If you call the method without supplying a comparator function reference:
  1. comparator is undefined.
  2. The expression evaluates comparator, sees the undefined value, and converts it to a boolean: false.
  3. Because the first parameter is false, the operator returns the second parameter's value.
And now let's look at it if you do supply a comparator function reference:
  1. comparator is a reference to a function.
  2. The expression evaluates comparator, sees the function reference, and converts it to a boolean: true.
  3. Because the first parameter is true, the operator returns the first parameter's value (it never evaluates the second parameter at all).
So the author's line
comparator = comparator || this.defaultComparator;
sets up the comparator variable with the default if it didn't already have a value.

You might be asking, why not just do this:
if (!comparator)
{
comparator = this.defaultComparator;
}
Well, for one thing, it doesn't demonstrate l33t skilz, which is just as important to the current set of twentysomething coders as it was to the previous set, and the set before that (as it is now, has it ever been). But they would probably also point out that even after being minified, the result is in the region of six characters longer -- or nearly 12% of the total. (And they both beat out the version using the ternary operator.) Size counts with downloaded JavaScript. Some would also probably argue that it's more "expressive". Plodders like myself might raise counter-arguments about clarity, but that's the thing, to a day-in, day-out JavaScript coder, the statement isn't unclear.

Let's take our example one step further: Suppose our hypothetical class with the sort method only sets its defaultComparator on the instance if it's explicitly set, and relies on a class-wide default otherwise? The || operator lets us write the choice of the comparator function with a single line (split into three lines here for word-wrap):
comparator = comparator
|| this.defaultComparator
|| NiftyClass.classComparator;
The first || tests comparator and, finding it false-ish, takes the value of this.defaultComparator; then the second || evaluates this.defaultComparator and, finding it false-ish, takes the value of NiftyClass.classComparator. Putting it more simply: It selects the first one that's defined. As a good friend of mine said "...it is kind of a selection operator more than a logical one..." Indeed, it's almost a specialized version of the ternary operator where the test is for "truth" or "existence" (and I figure that's got to account for 80% or more of the use of the ternary operator).

So, do we all want to see the || idiom for selection rather than if statements? There we enter the realm of style, I tend to doubt that either way is markedly better than the other in terms of execution speed (which probably varies from engine to engine anyway), the size difference is minimal, and it's personal choice which is more expressive. But if you're going to be reading any significant amount of modern JavaScript, you'll see this
x = a || b || c;
idiom a lot, so be prepared for it. And if you're an old plodder like me, well...it does kinda grow on you. ;-)

[I should point out that in early versions of JavaScript, the || operator did evaluate to true or false, rather than providing this nifty return-the-value-of-the-param behavior. But the JavaScript engines in all major browsers (including IE6) use the behavior described above.]

Thursday, 14 February 2008

Nifty Snippets

Welcome to Nifty Snippets, my new blog for capturing little snippets of code and techniques (as well as engineering, business, and teamwork "lessons learned"). Initially the focus will be on Ajax stuff, and that means lots of JavaScript.

Now, JavaScript is about 18 times more interesting than most people think. This is not a toy language, this is an incredibly rich, powerful, expressive language that until recently has been dramatically under-used. Partially that's because of how it was introduced to the world (scripting events in Netscape Navigator), and partially I suspect it's down to a bias many of us had toward hierarchical, class-based languages like Java and C++. JavaScript is neither hierarchical nor class-based; it's a prototype language.

This first post is just meant to say "hi". (Hi!) But I'll also throw out a few handy links for those with an interest in JavaScript, Ajax, and rich web applications:

  • If you're doing any serious work in JavaScript, read Crockford. If there's something powerful and interesting about the JavaScript language, the odds are Douglas Crockford has written about it in detail. Don't let the depth of the language daunt you, you can ease your way into it.
  • Similarly, if you're doing Ajax apps, you need JavaScript: The Definitive Guide by David Flanagan [O'Reilly, Amazon].
  • Lots of people are really getting a lot out of Prototype, which is basically a bunch of nifty stuff for JavaScript wrapped up into a toolkit.
  • Many of those same people are enjoying the various effects and other goodness available via script.aculo.us.
  • Alternately, perhaps you'd like to peruse one of the dozens of other JavaScript toolkits out there.
Those'll get you started, anyway. Back soon with our first snippet!