Friday, 19 March 2010

Small is...useful

Micro-post today:

I'm an active member of a couple of places where people post questions and (hopefully) get answers to them. Time and time again I see questions like "I'm doing X, but it isn't working. Why not?" followed by 35-200 lines of code, of which the eponymous "X" is 5-10 lines of code.

People. Seriously. It's like you've never heard the maxim of the Old French Republic: divide et impera (in English, loosely: divide and conquer). Or to put it in even geekier terms: Create a minimum failing test case. There are two very good reasons for doing this:

  1. I give it 90% or better odds that if you build a minimum failing test case, you'll figure out why "X" isn't working.
  2. If (1) above doesn't work out for you, well, then you have this lovely minimum failing test case you can post so people can help you out.
See? Vincere-vincere!

Thursday, 18 March 2010

Anonymouses Anonymous

2015 Update: Since this post was written in 2010, the JavaScript specification has been updated so that functions get assigned names more than they used to, by using the name of the property or variable they're being assigned to, if possible. I mostly haven't updated this post, but I have added the odd note where the new spec changes things.
But as of ES2015 (aka ES6), all of the following functions have names:
var foo = function() { };
var obj {
    foo: function() { }
};
obj.bar = function() { };
There isn't always a useful name to use, but when there is, as of ES2015, it's used.
In my JavaScript work, I see a huge number of anonymous functions. I'm not a fan. In today's post I'd like to show the problem and how I like to get around it.

The Short Version


The short version of this post would be: Anonymous functions don't have names. Functions having names is a Good Thing(tm). So give them names.

But what fun would that be? Read on...

Anonymous Functions


First off, what do I mean by an anonymous function? Here's an example:
var a = ["Dave", "Maria", "Joe"];
a.sort(function(a, b) {
return a.length - b.length;
});
You'll recall that the Array#sort function optionally accepts a function to call to compare two entries. The function I gave it above is anonymous -- it has no name. In contrast:
var a = ["Dave", "Maria", "Joe"];
a.sort(compareByLength);
function compareByLength(a, b) {
return a.length - b.length;
}
Here, the function has a name.

But why would I want to use a named function in such a trivial example? I mean, we all know what it does, and I'm only going to use it once, so why not just define it inline as in the first example and leave it at that?

My main reason is that by giving the function a name, you help your tools help you. (I'd also dispute the "I'll only use it once" argument, but that's neither here nor there — as we'll see, you can reuse anonymous functions, and people frequently do.)

Granted the above is a trivial example, but you see anonymous functions in seriously non-trivial situations. For instance, you frequently see code like this (even this is a trivial example, but it's indicative of the larger structures people create, like the one currently used as an illustration on the main Wikipedia JavaScript article):
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
Person.prototype = {
getFullName: function() {
return this.firstName + " " + this.lastName;
},
toString: function() {
return this.getFullName();
}
};
var p = new Person("John", "Doe");
var s = p.toString();
We're creating a constructor named Person and then assigning two anonymous functions to its prototype.

"Huh?" I hear you say, "Those aren't anonymous, they have names — getFullName and toString."

Actually, they don't. (2015 update: They do now! The engine assigns the name based on the property the "anonymous" function expression is being assigned to. Much of the following is therefore outdated.) They're bound to properties with names, but the functions themselves don't have names. Don't believe me? Paste that code into your favorite JavaScript environment with a debugger, put a breakpoint on the return statement inside the function bound to the getFullName property (and isn't that a mouthful), and run until you hit the breakpoint.

What function does your debugger say you're in? What does the call stack look like?

That's right, if it's like most debuggers, it says you're in a function called (?) (an anonymous function) called from a function (?) (another anonymous function). How helpful. Stupid debugger.

Except it isn't the debugger's fault. It's not that the debugger is too stupid to see what the function's name is, it that it doesn't have one. Suppose I change that code to this:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
Person.prototype = {
getFullName: function() {
return this.firstName + " " + this.lastName;
},
toString: function() {
return this.getFullName();
}
};
Person.prototype.foo = Person.prototype.toString;
var p = new Person("John", "Doe");
var s = p.foo();
Using your same breakpoint, run it again. Again the debugger tells you you're in function (?) called from function (?). So should it be saying that the outer function's name is foo? Or toString? If you said foo, let me ask the question a different way. Let's add this to the end:
function bar(context, func) {
var a;
a = [];
a[42] = func;
return a[7 * 6].call(context);
}
var s2 = bar(p, p.foo);
So now, what's the function's name? a[7 * 6]? a[42]? func? foo? toString? All of them? None?

None of them, of course. It doesn't matter how many different ways we refer to it, the function itself has no name, and the debugger wouldn't be able to figure out a name for it in any but the most trivial cases even if we wanted it to (which we don't; how confusing would that be?!).

Do we care?


So okay, they don't have names. Do we care? Well, I do. When an exception gets thrown and reported to me by the debugger, I want to know where it was thrown. I want to know what the call stack looks like. I want to look at my list of breakpoints and see something meaningful.

What do we do about it?


Your first thought, like mine, might be to do this:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
Person.prototype = {
getFullName: function getFullName() { // <= PROBLEMATIC CHANGE HERE
return this.firstName + " " + this.lastName;
},
toString: function toString() { // <= AND HERE
return this.getFullName();
}
};
Note that I've now put function names after the function keyword; I haven't changed anything else. Seems reasonable, doesn't it?

It is, but it doesn't work — at least, not in most JavaScript implementations in the wild. Arguably it should work, but then again, arguably I should be slim, rich, and famous. Things aren't always as they should be. Those things above are called Named Function Expressions — they're function expressions (as opposed to function statements) and they have names. Internet Explorer (well, JScript in general), Safari, and several others have bugs related to them. For details, check out Juriy Zaytsev's article on the subject.

Okay, so we can't do that. How 'bout this:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
function getFullName() {
return this.firstName + " " + this.lastName;
}
function toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: getFullName,
toString: toString
};
Well, okay, that does work, but it still has some issues:
  1. I seem to be throwing all sorts of things into the global namespace.
  2. The function name toString doesn't tell me much, assuming I have several different objects that all support the toString function.
  3. I have to repeat the function name three times, just to make things work.
So not ideal, but perhaps a step closer.

Mini Modules


So let's take the global name problem first: How do I avoid polluting the global namespace? The usual way is with the module pattern: Define a scoping function, define your bits and pieces within the scoping function, and only make public (export) the things you really need to be global (and in many applications of the module pattern, that can be nothing at all).

So the Person object isn't a module, but using a scoping function is absolutely how we can avoid global names, so let's see how that could work:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function getFullName() {
return this.firstName + " " + this.lastName;
}
function toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: getFullName,
toString: toString
};
})();
There, we define an anonymous function, define our stuff inside it, and call it right away. Now our getFullName and toString functions aren't globals anymore; the only way you can access them is via the properties on the Person prototype. And yet, they have proper names. Yay! (And yes, I'm aware of the irony of using an anonymous function — my scoping function — to avoid having anonymous functions. But wait! I could name the scoping function. But then I'm polluting the global namespace. Well, I can scope the scoping function. Oh, but...)

We'll revisit that structure in a bit, but let's move on to problem #2: toString is pretty generic. My call stack won't tell me which toString I'm dealing with — the one on Person? Place? Thing? So how do we deal with that?

Well, in a funny way, by exploiting the fact that the function name has nothing to do with the name of the property that refers to the function:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function Person_getFullName() {
return this.firstName + " " + this.lastName;
}
function Person_toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: Person_getFullName,
toString: Person_toString
};
})();
There, now my call stack will show Person_toString and I'll know where I am. The property is still toString.

And a straw man in the back calls out: "But that means I'm typing 'Person' all over the place!" (An issue closely related to problem #3 from the earlier list.) To which my response is: Yes. Get over it, and configure your tools to help. Seriously. In today's world, our editors are very configurable with templates and triggers and macros and such, and to my mind the value (in a project of any size) of knowing what you're dealing with outweighs a bit of extra typing.

"But what about download size?!" someone who works on web apps calls out. And it's true, for the toString function we're saying "toString" three times instead of once, and "Person" twice (in relation to toString). Similarly, we've said "getFullName" two "extra" times (and there are its two "Person"s as well). In all, there's a fair bit of repetition in that solution.

Well, there is, but I very much doubt your final script's download size will be significantly impacted. Here's why: The names you use in this context are (I assert) a very small part of your overall script. Remember that when you call these functions, you do so through the (shorter) property name. Most of your script is probably logic. It's true we want to keep functions as small as feasible (more smaller functions is better than fewer larger functions, for maintenance and readability reasons), but my take is that this is really only needed for public functions on instances, not the private functions used for implementation of an instance.

"Private functions?" Yeah! Because having the scoping function makes it really easy to have private object functions. I've written about that aspect in some detail before, but just briefly:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function buildFullName(fname, lname) {
return fname + " " + lname;
}
function Person_getFullName() {
return buildFullName(this.firstName, this.lastName);
}
function Person_toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: Person_getFullName,
toString: Person_toString
};
})();
The buildFullName function is completely private, only the other functions in that scoping function have access to it. The way I called it there, it doesn't have access to the instance's properties, but you can either use it as an object-wide (rather than instance-wide) function (what we'd call a class function in class-based programming), or you can call it differently and it's a truly private instance function:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function buildFullName() {
return this.firstName + " " + this.lastName;
}
function Person_getFullName() {
return buildFullName.call(this);
}
function Person_toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: Person_getFullName,
toString: Person_toString
};
})();
(Note how Person_getFullName now uses call to call it.)

My compromise on the repetition thing is to be explicit in my public functions, but not to worry about it in private implementation functions — because I can easily tell what object is involved from the other calls on the stack (since they're private, they can only be called from the public ones). Since in most implementations the private functions will substantially outnumber the public ones, we end up with not too much impact (and a lot of debugging/maintenance gain).

Help yourself to help your tools help you


To me, the Person structure we've been kicking around here is a bit clunky, but there are lots of things you can do to make it less so through helper functions. My usual declaration for Person would look like this (but with comments):
var Person = Class.define(function() {
var pubs = {};

pubs.initialize = Person_initialize;
function Person_initialize(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}

pubs.getFullName = Person_getFullName;
function Person_getFullName() {
return buildFullName.call(this);
}

pubs.toString = Person_toString;
function Person_toString() {
return this.getFullName();
}

function buildFullName() {
return this.firstName + " " + this.lastName;
}

return pubs;
});
Yes, technically that's one line longer (not counting the vertical breaks I added for clarity), but to me it's clearer and there's little chance of my forgetting to export a public function if I do so right where the function is declared. My Class.define function creates a constructor for me (for various reasons) and then has that constructor call my initialize function. It automatically knows to call the anonymous function I hand it and use the resulting object's properties as the public members for the thing I'm defining, so I don't have to do that explicitly. For details, check out my article on efficient supercalls.

Conclusion


So that's my take on it, anyway: Help yourself, help your tools help you. Almost any time I find myself starting to write an anonymous function, I stop, write a function with a name, and use that instead — even if I'm writing a closure that will be used and then thrown away, too often I've run into exceptions or call stacks where I just couldn't figure out where I was when I used anonymous functions. Maybe the Array#sort is a bit too trivial to bother, or maybe not (hint: try it on Internet Explorer with this array: [10, "Maria", "Joe"] -- and don't you want to know which function threw that exception?).

Happy coding!

Tuesday, 16 March 2010

...by any other name, would smell as sweet

Although it's not a new problem, lately I've been seeing so many people running into a specific issue with Internet Explorer that I thought it worth just jotting down what the problem is and how to get around it.

The problem, in brief, is conflation. Internet Explorer (v6 and v7) mixes together the namespaces of the id and name attributes, which should be completely distinct from one another. id, you'll recall, must be unique within the document; it uniquely identifies an element. name, on the other hand, is not for uniquely identifying things (well, not mostly). It's used for a couple of different things, such as giving form fields the names they'll have when submitted, or giving anchors a name. There's no requirement that name values be unique — in fact, when doing forms, there are lots of reasons for using the same name for multiple fields (radio buttons, for example).

Unfortunately, the Internet Explorer engine will find elements by name sometimes when it really shouldn't, specifically when you're using document.getElementById. So for instance, say you have a form in your document with a 'stuff' field:

<input type='text' name='stuff'>
and later in that same document you have a div with the id "stuff":
<div id='stuff'>...</div>
Consider this code:
var elm;
elm = document.getElementById('stuff');
if (elm) {
alert("The element's tagName is " + elm.tagName);
}
else {
alert("Couldn't find the element.");
}
In a browser that implements document.getElementById correctly, that should alert "The element's tagName is DIV". But on IE, it will alert "The element's tagName is INPUT" because it incorrectly finds the input field. The only way around this is to change the id or name of one element or the other.

Another related problem is that id values are not case-sensitive in IE6 or IE7, although of course the standard says they should be. So it'll also confuse an element with the id (or name) 'stuff' with one with the id (or name) 'Stuff'. (To be fair, surely you and I would as well?)

There's good news, though: Microsoft does document the behavior, and they've fixed it in IE8, so there's hope for the future.

Perhaps slightly OT, but lest people accuse me of Microsoft-bashing (er, I have been known...), let me just remind everyone who brought us XMLHttpRequest (the basis of Ajax) and innerHTML, both of which I use (directly or indirectly) in nearly every JavaScript-enabled browser project I do — and I bet you do too. So hey, they got many things wrong, but got some things very right as well. Also useful to remember — as we continue to bash IE6 with repeated shouts of "Die! Die!" — just how much amazingly better (faster, less crash-prone, more feature-rich) it was than its chief rival, back in the day.