Thursday, 18 March 2010

Anonymouses Anonymous

In my JavaScript work, I see a huge number of anonymous functions. I'm not a fan. In today's post I'd like to show the problem and how I like to get around it.

The Short Version


The short version of this post would be: Anonymous functions don't have names. Functions having names is a Good Thing(tm). So give them names.

But what fun would that be? Read on...

Anonymous Functions


First off, what do I mean by an anonymous function? Here's an example:
var a = ["Dave", "Maria", "Joe"];
a.sort(function(a, b) {
return a.length - b.length;
});
You'll recall that the Array#sort function optionally accepts a function to call to compare two entries. The function I gave it above is anonymous -- it has no name. In contrast:
var a = ["Dave", "Maria", "Joe"];
a.sort(compareByLength);
function compareByLength(a, b) {
return a.length - b.length;
}
Here, the function has a name.

But why would I want to use a named function in such a trivial example? I mean, we all know what it does, and I'm only going to use it once, so why not just define it inline as in the first example and leave it at that?

My main reason is that by giving the function a name, you help your tools help you. (I'd also dispute the "I'll only use it once" argument, but that's neither here nor there — as we'll see, you can reuse anonymous functions, and people frequently do.)

Granted the above is a trivial example, but you see anonymous functions in seriously non-trivial situations. For instance, you frequently see code like this (even this is a trivial example, but it's indicative of the larger structures people create, like the one currently used as an illustration on the main Wikipedia JavaScript article):
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
Person.prototype = {
getFullName: function() {
return this.firstName + " " + this.lastName;
},
toString: function() {
return this.getFullName();
}
};
var p = new Person("John", "Doe");
var s = p.toString();
We're creating a constructor named Person and then assigning two anonymous functions to its prototype.

"Huh?" I hear you say, "Those aren't anonymous, they have names — getFullName and toString."

Actually, they don't. They're bound to properties with names, but the functions themselves don't have names. Don't believe me? Paste that code into your favorite JavaScript environment with a debugger, put a breakpoint on the return statement inside the function bound to the getFullName property (and isn't that a mouthful), and run until you hit the breakpoint.

What function does your debugger say you're in? What does the call stack look like?

That's right, if it's like most debuggers, it says you're in a function called (?) (an anonymous function) called from a function (?) (another anonymous function). How helpful. Stupid debugger.

Except it isn't the debugger's fault. It's not that the debugger is too stupid to see what the function's name is, it that it doesn't have one. Suppose I change that code to this:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
Person.prototype = {
getFullName: function() {
return this.firstName + " " + this.lastName;
},
toString: function() {
return this.getFullName();
}
};
Person.prototype.foo = Person.prototype.toString;
var p = new Person("John", "Doe");
var s = p.foo();
Using your same breakpoint, run it again. Again the debugger tells you you're in function (?) called from function (?). So should it be saying that the outer function's name is foo? Or toString? If you said foo, let me ask the question a different way. Let's add this to the end:
function bar(context, func) {
var a;
a = [];
a[42] = func;
return a[7 * 6].call(context);
}
var s2 = bar(p, p.foo);
So now, what's the function's name? a[7 * 6]? a[42]? func? foo? toString? All of them? None?

None of them, of course. It doesn't matter how many different ways we refer to it, the function itself has no name, and the debugger wouldn't be able to figure out a name for it in any but the most trivial cases even if we wanted it to (which we don't; how confusing would that be?!).

Do we care?


So okay, they don't have names. Do we care? Well, I do. When an exception gets thrown and reported to me by the debugger, I want to know where it was thrown. I want to know what the call stack looks like. I want to look at my list of breakpoints and see something meaningful.

What do we do about it?


Your first thought, like mine, might be to do this:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
Person.prototype = {
getFullName: function getFullName() { // <= PROBLEMATIC CHANGE HERE
return this.firstName + " " + this.lastName;
},
toString: function toString() { // <= AND HERE
return this.getFullName();
}
};
Note that I've now put function names after the function keyword; I haven't changed anything else. Seems reasonable, doesn't it?

It is, but it doesn't work — at least, not in most JavaScript implementations in the wild. Arguably it should work, but then again, arguably I should be slim, rich, and famous. Things aren't always as they should be. Those things above are called Named Function Expressions — they're function expressions (as opposed to function statements) and they have names. Internet Explorer (well, JScript in general), Safari, and several others have bugs related to them. For details, check out Juriy Zaytsev's article on the subject.

Okay, so we can't do that. How 'bout this:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
function getFullName() {
return this.firstName + " " + this.lastName;
}
function toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: getFullName,
toString: toString
};
Well, okay, that does work, but it still has some issues:
  1. I seem to be throwing all sorts of things into the global namespace.
  2. The function name toString doesn't tell me much, assuming I have several different objects that all support the toString function.
  3. I have to repeat the function name three times, just to make things work.
So not ideal, but perhaps a step closer.

Mini Modules


So let's take the global name problem first: How do I avoid polluting the global namespace? The usual way is with the module pattern: Define a scoping function, define your bits and pieces within the scoping function, and only make public (export) the things you really need to be global (and in many applications of the module pattern, that can be nothing at all).

So the Person object isn't a module, but using a scoping function is absolutely how we can avoid global names, so let's see how that could work:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function getFullName() {
return this.firstName + " " + this.lastName;
}
function toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: getFullName,
toString: toString
};
})();
There, we define an anonymous function, define our stuff inside it, and call it right away. Now our getFullName and toString functions aren't globals anymore; the only way you can access them is via the properties on the Person prototype. And yet, they have proper names. Yay! (And yes, I'm aware of the irony of using an anonymous function — my scoping function — to avoid having anonymous functions. But wait! I could name the scoping function. But then I'm polluting the global namespace. Well, I can scope the scoping function. Oh, but...)

We'll revisit that structure in a bit, but let's move on to problem #2: toString is pretty generic. My call stack won't tell me which toString I'm dealing with — the one on Person? Place? Thing? So how do we deal with that?

Well, in a funny way, by exploiting the fact that the function name has nothing to do with the name of the property that refers to the function:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function Person_getFullName() {
return this.firstName + " " + this.lastName;
}
function Person_toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: Person_getFullName,
toString: Person_toString
};
})();
There, now my call stack will show Person_toString and I'll know where I am. The property is still toString.

And a straw man in the back calls out: "But that means I'm typing 'Person' all over the place!" (An issue closely related to problem #3 from the earlier list.) To which my response is: Yes. Get over it, and configure your tools to help. Seriously. In today's world, our editors are very configurable with templates and triggers and macros and such, and to my mind the value (in a project of any size) of knowing what you're dealing with outweighs a bit of extra typing.

"But what about download size?!" someone who works on web apps calls out. And it's true, for the toString function we're saying "toString" three times instead of once, and "Person" twice (in relation to toString). Similarly, we've said "getFullName" two "extra" times (and there are its two "Person"s as well). In all, there's a fair bit of repetition in that solution.

Well, there is, but I very much doubt your final script's download size will be significantly impacted. Here's why: The names you use in this context are (I assert) a very small part of your overall script. Remember that when you call these functions, you do so through the (shorter) property name. Most of your script is probably logic. It's true we want to keep functions as small as feasible (more smaller functions is better than fewer larger functions, for maintenance and readability reasons), but my take is that this is really only needed for public functions on instances, not the private functions used for implementation of an instance.

"Private functions?" Yeah! Because having the scoping function makes it really easy to have private object functions. I've written about that aspect in some detail before, but just briefly:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function buildFullName(fname, lname) {
return fname + " " + lname;
}
function Person_getFullName() {
return buildFullName(this.firstName, this.lastName);
}
function Person_toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: Person_getFullName,
toString: Person_toString
};
})();
The buildFullName function is completely private, only the other functions in that scoping function have access to it. The way I called it there, it doesn't have access to the instance's properties, but you can either use it as an object-wide (rather than instance-wide) function (what we'd call a class function in class-based programming), or you can call it differently and it's a truly private instance function:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function buildFullName() {
return this.firstName + " " + this.lastName;
}
function Person_getFullName() {
return buildFullName.call(this);
}
function Person_toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: Person_getFullName,
toString: Person_toString
};
})();
(Note how Person_getFullName now uses call to call it.)

My compromise on the repetition thing is to be explicit in my public functions, but not to worry about it in private implementation functions — because I can easily tell what object is involved from the other calls on the stack (since they're private, they can only be called from the public ones). Since in most implementations the private functions will substantially outnumber the public ones, we end up with not too much impact (and a lot of debugging/maintenance gain).

Help yourself to help your tools help you


To me, the Person structure we've been kicking around here is a bit clunky, but there are lots of things you can do to make it less so through helper functions. My usual declaration for Person would look like this (but with comments):
var Person = Class.define(function() {
var pubs = {};

pubs.initialize = Person_initialize;
function Person_initialize(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}

pubs.getFullName = Person_getFullName;
function Person_getFullName() {
return buildFullName.call(this);
}

pubs.toString = Person_toString;
function Person_toString() {
return this.getFullName();
}

function buildFullName() {
return this.firstName + " " + this.lastName;
}

return pubs;
});
Yes, technically that's one line longer (not counting the vertical breaks I added for clarity), but to me it's clearer and there's little chance of my forgetting to export a public function if I do so right where the function is declared. My Class.define function creates a constructor for me (for various reasons) and then has that constructor call my initialize function. It automatically knows to call the anonymous function I hand it and use the resulting object's properties as the public members for the thing I'm defining, so I don't have to do that explicitly. For details, check out my article on efficient supercalls.

Conclusion


So that's my take on it, anyway: Help yourself, help your tools help you. Almost any time I find myself starting to write an anonymous function, I stop, write a function with a name, and use that instead — even if I'm writing a closure that will be used and then thrown away, too often I've run into exceptions or call stacks where I just couldn't figure out where I was when I used anonymous functions. Maybe the Array#sort is a bit too trivial to bother, or maybe not (hint: try it on Internet Explorer with this array: [10, "Maria", "Joe"] -- and don't you want to know which function threw that exception?).

Happy coding!

14 comments:

Petr 'PePa' Pavel said...

Interesting reading. Thanks.

Sasha Chedygov said...

Interesting read, but Firebug (and I'd assume most debuggers) lets you click on the "(?)" and see which function it is and in what file. Yes, it takes an extra click, but it's not like that would save you that much time anyway.

T.J. Crowder said...

@Sasha: Good tip, thanks for that. Doesn't help with the "where am I" glance at the call stack/breakpoints list (in fact, Firebug is misleading in that regard -- it'll show you the name of the function that created the anonymous function, even though that's not where the breakpoint is), but still very useful to know

Maciek said...

Good post, thank you. I'd like to share some extra points about the result:

Using module pattern inside a constructor causes to lost the polymorphic capability. If you don't need it, just ignore this point, but is a good idea to be conscious about what we're doing. Consider the following piece of code:

var klass1 = function () {};
klass1.prototype = {
  foo : "Hello"
};

var klass2 = function () {
  var pub = {
    bar : "World"
  };

  return pub;
};

var obj1 = new klass1();
var obj2 = new klass2();

console.log(obj1 instanceof klass1);
console.log(obj2 instanceof klass2);

The output will be:
true
false

This means that in the case of klass2, we're loosing the prototype inheritance to gain an easy-to-write private scope.

Mea culpa, I also use this way the most of time, but we need to pay attention if we tent to create an architecture based on some kind of interfaces.

Thanks again!

Regards,

Maciek

T.J. Crowder said...

@Maciek:

Thanks!

The main issue with klass2 is that it overrides the creation of the new object (by returning a different object out of the constructor function, rather than augmenting this). In fact, the way klass2 is written, it wouldn't matter whether you used new with it or not, because it's going to create its own return value and completely ignore this. The resulting object will have whatever constructor (and prototype) it gets from how it's constructed. Since the constructor isn't klass2, instanceof says false.

In your example, the object's constructor would be Object, but it doesn't have to be:

var klass1 = function () {};
klass1.prototype = {
foo : "Hello"
};

var klass2 = function () {
var pub = new klass3();

return pub;
};

var klass3 = function () {};
klass3.prototype = {
bar : "World"
};

var obj1 = new klass1();
var obj2 = new klass2();

log(obj1 instanceof klass1);
log(obj2 instanceof klass2);
log(obj2 instanceof klass3);

That shows:

true
false
true

...because the object returned from klass2 is actually an instance of klass3. I wouldn't recommend doing this. Generally, you want to avoid returning objects out of constructor functions and allow the default behavior of the constructor (returning the new object created via the new keyword, e.g., this) to happen instead. If you really have to, you might return a different instance that had previously been constructed by that same constructor (the singleton pattern or some related patterns), but only in edge cases and only when it really was necessary because of an API you were using.

It would be possible to do all the module pattern stuff inside the constructor, but I prefer to do it a level further out, not least because I don't want to do all of it every time the constructor is called (and replicate all that stuff). See Private Methods in JavaScript and Simple, Efficient Supercalls in JavaScript for more.

-- T.J. :-)

Juan Mendes said...

Inline functions can still be named:

Class.prototype.doSomething = function Class_doSomething(a) {
return a;
}

I think that's better than the two step approach you've taken

T.J. Crowder said...

@Juan: If only we could. I addressed that in the post (read again just after the heading "What do we do about it?"). That's a named function expression, and it should work, but there have been bugs in various JavaScript interpreters (particularly in Microsoft's, but not just theirs) that prevent them from working correctly. See also my post Double-take and the article by Juriy Zaytsev I linked above.

stivlo said...

Before reading your article I was defining my classes as:

var app = {}; //namespace definition
app.Person = function () {
...
};

I understand yours is a better approach, but can we still have namespaces with this way of defining classes or the (small) price to pay is to have all class names in the global namespace?

T.J. Crowder said...

@stivlo: Good news: There's nothing in this that requires that the symbols be globals. I usually avoid adding any symbols to the global namespace with my code, it's just too crowded already. I usually use a scoping function around all of my code, so the various function names end up being scoped to that function, not global. E.g.:

(function() {
    function Person() {
    }
    // ....
})();


Note that Person is not global.

If you need to export a symbol to the global namespace, using a scoping function feeds into that quite naturally. There are three popular ways:

1. Using a var:

var Person = (function() {
    function Person() {
    }
    // ....
    return Person;
})();


2. Explicitly assigning to window:

(function() {
    function Person() {
    }
    // ....
    window.Person = Person;
})();


3. Implicitly assigning to window (via this):

(function() {
    function Person() {
    }
    // ....
    this.Person = Person;
})();


That last works because of the way we call the scoping function. We call it directly, so we know that within the call, this will be the global object (which is window on browsers, but the nice thing about doing it that way is that the code isn't browser-specific).

HTH,

-- T.J. :-)

stivlo said...
This comment has been removed by the author.
stivlo said...

Great, thank you! I've already experimented your suggestions:

JavaScript, defining a class, first way

JavaScript, defining a class using namespaces

T.J. Crowder said...

I've had to remove a comment posted yesterday. If you're the person who posted the comment, the comment was fine but the name associated with it had multiple swearwords in it. Feel free to repost with a non-offensive name.

Hari Prasad said...

Thanks for a really nice post.

Hari Prasad said...

Thanks for really a nice post about JavaScript.