Thursday, 23 December 2010

Say what?

(Updated 2014/12/30)

Something that sometimes comes up in JavaScript work is the need to figure out what kind of thing you're dealing with — is it a string? a number? a Date? And JavaScript has various features to help you do that, but there are some "gotchas" to watch out for. In this post I'll talk about various strategies for figuring out what things are.

Basically, you have four tools you can use, each of which has its place:

  • typeof
  • instanceof
  • Object.prototype.toString
  • Learn to stop worrying and love the duck

Let's look at each of them:

typeof

typeof is fast but fairly limited. It basically tells you whether something is a primitive or an object, and if it's a primitive it tells you what kind of primitive it is, but for all objects other than functions, it just says "object". So:

console.log(typeof "testing"); // "string"
console.log(typeof {}); // "object"

But it's useful for, say, determining if something is a function:

if (typeof x === "function") {
// Yes it is
}

(Although in some environments, host-provided functions like alert give us "object" rather than "function".)

typeof's chief advantage is speed, although with modern JavaScript engines, speed isn't anywhere near the concern it was a few years ago.

instanceof

instanceof sort of picks up where typeof leaves off. It's good for checking whether something is a specific kind of object, e.g.:

var dt = new Date(...); 
console.log(dt instanceof Date); // true

Specifically, obj instanceof Func it looks at each object in obj's prototype chain and sees if any of them is the one that Func assigns when used to construct objects. In the example above, for instance, since dt's prototype is Date.prototype, dt instanceof Date is true. dt instanceof Object is also true, because dt's prototype's prototype is Object.prototype. If you'd set up a multi-level hierarchy of constructors and prototypes so that Cat derives from Mammal which derives from Animal, and used c = new Cat(), instanceof would be true for a c instanceof Cat, c instanceof Mammal, and c instanceof Animal (and c instanceof Object, normally).

You do have to be careful with instanceof in some edge cases, though, particularly if you're writing a library and so have less control / knowledge of the environment in which it will be running. The issue is that if you're working in a multiple-window environment (frames, iframes), you might receive a Date d (for instance) from another window, in which case d instanceof Date will be false — because d's prototype is the Date.prototype in the other window, not the Date.prototype in the window where your code is running. And in most cases you don't care, you just want to know whether it has all the Date stuff on it so you can use it.

Object.prototype.toString

Calling Object.prototype.toString is slower than typeof or instanceof, but more useful for differentiating what kind of object you have — but sadly, only if the object is one created by one of the built-in JavaScript constructor functions like Date or String, not our own constructor functions. The return value of the Object prototype's toString function is defined in the standard for all the built-in constructor functions (Array, Date, etc.): It returns "[object ___]" where ___ is the constructor function name. So:

var what = Object.prototype.toString;
console.log(what.call(new Date())); // "[object Date]"
console.log(what.call(function(){})); // "[object Function]"
console.log(what.call([])); // "[object Array]"

...etc. Note that this is very different from just calling toString on the object itself; the object probably has an overridden toString. That's why we explicitly use Object.prototype.toString above. (Remember, these things are just functions, not methods.) Object.prototype.toString works on primitives too, because it coerces its argument into an object before checking. So if you call it with a string primitive, you get "[object String]" (and "[object Number]" for numbers, etc.).

The chief advantage of this is that it doesn't care what window the object came from; referring back to our Date object d from another window, we'll get "[object Date]" regardless.

But sadly for our own constructor functions, all we get is "[object Object]":

var what = Object.prototype.toString;
console.log(what.call(new MyNiftyThing("foo"))); // "[object Object]"

Ah, well...

At one point I was tempted to wrap all of the above (plus support for my own object hierarchy) up into an uber-function that definitively figured out what the thing was. Then I thought: Enh, maybe I should just use the right tool in each given situation.

Speaking of which, our last tool:

Learn to stop worrying and love the duck

When reaching for instanceof or typeof or whatever, ask yourself: Do you really care? How 'bout looking to see if the object seems to have the things on it you need (feature detection) rather than worrying about what it is? This is usually called "duck typing," from the phrase "If it walks like a duck and talks like a duck..." Sometimes, obviously, you do care, but if you get in the habit of asking yourself the question, it's interesting how frequently the answer is "Actually, no, I don't care whether that's a Foo; I just care that it has X on it" or "...I just care that it works if I pass it into getElementById" (that latter case being, basically, it's a string or its toString does what you need). I find myself doing a lot less worrying about types and just getting on with the job these days. I've learned to love the duck. (And always remember, be kind to your web-footed friends; a duck may be somebody's mother.)

Happy coding!


Postscript: You may be wondering why I haven't mentioned the constructor property in any of the above. The answer is: Because it's not very useful, and there are common anti-patterns that mess it up. At some point I'll do an article on what's wrong with constructor...

Tuesday, 14 December 2010

Another off-topic gob-smacker

Another near-Neil-Armstrong moment (like this one) is upon us: Voyager 1 has begun transiting the heliopause, the boundary between our solar system and interstellar space. All being well, Voyager will actually go interstellar within the next five years. Yes, five years — it's a big boundary, and not an artificial one: it's where the solar wind's strength is no longer great enough to push back the stellar winds of the surrounding stars (see the linked article for a nice graphic).

For anyone keeping track, Voyager 1 was launched 33 years ago, finished its main job 21 years ago, and is still sending useful observations from its on-board instruments. Truly awe-inspiring.

Ride, Voyager, ride...

Wednesday, 8 December 2010

V8 Raises the Bar Again

Google's V8 JavaScript engine is well-known for being freaky fast, but far from resting on their laurels, the V8 team are taking things to the next level. Their latest enhancement, which they call Crankshaft, is basically HotSpot for JavaScript. They scale back the optimizations done by the main compilation step, thereby compiling scripts faster and reducing page load time, but then identify and aggressively optimize hot spots in the code (code that runs frequently — like, say, a mousemove handler). It's an approach that worked well for Sun's Java runtime, and I bet it'll work well for V8.

Thursday, 18 November 2010

Dear Computer Product Manufacturers

The place for "Compatible with" stickers is on the box, not on the product. I couldn't care less that my new monitor is "Compatible with Windows 7", and if I did care, I'd've been looking for that information before I bought it, making the sticker on the base marring what is an otherwise lovely bit of modern piano black design completely and utterly pointless. </rant> Now, where's that bottle of "Goo Gone"...

Sunday, 7 November 2010

Myths and realities of for..in

Probably the second most enduring myth about JavaScript is that the for..in statement loops through the indexes of an array. It doesn't, and if you write code that assumes it does, then even if the code doesn't break in its nice cozy nest on your computer, it's likely to as soon as you expose it to the complexities of the outside world.

In this article, I'll explain the myth, the reality of what for..in actually does, and discuss various ways to loop through the contents of an array correctly (and efficiently).

The Myth

According to the myth, you can loop through the contents of an array like this (let's assume we have a display function that writes a line of text somewhere):

var stuff, index;
stuff = ['a', 'b', 'c'];
for (index in stuff) { // <== WRONG
    display("stuff[" + index + "] = " + stuff[index]);
}

And in a limited environment, that is indeed likely to show this output:

stuff[0] = a
stuff[1] = b
stuff[2] = c

But you can't rely on that, doing so will come back and bite you.

The Reality

for..in loops through the enumerable property names of an object, not the indexes of an array. This is much more powerful than something that just loops through array indexes. Here's an example:

var person, name;
person = {
    fname: "Joe",
    lname: "Bloggs",
    age:   47
};
for (name in person) {
    display("person[" + name + "] = " + person[name]);
}

There we have an object, person, which has three properties: fname, lname, and age. The loop displays each of these properties and its value, for instance:

person[lname] = Bloggs
person[fname] = Joe
person[age] = 47

Note that the properties are not necessarily listed in any particular order (they're not alphabetical, they're not in order of when or how they were added to the object, or in any other order you can rely on; it completely varies depending on the JavaScript implementation).

Now you may be thinking "Well, sure, so it does something different with objects than it does with arrays, so what?" But it doesn't. This is an important point and so I'm going to emphasize it:

Arrays in JavaScript are just objects (with one special feature), and array indexes are just property names. In fact, the name "array" is a bit misleading, because JavaScript arrays are not like arrays in most other languages. For one thing, JavaScript arrays are not contiguous blocks of memory. For another, the indexes into them are not offsets. In fact, array "indexes" aren't even numbers, they're strings. The only reason we get away with things like stuff[0] is that the 0 gets automatically converted into "0" for us. (Don't believe me? Read Section 11.2.1 ["Property Accessors"] of the spec.) So we don't really have array "indexes" at all, we just have property names that consist entirely of the digits 0-9 (without extraneous leading zeroes). But it's convenient to call those property names "indexes."

(Update: Here I'm talking about JavaScript's standard Array type. Many environments now also support new typed arrays, which are different.)

The one special feature of arrays is, of course, their length property, which behaves in a way to make arrays seem more like arrays in other languages (I won't go into the details here, not that they're complex; check out Section 15.4 ["Array Objects"] of the spec).

"Hey wait!" you say, "Speaking of length, if for..in really loops through the property names of the object, why don't we see length in the list?!" Good question. It's because for..in loops through the enumerable property names of an object, and length is not an enumerable property. Arrays (and other objects) are defined with lots of non-enumerable properties. (We'll come back to that in a moment.)

For now, let's consider what happens if we add another (enumerable) property to the array:

var stuff, index;
stuff = ['a', 'b', 'c'];
stuff.name = "Foo";
for (index in stuff) {
    display("stuff[" + index + "] = " + stuff[index]);
}

That's a perfectly legitimate and reasonable thing to do, we've added a name property to the array. Now what gets shown? Right! Four things:

stuff[name] = Foo
stuff[0] = a
stuff[1] = b
stuff[2] = c

Because for..in loops through all of the property names of the object, not just the "indexes."

And this is one of the places we get into differences in the order of property names. Some implementations will display the above, with the name property followed by the numeric properties; other implementations will do it the other way around, with name at the end. In fact, in theory, name could be in the middle. In fact, in theory even the numeric indexes might not be done in order (there's nothing in the spec saying that they have to be), but in practice I've never seen an implementation that did the numeric indexes out of order. Regardless, it's not guaranteeed.

Okay, but as long as we don't put other properties on our array, we can use for..in as though it did just the indexes, right? Wrong. That's where we get into the complexities of the real world, because for..in doesn't list only the object's own properties, it lists all of the properties of its prototype, and its prototype's prototype, etc. So that means not only not adding anything to your specific array instance, but not adding anything to Array.prototype either — and it turns out that it's really handy to add to Array.prototype sometimes, handy enough that lots of libraries, plug-ins, and other bits of code you'll find around do it.

Consider the Array#indexOf method. This handy method finds an element in an array and returns its index. But although some implementations have had it for years, others haven't, and it was only added to the specification as of ECMAScript 5th edition (e.g., at the end of 2009). So there are still implementations that don't have it.

But what if you want to write code that assumes it's there? You could write your own function in a procedural programming way and use that instead, passing the array and the target element in as parameters. Your function could then call the indexOf method if it's there, or do a manual search otherwise. But that not only introduces overhead (an extra function call), it also isn't very object-oriented — and JavaScript is object-oriented down to its bones. So the usual solution (and in fact the one suggested by Mozilla) is to just add it if it's missing:

if (!Array.prototype.indexOf) {
    Array.prototype.indexOf = function(searchElement, index) {
        // ...
    };
}

Now you can use indexOf on any array instance, since the instances inherit the function from the prototype. Augmenting types like this is very common in JavaScript circles.

But the thing is, it creates an enumerable property, and so if we do that, then even our first code snippet at the top starts failing:

var stuff, index;
stuff = ['a', 'b', 'c'];
for (index in stuff) { // <== WRONG
    display("stuff[" + index + "] = " + stuff[index]);
}

Outputs:

stuff[0] = a
stuff[1] = b
stuff[2] = c
stuff[indexOf] = function...

Note that the function (which is, after all, just another property like any other property) now shows up in our loop.

You may be wondering why we don't see all the other functions from the Array prototype in our loops (splice, join, etc.). The answer (you guessed it!) is that they're all defined by the spec as non-enumerable properties.

So can we add our function as a non-enumerable property? As of this writing, no, but it probably won't be long before we can. The new ECMAScript 5th edition specification defines a way for us to add properties to objects with the "enumerable" flag set to false, so they don't show up in for..in loops. But as of this writing, the mechanism has only been implemented and deployed in Google's V8 JavaScript engine (which is used in their Chromium and Chrome browsers). While the other implementers aren't far behind (both Mozilla and Microsoft have it in betas of their next releases), it's not there yet. And of course, once the mechanism is widely implemented, then everyone doing it the old way has to change their code to do it the new way.

So in short: Non-enumerable properties are not going to address this issue for you any time soon. And besides, it's not really an issue. Looping over property names (not indexes!) is what for..in is for!

Now someone out there is probably thinking "But hey, this doesn't only apply to arrays! What if someone adds something to Object.prototype, won't that show up in my for..in loops, even on just normal objects?" Yes it will, but fortunately, a de-facto standard has arisen in the JavaScript community: It's okay to augment the prototypes of specific types, but not okay to augment the Object prototype. Those few projects that have done so have been swiftly beaten into submission.

Options for Looping through Array Indexes

So okay, for..in is really cool and helpful for other things, but if we can't use it to loop over the indexes of an array, what should we do instead?

There are several options:

1. Use a boring old-fashioned loop:

Sometimes, the old ways are the best:

// Boring old fashioned loop (a classic!)
var stuff, index;
stuff = ['a', 'b', 'c'];
for (index = 0; index < stuff.length; ++index) {
    display("stuff[" + index + "] = " + stuff[index]);
}

Perfectly clear to anyone who's ever programmed in any language syntactically derived from B (so C, C++, D, C#, Java, JavaScript, PHP, and about 30 others). Straight-forward.

Now of course, there are some variations on that. For when the length of the array won't change, there's the cache-the-length variant:

// Boring old fashioned loop, caching length of unchanging array
var stuff, index, length;
stuff = ['a', 'b', 'c'];
for (index = 0, length = stuff.length; index < length; ++index) {
    display("stuff[" + index + "] = " + stuff[index]);
}

Or for when order doesn't matter, there's the count-backward version to take advantage of any "greater-than-or-equal-to-zero" optimization that the language implementation may have:

// Boring old fashioned loop, counting backward
var stuff, index;
stuff = ['a', 'b', 'c'];
for (index = stuff.length - 1; index >= 0; --index) {
    display("stuff[" + index + "] = " + stuff[index]);
}

Before anyone gets into any performance wars over the above, though, some observations:

  1. Unless you actually see a performance problem, it doesn't matter what you use, use what makes sense in relation to the code you're writing. (Don't optimize prematurely.)
  2. Which one you use won't matter from a performance perspective unless the array is huge, in which case you probably have other things to worry about first.
  3. JavaScript is not C/C++/C#/Java/whatever. The source-level optimizations that make sense in C/C++/C#/Java/whatever do not necessarily make sense in JavaScript. Don't assume they do.
  4. (This is a biggie.) The optimizations that improve performance on one implementation (say, in IE) may make performance worse in other implementations (say, in Firefox).

2. Using for..in — Correctly

But the old ways aren't always best. By their very nature, JavaScript's arrays are sparse — that is, an array can have a length of 10,000 but only have two entries in it:

stuff = [];
stuff[0] = "zero";
stuff[9999] = "nine thousand nine hundred and ninety-nine";
display(stuff.length); // shows 10000

That means that if you use a boring old-fashioned loop, you may work a lot harder than you have to, because you'll execute the body of your loop whether the array has an entry there or not:

var stuff, index;
stuff = [];
stuff[0] = "zero";
stuff[9999] = "nine thousand nine hundred and ninety-nine";
stuff.name = "foo";
for (index = 0; index < stuff.length; ++index) {
    display("stuff[" + index + "] = " + stuff[index]);
}

Although that loop correctly avoids outputting the name property, it will display 10,000 lines of output, with 9,998 of them ending with "= undefined" because the array doesn't have an entry there. Now, maybe that's what you want, maybe not (depends on what the array is for).

When you don't want to do that, you can use for..in — but use it correctly:

var stuff, index;
stuff = [];
stuff[0] = "zero";
stuff[9999] = "nine thousand nine hundred and ninety-nine";
stuff.name = "foo";
for (index in stuff) {
    if (stuff.hasOwnProperty(index)  &&    // These are explained
        /^0$|^[1-9]\d*$/.test(index) &&    // and then hidden
        index <= 4294967294                // away below
       ) {
        display("stuff[" + index + "] = " + stuff[index]);
    }
}

Now we only loop three times (once for each property, the indexes and name), and we only output two lines, because the if statement's condition only evaluates true for the kinds of property names we call "array indexes."

So what's that condition doing? Two things:

  1. It uses the hasOwnProperty function that's built into all objects to tell us whether the property is on the object itself (returns true), rather than being inherited from the object's prototype (returns false). This deals with any enumerable properties that might have been added to Array.prototype.
  2. Second, it determines if the property name looks like an "array index" as defined in the specification: The name is (by definition) a string, and we use a regular expression to prove that it's in the correct standardized base-10 form (either "0" on its own, or a non-zero digit followed by zero or more digits), then we check that it's in range (0 <= index <= 2^32-2 [which is 4,294,967,294]). If the property name passes those tests, it's an array index. You may be wondering about that magic number, 2^32 - 2. Why that number? It's because the spec says that the length property will be in the range 0 to 2^32 - 1 (inclusive), and as length is one higher than the highest array index, the highest array index is 2^32 - 2. (17 July 2013: Many thanks to RobG for pointing out that my test in earlier versions of this article wasn't quite right.)

Now, we're not going to want to type that every time this comes up, so here's a function we can use that applies the correct test:

function arrayHasOwnIndex(array, prop) {
    return array.hasOwnProperty(prop) && /^0$|^[1-9]\d*$/.test(prop) && prop <= 4294967294; // 2^32 - 2
}

Which we'd use in the loop like so:

for (index in stuff) {
    if (arrayHasOwnIndex(stuff, index)) {
        display("stuff[" + index + "] = " + stuff[index]);
    }
}

If you're into extending built-in prototypes, you could even adapt that to put on Array.prototype and then call it like stuff.hasOwnIndex(index), but beware of conflicts, and use Object.defineProperty to make it non-enumerable if you can (e.g., in ES5-enabled environments).

3. Use the New forEach Function

The new ECMAScript 5th edition specification adds a function to arrays called forEach which calls a callback for each element in the array, where an element is an entry in the array whose name is an array "index" according to our earlier definition. It also guarantees that it will call them in order (according to the numeric value of their property name), lowest to highest. The function is called only for entries that actually exist. So:

var stuff;
stuff = [];
stuff[0] = "zero";
stuff[9999] = "nine thousand nine hundred and ninety-nine";
stuff.name = "foo";
stuff.forEach(function(value, index) {
    display("stuff[" + index + "] = " + value);
});

Like our for..in example earlier, this displays two lines. (Unlike our earlier for..in example, it guarantees the order — although again, I've never seen an implementation of JavaScript where the array "indexes" weren't handled in ascending order by for..in, it's just that the spec doesn't guarantee it.)

(The new spec defines several other handy Array functions aside from just forEach, it's worth taking a look.)

So should you use an old-fashioned loop, for..in correctly, or the new forEach (or one of the other functions like it?)? It depends entirely on what you're doing. For me, most of the time, a boring old-fashioned loop does the trick unless I'm using a sparse array for something, in which case I use for..in. But it's totally up to you. The main take-away point is that a for..in used incorrectly will bite you.

Happy Coding!

Friday, 8 October 2010

CSS and showing my age

Okay. I can't be the only one. Surely some of you also have trouble remembering the order of the values in CSS shorthand properties, e.g.:

margin: 10px 20px 30px 40px;
I mean, I'm an engineering kind of guy. I'm used to coordinate systems like (top,left)-(bottom,right) or (x1,y1)-(x2,y2). CSS, however, uses neither. Instead, it's top right bottom left. I expect there's a good reason.

So showing my age, but I just could not keep that in my head. I kept thinking, "if only I understood why they'd done it, maybe I could remember it." But I've gone one better, and am showing my age ^ 2: I'll just remember "tribble".

That's right. Cute furry (and ultimately dangerous) things from Star Trek: The Original Series. "tribble" as in T R B L as in Top Right Bottom Left:

         +----------------- Top
| +------------ Right
| | +------- Bottom
| | | +-- Left
v v v v
margin: 10px 20px 30px 40px;

Sorted.

I suppose I'd be remiss if I opened this topic but didn't mention that you can also specify one, two, or three values, but the mnemonic holds well enough  basically, just remember to start with the T and you'll be okay. Specifically:

         +----------------- (All four)
v
margin: 10px;

+----------------- Top & Bottom
| +------------ Right & Left
v v
margin: 10px 20px;

+----------------- Top
| +------------ Right & Left
| | +------- Bottom
v v v
margin: 10px 20px 30px;

Right, well, as there are no computers around here to reason into a conflicted meltdown, I guess I'll head off for a Saurian brandy and see if I can chat up an Orion slavegirl. I've always been partial to scantily-clad brunettes with green skin...

Tuesday, 5 October 2010

IE6, the Undead Browser

2013/05/16: Ah, what a difference a couple of years makes. :-) Today, this article can be a lot shorter, and could be titled "IE6, the Mostly Dead Browser". Huzzah!

Is IE6 finally dead?

  • In China: No, it's still more than 24% of the browser share there.
  • Everywhere Else: Yes!

I can't give the region-by-region breakdown I could below because Net Applications have started charging (a lot) for that report, but even in places like Mexico and India, which had high IE6 use a couple of years ago, use has plummeted. According to ie6countdown, after China, Taiwan is next at 3.5%, followed by India at 2.8%, and then Japan and Russia tied at 1.7%. The corporate U.S. finally "got" the security risk and has moved on to IE8 or IE9.

The take-away? If you're developing web pages and web apps for China, you must still support and test on IE6. By all means include an educational banner or something advocating change, but you must still support it. Just about anywhere else, fergedabouddit.

And what about IE7? Great new there: Worldwide usage of IE7 is currently just 1.81%. So people leaving IE6 jumped to IE8 (23.08%) or IE9 (18.17%) (and those nearly 20% at one point who were using IE7 have moved on too). We can figure a lot of those IE9 users will soon be on IE10. The IE8 users will be with us for a while, since that's as high as Windows XP goes.

Here's the old article from October 2010 (with updates in November 2010 and March 2011), just for posterity:


2011/03/14: See also Microsoft's new IE6 countdown site

2010/11/18: Updated to also reference StatCounter's figures (for May 2010).

There's a common refrain on sites where people ask for help getting things to work well cross-browser, when someone mentions needing to support IE6:

IE6 is dead. Microsoft officially stopped supporting it. I don't see any reason you should.

— comment from a StackOverflow user

IE6 is dead. don't speak of it.

— another StackOverflow comment

Some people have even held a funeral for it (Microsoft were classy enough to send flowers); others declare it Well and Truly dead.

So is IE6 finally dead?

No.

Far from it, IE6 is still (as of this writing) the third most used browser out there, at 15.55% it's just barely behind Firefox 3.6 at 17.05%, both trailing IE8 at 29.06%. (That link is for September 2010; current stats here.) StatCounter gives a slightly lower figure, making it the fourth most popular browser in the world at 9.75% in May 2010 —which makes sense; StatCounter and Net Applications have different customer bases.

Like most developers who do browser-based applications, I wish IE6 were dead. It certainly should be, and Microsoft are doing everything they can to kill it, but unfortunately it's not quite that simple. IE6 was the de-facto standard browser in large corporate and government environments for the majority of the boom of browser-based applications, and therein lies the problem. Large organizations are very slow to upgrade key bits of software. For example, many of us recently petitioned the UK Government to upgrade all government departments away from IE6. Their response was to say that it's "...not straightforward for HMG departments to upgrade IE versions on their systems...", that testing their web apps for compatibility "...can take months at significant potential cost to the taxpayer..." and that they deal with the security issues with firewalls and malware scanners. Big business is much the same as big government in this regard. You can bet that the majority of users of IE6 are sitting in a cubicle somewhere.

It's also interesting to try to find a statement from Microsoft backing up the assertion from our first commenter above that IE6 is no longer supported by Microsoft. Some point to the Lifecycle Supported Service Packs page saying that it says IE6 support ended on July 13th, 2010, but that page is about service packs, not products, and the July 13th date is only listed next to some of them; others, like the one for IE6 on Windows XP SP3 say that support ends 24 months after the next service pack is released. There hasn't been an SP4 for XP and SP3 is still supported (support for SP2 ended on the aforementioned July 13th), so... Further, that page also (now) says it's "no longer updated and scheduled for retirement," referring people to the Microsoft Product Lifecycle Search page instead. Amusingly (or perhaps I'm just laughing to hide the tears), if you use that page to find lifecycle information for Internet Explorer 6, it doesn't give an end date for extended support; instead, it says "For support dates for specific Internet Explorer 6 and operating system versions and their service packs, visit the Lifecycle Supported Service Packs site at..." Yup, that's right, they then give our first link above, the one that says it's no longer maintained. Rinse, repeat. So has Microsoft ended support for IE6? If so, those pages aren't saying so.

So what does this mean for those of us who develop web sites and applications? Well, your first thought might be that if you're writing a consumer-facing website, you can probably drop support for IE6. And that's probably mostly true, although you have to consider what it could mean to lose people browsing on their lunch break (or any time the boss isn't around) when most sites are scratching for every hit they can get, antiquated insecure non-compliant browser or no. You also have to consider where your visitors are coming from; StatCounter says that IE6 use is half as likely in the U.S., Europe, the UK, Canada, Australia, and New Zealand, where in all of those except Australia they're seeing numbers under 5% and in Australia only just barely over (as compared with 9.75% worldwide). But even StatCounter is seeing IE6 as the second-most popular browser in Asia at 20% and the top browser in Africa at 22%. So the locality (if any) and language of your site play a part.

But even in those ~5% countries, if you're building software that you want corporations to be able to adopt, I'd say that right now, today, you ignore IE6 support at your peril. Unfortunately. Maybe in another year, although with the downturn, IT budgets are pretty tight...it may take even longer than that.

There is good news, though. This time last year (September 2009), IE6 was the number one browser, at 24.42% dominating IE7's 19.39% and IE8's 16.84%. So clearly on the way out. I predict the decline will continue but flatten out as we hit the hard core corporate deployments with strapped IT budgets. (The other good news in comparing last year to this year is how IE7 is being displaced by IE8. Excellent. IE8 is a much, much better browser than either IE6 or IE7...and if people are willing to upgrade, when the time comes maybe they'll keep going to IE9.)

Happy coding.

Tuesday, 21 September 2010

A literal improvement

JavaScript literals are getting better. Until recently, the grammar for object literals didn't explicitly allow a trailing comma, like this:
var obj = {
foo: 42,
bar: 27, // <== This is the problem
};
SpiderMonkey (Firefox), V8 (Chrome), and whatever Safari and Opera use don't care, but JScript (IE) prior to JScript 6 (IE8) throws a parsing exception on the comma and your script dies. (JScript 6 / IE8 fix this.)

A trailing comma in an array literal has a different issue:
var a = [1, 2, 3, ];
All versions of JScript so far (including JScript 6 / IE8) create an array with four (yes, four) entries, the last of which is undefined. This isn't unreasonable, because the spec wasn't explicit about it and we were always allowed to have blank entries (e.g., var a = [1, , 3];) and those entries defaulted to undefined — but everyone else went the other way and created an array with three entries instead.

Fortunately, ECMAScript 5 clears this up. The trailing comma is explicitly allowed in object literals (Section 11.1.5), and in array literals (Section 11.1.4). In the case of array literals, the trailing comma doesn't add to the length of the array (a.length above is 3).

The team behind IE9 are very engaged with standards bodies now, so hopefully that includes the JScript folks and they'll change the array behavior, though you know it must be a much harder sell for them than the object literal was — it involves changing the behavior of something that did work. Still, here's hoping.

Sunday, 19 September 2010

Don't Default Destroy

Here's a little tidbit for UI designers: Don't default to destroying things. You'd think that would have been obvious, no?

No. VirtualBox is an excellent virtualization environment, one of the best in the world and possibly a contender for the top spot. But the default UI has a big UI fault.

VirtualBox, like most good VM technologies, lets you take "snapshots" of VMs that you can then restore, going back in time. Very, very handy for when you're about to do a tricky update and want to be able to roll it back.

In my case, the tricky update hard-crashed the box, which means I had to use the handy Machine | Close to terminate the VM. And lo! The Machine | Close dialog box has a handy tickbox for "Restore most recent snapshot". So I ticked the box, and it worked a treat. I went ahead and kept using the VM for several days.

Then something unrelated made the VM crash, and I did a quick Machine | Close. The next time I fired up the VM, something seemed wrong — files were missing, configuration changes I made a while back were undone, something was amiss!

You guessed it: The Machine | Close dialog box had remembered the "restore last snapshot" setting and so VirtualBox happily destroyed my data.

Should I have noticed the tickbox was ticked? Yes, but it's one UI element of about eight on that window, and the penalty for failing to notice it is unacceptably high. If I had just ticked the box, I could see not asking for confirmation — but not when the box was ticked by default.

Amazingly, this very thing was pointed out in a bug report, and the report was closed as "fixed" when they added the ability to have VMs override defaults if you explicitly set that up.

Um. Yeah. Because that addresses the usability problem. If you know you might make this mistake in the future, you can go update each and every VM you have to make it impossible to use that feature of the dialog box. And then remember to do it for all VMs you create in the future. All because some UI designer wants the purity of either remembering all the options on the dialog, or none of them. I don't think so.

So I've opened a new bug report on it. Hopefully they'll see this as the bug it is, but the point of this post is: Don't destroy things by default, you're likely to piss people off.

Thursday, 16 September 2010

Double-take

There's an issue with Microsoft's JScript interpreter (the one used by IE, Windows Scripting Host, and others) that you see mentioned deep in discussions of other things. I thought it would be worth just briefly talking about on its own.
Update: The newer version of JScript used by IE9 and up doesn't have this bug anymore. Yay! Still in IE8 and earlier, though.
Basically, if you use a named function expression in your JavaScript code, JScript will process it twice, creating two separate function objects, at two separate times: First it treats it as though it were a function declaration (even though it isn't), and then it treats it as the expression it is. This is not a distinction without a difference, either, as we'll see below. (Amongst other things, it creates "symbol bleed," putting the function's name in the enclosing scope in clear violation of Section 13 of the specification, which says that it should only be defined within the function's own scope).

What do I mean by function declaration vs. function expression (named or otherwise)? Here's a function declaration:
function foo() {
}
Here's an anonymous function expression:
var foo = function() {
};
And here's a named function expression — this is the one JScript (IE) has an issue with:
var f1 = function foo() {
};
The easiest way to tell whether you have a declaration or an expression is to ask yourself: Are you using it as a right-hand value? E.g., are you assigning it to a variable/property or passing it into a function as an argument? If so, it's an expression. If it's standalone, it's a declaration. Here are some further examples of named function expressions:
bar(function foo(){});

var obj = {
nifty: function foo() {
}
};
So first off, how do we know JScript is creating two function objects? Here's the easiest way:
var f1 = function foo() {
alert(f1 === foo); // alerts "false" on IE, "true" on other browsers.
};
f1();
So, okay, but what do we care? Well, let's say you want to hook up an event handler and have it unhook itself later if some condition is met:

Prototype example:
$('foo').observe('click', function fooClickHandler() {
if (/* ...some condition... */) {
this.stopObserving('click', fooClickHandler);
}
});
jQuery example:
$('#foo').click(function fooClickHandler() {
if (/* ...some condition... */) {
$(this).unbind('click', fooClickHandler);
}
});
Perfectly reasonable, but won't work on IE. The handler will remain attached, because when you unhook a specific event handler, the function reference you give has to be the same as the reference you want to remove. On IE, the above, it isn't: fooClickHandler isn't the same function that we hooked up. The expression returned a different function.

So how do you work around it? Just make sure you're using declarations, like this:

Prototype example:
function fooClickHandler() {
if (/* ...some condition... */) {
this.stopObserving('click', fooClickHandler);
}
}
$('foo').observe('click', fooClickHandler);
jQuery example:
function fooClickHandler() {
if (/* ...some condition... */) {
$(this).unbind('click', fooClickHandler);
}
}
$('#foo').click(fooClickHandler);
If you don't want fooClickHandler to be a symbol in that scope, wrap it up in a scoping function, like so:

Prototype example:
(function() {
function fooClickHandler() {
if (/* ...some condition... */) {
this.stopObserving('click', fooClickHandler);
}
}
$('foo').observe('click', fooClickHandler);
})();
jQuery example:
(function() {
function fooClickHandler() {
if (/* ...some condition... */) {
$(this).unbind('click', fooClickHandler);
}
}
$('#foo').click(fooClickHandler);
})();
Alternately, you could just not use names (and use arguments.callee to unhook the handler), but there are lots of good reasons not to do that (arguments.callee is slow on most browsers, not allowed in ECMAScript's new "strict" mode, and besides, names are good).

So what's this "symbol bleed" issue I mentioned? Well, according to the specification, the scope of the function name in a function expression is confined to the function itself, not the encompassing scope. So:
var f1 = function foo() {
// `foo` is defined here
};
// but not here
Whereas, of course, if that were a function declaration, the foo symbol would (of course!) be defined in the scope in which the function is declared.

(You can see this coming, can't you?) Since one of the times IE processes the named function expression it treats it as a declaration, it incorrectly defines the symbol in the enclosing scope — much like we would do if we didn't use the scoping functions above — which is incorrect.

Happy coding!

Sunday, 12 September 2010

Boot, Ubuntu, boot! Good dog!

This isn't about writing software, but it's a snippet, so...

I have a fresh new desktop based on the Intel DH57JG board on which I've happily installed Ubuntu 10.04 LTS desktop. Aside from a video mode detection issue, things are great, but I had an odd symptom: Ubuntu could successfully shut down and power off, but it would crash if I asked it to reboot. Just an annoyance, but surprisingly...annoying.

So naturally I consulted a Major Search Engine(tm)(r), and while I found several threads related to this problem, I never found one where the original questioner actually said their had been solved. I did find one where someone chimed in with "When I had that problem, adding reboot=bios to the kernel options fixed it." So I tried that, but no go. I also found a thread where someone said "I've tried the reboot b, c, and h options but..." So I figured there had to be more options for this reboot thing than just "bios" or none.

Well, it took some digging and installing the linux source (which is nicely packaged, so not like that's hard), and my machine now happily reboots when I ask it to. Because the options for reboot are a bit hard to find, here they are (for the x86 architecture) as of the Linux kernel 2.6.32-24:
warmDon't set the cold reboot flag
coldSet the cold reboot flag
biosReboot by jumping through the BIOS (only for X86_32)
smpReboot by executing reset on BSP or other CPU (only for X86_32)
tripleForce a triple fault (init)
kbdUse the keyboard controller. cold reset (default)
acpiUse the RESET_REG in the FADT
efiUse efi reset_system runtime service
pciUse the so-called "PCI reset register", CF9
forceAvoid anything that could hang.
(If you're curious, the answer in my case was reboot=acpi. At least, that was the first one I tried that worked, so I stuck with that.)

If you need a more up-to-date list, or a list for a different architecture, here's how I got that:
  1. I installed the linux-source package, which dumps a bzip2'd source tarball in /usr/src

  2. Uncompressed and untarred the tarball

  3. Looked at the arch/[architecture]/kernel/reboot.c file. Since I'm on an x86 processor on the 2.6.32 kernel, the full path in my case was /usr/src/linux-source-2.6.32/arch/x86/kernel/reboot.c but of course YMMV.
(Lest you think I was terribly clever finding the options in the source, the first place I looked was in the kernel parameters documentation —/usr/share/doc/linux-doc/kernel-parameters.txt.gz — but all that said was to look in reboot.c. :-) )

I applied the change by editing /etc/default/grub, adding that to the GRUB_CMDLINE_LINUX_DEFAULT variable, running update-grub, and rebooting (er, that is, shutting down and then starting up — rebooting would have crashed, of course).

Hope this saves someone else some time.

(If the title of this post seems oddly familiar but you can't place it, here.)

Thursday, 9 September 2010

Modern Youth

My son has now unequivocally demanded his own proper computer.

My son is not yet three years old.

At this rate, by the time he's six he'll have a better StackOverflow rep than I have.

Sunday, 5 September 2010

Beyond Either/Or

So I was doing a massive file comparison operation (hundreds of thousands of files, >100 GB of data) today on one of my Windows boxes and so naturally fired up the excellent WinMerge, my favorite — by a wide margin — Windows-based visual diff/merge tool. And it did something that is so smart, I just had to mention it.

Usually when you ask software to do an operation on a set of files you identify by wildcard or what-have-you, the software does one of two things: It either goes off and finds all of the files first, and then starts processing them; or it just starts processing and discovers the files as it goes. The former is useful for progress bars, for an early indication that maybe you've messed up your filters, etc.; and the latter is useful for those "I don't care how many there are, just get on with it!" situations.

So what was the smart thing the developers of WinMerge did? They did both. One thread went off to discover files, and another thread got on with the business of doing the comparison for me. It was exactly what I didn't know I wanted.

Are there trade-offs to this approach? Probably. You'll be asking the devices you're reading from to do two things at the same time for a while (list files and read files), which could impact overall performance. Or not, it depends a lot on what the devices are — do they have thrashing issues, are you maxing out your channel to them or does your post-receipt processing take most of the time, etc., etc. In my case, I was dealing with HDDs which presumably did have to thrash a bit (or a lot) early on, but for me it was still a really useful user experience.

So the point is, as it frequently is, to remember ask whether "both and" is an option when looking at an either/or choice. (One of probably three questions you should automatically ask when faced with such a thing, the other two being "is 'neither' an option?" and "are there more options?")

Or maybe the point is just to say "nice one" to the WinMerge devs. Either way.

Happy coding. -- T.J. ;-)

Friday, 3 September 2010

Brilliant - A periodic table of elements

Josh Duck has done a Periodic Table of the Elements for HTML5. Brilliant (and pretty). In the 20 or so years of HTML, surely someone has thought of this concept before — but if so, I'm certainly not aware of it. Props to Josh!

Wednesday, 9 June 2010

What every programmer should know about sysadmin

Link-post today: Do you write software that someone is going to have to deploy to servers and administer? Then read this question on ServerFault and its answers, most especially this answer. Then read them again.

'Nuff said.

Friday, 19 March 2010

Small is...useful

Micro-post today:

I'm an active member of a couple of places where people post questions and (hopefully) get answers to them. Time and time again I see questions like "I'm doing X, but it isn't working. Why not?" followed by 35-200 lines of code, of which the eponymous "X" is 5-10 lines of code.

People. Seriously. It's like you've never heard the maxim of the Old French Republic: divide et impera (in English, loosely: divide and conquer). Or to put it in even geekier terms: Create a minimum failing test case. There are two very good reasons for doing this:
  1. I give it 90% or better odds that if you build a minimum failing test case, you'll figure out why "X" isn't working.
  2. If (1) above doesn't work out for you, well, then you have this lovely minimum failing test case you can post so people can help you out.
See? Vincere-vincere!

Thursday, 18 March 2010

Anonymouses Anonymous

2015 Update: Since this post was written in 2010, the JavaScript specification has been updated so that functions get assigned names more than they used to, by using the name of the property or variable they're being assigned to, if possible. I mostly haven't updated this post, but I have added the odd note where the new spec changes things.
But as of ES2015 (aka ES6), all of the following functions have names:
var foo = function() { };
var obj {
    foo: function() { }
};
obj.bar = function() { };
There isn't always a useful name to use, but when there is, as of ES2015, it's used.
In my JavaScript work, I see a huge number of anonymous functions. I'm not a fan. In today's post I'd like to show the problem and how I like to get around it.

The Short Version


The short version of this post would be: Anonymous functions don't have names. Functions having names is a Good Thing(tm). So give them names.

But what fun would that be? Read on...

Anonymous Functions


First off, what do I mean by an anonymous function? Here's an example:
var a = ["Dave", "Maria", "Joe"];
a.sort(function(a, b) {
return a.length - b.length;
});
You'll recall that the Array#sort function optionally accepts a function to call to compare two entries. The function I gave it above is anonymous -- it has no name. In contrast:
var a = ["Dave", "Maria", "Joe"];
a.sort(compareByLength);
function compareByLength(a, b) {
return a.length - b.length;
}
Here, the function has a name.

But why would I want to use a named function in such a trivial example? I mean, we all know what it does, and I'm only going to use it once, so why not just define it inline as in the first example and leave it at that?

My main reason is that by giving the function a name, you help your tools help you. (I'd also dispute the "I'll only use it once" argument, but that's neither here nor there — as we'll see, you can reuse anonymous functions, and people frequently do.)

Granted the above is a trivial example, but you see anonymous functions in seriously non-trivial situations. For instance, you frequently see code like this (even this is a trivial example, but it's indicative of the larger structures people create, like the one currently used as an illustration on the main Wikipedia JavaScript article):
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
Person.prototype = {
getFullName: function() {
return this.firstName + " " + this.lastName;
},
toString: function() {
return this.getFullName();
}
};
var p = new Person("John", "Doe");
var s = p.toString();
We're creating a constructor named Person and then assigning two anonymous functions to its prototype.

"Huh?" I hear you say, "Those aren't anonymous, they have names — getFullName and toString."

Actually, they don't. (2015 update: They do now! The engine assigns the name based on the property the "anonymous" function expression is being assigned to. Much of the following is therefore outdated.) They're bound to properties with names, but the functions themselves don't have names. Don't believe me? Paste that code into your favorite JavaScript environment with a debugger, put a breakpoint on the return statement inside the function bound to the getFullName property (and isn't that a mouthful), and run until you hit the breakpoint.

What function does your debugger say you're in? What does the call stack look like?

That's right, if it's like most debuggers, it says you're in a function called (?) (an anonymous function) called from a function (?) (another anonymous function). How helpful. Stupid debugger.

Except it isn't the debugger's fault. It's not that the debugger is too stupid to see what the function's name is, it that it doesn't have one. Suppose I change that code to this:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
Person.prototype = {
getFullName: function() {
return this.firstName + " " + this.lastName;
},
toString: function() {
return this.getFullName();
}
};
Person.prototype.foo = Person.prototype.toString;
var p = new Person("John", "Doe");
var s = p.foo();
Using your same breakpoint, run it again. Again the debugger tells you you're in function (?) called from function (?). So should it be saying that the outer function's name is foo? Or toString? If you said foo, let me ask the question a different way. Let's add this to the end:
function bar(context, func) {
var a;
a = [];
a[42] = func;
return a[7 * 6].call(context);
}
var s2 = bar(p, p.foo);
So now, what's the function's name? a[7 * 6]? a[42]? func? foo? toString? All of them? None?

None of them, of course. It doesn't matter how many different ways we refer to it, the function itself has no name, and the debugger wouldn't be able to figure out a name for it in any but the most trivial cases even if we wanted it to (which we don't; how confusing would that be?!).

Do we care?


So okay, they don't have names. Do we care? Well, I do. When an exception gets thrown and reported to me by the debugger, I want to know where it was thrown. I want to know what the call stack looks like. I want to look at my list of breakpoints and see something meaningful.

What do we do about it?


Your first thought, like mine, might be to do this:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
Person.prototype = {
getFullName: function getFullName() { // <= PROBLEMATIC CHANGE HERE
return this.firstName + " " + this.lastName;
},
toString: function toString() { // <= AND HERE
return this.getFullName();
}
};
Note that I've now put function names after the function keyword; I haven't changed anything else. Seems reasonable, doesn't it?

It is, but it doesn't work — at least, not in most JavaScript implementations in the wild. Arguably it should work, but then again, arguably I should be slim, rich, and famous. Things aren't always as they should be. Those things above are called Named Function Expressions — they're function expressions (as opposed to function statements) and they have names. Internet Explorer (well, JScript in general), Safari, and several others have bugs related to them. For details, check out Juriy Zaytsev's article on the subject.

Okay, so we can't do that. How 'bout this:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
function getFullName() {
return this.firstName + " " + this.lastName;
}
function toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: getFullName,
toString: toString
};
Well, okay, that does work, but it still has some issues:
  1. I seem to be throwing all sorts of things into the global namespace.
  2. The function name toString doesn't tell me much, assuming I have several different objects that all support the toString function.
  3. I have to repeat the function name three times, just to make things work.
So not ideal, but perhaps a step closer.

Mini Modules


So let's take the global name problem first: How do I avoid polluting the global namespace? The usual way is with the module pattern: Define a scoping function, define your bits and pieces within the scoping function, and only make public (export) the things you really need to be global (and in many applications of the module pattern, that can be nothing at all).

So the Person object isn't a module, but using a scoping function is absolutely how we can avoid global names, so let's see how that could work:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function getFullName() {
return this.firstName + " " + this.lastName;
}
function toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: getFullName,
toString: toString
};
})();
There, we define an anonymous function, define our stuff inside it, and call it right away. Now our getFullName and toString functions aren't globals anymore; the only way you can access them is via the properties on the Person prototype. And yet, they have proper names. Yay! (And yes, I'm aware of the irony of using an anonymous function — my scoping function — to avoid having anonymous functions. But wait! I could name the scoping function. But then I'm polluting the global namespace. Well, I can scope the scoping function. Oh, but...)

We'll revisit that structure in a bit, but let's move on to problem #2: toString is pretty generic. My call stack won't tell me which toString I'm dealing with — the one on Person? Place? Thing? So how do we deal with that?

Well, in a funny way, by exploiting the fact that the function name has nothing to do with the name of the property that refers to the function:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function Person_getFullName() {
return this.firstName + " " + this.lastName;
}
function Person_toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: Person_getFullName,
toString: Person_toString
};
})();
There, now my call stack will show Person_toString and I'll know where I am. The property is still toString.

And a straw man in the back calls out: "But that means I'm typing 'Person' all over the place!" (An issue closely related to problem #3 from the earlier list.) To which my response is: Yes. Get over it, and configure your tools to help. Seriously. In today's world, our editors are very configurable with templates and triggers and macros and such, and to my mind the value (in a project of any size) of knowing what you're dealing with outweighs a bit of extra typing.

"But what about download size?!" someone who works on web apps calls out. And it's true, for the toString function we're saying "toString" three times instead of once, and "Person" twice (in relation to toString). Similarly, we've said "getFullName" two "extra" times (and there are its two "Person"s as well). In all, there's a fair bit of repetition in that solution.

Well, there is, but I very much doubt your final script's download size will be significantly impacted. Here's why: The names you use in this context are (I assert) a very small part of your overall script. Remember that when you call these functions, you do so through the (shorter) property name. Most of your script is probably logic. It's true we want to keep functions as small as feasible (more smaller functions is better than fewer larger functions, for maintenance and readability reasons), but my take is that this is really only needed for public functions on instances, not the private functions used for implementation of an instance.

"Private functions?" Yeah! Because having the scoping function makes it really easy to have private object functions. I've written about that aspect in some detail before, but just briefly:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function buildFullName(fname, lname) {
return fname + " " + lname;
}
function Person_getFullName() {
return buildFullName(this.firstName, this.lastName);
}
function Person_toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: Person_getFullName,
toString: Person_toString
};
})();
The buildFullName function is completely private, only the other functions in that scoping function have access to it. The way I called it there, it doesn't have access to the instance's properties, but you can either use it as an object-wide (rather than instance-wide) function (what we'd call a class function in class-based programming), or you can call it differently and it's a truly private instance function:
function Person(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}
(function() {
function buildFullName() {
return this.firstName + " " + this.lastName;
}
function Person_getFullName() {
return buildFullName.call(this);
}
function Person_toString() {
return this.getFullName();
}
Person.prototype = {
getFullName: Person_getFullName,
toString: Person_toString
};
})();
(Note how Person_getFullName now uses call to call it.)

My compromise on the repetition thing is to be explicit in my public functions, but not to worry about it in private implementation functions — because I can easily tell what object is involved from the other calls on the stack (since they're private, they can only be called from the public ones). Since in most implementations the private functions will substantially outnumber the public ones, we end up with not too much impact (and a lot of debugging/maintenance gain).

Help yourself to help your tools help you


To me, the Person structure we've been kicking around here is a bit clunky, but there are lots of things you can do to make it less so through helper functions. My usual declaration for Person would look like this (but with comments):
var Person = Class.define(function() {
var pubs = {};

pubs.initialize = Person_initialize;
function Person_initialize(fname, lname) {
this.firstName = fname;
this.lastName = lname;
}

pubs.getFullName = Person_getFullName;
function Person_getFullName() {
return buildFullName.call(this);
}

pubs.toString = Person_toString;
function Person_toString() {
return this.getFullName();
}

function buildFullName() {
return this.firstName + " " + this.lastName;
}

return pubs;
});
Yes, technically that's one line longer (not counting the vertical breaks I added for clarity), but to me it's clearer and there's little chance of my forgetting to export a public function if I do so right where the function is declared. My Class.define function creates a constructor for me (for various reasons) and then has that constructor call my initialize function. It automatically knows to call the anonymous function I hand it and use the resulting object's properties as the public members for the thing I'm defining, so I don't have to do that explicitly. For details, check out my article on efficient supercalls.

Conclusion


So that's my take on it, anyway: Help yourself, help your tools help you. Almost any time I find myself starting to write an anonymous function, I stop, write a function with a name, and use that instead — even if I'm writing a closure that will be used and then thrown away, too often I've run into exceptions or call stacks where I just couldn't figure out where I was when I used anonymous functions. Maybe the Array#sort is a bit too trivial to bother, or maybe not (hint: try it on Internet Explorer with this array: [10, "Maria", "Joe"] -- and don't you want to know which function threw that exception?).

Happy coding!

Tuesday, 16 March 2010

...by any other name, would smell as sweet

Although it's not a new problem, lately I've been seeing so many people running into a specific issue with Internet Explorer that I thought it worth just jotting down what the problem is and how to get around it.

The problem, in brief, is conflation. Internet Explorer (v6 and v7) mixes together the namespaces of the id and name attributes, which should be completely distinct from one another. id, you'll recall, must be unique within the document; it uniquely identifies an element. name, on the other hand, is not for uniquely identifying things (well, not mostly). It's used for a couple of different things, such as giving form fields the names they'll have when submitted, or giving anchors a name. There's no requirement that name values be unique — in fact, when doing forms, there are lots of reasons for using the same name for multiple fields (radio buttons, for example).

Unfortunately, the Internet Explorer engine will find elements by name sometimes when it really shouldn't, specifically when you're using document.getElementById. So for instance, say you have a form in your document with a 'stuff' field:
<input type='text' name='stuff'>
and later in that same document you have a div with the id "stuff":
<div id='stuff'>...</div>
Consider this code:
var elm;
elm = document.getElementById('stuff');
if (elm) {
alert("The element's tagName is " + elm.tagName);
}
else {
alert("Couldn't find the element.");
}
In a browser that implements document.getElementById correctly, that should alert "The element's tagName is DIV". But on IE, it will alert "The element's tagName is INPUT" because it incorrectly finds the input field. The only way around this is to change the id or name of one element or the other.

Another related problem is that id values are not case-sensitive in IE6 or IE7, although of course the standard says they should be. So it'll also confuse an element with the id (or name) 'stuff' with one with the id (or name) 'Stuff'. (To be fair, surely you and I would as well?)

There's good news, though: Microsoft does document the behavior, and they've fixed it in IE8, so there's hope for the future.

Perhaps slightly OT, but lest people accuse me of Microsoft-bashing (er, I have been known...), let me just remind everyone who brought us XMLHttpRequest (the basis of Ajax) and innerHTML, both of which I use (directly or indirectly) in nearly every JavaScript-enabled browser project I do — and I bet you do too. So hey, they got many things wrong, but got some things very right as well. Also useful to remember — as we continue to bash IE6 with repeated shouts of "Die! Die!" — just how much amazingly better (faster, less crash-prone, more feature-rich) it was than its chief rival, back in the day.

Tuesday, 9 February 2010

catch...from

A brief post off my usual JavaScript track for a digression into Java and related languages (although come to think of it, the concept below applies just as much to JavaScript as Java).

Exception handling features in languages are fantastic. They let us (if we're careful and thorough) express the mainline logic of our code uninterrupted by the niggly little detail that just about anything can fail at just about any time, and then let us handle those failures in clearly-defined "...and if that goes wrong, do this" blocks. In pseudocode:
try
{
doThis();
doThat();
doTheOther();
return result;
}
catch (ExceptionalCondition ec)
{
// Handle it
}
Lovely and wonderful. We were able to express what we're really trying to do (this, that, and the other) clearly, but then handle the fact that this, that, or the other can fail. The equivalent in, say, C would be to use the convention that returning 0 means everything was fine and returning !0 means something went wrong:
if (doThis() == 0)
{
if (doThat() == 0)
{
if (doTheOther() == 0)
{
return result;
}
}
}
// Handle the fact that something failed
(Although you probably wouldn't write it quite like that.)

So for obvious reasons I'm a big fan of exception constructs, but there's a refinement I've been wanting for a good 10+ years now (maybe it's about time I got around to submitting a JSR or something; they don't seem to be getting my telepathic messages):

catch..from

This is probably best expressed with an example:
FileOutputStream    output;
Socket socket;
InputStream input;
byte[] buffer;
int count;

// Not shown: Opening the input and output, allocating the buffer,
// getting the socket's input stream

try
{
while (/* ... */)
{
count = input.read(buffer, 0, buffer.length);
output.write(buffer, 0, count);

// ...
}
}
catch (IOException ioe from input)
{
// Handle the error reading the socket
}
catch (IOException ioe from output)
{
// Handle the error writing to the file
}
As you can see, the goal here is to separate the logic that handles errors reading from the socket from the completely unrelated logic that handles errors writing to the file.

A very basic idea, but I've never heard of anyone implementing it. This may be because I'm so shockingly brilliant that no one else has ever thought of this. (A kind reader will not immediately consider the converse argument.) Or, more likely, there are a lot of subtleties involved that make it tricky.

For instance, exceptions thrown by other objects being used under-the-covers by the socket or file output stream instance need to be handled as though thrown by the socket or output stream instance once they get to the level where we've caught them. But that shouldn't be that hard, although it would require that the stack trace record instance information.

Another wrinkle comes from checked exceptions: Does my code above throw IOException, or not? I've caught it if it's thrown by input or output, and those are the only two instances I've used methods on that may throw the exception, so no, my code doesn't throw IOException. But it'll require enhancing compilers so they can figure that out. But that, too, doesn't seem that complicated.

It's very hard to do the above (in a general way) without language support, hence the suggestion that it may be worth adding a construct to the language.

Happy coding.