Monday, 3 March 2008

Poor misunderstood 'var'

It seems most programmers coming to JavaScript from C, C++, Java, and the like equate the var statement with variable declaration statements in the languages they come from and use it the same way. And at some casual level that's reasonable; but it can lead you down a misleading path...

Consider this code:

function foo()
{
var ar;

// ...do some stuff, create an array in 'ar'...

for (var index = 0; index < ar.length; ++index)
{
doSomethingWith(ar[index]);
}
}
This is a common idiom, but a misleading one. You might think that index is only defined within the for loop (that's certainly the impression we're giving in the code). But it's not true: In fact, index is defined throughout the function -- within the loop, outside the loop, above the loop, and below the loop. The var statement defines a variable within the current scope (all of it, not just "from here on"), and unlike some other languages, in JavaScript blocks don't have any effect on scope; only functions introduce a new scope.

Consequently, the above function can be written as it is above, but also with the index declaration...

...at the top:
function foo()
{
var ar;
var index;

// ...do some stuff, create an array in 'ar'...

for (index = 0; index < ar.length; ++index)
{
doSomethingWith(ar[index]);
}
}
...at the bottom:
function foo()
{
var ar;

// ...do some stuff, create an array in 'ar'...

for (index = 0; index < ar.length; ++index)
{
doSomethingWith(ar[index]);
}

var index;
}
...anywhere in the middle:
function foo()
{
var ar;

// ...do some stuff, create an array in 'ar'...

for (index = 0; index < ar.length; ++index)
{
var index;
doSomethingWith(ar[index]);
}
}
...or even all of them!
function foo()
{
var ar;
var index;

// ...do some stuff, create an array in 'ar'...

for (var index = 0; index < ar.length; ++index)
{
doSomethingWith(ar[index]);
}

var index;
}
We can get away with that last one because a var statement defining a variable that already exists in the current scope does not replace the variable (this is what keeps you from accidentally masking your function's arguments, or even the arguments array that's provided for you).

This seems like an odd way to define the var statement until you get into the plumbing of JavaScript and how it sets up calls to functions. You can get into some of that by reading my earlier post, Closures are not complicated, but the net effect of the plumbing is that all var statements are treated as though they were at the top of the function (if they have initializers, those become assignments and stay where they are).

So does that mean that the common idiom of declaring an indexer within the loop statement is "wrong"? Well, that's a matter of perspective, and the older I get the more experience I accumulate, the less I think in terms of absolutes like right and wrong. The language spec allows it, so in that sense it's not "wrong". In some ways, it's sort of a shorthand way of telling the next person reading the code that you're going to use it for the loop (and only for the loop, right?), so in that sense perhaps it's not "wrong".

But the further your code gets from expressing what's really happening, the easier it is for someone reading the code later (perhaps you!) to get the wrong end of the stick and introduce a problem. For example, suppose you have a 30-some-odd-line function and the loop appears in within the body of a conditional about two-thirds of the way down:
function foo(someArray)
{
var thingy;
var otherThingy;

// ...20 lines of code...

if (thingy > otherThingy)
{
for (var index = 0; index < someArray.length; ++index)
{
doSomethingWith(someArray[index]);
}
}

// ...10 more lines of code...
}
Six months after you write this, Mike edits the function and needs to remember the index of something at the top so he can do something with it at the bottom; he declares an "index" variable, sets index at the top, and then uses it at the bottom, having missed the loop:
function foo(someArray)
{
var thingy;
var otherThingy;
var index;

index = findSomething(someArray);

// ...20 lines of code...

if (thingy > otherThingy)
{
for (var index = 0; index < someArray.length; ++index)
{
doSomethingWith(someArray[index]);
}
}

// ...10 more lines of code...

restoreSomething(someArray, index);
}
Mike's introduced a bug, an irritating, intermittent bug. Sometimes the restoreSomething call at the end fails for some reason; not always, mind, but sometimes. (Because index gets set by the loop, but only when thingy > otherThingy.)

Obviously, this bug could have been avoided if Mike had read through the entire function carefully before making his mods. Or if you'd chosen a different name for your index variable (in hopes of reducing the odds of Mike using it). Or it could have been caught by thorough unit tests that explore all conditions (and then Mike would have to go back and fix it).

But let's throw Mike a bone, eh? If we declare the variable in the text in the same place it's defined by the interpreter at runtime, we help him avoid making the mistake in the first place. And we like Mike, we don't want to trip him up...right?

Regardless of your decision about how to write your code, though, understanding what var is really doing can help you get that code doing what you want it to do.

2 comments:

kangax said...

Yet another reason to use each/forEach for enumeration as a way to encapsulate such annoying little things like conflicting index variables. We have closures - why not let them deal with it implicitly : )

Best,
kangax

T.J. Crowder said...

@kangax - True enough for this particular example, yes. But the issue isn't an issue with loop counters, that was just my example. It could be anything -- using 'element' to refer to an HTML element we're going to use, etc. It's yet another reason to keep functions short and tight, but in my view it's also a good reason for declaring all vars at the point they take effect -- the top of the function.