Recently, I expressed to Colin the sense of “disorientation” every Java (or other “traditional”) programmer must feel at diving into the client side and trying to write idiomatic, performant code. Every reasonable coder quickly develops a “gut” feeling for the costs (both in design, and resources) for the various primitives they have at their disposal. For timings, which is mainly what I am going to talk about in this posting, this might not take the form of actual hard numbers, but certainly a kind of “order of magnitude” relative estimate.

What I mean here is to know, generally within a factor of 2 or 3, how much one operation is likely to cost relative to another one, and to have a similar kind of “fuzz factor” with respect to actual performance on a particular kind of machine you are familiar with. Back in the days when I was starting on RSF, pinned to my desk was the original “How Long, Oh Lord?” which I reproduce here:

How Long, Oh Lord?
Method Call 8ns
ThreadLocal Get 60ns
Constructor 100ns
Reflective Call 700ns

For reference, these timings were on a Windows JDK 1.4 running on a relatively weak machine in 2005, perhaps something like a c.2Ghz P4.

Anyway, to skip to the present, starting work on the RSF/Fluid client-side renderer, I felt similarly at sea on how to plan for a design that runs efficiently. Javascript is starting to emerge from its “ghetto folkloric” period and to become the target of determined engineering efforts, and there are a number of resources round the web focused on Javascript performance issues, such as John Resig’s excellent blog entry collection, and a splendid presentation by Julien Lecomte of YUI

Some of the “learnings” I report here in “How Long, Oh Lord 2008″ are part of Julien’s presentation, but others I have not seen before. The workhorse of these estimates is a microbenchmark collection I have put together at (this is just accidentally for now embedded in the middle of a wider Java/RSF/JSON/Maven development structure).

The benchmarks are run in (currently 5) “batches” of say 10,000 iterations, the most extreme measurements are discarded, and the average of the remaining three batches are reported as a headline figure. This is an attempt to defeat random variation caused by system business, GC collections and the like.

Here is a sample row and table output from the system:

Count Name 1 2 3 4 5 Median ave
100000 Bare Function Call 125.0ms: (1.250us/call) 93.00ms: (0.930us/call) 94.00ms: (0.940us/call) 109.0ms: (1.090us/call) 110.0ms: (1.100us/call) 104.3ms: (1.043us/call)

Function Calls Considered Harmful

I think the most central result I want to report here is the truly fantastic expense of function calls in Javascript, which to my mind generally cuts at the heart of the possibility of orderly development.

The bad news is that a simple function call costs around 3 microseconds on a typical machine in Firefox 2.x. Opera builds are generally much better, around the 1s mark. The “better” news going forwards is that current builds of Firefox 3 betas are vastly improved, down to 350ns on the same hardware. All the same, the ground reality is that in the vast majority of installed system for a number of years, a Javascript function call is going to cost something in the region of 1 microsecond. Let’s put that figure into perspective with a tour of our glorious history – the heroic IBM System/360 Model 30 (Announced April 7, 1964) had a machine cycle time of 1 microsecond. On the personal computer front, the Commodore PET (January 1977) was the same. In this respect (and some others) therefore Javascript sets back computer science 30 years or more. This is a little tongue-in-cheek – but not entirely. Clearly even on the most up-to-date environment, a function call in Java, say, will not execute in a single machine cycle. But the discrepancy is more like 1 order of magnitude, rather than 3 as it is in Javascript.

Why is this, and what does it mean?

Javascript function calls, as the most skilled practitioners of Javascript (Crockford, etc.) will explain, are vastly powerful, vastly underappreciated pieces of machinery. Every invocation of a Javascript function brings into existence a highly complex structure, a dynamic scope chain which will allow variables to be referenced long after a function has returned or else, require them to be garbage collected, piece by piece. Therefore a JS “stack frame” is really very much more like an “object” in a classic “OO” language in its capabilities. More on this later. This means that a truly efficient implementation of JS function calls is essentially impossible – although the Firefox 3 team are having a thoroughly good try. What is particularly happy about this situation is that, as Crockford frequently laments, the true power of Javascript functional expressiveness is lost on 99% of its practitioners, who are typically trained if at all) in far less mind-bending idioms. And for those 1% who do truly understand what a JS function call could do, how many can make more than a handful of the function calls they write count for their full power? :P

What does this mean? It means that truly performant Javascript code is really obliged to drop back more than decades in its structural idiom. The function call is the primary, the first and most useful, unit of modular abstraction in programming, and these costs imply that efficient Javascript code must degrade to consist of few, extremely large function bodies, whereas a design based on fine-grained, flexibly composed units would be much more comprehensible and easy to maintain.

The most skilled Javascript programmers, framework authors, have been aware of this (implicitly or explicitly) for a long while, and it is clear that their code is following suit. As one example, TrimPath Templates a very performant macro-style templating language, is implemented with a large work-horse method (actually TrimPath uses a number of other performance tricks which will be touched upon later). As another, JQuery‘s central workhorse, JQuery.find() is likewise a single monster method. In this and many other areas, “stupider is better” is often the correct answer – cut and paste, traditionally the anathema of programming, has a new lease of life in Javascript as it is frequently superior to simply repeat frequently occurring, especially small, snippets of code, rather than factor them into function calls.

Scoping and storing bugbears

The pattern of which operations in Javascript are performant and which are not is interestingly skew. To go into other results from “perftest”, we can discover a rule of thumb that one object property access is roughly equivalent to four string comparisons. For example, when testing a string value to be one of four possible values, looking it up as an Object member is roughly breaking even with writing out the sequential comparisons.

Similarly, access to objects in outer scopes carries a price. For example, again on FF2.x, simply accessing a variable not declared at the current function scope carries a penalty of 300ns. Vast, in “traditional” terms.

Facts like this militate for numerous “hoisting” tricks – essentially if a given expression or variable is accessed more than a single time in a particular function scope, it will almost always make sense to assign it to a local variable before use. This is obviously much more important for Object members than for “outer scope” variables. Julien Lecomte covers this in his presentation – a further comparison worth noticing is that function scopes are actually more more efficient containers for data than Objects. This further pushes forwards the Crockfordian view of “function scopes” as being the primary units of abstraction and storage in an “idiomatic Javascript” program, rather than traditional “Objects”. If one is going to pay 1 microsecond for a function call, better make it count – but better to pay 3 microsecond for a call and 300 ns for an access, than to pay 5 microseconds to construct an Object and then 500 ns per access (Firefox 2.x figures – FF3 and Opera are better, but the relative picture is unchanged).

An unfortunate discovery

On Firefox 2.x the already dim picture is muddied further by the discovery that simply invoking a function declared at another function scope nearly doubles the cost – even if no variables are declared or accessed at the outer scope. Since this “trick” is the now standard modern recommdation for implementing namespacing (Crockford on Privacy) this automatically penalises those who seek to organise their code better. Again blissfully this penalty is gone from Firefox 3.

Finding performant primitives

Some Javascript operations essentially retain near-native performance. Boolean tests, numeric operations, local control structures such as loops and branches, all proceed pretty quickly. This would seem to at least allow the possibility of some high-performing algorithms (assuming they were structured as giant function blocks) were it not for another really unfortunate Javascript feature – the lack of any bulk “primitive” storage. Java’s primitive array support was often widely derided as a really irritating and unorthogonal language feature, but actually is the only thing that saves it as a data processing language. In Javascript one has only Strings – and even a simple charAt() call causes a complete object instantiation (as a side tip – use of the less-well-known String.charCodeAt() followed by a (much less readable) numeric test is around 20% faster than a charAt() plus String test). There is simply nowhere to put extensive data since every Array is an Array of Object.

This leaves one to fall back on Strings and String tricks. An interesting discovery is that String.indexOf() is a startlingly rapid operation – after what appears to be a standard “native call overhead” of something like 1.5 microseconds (abouthalf a “standard function call”) (again FF 2.x), the actual search occurs at fully native speeds. This implies you can find a given String amongst around 10K of data practically as fast as you can find it amongst 100 bytes. TrimPath makes very fruitful use of indexOf to gain a very good speed boost, helped by its simple and “easily recognisable” templating format.

Regexes can take up a lot of slack in some cases, but their overhead is very much higher than an indexOf. As Lecomte notes, a Regexp.test() is *somewhat* faster than a Regexp.exec() but actually not startlingly so, since the test() appears to require no “user-visible” memory allocation. Any Regexp invocation carries a standard overhead of about 2 function calls, so use with care on simple tasks.

Garbage in, garbage out

Investigating the costs of garbage in Javascript rounds out most of this discussion. It is very interesting to repeat the performance tests on a “huge browser image” runtime, since this immediately exposes the kinds of operations which create garbage and those which do not. I am somewhat untypical in that I am frequently running a system with around 70 top-level Opera frames (it is the only browser that will cope with this sensibly) and perhaps 20 Firefox frames. In a browser image like this, the fact that all Javascript objects are typically thrown into a global shared heap, rather than the per-frame heaps one might perhaps wish for is painfully exposed. Running perftest.js in a “mega-Opera” shows the allocating tests running perhaps 100 times slower – the data becomes very spiky, but a single micro-test can end up taking a full millisecond rather than a microsecond.
This exposes that although function calls are “like” Object (heap) allocations, they are still underlyingly dealt with by different, and more efficient data structures – even in a “mega-Opera”, a bare function call is still taking 1 microsecond, whilst a true heap allocation is running at a millisecond or more. “mega-Opera” is a highly untypical environment, but it demonstrates at extremes the superior scaling costs of function frames over true garbage.

If there is a simple take-home message from all of this, it is -

i) Javascript function calls are hugely expensive compared to traditional environments, take great care not to factor your code too deeply,

ii) Javascript function calls are sufficiently powerful to take over many of the traditional roles of “Objects” in traditional environments, so consider designs that operate closure-based storage and reserve your true Objects for data transfer,

iii) make judicious use of the primitive operations which are provided (indexOf and regexps) where possible to make your code run reasonably.

Leave a Reply