Recently, I expressed to Colin the sense of “disorientation” every Java (or other “traditional”) programmer must feel at diving into the client side and trying to write idiomatic, performant code. Every reasonable coder quickly develops a “gut” feeling for the costs (both in design, and resources) for the various primitives they have at their disposal. For timings, which is mainly what I am going to talk about in this posting, this might not take the form of actual hard numbers, but certainly a kind of “order of magnitude” relative estimate.
What I mean here is to know, generally within a factor of 2 or 3, how much one operation is likely to cost relative to another one, and to have a similar kind of “fuzz factor” with respect to actual performance on a particular kind of machine you are familiar with. Back in the days when I was starting on RSF, pinned to my desk was the original “How Long, Oh Lord?” which I reproduce here:
|How Long, Oh Lord?|
For reference, these timings were on a Windows JDK 1.4 running on a relatively weak machine in 2005, perhaps something like a c.2Ghz P4.
Some of the “learnings” I report here in “How Long, Oh Lord 2008″ are part of Julien’s presentation, but others I have not seen before. The workhorse of these estimates is a microbenchmark collection I have put together at https://source.caret.cam.ac.uk/rsf/projects/RSFSamples/trunk/AjaxJSON/src/webapp/content/templates/perftest.html (this is just accidentally for now embedded in the middle of a wider Java/RSF/JSON/Maven development structure).
The benchmarks are run in (currently 5) “batches” of say 10,000 iterations, the most extreme measurements are discarded, and the average of the remaining three batches are reported as a headline figure. This is an attempt to defeat random variation caused by system business, GC collections and the like.
Here is a sample row and table output from the system:
|100000||Bare Function Call||125.0ms: (1.250us/call)||93.00ms: (0.930us/call)||94.00ms: (0.940us/call)||109.0ms: (1.090us/call)||110.0ms: (1.100us/call)||104.3ms: (1.043us/call)|
Function Calls Considered Harmful
Why is this, and what does it mean?
Scoping and storing bugbears
Similarly, access to objects in outer scopes carries a price. For example, again on FF2.x, simply accessing a variable not declared at the current function scope carries a penalty of 300ns. Vast, in “traditional” terms.
An unfortunate discovery
On Firefox 2.x the already dim picture is muddied further by the discovery that simply invoking a function declared at another function scope nearly doubles the cost – even if no variables are declared or accessed at the outer scope. Since this “trick” is the now standard modern recommdation for implementing namespacing (Crockford on Privacy) this automatically penalises those who seek to organise their code better. Again blissfully this penalty is gone from Firefox 3.
Finding performant primitives
This leaves one to fall back on Strings and String tricks. An interesting discovery is that String.indexOf() is a startlingly rapid operation – after what appears to be a standard “native call overhead” of something like 1.5 microseconds (abouthalf a “standard function call”) (again FF 2.x), the actual search occurs at fully native speeds. This implies you can find a given String amongst around 10K of data practically as fast as you can find it amongst 100 bytes. TrimPath makes very fruitful use of indexOf to gain a very good speed boost, helped by its simple and “easily recognisable” templating format.
Regexes can take up a lot of slack in some cases, but their overhead is very much higher than an indexOf. As Lecomte notes, a Regexp.test() is *somewhat* faster than a Regexp.exec() but actually not startlingly so, since the test() appears to require no “user-visible” memory allocation. Any Regexp invocation carries a standard overhead of about 2 function calls, so use with care on simple tasks.
Garbage in, garbage out
This exposes that although function calls are “like” Object (heap) allocations, they are still underlyingly dealt with by different, and more efficient data structures – even in a “mega-Opera”, a bare function call is still taking 1 microsecond, whilst a true heap allocation is running at a millisecond or more. “mega-Opera” is a highly untypical environment, but it demonstrates at extremes the superior scaling costs of function frames over true garbage.
If there is a simple take-home message from all of this, it is -
iii) make judicious use of the primitive operations which are provided (indexOf and regexps) where possible to make your code run reasonably.