Crazy benchmarks

There are a lot of JVM vs. CLR "benchmarks" popping up all over the place, which are used as arguments for or against the respective environments. I am not talking about the "Petshop wars" here; I have rather looked at a few of the "little ones" that are used as ammunition in advocacy forums. The problem: Almost all of these "benchmarks" really only show how hard it is to write good and fair performance tests that give others some actual guidance.

I picked out two: On Javalobby.com, Dan Benanav shows how Swing outperforms Windows Forms by comparing the performance of the grid controls of Swing and Windows Forms. The Swing version wins easily in his version -- further down in the discussion thread someone rewrote the scenario and Windows Forms blows Swing away. The issue is that this test is so singularly focused on a specific feature that it doesn't tell anybody anything good about either environment. The same is true for Daniel Mettlers Mono benchmark. The "benchmark" tries to illustrate performance using 100000 iterations through a loop printing "Hello World" to the console. Again, this test may say a lot about console output performance but nothing about anything else.

In a time, where any desktop machine that you can buy at a supermarket has more firepower than 98% of all desktop applications will ever need in terms of GUI performance, client-side benchmarks are really "out". Even a Java application without any JIT will work more than satisfactory for most users that have >1GHz under their fingertips. Good server-side benchmarks are a different issue and still needed. But those require a lot of time, architectural thought, knowledge of proper algorithms and, more importantly, the money to buy/lease/borrow the required hardware.

The biggest "performance problem" is typically not the underlying platform, it's thoughtless programming where (proper choices of) algorithms don't play any role, where the choice between synchronous and asynchronous operations are not considered, where time is burned up in parsing and compiling query-plans for ad-hoc SQL queries instead of using stored procedures, where complex service constructs like EJB or COM+ are mindlessly used for "code consistency" reasons and not for the services they provide and so on and so on.

Only if all these things are considered, comparing platforms for raw speed may begin to make sense. However, the result is typically: "Your mileage may vary" and "It still depends on what you do".  The great side-effect of benchmark wars like those happening around the .NET/J2EE Petshop battle is that there are a lot of architectural best practices evolving in the process: These are the interesting bits, the raw numbers are of much lesser value.

All numbers are worthless if you don't look at how they were achieved and learn from the differences in choice of platform, architecture and implementation technique.

However, it'd be much more important to look at a different kind of speed: How long does it take the average developer to complete a certain task, achieveing set quality goals? How long does it take an average developer to understand code he or she needs to maintain? I don't see such benchmarks all that often.

Updated: