Complexity and Test-first 2

The story came from here.

Of course, I'm not the first to notice that software (or, OO software, anyway) has the scale free property. These folks have done a bunch of interesting work on the scale-free nature of the object graph in running systems, and also in various static properties of code. I might just, though, be the first person to observe that the distributions are quantitatively different for different design methodologies.

What the Wellington group are trying to do is disprove what they call the Lego Hypothesis, that all software can be built out of lots of small interchangeable components. Instead, they claim, because of the scale-free nature of software, the biggest components in bigger systems will be bigger than the biggest ones in smaller systems. I'd read some of this research before, but had forgotten that--conciously, anyway--until reminded of it by Kent Beck. Thanks, Kent.

The story continues here.

Complexity and Test-first 1

The story began here.

Interesting to note that a similar discussion, albeit brief, sparked up on the XP egroup. I posted a link to my previous blog post there, but it seemed to get lost in the flood. Oh well. It will be interesting to compare my results with those in the Muller paper.

Anyway, I've managed to find the time to look at a very small sample of Java codebases, find their distribution of complexity, fit it to a Pareto distribution and take a look at the slope of the best fit straight line. Here's the outcome, codebases ordered by published unit tests per unit total cyclomatic complexity (where applicable):

codebase#tests/
total CC
slope
jasml 0.1001.18
logica smpp library01.39
itext 1.4.101.96
jfreechart 1.0.10.022.43
junit 3.8.20.142.47
ust (proprietary)0.352.79

A few points present themselves: each of the code bases with no tests published has a substantially lower slope (and so substantially greater representation of more complex methods) than any of those with; of those with published tests, number of tests per "unit" of complexity is positively correlated (at about 0.96, very good for "social science", reasonable for "hard science", but this is "computer science", so who knows?) with higher slope and so a preference for simpler methods.

 The story continues here