peripatetic axiom

Pols' Perfect Purchaser

Andy Pols tells a great story about just how great it can be to have your customer 1) believe in testing and 2) completely engage in your development activities. Nice.

I just watched the video describing the latest incarnation of Subtext. The video shows how the liveness, direct manipulation and example–based nature of this new table–based Subtext makes the writing and testing of complex conditionals easy.

If you develop software in a way that makes heavy use of examples and tables, you owe it to yourself to know about Subtext.

A Catalogue of Weaknesses in Python

You may have read somewhere that "Patterns are signs of weakness in programming languages", and perhaps even that "16 of 23 [GoF] patterns are either invisible or simpler [in Lisp]", from which some conclude that dynamic languages don't have/need patterns, or some such.

Interesting, then, that a recent Google dev day presentation (pdf, video) provides us with a list of ~~terrible weaknesses~~ great patterns in Python.

Language

Steve Freeman ponders if maybe the reason that I find statistical properties that seem to resemble those of natural languages in the code of jMock is because jMock was written to be like a language...

Some Spa 2008 Stuff

Chris Clarke has made a post here exploring the ideas that Ivan Moore and I presented at our Spa 2008 workshop Programming as if the Domain Mattered. Ivan has written up some of his learnings from the session here. I'll be doing the same myself soon.

Chris makes this most excellent point:

I wish people would be a bit braver and use the code to express what they are trying to do and not worry about whether the way they are doing it is against Common Practice. Remember, the majority of software projects are still failures, so why follow Common Practice - it isn’t working!

Quite.

In other news, my experience report on the effects of introducing checked examples (aka automated acceptance/functional/user/whatever "tests") gets this thorough write up from "Me" (who are you, Me?) and also a mention in this one from Pascal Van Cauwenberghe.

Thanks, folks.

Design Patterns of 1994

So, Mark Dominus' Design Patterns of 1972 has made it back to the front page of proggit.

He offers a pattern-styled write up of the idea of a "subroutine" and opines that:

Had the "Design Patterns" movement been popular in 1960, its goal would have been to train programmers to recognize situations in which the "subroutine" pattern was applicable, and to implement it habitually when necessary

which would have been disastrous. Also:

If the Design Patterns movement had been popular in the 1980's, we wouldn't even have C++ or Java; we would still be implementing Object-Oriented Classes in C with structs

Actually, I think not. Because we did have patterns in the 1990's and, guess what, programming language development did not cease. Not even in C++.

Debunking Debunking Cyclomatic Complexity

Over at SDTimes, this article by Andrew Binstock contains a claim that this result by Enerjy somehow "debunks" cyclomatic complexity as an indicator of problems in code. He suggests that what's shown is that for low complexity of methods (which is overwhelmingly the most common kind of complexity of methods) increasing complexity of methods is not (positively) correlated with the likelihood of defects. Binstock suggests that:

What Enerjy found was that routines with CCNs of 1 through 25 did not follow the expected result that greater CCN correlates to greater probability of defects.

Not so. What Enerjy say their result concerns is:

the correlation of Cyclomatic Complexity (CC) values at the file level [...] against the probability of faults being found in those files [...]). [my emphasis]

It's interesting all by itself that there's a sweet spot for the total complexity of the code in a file, which for Java pretty much means all the methods in a class. However, Binstock suggests that

[...] for most code you write, CCN does not tell you anything useful about the likelihood of your code’s quality.

Which it might not if you only think about it as a number attached to a single method, and that there are no methods of high complexity. But there are methods of high complexity—and they are likely to put your class/file into the regime where complexity is shown to correlate with the likelyhood of defects. Watch out for them.

Red and Green

Brian Di Croce sets out here to explain TDD in three index cards. It's a nice job, except that I'm not convinced by the faces on his last card.

Brian shows a sad face for the red bar, a happy face for the green, and a frankly delirious face for the refactoring step. There's something subtly wrong with this. We are told that when the bar is green the code is clean, which is great. But the rule is that we only add code to make a failing test pass, which implies a red bar. So, the red bar is our friend!

When I teach TDD I teach that green bar time is something to get away form as soon as possible. Almost all the time a green bar occurs when a development episode is incomplete: not enough tests have been written for the functionality in hand, more functionality is expected to go into the next release, or some other completeness condition is not met.

It's hard to learn form a green bar, but a red bar almost always teaches you something. Experienced TDDer's are very (and rightly) suspicious of a green-to-green transition. The green bar gives a false sense of security.

Generally speaking, in order to get paid we need to move an implementation forward, and that can only be done on a red bar. Wanting to get to the next red bar is a driver for exploring the functionality and the examples that will capture it.

I tell people to welcome, to embrace, to seek out the red bar. And that when the bar is red, we're forging ahead.

TDD at QCon

Just finished a presentation of the current state of my TDD metrics thinking at QCon. The slides [pdf] are up on the conference site, video should be there soon, too.

Tests and Gauges

At the recent Unicom conference re Agility and business value I presented on a couple of cases where I'd seen teams get real, tangible, valuable...value from adopting user/acceptance/checked-example type testing. David Peterson was also presenting. He's the creator of Concordion, an alternative to Fit/Library/nesse. Now, David had a look at some of my examples and didn't like them very much. In fact, they seemed to be the sort of thing that he found that Fit/etc encouraged, of which he disapproved and to avoid which he created Concordion in the first place. Fair enough. We tossed this back and forth for a while, and I came to an interesting realization. I would in fact absolutely agree with David's critique of the Fit tests that I was exhibiting, if I though that they were for the purpose that David thinks his Concordion tests are for. Which probably means that any given project should probably have both. But I don't think that, so I don't.

Turning the Tables

David contends that Fit's table–oriented approach affords large tests, with lots of information in each test. He's right. I like the tables because most of the automated testing gigs that I've done have involved financial trading systems and the users of those eat, drink, and breath spreadsheets. I love that I can write a fixture that will directly parse rows off a spreadsheet built by a real trader to show the sort of thing that they mean when they say that the proposed system should blah blah blah. The issue that David sees is that these rows probably contain information that is, variously: redundant, duplicated, irrelevant, obfuscatory and various other epithets. He's right, often they do.

What David seems to want is a larger number of smaller, simpler tests. I don't immediately agree that more, simpler things to deal with all together is easier than fewer, more complex things, but that's another story. And these smaller, simpler tests would have the principle virtue that they more nearly capture a single functional dependency. That's a good thing to have. These tests would capture all and only the information required to exercise the function being tested for. This would indeed be an excellent starting point for implementation.

There's only one problem: such tests are further away from the users' world and close to the programmers' . All that stuff about duplication and redundancy is programmer's talk. And that's fair enough. And its not enough. I see David's style of test as somewhat intermediate between unit tests and what I want, which is executable examples in the users' language. When constructing these small, focussed tests we're already doing abstraction, and I don't want to make my users do that. Not just yet, anyway.

So then I realised where the real disagreement was. The big, cumbersome, Fit style tests are very likely too complicated and involved to be a good starting point for development. And I don't want them to be that. If they are, as I've suggested, gauges, then they serve only to tell the developers whether or not their users' goals have been met. The understanding of the domain required to write the code will, can (should?) come from elsewhere.

Suck it and See

And this is how gauges are used in fabrication. You don't work anything out from a gauges. What you do is apply it to see if the workpiece is within tolerance or not. And then you trim a bit off, or build a bit up, or bend it a bit more, or whatever, a re–apply the gauge. And repeat. And it doesn't really matter how complicated an object the gauge itself is (or how hard it was to make—and it's really hard to make good gauges), because it is used as if it were both atomic and a given. It's also made once, and used again and again and again and again...

Until this very illuminating conversation with David I hadn't really fully realised myself quite the full implications of the gauge metaphor. It actually implies something potentially quite deep about how these artifacts are built, used and managed. Something I need to think about some more.

Oh, and when (as we should) we start to produce exactly those finer–grained, simpler, more focussed tests that David rightly promotes, and we find out that the users' understanding of their world is all denormalised and stuff, what interesting conversations we can have with them then about how their world really works, it turns out.

Might even uncover the odd onion in the varnish. But let's not forget that having the varnish (with onion) is more valuable to them than getting rid of the onion.

Curiously Apt

A friend is embarking upon a conversion MSc into the wonderful world of software development. He's become interested in the currently en vogue paradigms of programming, their relationships and future. It seems to him (he says) that OO is very much about standing around a whiteboard with your friends, sipping a tall skinny latte while your pizza goes cold. And by contrast functional programming is like sitting alone, crying into your gin with your head held in your hands over some very, very, hard maths.

Perceptive. He'll go far. And not just because he's already had this other successful career dealing with actual customers (which should be a stronger prerequisite for becoming a commercial developer than any of that comp sci stuff).

Compiler Warnings

This Daily WTF reminded me of a less than glorious episode from my programming past. At a company that shall remain namless (and doesn't exist anymore) I was working on a C++ library to form part of a much larger application.

One fine day a colleague came stomping (that's the only verb that suits) over to my desk and boldly announced that "your code has crashed my compiler". Somewhat alarmed at this I scurried (yes, scurried) back to his desk. "Look" he said, and did a little cvs/make dance. Lo and behold, when the compiler got to my code indeed it fell to silently chugging away. "See," he resumed, "it's crashed."

"No," I said, "it's just compiling."

"So where," he asked, "are the messages?"

"What messages?" I replied. He scrolled the xterm up a bit.

"These. All the warnings and stuff"

"Doesn't generate any," I said.

He boggled. "What, have you found some way of turning them off?"

"No," I said, "I just wrote the code so that it doesn't generate any warnings."

He boggled some more. "Why," he eventually managed to gasp, "would you bother doing that?"

I didn't last long there. It was best for all concerend, really.

Golden Hammers vs Silver Bullets?

The old wiki page on Golden Hammers has popped up on reddit. One commentator suggests that a golden hammer seems to be the same as a silver bullet.

Probably. But I find the two phrases suggestive when placed side-by-side. To me, it seems as if a golden hammer is a tool that's very familiar, simple to apply, and words best on things within striking distance: everyday problems. All these screws sticking out all over, bam, bam, bam. A golden hammer is something we already have, and it worked great lots of time before, so lets carry on using it for whatever comes next.

A silver bullet, on the other hand, seems like something that is directed in from the outside. It is a projectile, the sender has no control over it once launched. The silver bullet is new and alien. It requires complex tooling to make it work. It is deployed against hairy monsters that jump out at you. Nothing ever worked on them before, maybe the silver bullet will?

A golden hammer, then, would be the over-applied old tool of a worker who doesn't learn new ones. The silver bullet is the disengaged outsider's agent of violent change.

I love having new distinctions to play with.

Agile 2008

Just a reminder that the call for submissions to the Agile 2008 conference closes on the 25th for Feb. I strongly encourage those who are interested in agile methods to go, and I strongly encourage those who are going to submit sessions.

I'm assistant producer (vice Steve Freeman) of the "stage" called Committing to Quality, which is a home for sessions concerned with the strong focus on internal and external quality in agile development.

I've been watching the submission system since the call opened and there are some really interesting sessions being proposed. Looks as if it's going to be a good conference.

See you there!

UK Conference Season

It's getting round to be conferece season again. If you happen to be in the UK in the next few months, maybe I'll see you at UNICOM, where I'll be talking about some adventures with automated testing.

Or perhaps at QCon, where I'll be presenting the latest news on my metrics work and joining in a panel with Beck and others, both part of the XPDay Taster track, a cross-over from the XPDay events.

Or even at Spa (that oh so magical automated testing again).

Tolstoy's Advice on Methodology

So, Tolstoy tells us that all happy families are all the same, whereas each unhappy family is unhappy in its own way. I think that the same applies to development projects: all successful projects are the same, unsuccessful ones fail in their own ways.

This occurred to me while chatting with one of my Swiss colleagues recently. The Swiss end of the business does a lot of successful RUP engagements: that's right, successful RUP. The reason that they are successful is twofold. Firstly, they always do a development case so they always carefully pick and choose which roles, disciple and the rest they will or will not use on a project. Secondly, they understand that RUP projects really are supposed to be really iterative and really incremental, that almost all the disciplines go on to a greater or lesser extent in all phases. A (different) Swiss colleague once asked me what the difference was between Agile and RUP. My only semi-flippant answer was that if you do RUP the way Philippe Kruchten wants you to do it, then not much.

Another take on DSLs, or "Why Ruby doesn't quite do it for me"

If you liked that, then you might also like this.

Research, huh! What is it good for?

...absolutely nothing, according to this screed. Yes, I know it's half a year old or so, but I only just came across it. Which is a nice segue, because it (and that I felt the need to make that qualification) partly illustrates something that annoys me not a little about the industry: a short memory. Also, not looking stuff up. While the overall message (that contemporary Comp Sci research is a titanic fraud that should be abolished) is both shrill and apparently right-libertarian propaganda, I have a degree of sympathy with it.

We are asked to consider those mighty figures of the past, working at institutions like the MIT AI Lab, producing "fundamental architectural components of code which everyone on earth points their CPU at a zillion times a day". OK. Let's consider them. And it's admitted that "some of [them] - perhaps even most - were, at some time in their long and productive careers, funded by various grants or other sweetheart deals with the State." No kidding.

Go take a look at, for example, the Lambda the Ultimate papers, a fine product of MIT. Now, MIT is a wealthy, independent, private university. So who paid for Steele and the rest to gift the world with the profound work of art that is Scheme? AI Memo 349 reports that the work was funded in part by an Office of Naval Research contract. The US Navy wanted "An Interpreter for Extended Lambda Calculus"? Not exactly. In 1975 "AI" meant something rather different and grander than it does today. And it was largely a government-funded exercise. This Google talk gives a compelling sketch of the way that The Valley is directly a product of the military-industrial complex, that is interventionist government funding. Still today, the military are a huge funder of research, and buyer of software development effort and hardware to run the results upon, which (rather indirectly) pushes vast wodges of public cash into the hands of technology firms. Or even directly: Bell Labs, for instance, received direct public funding in the form of US government contracts.

In the UK, at least, the government agencies that pay for academic research (of all kinds) are beginning to wonder, in quite a serious, budget-slashing kind of way, if they're getting value for money. So, naturally, the research community is doing some research to find out. One reason that this is of interest to me is that my boss (or, my boss's boss anyway) did some of this research. Sorry that those papers are behind a pay gate. I happen to be a member of the ACM, so I've read this one and one of the things it says is that as of 2000 the leading SCM tools generated revenues of nearly a billion dollars for their vendors. And where did the ideas in those valuable products come from? They came, largely, from research projects. What the Impact Project is seeking to do is to identify how ideas from research feed forward into industrial practice, and they are doing this by tracing features of current systems back to their sources. Let's take SCM.

The observation is made that the diff [postscript] algorithm (upon which, well, diffing and merging rely) is a product of the research community. From 1976. With subsequent significant advances made (and published in research papers) in 1984, '91, '95 and '96. Other research ideas (such as using SCMs to enforce development processes) didn't make a significant impact in industry.

Part of the goal of Impact is to:

educate the software engineering research community, the software practitioner community, other scientific communities, and both private and public funding sources about the return on investment from past software engineering research [and] project key future opportunities and directions for software engineering research and practice [and so] help to identify the research modalities that were relatively more successful than others.

In other words, find out how to do more of the stuff that's more likely to turn out to be valuable. The bad news is that it seems to be hard to tell what those are going to be.

I focus a little on the SCM research because that original blog post that got me going claims that

most creative programming these days comes from free-software programmers working in their spare time. For example, the most interesting new software projects I know of are revision control systems, such as darcs, Monotone, Mercurial, etc. As far as Washington knows, this problem doesn't even exist. And yet the field has developed wonderfully.

I would be very astonished to find that a contemporary (I write in early 2008–yes, 2008 and I still don't have a flying car!) update of that SCM study would conclude that the distributed version control systems were invented out of thin air in entirely independent acts of creation. They do mention that the SCM vendors, when asked, tended to claim this of their products).

The creation of Mercurial was a response to the commercialization of BitKeeper, and BitKeeper would seem to have been inspired by/based on TeamWare. Those seem to have all been development efforts hosted inside corporations, which is cool. I'd be interested to learn that McVoy at no time read any papers or met any researchers that talked about anything that resembled some sort of version control that was kind-of spread around the place. The Mercurial and and bazaar sites both cite this fascinating paper [pdf] which cites this posting. Which tells us that McVoy's approach to DVCS grew out work at Sun (TeamWare) done to keep multiple SCCS repositories in sync. Something that surely more people that McVoy wanted to do. SCCS was developed at Bell Labs (and written up in this paper [pdf], in IEEE Transactions in 1975)

One of the learnings from Impact is that what look like novel ideas from a distance in general turn out, upon closer inspection, to have emerged from a general cloud of research ideas that were knocking around at the time. The techniques used in the Impact studies have developed, and this phenomenon is much more clearly captured in the later papers. So what does that tell us?

Well, it tells us that its terribly hard to know where ideas came from, once you have them. And that makes it terribly hard to guess well what ideas are going to grow out of whatever's going on now. So perhaps there isn't a better way than to generate lots of solutions, throw them around the place and see waht few of them stick to a problem. Which is going to be expensive and inefficient–upsetting for free-marketeers, but then perhaps research should be about what we need, not merely what we want (which the market can provide just fine, however disastrous that often turns out to be). Anyway, once I heard somewhere that you can't always get what you want.

Back to that original, provocative, blog posting. It's claimed that, as far as the problems that the recent crop of DVCS systems address that "As far as Washington knows, this problem doesn't even exist." Applying a couple of levels of synecdoche and treating "Washington" as "the global apparatus of directly and indirectly publicly-funded research", the it would perhaps be better to say that "Washington thinks that it already payed for this problem to be solved decades ago". Washington might be mistaken to think that, but it's a rather different message.

Rubbing our noses in it this time

How hard can it be? What I want (and I know I'm not alone in this) is a 12" (ok, these days 13" if you must) MacBook Pro. With an optical drive built in. And user upgradeable RAM and hard-drive. And battery. With a pukka graphics card.

Such a machine would already be as thin and as light as I care about, and as capable as I want. But I can't have one, apparently. I could have this new "air" frippery–and doesn't it photograph well? Don't you love they way that in that 3/4 overhead shot you can't see the bulge underneath and it looks even thinner that it actually is? But really, a laptop with a off-board optical drive? Well, maybe that's the future, what with renting movies on iTunes and what have you, but...

Apple (ie, Jonathan Ive) have gotten really good at ~~recycling~~ a ~~forty year old~~ design language for products that pundits often hate and (partly because) the public loves, but this gadget seems a little too far ahead of its time. Ah well.

The Learning Curve

I'm going to let the dust settle on my recent posting on patterns, and then do a follow up—some interesting stuff has come out of it. For that, I think, I need a bit of supporting material some of which is here.

Recently this little gem resurfaced on reddit, which prompted a certain line of investigation. Normally I spread a lot of links around my posts[*] partly as aides memoir for myself, partly because my preferred influencing style is association, and partly because I believe that links are content. But I really want you to go and look at this cartoon, right now. Off you go.

Back again? Good. This "learning curve" metaphor is an interesting one. Before getting into this IT lark I was in training to be a physicist so curves on charts have are strongly suggestive to me. I want them to represent a relationship between quantities that reveals something about an underlying process. I want the derivatives at points and areas under segments to mean something. What might these meanings be in the case of the learning curve?

Most every-day uses of the phrase "learning curve" appeal to a notion of how hard it is to acquire some knowledge or skill. We speak of something difficult having a "steep" learning curve, or a "high" learning curve, or a "long" one. We find the cartoon is funny (to the extent that we do—and you might find that it's in the only-funny-once category) because our experience of learning vi was indeed a bit like running into a brick wall. Learning emacs did indeed feel a little bit like going round in circles, it did indeed seem as if learning Visual Studio was relatively easy but ultimately fruitless.

But where did this idea of a learning curve come from? A little bit of digging reveals that there's only one (family of) learning curve(s), and what it/they represents is the relationship between how well one can perform a task vs how much practice once has had. It is a concept derived from the worlds of the military and manufacturing, so "how well" has a quite specific meaning: it means how consistently. And how consistently we can perform an action is only of interest if we have to perform the action many, many times. Which is what people who work in manufacturing (and, at the time that the original studies were done, the military) do.

And it turns out, in pretty much all the cases that anyone has looked at, that the range of improvement that is possible is huge (thus all learning curves are high), and that the vast majority of the improvement comes from the early minority of repetitions (thus all learning curves are steep). Even at very high repetition counts, tens or hundreds of thousands, further repetitions can produce a marginal improvement in consistency (thus all learning curves are long). This is of great interest to people who plan manufacturing, or other, similar, operations because they can then do a little bit of experimentation to see how many repetitions a worker needs to do to obtain a couple of given levels of consistency. They can then fit a power-law curve trough that data and predict how many repetitions will be needed to obtain another, higher, required level of consistency.

Actual learning curves seem usually to be represented as showing some measure of error, or variation, starting at some high value and then dropping, very quickly at first, as the number of repetitions increases.

Which is great is large numbers of uniform repetitions is how you add value.

But, if we as programmers believe in automating all repetition, what then for the learning curve?

[*] Note: for those of you who don't like this trait, recall that you don't have to follow the links.