peripatetic axiom

ICSE 2012 Roundup

2012-07-21T22:07:00.003+01:00

Several folks have asked me what I thought of ICSE 2012. Here's what.

Full disclosure: ICSE 2012 was sponsored in part by Zuhlke Engineering AG, a sister company of my employer Zuhlke Engineering Ltd, and on that basis I was an “invited industrial speaker” at the conference. However, what I write here are purely my own opinions and do not necessarily represent the opinions or policies of any of the Zuhlke Group companies.

Phew. Well, am I glad I went? Yes. Would I go again? Not in a hurry. Next year is San Francisco—not interested. 2014 is Hyderabad—very interested!

To begin at the end, at the closing plenary Lionel Briand was awarded the IEEE Computer Society's Harlan D. Mills award. In his address he appealed for software engineering research to come out of the shadow of computer science and function as a stand-alone discipline. This crystallised a thought I had half-formed during the conference which was that many of the papers presented did seem a bit like second–rate computer science. And that's not right. Software engineering should no more be cheap'n'cheerful CS than mechanical engineering should be bargain basement physics.

Economics?

It seems to me that the difference between pure science and applied science is that applied science has to be useful as well as interesting, and the difference between engineering and applied science is that engineering has to make money. A solution is good engineering if it solves the problem at hand and makes a profit. Various people are credited with saying something like

an engineer is someone who can do for $STEREOTYPICAL_SMALL_SUM _OF_MONEY what any fool can do for $STEREOTYPICAL_LARGE_SUM_OF_MONEY.

So, let's look at the programme of ICSE 2012[pdf] and search for “econmic”. There are 4 hits, all in two of the keynotes. Either Saskia Sassen's talk on Digital Formations for the Powerful and Powerless (which was met with derision, so far as the conference twitter feed suggests) and Frank–Dieter Clesle's talk on Supporting Sustainability with Software. No papers. Not one.

Here is a word cloud prepared by Adrian Kuhn of the words that are in the papers at ICSE. To me, the big words look very closed, very inward–facing, rather self–centred.

Contrast this with the programme at the ASME 2011 International Mechanical Engineering Congress (at the time of writing the 2012 programme was still being worked out). Here are some sessions titles:

An Integrated Energy, Carbon, and Economic Analysis of Reclaimed Water Use in Austin, Texas
Economic Feasibility of Nuclear Desalination for Abu Dhabi
Technoeconomic Analysis of Biomass Gasification System Utilizing Geothermal Steam for Processing
Energetic and Economic Performance of a Compressed Air Energy Storage Facility in Texas as a Function of Technical and Cost Parameters
A System Performance and Economic Analysis of IGCC with Supercritical Steam Bottom Cycle Supplied with Varying Amounts of Coal and Biomass Feedstock

And so on. This year's American Institute of Chemical Engineers annual meeting has “economic” in the very strapline, and pages of hits for session abstracts that mention economic issues. Compared to these the ICSE program, for all the good intentions and hard work that have gone in to the material, looks a bit...immature, a bit amateurish.

Papers I Liked

Laura Plonka, Helen Sharp and Janet van der Linden examined Disengagement in Pair Programming. Does it matter? Answer: often yes. They show why your pair can tune out, why that's a problem, and offer some techniques for dealing for it.

Felienne Hermans, Martin Pinzger, and Arie van Deursen built tools for Detecting and Visualising Inter-worksheet Smells in Spreadsheets (slides here). Hermans was one of the stars of the conference, as this word cloud of tweets about the conference shows. And rightly so. This paper alone is a lovely piece of work. It takes ideas from the world of programmers' programming and applies them to the much wider, and very important, world of end–user, reactive, data–flow oriented programming known as “spreadsheets”.

David Budgen and Sarah Drummond ask What Scope is There for Adopting Evidence–Informed Teaching in SE? One would hope, quite a lot. This is part of the effort to promote Evidence Based Software Engineering, the rather radical notion that perhaps software engineers should do what works, because they know it works. The paper provides a survey of SE research literature that might be used to teach students what's known to work.

And the opposite of that

Conversely, the SEMAT panel proudly announced that while the SEMAT kernel has yet neither been (1) completed nor (2) tested in industry, it is already being taught to students. This would seem to be the opposite of evidence–informed teaching. Speculation–informed, perhaps. This is enormously disappointing, as is SEMAT.

While SEMAT is advertised as something like an attempt to reboot (in the “Spiderman” sense) the whole discipline of software engineering (such as it is), it seems to have ended up as an exercise in flogging the same old dead horse. Now with checklists.

Take a look at the SEMAT logo. See the spring–bow compasses standing in for the letter ‘A’? The compass is a powerful symbol. Compasses are found in the symbolism of the Freemasons. And of the former Deutsche Demokratische Republik. According to wikipedia they stand for “precision and discernment” which I guess is what SEMAT want to invoke. They also stand for an idea of what engineering is all about tools, on intermediate work products, on the accidental instrumentality of what engineers do. Here's Alastair Cockburn's Detailed Critique of the SEMAT Initiative from 2010 which covers a lot of what concerns me about SEMAT very well. Nothing much seems to have improved since then. The ‘T’ in SEMAT is meant to stand for “theory” and a speaker from the floor observed that there didn't seem to be any, yet. I also don't believe that there is any ‘E’–for–“engineering” either. Certainly not in the sense I identified above.

The economics of research

The researchers may not be paying much attention to economics but it has an influence on them. While I was talking about these papers at a lunchtime session back at the office my boss, who used to be a full–strength professor of computer science, observed that my choice of interesting papers were all from Europe, and that my sense of “2nd rate comp. sci.” was likely fuelled by American papers.

American researchers are driven by their funding structures to pump out a continual stream of not–very–good papers, the so–called Least Publishable Unit. Whereas Europeans have a bit more breathing space, and in particular have the time to, as in the case of Plonka, go spend time with practitioners, or with people who don't realise that they even are practitioners, as Hermans did, or to take a long view of the industry, as Budgen is doing.

A Strange Feeling

During several of the sessions at ICSE I was struck by a strange, uncomfortable feeling which I subsequently identified as a consequence of having a feminist moment (if you'll allow the conceit).

Many, but certainly not all, speakers—mainly the 2nd rate CS types—spoke earnestly about these people called “practitioners” but spoke of them in a way which suggested that (1) there couldn't possibly be any of these people in the room and (2) they are other, a strange and mysterious group with peculiar motivations and interests who need to be studied, and handled, carefully and (3) they need someone strong and capable to come and help them out—all in their own best interests, of course. Ah! practitioners! Dear, silly, practitioners! You have to love them but (as we all know, don't we?) they can't really be trusted with important jobs. Too flighty and erratic. Energetic enough in a blundering sort of way, but so muddled and confused.

It took a while to realise that they meant me.

Hiring in London: Lead and Principle Consultants...

2011-10-03T17:53:00.002+01:00

Principal consultants take responsibility for particularly challenging solutions and in demanding organisational environments. They closely interact with senior project managers, customer representatives at all levels including senior management, and guide project teams. Together with the responsible project managers, they lead technical and strategic initiatives to success, ranging from critical consulting mandates to complex delivery projects. Together with business development and business unit managers, they actively expand Zuhlke’s business and develop new opportunities. This can involve taking the leading technical role in large bid

Lead consultants take decisions and provide advice regarding complex technical systems. They closely liaise with the software development team, the project manager, and customer representatives, often with a technical background. They ensure that sound technical decisions are made and subsequently realised in state-of-the-art solutions by the project team. They can take the leading role in technical consulting assignments within their specialisation area.

The role is based in London and the majority of the work takes place in UK but on occasion training and consulting engagements may be delivered anywhere in the world.

The competitive package includes 20 days of professional development time per year.

The successful applicant will be a skilled in iterative, incremental development using at least one of Java/J2SE, C#/.NET and Adobe Flex. They will be skilled at applying test-driven development using both automated unit tests and automated acceptance/functional tests in a continuous integration environment (continuous build and dev-ops experience a plus).

They will be skilled in recognizing code smells and resolving them through refactoring. The successful applicant will be able to elicit requirements as User Stories (experience of “story mapping” is a bonus) and estimation of effort using Planning Poker and Wideband Delphi. They will be skilled in coaching and supporting a development team to effectively self-organize while maintaining transparency and control of rapid development using techniques from Scrum, Kanban, Extreme Programming and other Lean or Agile approaches.

Experience in a leading role like Scrum Master is required. They will be experienced in coaching individuals and teams in effective use of current good practices in all the areas mentioned above.

Experience delivering classroom training in technical practices such as TDD or build management techniques and a track record of technical publications and conference presentations a plus.

Apply via LinkedIn, or drop me a line.

New blog for complexity study

2011-07-26T19:04:00.001+01:00

My continuing investigation into the effect of TDD on code complexity (and maybe other things, too) will continue on this new blog.

Agile has little to say about Projects

2011-07-25T14:03:00.004+01:00

At the 2011 UK Agile Coaches Gathering at Bletchley last week I convened a session about the desire for (larger) organizations to execute "projects" but the desire of people working in them to use "agile" techniques. I perceive a contradiction there.

Agile has little to say about Projects, a photo by Keith Braithwaite on Flickr.

For the purposes of the discussion I define a "project" as a management structure with a start date, and end date, a budget and a goal. (It says "goal", but subsequent scribbling makes it look as if it says "goat", prompting one of my colleagues to quip: pick which one you want to sacrifice!)

My contention is that when people talk about managing Agile work they generally mean Scrum or these days Kanban (or something very much like them) and that these techniques talk mainly about the steady state of product development and how to control that. They have relatively little to say about what happens at the beginning of a "project". How to get one signed off, for example. Kanban seems particularly weak on this. Nor what happens at the end of a "project" when a set of features go into production. Scrum seems particularly weak on that. As a side note, when I first started working with banks it took me a long time to properly grasp that when they talk about the "delivery" phase of a project they don't mean the relatively trivial bit at the front where the programmers write code.

So, the conclusions of the group were that projects exist as a management structure to mitigate a certain kind of risk in a certain way. As such, I would not agree with this tweet (in reply to one I sent about the very session:

The concept [of a project] is superfluous as it does not add business value.

the concept seems to persist as (some people) believe that having projects allows them to protect money. They might be mistaken: thinking about risk in software development is often wrong, but the idea of a project cannot be dismissed out of hand.

The group observed that different stakeholders have different concerns at different size and time scales, and these are nested. Scrum and Kanban talk mostly about the innermost few layers of the onion.

There was a concrete recommendation: when communicating from the inner part of the onion (smaller scale, shorter time frame) to the outer, do not talk about "iterations" or "cadence" or much of anything time based. Talk about money: pay something in this range, get something in that range.

Agile Charlatans

2011-07-25T11:16:00.007+01:00

At the 2011 UK Agile Coaches Gathering at Bletchley last week I convened a session which was prompted by comments like this one from reddit user grauenwolf:

I used to think people like you were quacks, but now I see that there are teams that really need your services.

Grauenwolf and I have both been on reddit or a good long while and we've had a few...full and frank exchanges of opinion about the merits of this thing called "Agile". The issues I wanted to address in the session were: what causes competent professionals like grauenwolf to think that the sort of people who would go to something like an "Agile Coaches Gathering" are quacks, and what changes their mind? How might we do less of the former and more of the latter?

The output is captured here:

The "Agile Charlatan" Problem, a photo by Keith Braithwaite on Flickr.

It needs some explanation, though.

"Quack" is an evocative word. "Snake-oil salesman". "Charlatan". We felt that these words appeal to an analogy something along these lines: Agile consultants are like bogus medical practitioners.

The poster shows a 2-by-2 matrix (naturally!): well or ill against conventional medicine or alternative therapy. The model is that conventional medicine is mainly for people who really are ill and need to get better, while alternative therapy is mainly for people who are basically well but feel entitled to be "well-er".

Yes, that is quite cynical. We did talk about the fact that some alternative therapies do work, for the things they work for, but when that happens it's not for the reason that the practitioner thinks. And we talked about how some conventional medicine is not nearly as well founded as many people would like (you) to think. But, bear with me.

So, the top right and bottom left cells of the chart are the exemplars. Bottom right is the really bad place: trying to treat a mangled road traffic accident victim with a very, very, very dilute solution of London Bus, for example. Top left is pretty bad too: pushing pills for restless leg syndrome and things even less needing of allopathic intervention.

It seems that there is a problem with some kinds of Agile coaching or consulting or training or certification practices. They could be trying to address a really serious organizational problems with techniques that cannot possibly help (and maybe they even know it), or they can be barging in to teams that are functioning perfectly well and crashing about the place changing things that don't need to change.

How could we recognise these cases? Maybe this way:

conventional medicine often leads to treatment regimes that are quite unpleasant
people often need to be talked into consulting a conventional medic (partly because of [1])

alternative therapies are often really rather enjoyable
people often self-refer to an alternative practitioner (partly because of [1])

How to avoid breaking a working team by spuriously making them "Agile"? Watch out for one-size-fits-all solutions. Watch out for inflexibility, a rejection of new ideas (on the part of the consultant or coach).

How to increase the chances that you're really helping? Look for objective evidence of improvement, change your behaviour based upon that evidence.

The final question we came up with extends the analogy. If Agile coaches are a bit like certain kinds of medical practitioners, then what is the equivalent for us of public health?

No More Than Two

2010-10-25T22:54:00.003+01:00

London’s “convenience” stores are replacing staffed checkouts with self-service robots. Some of these offer lessons in user interface design painful to behold.

This evening I watched someone attempt to use one to buy paracetamol. The particular self-service robot checkout that I saw presented the user with a message very much like “paracetamol sale: you may only buy two packs” above a yes/no button pair. What is the casual paracetamol buyer to make of this? I watched with interest as my co-emptor pondered this. And then she did the only reasonable thing: she went and got a second pack. After all, it does say you may only buy two packs.

Quite what anyone is supposed to make of the yes/no buttons following a statement not a question I don't know.

The really sad thing is that, apart from the shame of confusing your customer, the behaviour I saw here—which I believe is the only reasonably response to the message from the robot—is the exact opposite of what is intended.

fixed-length iterations: a transitional practice

2010-05-22T21:19:00.004+01:00

I find it had to think of a development practice that isn't almost certainly a transitional practice. Configuration management, maybe.

Anyway, Scrum, XP, and all the rest I've come to understand as each a record of a reaction against some bad condition, with transitional practices to get a team away form that condition to a better one. Fo example, as Michael Feathers says in this post, regular, fixed-length Iterations require (and enable) a certain kind of discipline, force a certain set of tradeoffs. And maybe doing that can help a team a lot. And maybe not, as the case may be.

That doesn't mean that, once the use of that practice has taken the team away from the bad condition then no further change to practices can be beneficial. In the specific case of regular fixed-length iterations I mainly see the application being to teams moving from a condition where no-one ever has any idea at all when anything is going to be delivered to a condition where everyone always knows a date when something is going to be delivered. In many settings that would be considered a major improvement by those paying the team. And consistently working that way is a great way for a team to gain the trust of the business.

Once that trust is established, and once the conditions are in place for frequent delivery, what need for the fixed-length iterations? I suspect that what worries a lot of people who've seen the transition from chaos to iterations is that they can't imagine a world without iterations which is not also a (return to) chaos. Michael suggests an experiment:

Suppose that you had an iteration of one week, followed by an iteration of 2 days, followed by an iteration of 1 day, followed by an iteration of one-half a day, and so on. If you still had your sanity at the end of this process, would you have learned anything? I haven’t tried it with a team yet, but here’s the thing that I hope would come across: if you apply enough ingenuity and you’ve acquired enough skill, you can deliver business value in shorter times than you can currently imagine.

That would be cool. Of course, the Kanbanista's seem to suggest going straight to that world in one step. And maybe that can work in a certain setting, and maybe not. The idea scares me, whe I look at most of the teams I help.

The aspect of this sort of thing that really interests me, though, is this: if what the Certified Scrum Masters say about software development living on the “edge of chaos” is right and if what the Cynefin people say about that being exactly the place where “emergent practice” lives then by their own argument, we would expect Scrum to consist of mostly transitional practices.

Fixed-length iterations (excuse me, “sprints”) seem like a good candidate to be one.

Government IT projects: who can politicians listen to?

2010-05-20T15:04:00.003+01:00

As you may have heard, the UK has at the time of writing a rather confusing new government. We, and I suspect, they are still trying to understand what this means.

One thing that it might mean is an opportunity for government departments to change the way they deal with their IT suppliers. Recently (ie during the Labour administration we had since 1997 until 2010), it hasn't gone well.

Accenture build a system for the RPA: not fit for purpose, £46.5 million overspend
BT, Fujitsu and others build NPfIT for the NHS: not fit for purpose, £10.4 billion (with a “b”) overspend
Fujitsu build an information system for magistrates courts: £342 million overspend
Cap Gemini build PRISM for the FCO: not fit for purpose, £34.5 million overspend

And so on. It has been estimated that the ten worst IT project failures under Labour cost the country around £26 billion. That's half the annual budget for schools.

So, a new government setting out to tame a spectacular deficit might want to bring these projects under control. Unfortunately, they get their advice from places such as Fujitsu, the very firms who do so well form these failed projects. Says Fujitsu's marketing director Simon Carter of discussions held with the Conservatives when they were still the shadow cabinet:

[the Conservatives] began to take on some of our suggestions, as they came to better understand government IT. For example, their proposal to cut IT contracts into smaller and shorter chunks was dropped as they realised they would have to act as system integrator to each of these smaller projects.

What is the taxpayer to do in the face of this sort of thing? Particularly the well–informed taxpayer who knows full well that in no way whatsoever is this argument from Mr Carter valid.

At this year's Spa conference there was a Birds of a Feather session about this issue. Some of the signatories of this petition—a petition only 62 signatures away, as I write, from the 500 needed for it to have anyone inside 10 Downing Street pay any attention to it. If you are a UK resident and would rather that the new government were not wasting money on entirely avoidable IT project failures I urge you to sign the petition and to urge others to.

T-shaped designers

2010-05-16T08:59:00.004+01:00

BBC 2 is running a series of documentaries called The Genius of Design. Programme 2 is "Design for Living" and discusses, amongst other things, the Bauhaus and its influence. It's getting on for a century since the hay-day of the Bauhaus and it's always worth being reminded of the influence it had—got any tubular steel furniture in your house or office? Bauhaus. Any lighting fixtures with push–on/push–off switches? Bauhaus. Got a fitted kitchen? Bauhaus.

The segment on the fitted kitchen was interesting. A fitted kitchen seems like a natural and obvious thing now, but the idea had to be invented. The discussion of the Frankfurt Kitchen in the programme was the start of an interesting thread. Users of the kitchen tended to be a bit ill-disciplined. Certainly the tended to disregard the labels permanently attached to the custom–made drawers and put any old thing in them. Users found that the kitchen was built to support well only certain workflows, workflows that they didn't like, didn't understand and couldn't change. Workflows devised by an architect who couldn't cook.

Meanwhile, another Bauhaus architect, Le Corbusier, is being given free reign to redesign entire cities, up to the point of making models, anyway. Filling them with great towers full of his “machines for living in”. And we know how that worked out once people started taking it seriously.

If you are a regular reader of this blog you probably know where I'm going next.

Commentators on software development often seem to exhibit a lot of discipline envy. Two common themes are that 1) our projects should exhibit the reliability of those in the “established” branches of engineering, and 2) our projects should exhibit the conceptual integrity attained by building architects.

That conceptual integrity can be a dangerous thing. Lihotzky's kitchens had a lot of conceptual integrity (and a lot of research to back that up), Corb's vision of mass housing (and its implementation by later architects) had a really astonishing amount of conceptual integrity. Neither leads to much in the way of joy for users (*). The Bauhaus architects designed a lot of chairs, none are comfortable to sit in.

One of the designers interviewed in the programme explained the problem along these lines: architects tend to be ‘I’ shaped, by which he means they have a very deep knowledge and skill in their craft, but not a lot else going on. Designers tend to be ‘T’ shaped, deep in craft but also with a breadth that touches many other disciplines. And from that breadth comes the ability to design objects that people can comfortably incorporate into their lives.

I think that the application of this thought to the software world is clear.

(*) The very few dense housing projects that Le Corbusier himself built have proven to be resilient and popular. It's the shoddy works inspired by his ideas and executed without his art that are the problem.

Tests vs checks

2010-04-27T22:37:00.003+01:00

Trying to spread the good word on "testing" vs "checking" in this article for T. E. S. T. Magazine.

Software Engineering?

2010-03-23T07:26:00.004+00:00

I wish that the people in the software industry who bang on about a need for "software engineering" showed more evidence of ever having met an engineer. The latest run at it, SEMAT, seems to be making the same old mistakes all over again.

It's not all bad. I'm pleased to see Ivar Jacobson repeat his call to move beyond "process" to "practices". On the other hand, I'm dismayed to see the SEMAT programme described at it's highest level by these streams: definitions, theory, universals, kernel language, assessments.

Really? Definitions, theory, universals? Are these really the things that the software industry is lacking? The problem here, I think, is that just as at the original "Software Engineering" conferences in the 60's the SEMAT folks have confused the retrospective coherence with which engineering (that is, the mechanical, chemical, electrical, electronic and other flavours) is described with how engineering is actually done.

Similarly, the presence of a spring bow compass in the SEMAT logo worries me. The 60's effort at SE also confused the contingent artefacts produced by engineers (which at the time were actual drawings produced with things like spring bow compasses) with the essence of what engineers do. With this logo SEMAT are associating themselves not just with tools and outputs rather than principles and practices, but with mightily outdated tools. They might as well have put a slide rule in their logo.

It really seems as if an historic mistake is about to be repeated. I need to study SEMAT more, but for now Alistair Cockburn's commentary resonates strongly with me.

He urges that SEMAT does these to things:

Look at what engineers ‘do’, not what they build.
Catch up with the state of the art in what is conventionally called engineering.

I can hope.

Innovation

2010-02-01T14:18:00.007+00:00

Last year Luke Hohmann demonstrated some of his innovation games at an XtC event hosted at the Zuhlke Ltd office in London.

One theme was that innovation is different from invention. That's a topic close to my heart. Zuhlke has a division named "Product Innovation" and indeed they innovate like crazy (for example, repurposing the optical sensor for a mouse to better control sewing machines [pdf]) but they rarely invent anything (although they do from time to time, for example a newly patented waterless urinal trap [pdf]). Knowing a bit about those folks and what they do has helped make me very sensitive to abuses of the term "engineering" as (mis-) applied to software development. But that's a story for another time.

The Shock of the Old

I've just finished reading The Shock of the Old. It goes on a bit, it's a bit repetitive, it's highly polemical and a bit repetitive. Pretty good, though. The key observation is that the history of technology as usually presented (for example, in institutions like the Science Museum) is largely bunk. This history for the most part ignores use and so also ignores folk technology and what Edgerton calls creole technology. He presents numerous case studies to show that what has been made to look like invention is really innovation and diffusion.

Quick quiz: when was the first document transmitted by fax? Answer: depending on quite what you think a fax is, sometime between 1843 and 1865.

But the average householder couldn't go and buy a fax machine until about a century later. When faxes became available to the general public that wasn't invention, it wasn't even innovation, it was diffusion.

The Revolutionary Period of Big Innovation

Bruce Eckel has come to realize that "software development has stalled". He says,

in recent years it has started to look like we're moving out of the revolutionary period of big innovation, and into a phase of relative stability.

I don't believe this for a minute.

I think the revolutionary period of big innovation in the tools of programming ended about the time that Sun dropped Self. I'd say that the gold standard development environment for mainstream languages right now is Eclipse. As Dave Ungar explains towards the end of this video about the history and influence of Self, Eclipse represents the continuation of the tool-based approach to building a programming environment developed in Smalltalk. I'd say that Eclipse, even the best-of-breed Java environment built with it, still isn't as good as the best Smalltalk environments for ease of use, productivity and fun.

Sun pulled the plug on Self about fifteen years ago. Ironically, they had to buy back the technology for making dynamic dispatch in a dynamically typed language on a VM go fast from Self project staff who had left Sun.

What has happened since then is a steady diffusion of features from good object-oriented development environments of the 80's and 90's into the mainstream. Sadly, few features of the very best object-oriented environment of that time (Self, of course) have made it through.

Sources of Diffusion

Where else are ideas diffusing from? From the mother of all demos. Unfortunately, that seam is about worked out. Alan Kay is supposed to have said "I don't know what Silicon Valley will do when it runs out of Doug's ideas." We may be about to find out. Further diffusion (disguised as innovation) in the field of living with computers may well require actual invention.

Fortunately, that seam is about worked out. We, the public (at least in high-income countries with stable governments) do now finally live in the world that Englebart invented in 1968.

And what of software development tools?

Bruce says:

no matter how good and powerful our software tools get, we are only getting a fraction of the leverage out of them that we could get.
Programming tools are no longer where the greatest potential lies.
We will get the biggest leverage, not just in programming but in all our endeavors, by discovering better ways to work together. [emphasis in original]

and I think he's right. I wonder what it is about the world that Bruce works in that has hidden this from him for so long. After all, in 2001 one group said:

We are uncovering better ways of developing
software by doing it and helping others do it.
Through this work we have come to value:

Individuals and interactions over processes and tools [etc]

Behind the manifesto is a use-based story about the history of technology. Notice that it's uncovering by doing, and not inventing. I'd like to think that Edgerton would approve. It is also a story of diffusion. In Agile and Iterative Development Larman quotes Weinberg:

We were doing incremented development as early as 1957, in Los Angeles, under the direction of Bernie Dimsdale. He was a colleague of John von Neumann, so perhaps he learned it there, or assumed it was totally natural. [...] the technique used was, as far as I can tell, indistinguishable from XP. [...] much of the same team was reassembled [...] in 1958 to develop Project Mercury, we had our own machine [...] whose symbolic modification and assembly allowed us to build the system incrementally, which we did, with great success.

What has happened is that techniques that were once restricted to research projects at the cutting edge of a technological nation's story of existential survival are now available to the mainstream.

What, I would like to know is, next?

UK Goverment IT Failures

2010-01-26T13:40:00.004+00:00

The Independent recently reported on the very, very sorry state of Government IT projects in the UK. It's a mess:

[...]the total cost of Labour's 10 most notorious IT failures is equivalent to more than half of the budget for Britain's schools last year. Parliament's spending watchdog has described the projects as "fundamentally flawed" and blamed ministers for "stupendous incompetence" in managing them.

I thought that the article missed a trick and wrote to tell them so. My letter didn't make it to the paper paper, but has appeared on their blog. It's somewhat hidden, though, so I reproduce it here.

It would be easy to conclude that large government IT projects fail primarily for technical reasons. Technical mistakes certainly are made, but the root cause of many failed projects seems to be the procurement process with which they begin. The question which is asked of the suppliers is not one that allows for the kind of answer that leads to success. It is well known in the private sector that very large monolithic projects have little chance of success, but this is the only kind for which the Civil Service seems to know how to ask.

I have no doubt that ministers are, as Michael Savage puts it, "too easily wooed by suppliers". The suppliers which we see winning government contracts again and again might also find it too easy to woo a minister. Particularly a minister and a department which would rather launch a high-profile, high-budget, high-risk project than adopt the smaller scale, incremental approach that creates few headlines when launched, but also has a much better chance of creating few headlines when it does not fail.

It's the Screaming

2010-01-22T03:17:00.004+00:00

Tim Bray observes:

The community of developers whose work you see on the Web, who probably don’t know what ADO or UML or JPA even stand for, deploy better systems at less cost in less time at lower risk than we see in the Enterprise. [emphasis in original]

That's a pretty bold claim.

Oh, certainly, the Enterprise makes a botch of IT projects on a titanic scale, and Tim gives a partial list of very high profile failures. I think this data is tainted. Firstly, these project failures (such as the UK NHS NPfIT programme) are, well, very high profile—but they aren't representative. Plenty of corporate IT projects do very OK. Secondly, the specific examples Tim gives are actually government projects. Government IT is the reductio ad absurdum of Enterprise IT. These might not be the most informative examples to look at.

On the other side of the matter, Tim cites the successes of Facebook, Google and Twitter. These are also not representative. Quickly, name three failed web 2.0 startups...that's a hard ask because those startups that don't succeed sink without trace. Which, if you believe in the Y Combinator argument, is the point. A web 2.0 startup needs so little initial investment that western civilisation can afford to throw them away in bulk without noticing. The result is a sort of Monte Carlo method for product innovation. I wonder how that's working out for them now that money is no longer so cheap.

But still, something is very wrong with enterprise IT. Would the "web 2.0" way of working help?

Maybe, maybe not. I think Tim has made the mistake of assuming that a technique that he's seen work in one context will necessarily work in another. And should be deployed there post-haste. The IT industry collectively makes this mistake about every seven years.

Some commentators have noted some of the differences between the Web and the Enterprise. In summary: web 2.0 startups don't have to integrate with a heterogeneous ecosystem of legacy systems older than the combined ages of their founders. Web 2.0 startups don't have to operate within multiple tight and complex regulatory regimes. Some were even so unkind as to observe that Tim is one of the devisors of XML and so some of the chest–deep mud that Enterprise developers have to slog through is his fault.

I think there are two more issues that Enterprise groups have to deal with that web 2.0 startups don't, and don't have a good story for.

Scale

I'm going to say that enterprise IT faces a much tougher scale problem than do web developers. Part of the web 2.0 model is that the initial user base is the founders plus their immediate circle. The problem that startups face is that if they are one of the (very) few that make it, their user base may grow very quickly. Their problem is scaleability. This is a tough problem.

The problem for the Enterprise is scale.

Citigroup, Bank of America and JP Morgan Chase all have slightly less than a quarter of a million employees. If you are deploying a "corporate IT app" at one of those organisations then you have to plan for all quarter of a million of them to hit your app at 9am their local time tomorrow. And all day and everyday thereafter. That's a different problem. And by the standards of corporate IT this is no–where near as bad as it can get.

You know that NPfIT project that Tim holds up as an exemplar of corporate IT failure? In 2008 the NHS had 1.12 million non–medical staff, 99,000 medics and dentists and 34,000 General Practitioners. Imagine that they are all going to start using your app tomorrow. To help them treat patients. Of which there are several million a week handled by NHS services. Putting up the fail whale isn't going to cut it.

And that's the real problem that corporate IT have the deal with: the screaming.

Screaming

Let's say that your web 2.0 startup thingy that you put together in record time in your buddy's mum's spare room falls over. What's the worst thing that can happen? A few snarky tweets? The odd complaining blog post? The whole point of your business model is that your service (and company) are disposable, at least in the early days. It's not as if you have a revenue stream to protect. As if!

Now lets imagine what happens when your FX app you just deployed into Mammon Inc's trading floors falls over. What happens then is that a head of desk in Farawayvia phones you up at 3 am your time and will not stop screaming at you until you get it fixed. I exaggerate for comic effect, but not by much. If you are a Web 2.0 type then you have the luxury of fixing stuff up in something like your own time. If you are a Corporate IT type then you might well have someone who believes that you, personally, are trying to steal their bonus from them raging at you until you get the damn thing working again.

Or that you, personally, are trying to kill their patient.

If you were a founder of a web startup and the users of your service had your personal mobile number and believed that five minutes of downtime could cost them personally millions of bottletops (the currency of Farawayvia, ticker symbol BTP), or lead them to a malpractice suite over a dead patient, do you think you would behave the same way that other founders do?

Solutions?

So, what's the answer? I don't there is any "the answer". Not a technical one, anyway. There are a bunch of tools and techniques that I've come to think will help a lot, and I encourage my clients who happen to be in corporate IT to use them. I fact, Tim mentions some of them in his post. He notes that startups get a lot of help from:

dynamic languages and Web frameworks and TDD and REST and Open Source and NoSQL at varying levels of relative importance

Too right. But, as Gerry Weinberg says: it's always a people problem. It always is. It's the screaming.

Get people into a place where they believe they can do the right thing and they mostly will. The technology can then, to a surprising extent, look after itself. This is the really big advantage that startups have, I think: no–one to tell them not to do the right thing. It's not that corporate IT is full of people saying "don't do the right thing" (although in the worst case that does happen). Rather it's that the social context inside any organisation big enough to call itself an "enterprise", without any actual malice on the part of any individual, works against the right thing.

I think it was Ron Jeffries that I first heard say that most of the advice given to software developers to help them do better is about as useful as telling them to be taller. Telling folks who work in corporate IT to behave more like the folks in web startups seems no more useful than that to me.

On the other hand, if one does end up working in a really toxic environment that does work against the right thing it can be worth looking around. Look around to see who, exactly, is holding the gun to your head to make you put up with it.

Complex Domains: playing with Alloy

2010-01-17T20:33:00.104+00:00

Before digging in to more Bayesian ideas I want to take a step back. The example in this previous post, which I worked through with Laurent, has an interesting property: Each subsequent test halves the remaining uncertainty about the correctness of the system under test. This is a property of the problem domain, not the solution. For a next example I want to look at a more complex domain. But what makes a domain complex (in the sense of essential complexity)? That's not a rhetorical question.

And what tools are appropriate to handling a complex domain? Let me tell you a story... (if you want to skip the story and go direct to the techie stuff, it's here)

Microcell Predictor

Back in the day I worked on a tool used by radio engineers to design mobile phone networks.

In particular I worked on a so–called "microcell predictor". This would take a description of a dense urban environment and a proposed low–power base station location and calculate the expected signal strength at various points in the area. The input was a file containing a bunch of polygons describing building footprints and some materials data (steel and glass, masonry, etc) and the base station location and properties (antenna design and so forth). The output was a raster of predicted signal strengths. This could overlay the building polygons and generate a map that the engineers could first eyeball and then if necessary analyse more closely to help them optimise the base station placement. This was a lot faster and cheaper than putting up a temporary antenna and then driving around in a vehicle with a meter measuring the actual signal strength, which was the way the very first networks were planned.

The requirement for this came from the radio specialists in the form of pages of maths describing various "semi-empirical" models of microwave propagation and how these interacted with buildings. Let's say we are looking at GSM 900. If a 900 MHz microwave photon were moving in free space it would have a wavelength of approximately 3×10⁹ ms^-1/9×10⁸s^-1or around 33mm. This makes such photons quite good at seeming to go around the corners of "human scale" structures by diffraction. To calculate that exactly would be very messy and on the boxes we used, impracticable. So we had these other methods which hid a lot of the details and gave results that the radio experts deemed good enough and that we could compute with. The input didn't have to be especially large or complicated for the prediction to take long enough that the user would give up, but the point of microcells is that they only cover a small area in a city centre anyway so that was OK. This was fifteen years ago, the techniques used now are a lot more sophisticated.

Semi-formal

We used a development process called Syntropy. It's an unusual day in which if I spend any time at all thinking about software I don't use ideas from Syntropy to good effect. Amongst other things Syntropy combines a graphical notation for object structures much like OMT with a textual notation for facts about them much like Z. Some (but not nearly enough) of these ideas made it into UML, particularly the OCL.

So, we had these mathematical requirements and we produced from them mathematically supported specifications and designs, full of ∀'s and ∃'s, and we had to turn these into working software. I learned a great deal about the art of doing that there, other parts of which are a story for another time.

The main thing for my current purpose, though, is that when I think back to those times I'm stunned by the amount of effort we put into determining if those specifications and designs were correct. The only way we knew how was to round up a bunch of seriously smart people (which luckily we had) and check these models manually. Management were smart about it and paid for us to be trained in Fagan inspection techniques, which helped a lot. But the expense! six or eight top-flight programmers in a room for a couple of hours is not a trivial investment. And to do that many times per document. Sometimes many times per page of a document, over the years that we worked on this thing.

But that was then. As it happens, more–or–less exactly then another group in the UK were using much more advanced formal methods to address a much trickier problem.

This is Now

In 2007 Sir Tony Hoare delivered a keynote at the Spa conference. He talked about the effort required to prove (really, prove) that the Mondex electronic money system was secure. The thing about Mondex is that the money is actually on the card, rather than being in the network with the card acting as a credential to allow the money to be moved. This made the Bank of England very nervous (Mondex was developed in the UK). Developing that 200 page proof was very expensive. This effort has become something of a celebrity amongst the Verified Software community.

The folks who worked on the Mondex proof were, almost certainly, much smarter than my colleagues and I who worked on the microcell predictor (sorry guys), but they seem not to have known a better way to proceed than manual checking, either. In fact they, so they say, they said at the time

mechanising such a large proof cost–effectively is beyond the state of the art

Hoare's keynote explained that between then and now, in fact in that year 2007, the problem had been re–addressed. The goal was to discover to what extent the state of the art had moved on in ten years and whether mechanisation had become cost–effective. Hoare suggested strongly that through improvements in theory and hardware that cost–effectiveness is within reach.

One of the things that came out of that effort was a model written in Alloy (note: the link was dead at the time of writing but the Alloy site is actively maintained).

Alloy is actually what I wanted to write about. Alloy seems to live at an interesting place: the intersection of proof and examples. What Alloy does is help you develop a proof of various properties of a specification by on the one hand generating examples (if the specification is consistent) or counter–examples (if it isn't).

Testing

Over time, and quite naturally, our focus changed while working on the microcell predictor. We became less interested in demonstrating that our code conformed to a design that conformed to a specification that conformed to a requirement. We became more interested that the code conformed to the users' needs. We showed this through intensive automated testing.

My boss at the time insisted that we write fully automated tests for every function we wrote. He had an automated testing framework that he carried around in his head and regenerated at each new place of work. I think he had learned this from previous boss of his and the framework had, IIRC, originally been written in Pascal. So we crated a C++ version and off we went writing tests and I can't begin to mention the number of times that writing the tests, and running them, again and again and again, was crucial to overcoming what would otherwise have been show–stopping problems.

It was especially interesting that one of the guys on the team, a real code–basher and much better programmer than I am, built a graphical test runner (polygons, remember) that let you see what the code was doing as a test ran. See the building footprint polygons, see the triangulation of the line–of–sight region, see the first–order rays from the antenna to the corners of buildings, see the second–order virtual sources, see the triangulation of their line–of–sight, and so on. See it in all these various scenarios, each devised specifically to check that some particularly interesting feature of the problem was dealt with correctly. At one time I had several sheets of big flipchart paper covered in the tiniest writing I could manage describing all the ways I could think of that a line segment could meet a set of polygons. I missed a few.

Something like Alloy would have helped so much.

These animated tests became the premier way of explaining what the microcell predictor did. Even to customers.

Notice that my work on microcells, with the intensive automate testing, and the original Mondex proof took place more–or–less contemporaneously with the discovery of Extreme Programming. I think I recall a lunchtime conversation during the microcell work to the effect that there was this mad project going on where they had automated tests for everything (as we did), but they wrote the tests first! I think I recall some comment along the lines that this was fine only so long as you knew the requirement in great and final detail, but in practice you never do. I'm now pretty confident that the converse is true: while we hardly ever do have great and finally detailed requirements, this is exactly when writing the tests first does help.

I'm glad that I dropped off the "models" path to correctness and onto the "test first" one. And I'm glad that I had the experience of doing the "models" approach. I find it interesting to look back over the fence sometimes, and see how those folks are getting along.

Alloy

And so to Alloy. I have Alloy 4.1.10 here on my MacBook Pro (2.4 GHz Core 2 Duo, 4GB ram). I'm going to try to develop a formal model of the points on a Goban. If you'd like to play along, there's an hg repo.

Points

Alloy models are essentially relational although the syntax is deliberately chosen to be as familiar as possible to users of "curly bracket" OO languages. I begin by writing a kind of test. This takes the form of a predicate called correct which says

pred correct {
 there_are_such_things_as_points
}

and I can ask Alloy to run this test

run correct

and Alloy tells me that The name "there_are_such_things_as_points" cannot be found. which is excellent news. I'm well on the way to using the familiar TDD cycle. Not compiling is failure and here is a failing test. I can make the test fail in a slightly more informative way by defining there_are_such_things_as_points like so

pred there_are_such_things_as_points{
  #Point > 0
}

which says that the size of the set named Point (which is the set of all tuples conforming to the signature Point—it's a relational model, remember) is strictly greater than zero. Of course I haven't defined that signature yet so Alloy tells me that The name "Point" cannot be found. I define Point like so

sig Point {}

and now Alloy reports that

Executing "Run correct"

Sig this/Point scope <= 3

Sig this/Point in [[Point$0], [Point$1], [Point$2]]

Solver=minisatprover(jni) Bitwidth=4 MaxSeq=4 SkolemDepth=2 Symmetry=20

18 vars. 3 primary vars. 23 clauses. 183ms.

Instance found. Predicate is consistent. 41ms.

There's a lot of information there. The important part for now is that Alloy could find an instance (that is, a bunch of tuples) that conforms to the model and of which the predicate is true. Therefore the model and the predicates I have defined are consistent (that is, contain no contradictions). I can also ask Alloy not to run my predicate but instead to check it. The news here is not so good.

Executing "Check check$1"

Sig this/Point scope <= 3

Sig this/Point in [[Point$0], [Point$1], [Point$2]]

Solver=minisatprover(jni) Bitwidth=4 MaxSeq=4 SkolemDepth=2 Symmetry=20

18 vars. 3 primary vars. 24 clauses. 11ms.

Counterexample found. Assertion is invalid. 18ms.

It turns out although my model is consistent it is not valid. It is possible to construct instances of the model for which the predicate is not true.

Notice the lines in the reports about sig this/Point. In working with my model Alloy has made some instances of Point form which it has then constructed instances of the model. By default it chooses to make up to 3 instances of a signature. Here is a graphical (in both senses) representation of the instance of the model which Alloy built

Do you see this instance named in the array of three instances which Alloy reported it had created? Clearly the predicate is satisfied. Alloy will also produce a graph of the counterexample which it found—which is empty. (Well, strictly it's a message telling me that "every atom is hidden" in a "this page intentionally left blank" sort of way).

There is nothing in the model which says that there are any points, only that there possibly is such a thing as a Point. The problem domain can help us here, as it turns out that some of the points on the board have names.


pred tengen[p : Point]{}

fact tengen_exists {
  one p : Point |
      tengen[p]
}

Here I state a fact, which is very much like a predicate, except that it is information for Alloy to use not a question for it to ask of the model.

Read the fact tengen_exists like this: "it's true of exactly one instance, named p, of the signature Point that the predicate tengen is true of p". The predicate itself is parameterised on an instance of Point but does not depend upon that instance. Which seems as if it should smell.

Running the predicate as before finds that same instance of the model with one point in it. I can pop up an evaluator on that instance and ask for the value of tengen[Point$0] which is (of course) true. If I ask Alloy to check the model it now reports No counterexample found. Assertion may be valid. 69ms. Note the "may be" there. Alloy can't be absolutely sure because it only instantiates a small number of tuples for each signature. This is a manifestation of the Small Instance Hypothesis (sometimes "small model" or "small scope") which claims that if your model is bogus then this will show up very quickly after looking at a small number of small examples—exhaustive enumeration of cases is not required.

So now I have a model, however feeble, which is consistent and cannot be shown (using up to three Points) to be invalid. I'll check in.

Refinement

I'm not very happy with this model. I said that some points are named, such as tengen, but that's not really very well expressed. There's that smelly predicate which doesn't depend upon its parameter. If some instances of Point have names, then we can say that. After going around the fail-pass loop (trust me, I am doing that but I'm not going to write it out every time) the model looks like this

enum Name { Tengen }

sig Point {
  name : lone Name
}

pred tengen[p : Point]{
  p.name = Tengen
}

fact tengen_exists {
  one p : Point |
  tengen[p]
}

Several new Alloy features are used here. Since Alloy 4 doesn't support string literals I use an atom (an instance of a signature with no further structure). The enum clause creates quite a complex structure behind the scenes but gets me the atom Tengen. The signature of Point is extended to have a field named name which will, in a navigation expression such as p.name resolve to an instance of signature Name, or to none, as shown by the cardinality marker lone. These navigation expressions look like dereferencing as found in OO languages, but are actually joins.

This looks a lot healthier to me, and the model is still both consistent and not demonstrably invalid. Here's the new instance of the model. I'll check in.

Directions

There are many points on a goban, however. And these points stand in a certain relationship. Specifically, every point on the board has some neighbours. Tengen has four neighbouring points, one in each of the four directions I will call N, E, S and W. I start with N and obtain this invalid model


enum Name { Tengen }

sig Point {
  name : lone Name,
  neighbour : Direction -> lone Point
}

pred tengen[p : Point]{
  p.name = Tengen

}

fact tengen_exists {
  one p : Point | tengen[p]
}

enum Direction {N}

which admits a counterexample which does not satisfy this predicate

pred tengen_has_a_neighbour_in_each_direction{
  let tengen = {p : Point | p.name = Tengen} {
      not tengen.neighbour[N] = none
  }
}

The counterexample looks like this

New Alloy features are the mapping Direction -> lone Point which is pretty much the same as a typical "dictionary" and the let form and its binding of the name tengen to the value of a comprehension. The comprehension should be read as "the set of things p, which are instances of Point of which it is true that the value of p.name is equal to Tengen" Some sugar in Alloy means that we don't need to distinguish between a value and the set of size 1 who's sole member is that value.

Running the model produces the somewhat surprising result that it is consistent. Looking at the example shows that this is a red herring. This is an interesting state of the world, so I check in with a suitable caveat in the message.

A YAGNI Moment

The invalid aspect of the model comes from the cardinality on Point.direction. There are points on a Goban which do not have four neighbours, one in each direction. But I haven't mentioned any of them yet. There's a good chance that eventually points will need to have optional neighbours, but right now YAGNI.

As the (so far, incomplete) predicate's name suggests, tengen really does have four neighbours. The cardinality should be one. Making that change produces a model which no cannot be shown to be invalid. However, the example instance is still bogus. I know from TDD practice what to do: write a test that will fail until the problem is fixed. Here it is

pred points_are_not_their_own_neighbour {
  all p : Point |
      not p in univ.(p.neighbour)
}

The construction univ.r for any relation r evaluates to the range of the relation.

As I hoped, this test fails. Although running the predicates can produce an instance in which the northerly neighbour of tengen is not tengen, checking can also still produce an invalidating counterexample in which it is. I must add a predicate to apply to all Points forcing them not to be their own neighbour


sig Point {
  name : lone Name,
  neighbour : Direction -> one Point
}{
  not this in ran[neighbour]
}

Here I use the function ran imported from the module util/relation to state the constraint on the range of neighbour. The conjunction of the predicates listed in curleys immediately after a sig are taken as facts true of all instances of that signature. The model is now consistent and not demonstrably invalid, but a glance at the example reveals that all is not well

This is interesting, so I check it in.

Complementary Directions

Once again, I need to strengthen the tests. If a point is the northern neighbour of tengen, then tengen is the southern neighbour of the that point. Directions on the board come in complementary pairs.

pred directions_are_complementary{
  N.complement = S
  S.complement = N
}

and now I have to de-sugar Direction in order to insert the complement relation. And now we see how enums work

abstract sig Direction{
  complement : one Direction
}{
  symmetric[@complement]
}

one sig N extends Direction{}{complement = S}
one sig S extends Direction{}

In the fact appended to Direction I say that the relation complement (the @ means that I'm refering to the relation itself and not its value) is symmetric using the predicate util/relation/symmetric. Thus I do not have to specify that S's complement is N having once said the converse. The same pattern applies to E and W.

The instance is now a spectacular mess.I will check in anyway.

Distinct Neighbours

What I might like to say is

pred neighbours_are_distinct{
  all p : Point |
      all disj d, d' : Direction |
          p.neighbour[d] != p.neighbour[d']
}

The nested quantification uses the disj modifier and should be read "for all distinct pairs of Direction, named d and d'..."

This immediately renders the model (seemingly) inconsistent. More specifically the problem is that Alloy can no longer find an instance. I don't think that this is because the model contains contradictions so much as that it now requires more than three points in order to satisfy it. I can increase the number of instances available when the predicates are run like this run correct for 5 Point The resulting example is a rats nest of dodgy looking relations (it's in the repo as instance.dot if you want a look)

A less extravagant predicate is

pred neighbours_of_tengen_are_distinct{
  let tengen = {p : Point | p.name = Tengen} |
      all disj d, d' : Direction |
          tengen.neighbour[d] != tengen.neighbour[d']
}

and with this I see a much less tangled, but still wrong, instance (instance1.dot). And the model is also demonstrably invalid. I'm going to make a significant change to the model. One I've been itching to do for some time. I check in before this.

A Missing Abstraction?

I feel as if I'm missing a degree of freedom, which is making it hard to say what I want. I'm going to promote tengen to be a signature, or rather to create a signature of which tengen will be the only instance at the moment: InteriorPoint. I remove the predicates about distinct neighbours and introduce InteriorPoint

sig InteriorPoint extends Point{}

and can then quite happily say

pred interior_points_have_a_neighbour_in_each_direction{
all p : InteriorPoint {
  not p.neighbour[N] = none
  not p.neighbour[E] = none
  not p.neighbour[S] = none
  not p.neighbour[W] = none
  }
}

and this gets me back to a consistent, not demonstrably invalid (although still wrong) model. I check in.

Now I can say

sig InteriorPoint extends Point{}{
  #ran[neighbour] = #Direction
}

and leave other kinds of point to look after themselves. If the range of the neighbour relation (which is a set) is the same size as the set of Directions, then there must be one neighbour per direction.

This leaves me with the model in this state

sig Point {
  neighbour : Direction -> lone Point
}{
  not this in ran[neighbour]
  all d : dom[neighbour] |
      this = neighbour[d].@neighbour[d.complement]
}

sig InteriorPoint extends Point{}{
  #ran[neighbour] = #Direction
}

fact tengen_exists {
  #InteriorPoint = 1
}

abstract sig Direction{
  complement : one Direction
}{
  symmetric[@complement]
}

one sig N extends Direction{}{complement = S}
one sig S extends Direction{}

one sig E extends Direction{}{complement = W}
one sig W extends Direction{}

a little bit of tidying up and I check in. The model is consistent and cannot be shown to be invalid. The example instance looks respectable too—so long as we focus on the interior point and don't worry about how its neighbours relate to one-another, which is clearly wrong. But there are no tests for that. This diagram has been cleaned up in omnigraffle to focus on the interior point but the .dot of the original is checked in. Maybe another time I'll sort out the regular points.

Thoughts

Wow, that was hard work. Took a long time, too (although not so long as the timestamps make it look, I was also doing laundry and so forth during the elapsed). Does that make what is after all merely a rectangular array of points a "complex domain"? No. I'm out of practice with this kind of thing, and not fluent with the tool. Even so, I'm impressed by how good a fit the TDD cycle seems to be for this formal modelling tool. I even got into a bit of trouble towards the end but was rescued by recalling the TDD technique of making the tests dumb and repetitive—but concrete and clear. And how the same subtle trap of thinking too far ahead applies here too.

Would this have helped with the microcell predictor? Maybe not. Alloy doesn't do numbers at all well, and that was an intrinsically numerical problem. Could this approach help with other things? I think so. This tiny model(ling) problem has turned out to be harder and more time-consuming than I expected it to be, but it is my first go with the tool. I'm going to play around with it some more, as time allows, and see what comes up.

I'm certainly impressed with the tool. Alloy comes very close to making powerful models an everyday tool for the working programmer, but I don't think its quite there yet. The gulf, as it always was, is between that very succinct model and working code. How to bridge that gulf in a useful way I don't yet know.

Observations on Spotify

2010-01-03T19:03:00.002+00:00

I use Spotify, but not quite enough to want to pay for the ad–free premium service just yet. I'm getting pretty good at not paying any attention to the ads (which seems as if it could be an increasingly useful life skill in the apparently inevitable entirely ad–funded on–line world to come), but a couple just now caught my...ear.

Spotify make a big noise about their targeted ads: better value for advertisers, less annoying for listeners. And yet here I am half way through the third act of Tristan and it's choosing to tell me about a campaign called dance4life which encourages clubbers under 25 to "start dancing and stop aids". A worthy goal, but unless Spotify knows something about Wagnerians that I don't this seems a little odd.

I also notice that while playing music in its peer–to–peer mode Spotify is gratifyingly parsimonious of bandwidth, but when the ads are streaming in my network suddenly jump up to 125KB/s and stay there for the duration of the ads. I wonder why that should be.

Bayesian Testing?

2009-12-28T14:09:00.040+00:00

Introduction

I'm tossing this idea out into the world. It's half-formed and I'm learning as I go along. It may be invalid, it may be old news, it may not. What I'm hoping for is that someone who knows more about at least one of testing and Bayesian inference than I do will come and set me straight.

UPDATE: Laurent Bossavit turned out to be that person. The results below have be adjusted significantly a a result of a very illuminating conversation with him. Whatever virtue these results now have is due to him (and the defects remain my responsibility). Laurent, many thanks.

In addition, a bunch of folks kindly came along to an open space session at Xp Day London this year. Here is the commentary of one. From that already the idea became better formed, and this article reflects that improvement, thanks all. If you want to skip the motivation and cut to the chase, go here.

Evidence

You may have read that absence of evidence is not evidence of absence. Of course, this is exactly wrong. I've just looked, and there is no evidence to be found that the room in which I am sitting (nor the room in which you are, I'll bet: look around you right now) contains an elephant. I consider this strong evidence that there is no elephant in the room. Not proof, and in some ways not the best reason for inferring that there is no elephant, but certainly evidence that there is none. This seems to be different from the form of bad logic that Sagan is actually criticising, in which the absence of evidence that there isn't an elephant in the room would be considered crackpot-style evidence that there was an elephant in the room.

You may also have read (on page 7 of that pdf) that program testing can be used to show the presence of bugs, but never to show their absence! I wonder. In the general case this certainly seems to be so, but I'm going to claim that working programmers don't often address the general case.

Dijkstra's argument is that, even in the simple example of a multiplication instruction, we do not have the resources available to exhaustively test the implementation but we still demand that it should correctly multiply any two numbers within the range of the representation. Dijkstra says that we can't afford to take even a representative sample (whatever that might look like) of all the possible multiplications that our multiplier might be asked to do. And that seems plausible, too. Consider how many distinct values a numerical variable in your favourite language can take, and then square it. That's how many cases you expect the multiplication operation in your language to deal with, and deal with correctly. As an aside: do you expect it to work correctly? If so, why do you?

In this post I want to explore an approach that seems as if it might help us to decide how much confidence in the correctness of some code we should have, given the test results we can obtain about it.

A Small Example of Confidence

Let's say that we wish to write some code to recognise if a stone played in a game of Go is in atari or not (this is my favourite example, for the moment). The problem is simple to state: a stone with two or more "liberties" is not in atari, a stone with one liberty is in atari. A stone can have 1 or more liberties. In a real game situation it can be some work to calculate how many liberties a stone has, but the condition for atari is that simple.

A single stone can have only 1, 2, 3 or 4 liberties and those are the cases I will address here. I write some code to implement this function and I'll say that I'm fairly confident I've got it right (after all, it's only an if), but not greatly so. Laurent proposed a different question to ask from the one I was asking before—a better question, and he helped me find and understand a better answer.

The prior probability of correctness that question leads to is 1 ⁄ 16. This is because there are 16 possible one-to-one onto mappings from {1, 2, 3, 4} to {T, F} and only one of them is the correct function. Thus, the prior is the prior probability that my function behaves identically to some other function that is correct by definition.

How might a test result influence that probability of correctness? There is a spreadsheet which shows a scheme for doing that using what very little I understand of Bayesian inference, slightly less naïvely applied than before.

Cells in the spreadsheet are colour–coded to give a guide as to how the various values are used in the Bayesian formula. The key, as discussed in the XpDay session is how to count cases to find the conditional probabilities of seeing the evidence.

The test would look something like this:

One Liberty Means Atari
liberties	atari?
1	true

The posterior probability of correctness is 0.125

Adding More Test Cases

Suppose that I add another case that shows that when there are 2 liberties the code correctly determines that the stone is not in atari.

One Liberty Means Atari
liberties	atari?
1	true
2	false

Using the same counting scheme as in the first case and using the updated probability from the first case as the prior in the second then it seems as if the updated probability of correctness with the new evidence is increased to 0.25 as this sheet shows.

But suppose that the second test actually showed an incorrect result: 2 liberties and atari true.

One Liberty Means Atari
liberties	atari?
1	true
2	true

Then, as we might expect, the updated probability of correctness falls to 0.0 as shown here. And as the formula works by multiplication of the prior probability by a factor based on the evidence, the updated probability will stay at zero no matter what further evidence is presented—which seems like the right behaviour to me.

This problem is very small, so in fact we can exhaustively test the solution. What happens to the probability of correctness then? Extending test coverage to these cases

One Liberty Means Atari
liberties	atari?
1	true
2	false
3	false

gives an updated probability of 0.5 as shown here.

One more case remains to be added:

One Liberty Means Atari
liberties	atari?
1	true
2	false
3	false
4	false

and the posterior probability of correctness is updated to 1.0 as shown here.

That result seems at to contradict Dijkstra: exhaustive testing, in a case where we can do that, does show the absence of bugs. He probably knew that.

Next?

My brain is fizzing with all sorts of questions to ask about this approach: I talked here about retrofitted tests, can it help with TDD? Can this approach guide us in choosing good tests to write next? How can the structure of the domain and co-domain of the functions we test guide us to high confidence quickly? Or can't they? Can the current level of confidence be a guide to how much further investment we should make in testing?

Some interesting suggestions are coming in in the comments, many thanks for those.

My next plan I think will be to repeat this exercise for a slightly more complex function.

New article for BCW

2009-11-25T11:26:00.002+00:00

If you read this blog then there's likely little new for you in this article for Business Computing World, but it might amuse.

Innovation Games

2009-11-12T15:16:00.003+00:00

Next Tuesday there will be a special XtC event at Zuhlke's office in London. Luke Hohmann will be demonstrating his innovation games for Agile teams. Should be good.

Details here.

Places remain at XP Day London 2009

2009-11-09T10:17:00.003+00:00

XP Day London is filling up, but places remain. The programme is looking very good. Register here.

Sketches

2009-11-03T17:51:00.004+00:00

One of the things I like to do in my free time is to dabble, in the most unschooled fashion imaginable, in music composition. Composing is hard. About as hard (and remarkably similar to) programming. Arnold Schoenberg offers this "advice for self-criticism" to students of composition:

6. MAKE MANY SKETCHES
Join the best sketches to produce others and improve them until the result is satisfactory.
To make sketches is a humble and unpretentious approach toward perfection.
—Fundamentals of Musical Composition, Ch XII

I think that this applies equally well to programming.

XP Day London 09: Programme

2009-10-19T22:14:00.002+01:00

After a lot of wrangling the almost-but-not-quite final programme for XP Day London is now available. Because of illness and other asynchronous distractions some of the presenters had to change at the last minute we still have to nail down one session, but this will be pretty much it.

This year we have a lot of excellent experience reports from a range of practitioners who've been doing exiting new things and some really outstanding keynotes.

Scheduling by value?

2009-10-18T13:51:00.010+01:00

David Peterson has started a new blog on Kanban (and snaffled a very tasty URL for it). He presents this discussion of scheduling features into a development team. The case the David presents is related to a behaviour I sometimes see with inexperienced teams who's just had someone go learn Scrum. Comes the next planning meeting and this idea pops up that the backlog needs to be ordered by "business value" so that the "most valuable" features can be delivered earliest.

This can easily lead to some very nasty scenes where the Scrum Master demands that the Product Owner produce a "value" for each story—actually write a number on the card. The problem comes to a head when it turns out that the Product Owner not only doesn't know the value of the stories they are putting on the backlog, but they also have no way of finding out what they value of a story is. And this isn't because they are stupid, nor incompetent, nor malicious. It's because finding that value is far, far too difficult and time consuming an activity. And there's a good chance that any answer that came out of it would be so well hedged as to be meaningless.

Sometimes the Product Owner does know, or can find out at reasonable cost, a value for a story or feature. Being able to trade a new asset class probably can be valued. Changing a flow to give 10% high conversion probably can be valued. Improving a model to get 1% higher efficiency in the machines it's used to design can probably be valued. These valuations will be functions of time time and various other parameters. If you really have to, you could get a number of them that's valid today (and perhaps only today). David makes the point that even if you do know that number for a feature, scheduling the next one simply on the basis of highest value might not be the smartest move. There are other variables to consider.

There is a case to be made that within the context of a project value isn't the best figure of merit to use anyway, since someone should have made a go/no-go decision at some point that the planned budget and planned value seemed reasonable. That decision should be re-assessed frequently (far too little of this goes on) based on progress to date, and action taken if the actuals have come too far adrift, but in-between those times trying to optimise on value is perhaps not worth it.

Another option is to indeed demand (and obtain) those value numbers and then schedule work primarily on the basis of business value and dispense with effort estimates, so-called "naked planning". This has caused eyebrows to be raised. The underlying claim is that

value varies along an exponential scale while development costs vary along a linear scale. Therefore delivering the most valuable features trumps any consideration of whether or not the most valuable feature is cheap or easy to develop

whihc, if true of your environment, might give pause for though. How this interacts with the desire to schedule so as to maximise throughput at the bottleneck is an open question, for me at least.

Service-Oriented Architecture

2009-09-09T19:51:00.005+01:00

I'm currently embroiled in the long and fraught process of having telephony and data services installed in a certain location. One supplier steadfastly and consistently refused to respond to my offers to become a paying customer, so I selected another who were very responsive at first, but have become less and less so over time. In fact, it's about two months since I signed and still no service has been provided (although bills have been sent).

Part of my frustration with this is that it's very hard to find out what's going on. The company, a British telecoms provider and let's leave it at that, was once a monolithic monopoly but now has been dissected into multiple different business units, components, we might almost call them, each—I suppose—focussing on its so–called Core Competence (and more on that in a later post). Each of these components has its various workflows that it does and one or more contracts with other components for services it supplies or consumes and the components communicate by passing electronic messages to one another. Sometimes they pass electronic messages to me, complete with the URL of some other component where I have to go and do some action. It's all very slick and automated and orchestrated and, indeed, seems to have a mind of its own.

For instance, the putting-in-wires component received a message telling it to come to my location and do just that. Unfortunately, the agent of the no-I-mean-really-putting-in-wires component to which they delegated implementation of that action was not able to complete it. He sent a message saying so and various exception flows kicked off, requiring a lot of manual intervention, oh yes.

Meanwhile, the arrange-for-telephony component turned out to have a clock running and when a certain (unpublicised) duration had elapsed without it receiving a notification of success from the putting-in-wires component (which was busy with some recovery actions on the no-I-mean-really-putting-in-wires component) it triggered a flow that cancelled my original request to have some services. A notification was received by one (but not all) of the taking-money-off-you components and one of them sent me a message telling me that my order for some services had been cancelled. A good thing, because otherwise I would have been blissfully unaware of the situation. On the other hand I am now angrily aware of the situation.

Now, here's the fun bit: irrespective which component sends me a message, no agent working for that component can explain to me what the message means, because whatever it means that meaning belongs to whatever other component sent the earlier message that lead the the message I received being sent. And no, they can't put me through to an agent in that component. There is no interoperability layer.

Today I spoke with five agents in three different components. One of them gave me quite the run–around because although I had contacted him through the callback given in the message I'd received from his component I had mis–configured part of my message header leading to my message being dispatched to the wrong agent because I had misunderstood the published specification for that header which he freely admitted was itself a shoddy piece of work with unreasonable and misleading contents but it was still my problem that I'd botched the message send.

Also, I've learned that to get to speak to an agent at all I have to go twice around the loop of failing a handshake because I can't provide a piece of data that the protocol requires but that I won't get until the request succeeds. After two failures in a row a supervisory process notices and I'm failed over to a more generic service through which I can contact an agent, but that service is not exposed on a public URL.

To all of which I say: bring back the mainframe.

Observations on Estimation

2009-09-08T10:09:00.008+01:00

Teams following a process like Scrum tend to estimate the "size" of stories as an aid to figuring out a commitment for a sprint. My view is that this is a transitional practice, and that the aim should be to learn how to make stories all roughly the same size so that commitments (also a transitional practice) can be determined by counting.

While all of that is going on teams that want to use a numerical scale to estimate (rather than, say, "t-shirt" sizing) tend to choose a scale, a sequence of licit values from which estimates must be drawn. The various planning tools that demand a numerical field be filled in tend to force this issue.

I've noticed a tendency for "expert" level practitioners to want to use some clever non-linear scale, maybe Fibonacci numbers (1,2,3,5,8,13), maybe a geometric series (1,2,4,8,16) and they will have some sophisticated reason why this or that series is preferred. And I've noticed that a lot of teams aren't comfortable with this. They want to use a linear scale.

It seems to be traumatic enough that the estimates don't have units, or even dimensions. The idea that estimates are dimensionless but also structured can be a double cause of confusion.

Anecdote: a team had been estimating and planning and delivering consistently for a good long time. Their velocity was fairly constant, but drifted over time (fair enough). One day it turned out that their velocity happened to be numerically equal to the number of team members times the number of days to the next planning horizon. Someone noticed this and with a huge sigh of relief the team concluded that these mysterious "units" in which they estimated were actually man-days in disguise. Now they finally understood what they were estimating! And they promptly lost the ability to estimate: their next planning session was all over the place and it took some time for their planning activities to converge again. My inference was that it's actually quite important that estimates are dimensionless.

Anecdote: a User Experience expert at a client had been involved in some research whereby (as a side effect) members of the general public had to create a scale that made sense to them within which to rank the usability of features. These folks were presented with different generic objects and asked to give them a "size", and then to give a corresponding "size" to some other generic objects in order to create a scale that made sense to them, which would then be applied to the merit of the system features that were the actual target of the research. They created linear scales.

[After seeing this he added the observation that this process was in aid of avoiding what often happens with the strongly disagree, disagree, no preference... type of scale which is either polarised or bland results, neither of which is that useful]

That surprised me at first, since I know that the physics of our sensory apparatus are generally non-linear, and memory is non-linear and so forth. But thinking about it some more I realised that our experience tends to seem to be linear, even if the underlying phenomena aren't.

Meanwhile, if one did want to use a particular scale for estimating the size of stories, why not use one of the series of prefered values? They are very well established in engineering and product design and offer interesting error-minimising properties. On the other hand, it might be a real struggle to get a team to decide if a story was a 1.6 or a 3.15

I don't have a grand narrative into wich to fit these observations, but here is another related anecdote about estimation.