No More Than Two

London’s “convenience” stores are replacing staffed checkouts with self-service robots. Some of these offer lessons in user interface design painful to behold.

This evening I watched someone attempt to use one to buy paracetamol. The particular self-service robot checkout that I saw presented the user with a message very much like “paracetamol sale: you may only buy two packs” above a yes/no button pair. What is the casual paracetamol buyer to make of this? I watched with interest as my co-emptor pondered this. And then she did the only reasonable thing: she went and got a second pack. After all, it does say you may only buy two packs.

Quite what anyone is supposed to make of the yes/no buttons following a statement not a question I don't know.

The really sad thing is that, apart from the shame of confusing your customer, the behaviour I saw here—which I believe is the only reasonably response to the message from the robot—is the exact opposite of what is intended.

fixed-length iterations: a transitional practice

I find it had to think of a development practice that isn't almost certainly a transitional practice. Configuration management, maybe.

Anyway, Scrum, XP, and all the rest I've come to understand as each a record of a reaction against some bad condition, with transitional practices to get a team away form that condition to a better one. Fo example, as Michael Feathers says in this post, regular, fixed-length Iterations require (and enable) a certain kind of discipline, force a certain set of tradeoffs. And maybe doing that can help a team a lot. And maybe not, as the case may be.

That doesn't mean that, once the use of that practice has taken the team away from the bad condition then no further change to practices can be beneficial. In the specific case of regular fixed-length iterations I mainly see the application being to teams moving from a condition where no-one ever has any idea at all when anything is going to be delivered to a condition where everyone always knows a date when something is going to be delivered. In many settings that would be considered a major improvement by those paying the team. And consistently working that way is a great way for a team to gain the trust of the business.

Once that trust is established, and once the conditions are in place for frequent delivery, what need for the fixed-length iterations? I suspect that what worries a lot of people who've seen the transition from chaos to iterations is that they can't imagine a world without iterations which is not also a (return to) chaos. Michael suggests an experiment:
Suppose that you had an iteration of one week, followed by an iteration of 2 days, followed by an iteration of 1 day, followed by an iteration of one-half a day, and so on. If you still had your sanity at the end of this process, would you have learned anything? I haven’t tried it with a team yet, but here’s the thing that I hope would come across: if you apply enough ingenuity and you’ve acquired enough skill, you can deliver business value in shorter times than you can currently imagine.
That would be cool. Of course, the Kanbanista's seem to suggest going straight to that world in one step. And maybe that can work in a certain setting, and maybe not. The idea scares me, whe I look at most of the teams I help.

The aspect of this sort of thing that really interests me, though, is this: if what the Certified Scrum Masters say about software development living on the “edge of chaos” is right and if what the Cynefin people say about that being exactly the place where “emergent practice” lives then by their own argument, we would expect Scrum to consist of mostly transitional practices.

Fixed-length iterations (excuse me, “sprints”) seem like a good candidate to be one.


Government IT projects: who can politicians listen to?

As you may have heard, the UK has at the time of writing a rather confusing new government. We, and I suspect, they are still trying to understand what this means.

One thing that it might mean is an opportunity for government departments to change the way they deal with their IT suppliers. Recently (ie during the Labour administration we had since 1997 until 2010), it hasn't gone well.
And so on. It has been estimated that the ten worst IT project failures under Labour cost the country around £26 billion. That's half the annual budget for schools.

So, a new government setting out to tame a spectacular deficit might want to bring these projects under control. Unfortunately, they get their advice from places such as Fujitsu, the very firms who do so well form these failed projects. Says Fujitsu's marketing director Simon Carter of discussions held with the Conservatives when they were still the shadow cabinet:
[the Conservatives] began to take on some of our suggestions, as they came to better understand government IT. For example, their proposal to cut IT contracts into smaller and shorter chunks was dropped as they realised they would have to act as system integrator to each of these smaller projects.
What is the taxpayer to do in the face of this sort of thing? Particularly the well–informed taxpayer who knows full well that in no way whatsoever is this argument from Mr Carter valid.

At this year's Spa conference there was a Birds of a Feather session about this issue. Some of the signatories of this petition—a petition only 62 signatures away, as I write, from the 500 needed for it to have anyone inside 10 Downing Street pay any attention to it. If you are a UK resident and would rather that the new government were not wasting money on entirely avoidable IT project failures I urge you to sign the petition and to urge others to.

T-shaped designers

BBC 2 is running a series of documentaries called The Genius of Design. Programme 2 is "Design for Living" and discusses, amongst other things, the Bauhaus and its influence. It's getting on for a century since the hay-day of the Bauhaus and it's always worth being reminded of the influence it had—got any tubular steel furniture in your house or office? Bauhaus. Any lighting fixtures with push–on/push–off switches? Bauhaus. Got a fitted kitchen? Bauhaus.

The segment on the fitted kitchen was interesting. A fitted kitchen seems like a natural and obvious thing now, but the idea had to be invented. The discussion of the Frankfurt Kitchen in the programme was the start of an interesting thread. Users of the kitchen tended to be a bit ill-disciplined. Certainly the tended to disregard the labels permanently attached to the custom–made drawers and put any old thing in them. Users found that the kitchen was built to support well only certain workflows, workflows that they didn't like, didn't understand and couldn't change. Workflows devised by an architect who couldn't cook.

Meanwhile, another Bauhaus architect, Le Corbusier, is being given free reign to redesign entire cities, up to the point of making models, anyway. Filling them with great towers full of his “machines for living in”. And we know how that worked out once people started taking it seriously.

If you are a regular reader of this blog you probably know where I'm going next.

Commentators on software development often seem to exhibit a lot of discipline envy. Two common themes are that 1) our projects should exhibit the reliability of those in the “established” branches of engineering, and 2) our projects should exhibit the conceptual integrity attained by building architects.

That conceptual integrity can be a dangerous thing. Lihotzky's kitchens had a lot of conceptual integrity (and a lot of research to back that up), Corb's vision of mass housing (and its implementation by later architects) had a really astonishing amount of conceptual integrity. Neither leads to much in the way of joy for users (*). The Bauhaus architects designed a lot of chairs, none are comfortable to sit in.

One of the designers interviewed in the programme explained the problem along these lines: architects tend to be ‘I’ shaped, by which he means they have a very deep knowledge and skill in their craft, but not a lot else going on. Designers tend to be ‘T’ shaped, deep in craft but also with a breadth that touches many other disciplines. And from that breadth comes the ability to design objects that people can comfortably incorporate into their lives.

I think that the application of this thought to the software world is clear.

(*) The very few dense housing projects that Le Corbusier himself built have proven to be resilient and popular. It's the shoddy works inspired by his ideas and executed without his art that are the problem.

Tests vs checks

Trying to spread the good word on "testing" vs "checking" in this article for T. E. S. T. Magazine.

Software Engineering?

I wish that the people in the software industry who bang on about a need for "software engineering" showed more evidence of ever having met an engineer. The latest run at it, SEMAT, seems to be making the same old mistakes all over again.

It's not all bad. I'm pleased to see Ivar Jacobson repeat his call to move beyond "process" to "practices". On the other hand, I'm dismayed to see the SEMAT programme described at it's highest level by these streams: definitions, theory, universals, kernel language, assessments.

Really? Definitions, theory, universals? Are these really the things that the software industry is lacking? The problem here, I think, is that just as at the original "Software Engineering" conferences in the 60's the SEMAT folks have confused the retrospective coherence with which engineering (that is, the mechanical, chemical, electrical, electronic and other flavours) is described with how engineering is actually done.

Similarly, the presence of a spring bow compass in the SEMAT logo worries me. The 60's effort at SE also confused the contingent artefacts produced by engineers (which at the time were actual drawings produced with things like spring bow compasses) with the essence of what engineers do. With this logo SEMAT are associating themselves not just with tools and outputs rather than principles and practices, but with mightily outdated tools. They might as well have put a slide rule in their logo.

It really seems as if an historic mistake is about to be repeated. I need to study SEMAT more, but for now Alistair Cockburn's commentary resonates strongly with me.

He urges that SEMAT does these to things:
  1. Look at what engineers ‘do’, not what they build.
  2. Catch up with the state of the art in what is conventionally called engineering.
I can hope.


Innovation

Last year Luke Hohmann demonstrated some of his innovation games at an XtC event hosted at the Zuhlke Ltd office in London.

One theme was that innovation is different from invention. That's a topic close to my heart. Zuhlke has a division named "Product Innovation" and indeed they innovate like crazy (for example, repurposing the optical sensor for a mouse to better control sewing machines [pdf]) but they rarely invent anything (although they do from time to time, for example a newly patented waterless urinal trap [pdf]). Knowing a bit about those folks and what they do has helped make me very sensitive to abuses of the term "engineering" as (mis-) applied to software development. But that's a story for another time.

The Shock of the Old

I've just finished reading The Shock of the Old. It goes on a bit, it's a bit repetitive, it's highly polemical and a bit repetitive. Pretty good, though. The key observation is that the history of technology as usually presented (for example, in institutions like the Science Museum) is largely bunk. This history for the most part ignores use and so also ignores folk technology and what Edgerton calls creole technology. He presents numerous case studies to show that what has been made to look like invention is really innovation and diffusion.

Quick quiz: when was the first document transmitted by fax? Answer: depending on quite what you think a fax is, sometime between 1843 and 1865.

But the average householder couldn't go and buy a fax machine until about a century later. When faxes became available to the general public that wasn't invention, it wasn't even innovation, it was diffusion.

The Revolutionary Period of Big Innovation

Bruce Eckel has come to realize that "software development has stalled". He says,
in recent years it has started to look like we're moving out of the revolutionary period of big innovation, and into a phase of relative stability.
I don't believe this for a minute.

I think the revolutionary period of big innovation in the tools of programming ended about the time that Sun dropped Self. I'd say that the gold standard development environment for mainstream languages right now is Eclipse. As Dave Ungar explains towards the end of this video about the history and influence of Self, Eclipse represents the continuation of the tool-based approach to building a programming environment developed in Smalltalk. I'd say that Eclipse, even the best-of-breed Java environment built with it, still isn't as good as the best Smalltalk environments for ease of use, productivity and fun.

Sun pulled the plug on Self about fifteen years ago. Ironically, they had to buy back the technology for making dynamic dispatch in a dynamically typed language on a VM go fast from Self project staff who had left Sun.

What has happened since then is a steady diffusion of features from good object-oriented development environments of the 80's and 90's into the mainstream. Sadly, few features of the very best object-oriented environment of that time (Self, of course) have made it through.

Sources of Diffusion

Where else are ideas diffusing from? From the mother of all demos. Unfortunately, that seam is about worked out. Alan Kay is supposed to have said "I don't know what Silicon Valley will do when it runs out of Doug's ideas." We may be about to find out. Further diffusion (disguised as innovation) in the field of living with computers may well require actual invention.

Fortunately, that seam is about worked out. We, the public (at least in high-income countries with stable governments) do now finally live in the world that Englebart invented in 1968.

And what of software development tools?

Bruce says:
no matter how good and powerful our software tools get, we are only getting a fraction of the leverage out of them that we could get.

Programming tools are no longer where the greatest potential lies.

We will get the biggest leverage, not just in programming but in all our endeavors, by discovering better ways to work together. [emphasis in original]

and I think he's right. I wonder what it is about the world that Bruce works in that has hidden this from him for so long. After all, in 2001 one group said:
We are uncovering better ways of developing
software by doing it and helping others do it.
Through this work we have come to value:

Individuals and interactions over processes and tools [etc]
Behind the manifesto is a use-based story about the history of technology. Notice that it's uncovering by doing, and not inventing. I'd like to think that Edgerton would approve. It is also a story of diffusion. In Agile and Iterative Development Larman quotes Weinberg:
We were doing incremented development as early as 1957, in Los Angeles, under the direction of Bernie Dimsdale. He was a colleague of John von Neumann, so perhaps he learned it there, or assumed it was totally natural. [...] the technique used was, as far as I can tell, indistinguishable from XP. [...] much of the same team was reassembled [...] in 1958 to develop Project Mercury, we had our own machine [...] whose symbolic modification and assembly allowed us to build the system incrementally, which we did, with great success.
What has happened is that techniques that were once restricted to research projects at the cutting edge of a technological nation's story of existential survival are now available to the mainstream.

What, I would like to know is, next?

UK Goverment IT Failures

The Independent recently reported on the very, very sorry state of Government IT projects in the UK. It's a mess:
[...]the total cost of Labour's 10 most notorious IT failures is equivalent to more than half of the budget for Britain's schools last year. Parliament's spending watchdog has described the projects as "fundamentally flawed" and blamed ministers for "stupendous incompetence" in managing them.
I thought that the article missed a trick and wrote to tell them so. My letter didn't make it to the paper paper, but has appeared on their blog. It's somewhat hidden, though, so I reproduce it here.

It would be easy to conclude that large government IT projects fail primarily for technical reasons. Technical mistakes certainly are made, but the root cause of many failed projects seems to be the procurement process with which they begin. The question which is asked of the suppliers is not one that allows for the kind of answer that leads to success. It is well known in the private sector that very large monolithic projects have little chance of success, but this is the only kind for which the Civil Service seems to know how to ask.

I have no doubt that ministers are, as Michael Savage puts it, "too easily wooed by suppliers". The suppliers which we see winning government contracts again and again might also find it too easy to woo a minister. Particularly a minister and a department which would rather launch a high-profile, high-budget, high-risk project than adopt the smaller scale, incremental approach that creates few headlines when launched, but also has a much better chance of creating few headlines when it does not fail.

It's the Screaming

Tim Bray observes:
The community of developers whose work you see on the Web, who probably don’t know what ADO or UML or JPA even stand for, deploy better systems at less cost in less time at lower risk than we see in the Enterprise. [emphasis in original]
That's a pretty bold claim.

Oh, certainly, the Enterprise makes a botch of IT projects on a titanic scale, and Tim gives a partial list of very high profile failures. I think this data is tainted. Firstly, these project failures (such as the UK NHS NPfIT programme) are, well, very high profile—but they aren't representative. Plenty of corporate IT projects do very OK. Secondly, the specific examples Tim gives are actually government projects. Government IT is the reductio ad absurdum of Enterprise IT. These might not be the most informative examples to look at.

On the other side of the matter, Tim cites the successes of Facebook, Google and Twitter. These are also not representative. Quickly, name three failed web 2.0 startups...that's a hard ask because those startups that don't succeed sink without trace. Which, if you believe in the Y Combinator argument, is the point. A web 2.0 startup needs so little initial investment that western civilisation can afford to throw them away in bulk without noticing. The result is a sort of Monte Carlo method for product innovation. I wonder how that's working out for them now that money is no longer so cheap.

But still, something is very wrong with enterprise IT. Would the "web 2.0" way of working help?

Maybe, maybe not. I think Tim has made the mistake of assuming that a technique that he's seen work in one context will necessarily work in another. And should be deployed there post-haste. The IT industry collectively makes this mistake about every seven years.

Some commentators have noted some of the differences between the Web and the Enterprise. In summary: web 2.0 startups don't have to integrate with a heterogeneous ecosystem of legacy systems older than the combined ages of their founders. Web 2.0 startups don't have to operate within multiple tight and complex regulatory regimes. Some were even so unkind as to observe that Tim is one of the devisors of XML and so some of the chest–deep mud that Enterprise developers have to slog through is his fault.

I think there are two more issues that Enterprise groups have to deal with that web 2.0 startups don't, and don't have a good story for.


Scale

I'm going to say that enterprise IT faces a much tougher scale problem than do web developers. Part of the web 2.0 model is that the initial user base is the founders plus their immediate circle. The problem that startups face is that if they are one of the (very) few that make it, their user base may grow very quickly. Their problem is scaleability. This is a tough problem.

The problem for the Enterprise is scale.

Citigroup, Bank of America and JP Morgan Chase all have slightly less than a quarter of a million employees. If you are deploying a "corporate IT app" at one of those organisations then you have to plan for all quarter of a million of them to hit your app at 9am their local time tomorrow. And all day and everyday thereafter. That's a different problem. And by the standards of corporate IT this is no–where near as bad as it can get.

You know that NPfIT project that Tim holds up as an exemplar of corporate IT failure? In 2008 the NHS had 1.12 million non–medical staff, 99,000 medics and dentists and 34,000 General Practitioners. Imagine that they are all going to start using your app tomorrow. To help them treat patients. Of which there are several million a week handled by NHS services. Putting up the fail whale isn't going to cut it.

And that's the real problem that corporate IT have the deal with: the screaming.


Screaming

Let's say that your web 2.0 startup thingy that you put together in record time in your buddy's mum's spare room falls over. What's the worst thing that can happen? A few snarky tweets? The odd complaining blog post? The whole point of your business model is that your service (and company) are disposable, at least in the early days. It's not as if you have a revenue stream to protect. As if!

Now lets imagine what happens when your FX app you just deployed into Mammon Inc's trading floors falls over. What happens then is that a head of desk in Farawayvia phones you up at 3 am your time and will not stop screaming at you until you get it fixed. I exaggerate for comic effect, but not by much. If you are a Web 2.0 type then you have the luxury of fixing stuff up in something like your own time. If you are a Corporate IT type then you might well have someone who believes that you, personally, are trying to steal their bonus from them raging at you until you get the damn thing working again.

Or that you, personally, are trying to kill their patient.

If you were a founder of a web startup and the users of your service had your personal mobile number and believed that five minutes of downtime could cost them personally millions of bottletops (the currency of Farawayvia, ticker symbol BTP), or lead them to a malpractice suite over a dead patient, do you think you would behave the same way that other founders do?


Solutions?

So, what's the answer? I don't there is any "the answer". Not a technical one, anyway. There are a bunch of tools and techniques that I've come to think will help a lot, and I encourage my clients who happen to be in corporate IT to use them. I fact, Tim mentions some of them in his post. He notes that startups get a lot of help from:
dynamic languages and Web frameworks and TDD and REST and Open Source and NoSQL at varying levels of relative importance
Too right. But, as Gerry Weinberg says: it's always a people problem. It always is. It's the screaming.

Get people into a place where they believe they can do the right thing and they mostly will. The technology can then, to a surprising extent, look after itself. This is the really big advantage that startups have, I think: no–one to tell them not to do the right thing. It's not that corporate IT is full of people saying "don't do the right thing" (although in the worst case that does happen). Rather it's that the social context inside any organisation big enough to call itself an "enterprise", without any actual malice on the part of any individual, works against the right thing.

I think it was Ron Jeffries that I first heard say that most of the advice given to software developers to help them do better is about as useful as telling them to be taller. Telling folks who work in corporate IT to behave more like the folks in web startups seems no more useful than that to me.

On the other hand, if one does end up working in a really toxic environment that does work against the right thing it can be worth looking around. Look around to see who, exactly, is holding the gun to your head to make you put up with it.


Complex Domains: playing with Alloy

Before digging in to more Bayesian ideas I want to take a step back. The example in this previous post, which I worked through with Laurent, has an interesting property: Each subsequent test halves the remaining uncertainty about the correctness of the system under test. This is a property of the problem domain, not the solution. For a next example I want to look at a more complex domain. But what makes a domain complex (in the sense of essential complexity)? That's not a rhetorical question.

And what tools are appropriate to handling a complex domain? Let me tell you a story... (if you want to skip the story and go direct to the techie stuff, it's here)

Microcell Predictor

Back in the day I worked on a tool used by radio engineers to design mobile phone networks.

In particular I worked on a so–called "microcell predictor". This would take a description of a dense urban environment and a proposed low–power base station location and calculate the expected signal strength at various points in the area. The input was a file containing a bunch of polygons describing building footprints and some materials data (steel and glass, masonry, etc) and the base station location and properties (antenna design and so forth). The output was a raster of predicted signal strengths. This could overlay the building polygons and generate a map that the engineers could first eyeball and then if necessary analyse more closely to help them optimise the base station placement. This was a lot faster and cheaper than putting up a temporary antenna and then driving around in a vehicle with a meter measuring the actual signal strength, which was the way the very first networks were planned.

The requirement for this came from the radio specialists in the form of pages of maths describing various "semi-empirical" models of microwave propagation and how these interacted with buildings. Let's say we are looking at GSM 900. If a 900 MHz microwave photon were moving in free space it would have a wavelength of approximately 3×109 ms-1/9×108 s-1 or around 33mm. This makes such photons quite good at seeming to go around the corners of "human scale" structures by diffraction. To calculate that exactly would be very messy and on the boxes we used, impracticable. So we had these other methods which hid a lot of the details and gave results that the radio experts deemed good enough and that we could compute with. The input didn't have to be especially large or complicated for the prediction to take long enough that the user would give up, but the point of microcells is that they only cover a small area in a city centre anyway so that was OK. This was fifteen years ago, the techniques used now are a lot more sophisticated.

Semi-formal

We used a development process called Syntropy. It's an unusual day in which if I spend any time at all thinking about software I don't use ideas from Syntropy to good effect. Amongst other things Syntropy combines a graphical notation for object structures much like OMT with a textual notation for facts about them much like Z. Some (but not nearly enough) of these ideas made it into UML, particularly the OCL.

So, we had these mathematical requirements and we produced from them mathematically supported specifications and designs, full of ∀'s and ∃'s, and we had to turn these into working software. I learned a great deal about the art of doing that there, other parts of which are a story for another time.

The main thing for my current purpose, though, is that when I think back to those times I'm stunned by the amount of effort we put into determining if those specifications and designs were correct. The only way we knew how was to round up a bunch of seriously smart people (which luckily we had) and check these models manually. Management were smart about it and paid for us to be trained in Fagan inspection techniques, which helped a lot. But the expense! six or eight top-flight programmers in a room for a couple of hours is not a trivial investment. And to do that many times per document. Sometimes many times per page of a document, over the years that we worked on this thing.

But that was then. As it happens, more–or–less exactly then another group in the UK were using much more advanced formal methods to address a much trickier problem.

This is Now

In 2007 Sir Tony Hoare delivered a keynote at the Spa conference. He talked about the effort required to prove (really, prove) that the Mondex electronic money system was secure. The thing about Mondex is that the money is actually on the card, rather than being in the network with the card acting as a credential to allow the money to be moved. This made the Bank of England very nervous (Mondex was developed in the UK). Developing that 200 page proof was very expensive. This effort has become something of a celebrity amongst the Verified Software community.

The folks who worked on the Mondex proof were, almost certainly, much smarter than my colleagues and I who worked on the microcell predictor (sorry guys), but they seem not to have known a better way to proceed than manual checking, either. In fact they, so they say, they said at the time
mechanising such a large proof cost–effectively is beyond the state of the art
Hoare's keynote explained that between then and now, in fact in that year 2007, the problem had been re–addressed. The goal was to discover to what extent the state of the art had moved on in ten years and whether mechanisation had become cost–effective. Hoare suggested strongly that through improvements in theory and hardware that cost–effectiveness is within reach.

One of the things that came out of that effort was a model written in Alloy (note: the link was dead at the time of writing but the Alloy site is actively maintained).

Alloy is actually what I wanted to write about. Alloy seems to live at an interesting place: the intersection of proof and examples. What Alloy does is help you develop a proof of various properties of a specification by on the one hand generating examples (if the specification is consistent) or counter–examples (if it isn't).

Testing

Over time, and quite naturally, our focus changed while working on the microcell predictor. We became less interested in demonstrating that our code conformed to a design that conformed to a specification that conformed to a requirement. We became more interested that the code conformed to the users' needs. We showed this through intensive automated testing.

My boss at the time insisted that we write fully automated tests for every function we wrote. He had an automated testing framework that he carried around in his head and regenerated at each new place of work. I think he had learned this from previous boss of his and the framework had, IIRC, originally been written in Pascal. So we crated a C++ version and off we went writing tests and I can't begin to mention the number of times that writing the tests, and running them, again and again and again, was crucial to overcoming what would otherwise have been show–stopping problems.

It was especially interesting that one of the guys on the team, a real code–basher and much better programmer than I am, built a graphical test runner (polygons, remember) that let you see what the code was doing as a test ran. See the building footprint polygons, see the triangulation of the line–of–sight region, see the first–order rays from the antenna to the corners of buildings, see the second–order virtual sources, see the triangulation of their line–of–sight, and so on. See it in all these various scenarios, each devised specifically to check that some particularly interesting feature of the problem was dealt with correctly. At one time I had several sheets of big flipchart paper covered in the tiniest writing I could manage describing all the ways I could think of that a line segment could meet a set of polygons. I missed a few.

Something like Alloy would have helped so much.

These animated tests became the premier way of explaining what the microcell predictor did. Even to customers.

Notice that my work on microcells, with the intensive automate testing, and the original Mondex proof took place more–or–less contemporaneously with the discovery of Extreme Programming. I think I recall a lunchtime conversation during the microcell work to the effect that there was this mad project going on where they had automated tests for everything (as we did), but they wrote the tests first! I think I recall some comment along the lines that this was fine only so long as you knew the requirement in great and final detail, but in practice you never do. I'm now pretty confident that the converse is true: while we hardly ever do have great and finally detailed requirements, this is exactly when writing the tests first does help.

I'm glad that I dropped off the "models" path to correctness and onto the "test first" one. And I'm glad that I had the experience of doing the "models" approach. I find it interesting to look back over the fence sometimes, and see how those folks are getting along.

Alloy

And so to Alloy. I have Alloy 4.1.10 here on my MacBook Pro (2.4 GHz Core 2 Duo, 4GB ram). I'm going to try to develop a formal model of the points on a Goban. If you'd like to play along, there's an hg repo.

Points

Alloy models are essentially relational although the syntax is deliberately chosen to be as familiar as possible to users of "curly bracket" OO languages. I begin by writing a kind of test. This takes the form of a predicate called correct which says
pred correct {
there_are_such_things_as_points
}
and I can ask Alloy to run this test
run correct
and Alloy tells me that The name "there_are_such_things_as_points" cannot be found. which is excellent news. I'm well on the way to using the familiar TDD cycle. Not compiling is failure and here is a failing test. I can make the test fail in a slightly more informative way by defining there_are_such_things_as_points like so
pred there_are_such_things_as_points{
#Point > 0
}
which says that the size of the set named Point (which is the set of all tuples conforming to the signature Point—it's a relational model, remember) is strictly greater than zero. Of course I haven't defined that signature yet so Alloy tells me that The name "Point" cannot be found. I define Point like so
sig Point {}
and now Alloy reports that

Executing "Run correct"
Sig this/Point scope <= 3
Sig this/Point in [[Point$0], [Point$1], [Point$2]]
Solver=minisatprover(jni) Bitwidth=4 MaxSeq=4 SkolemDepth=2 Symmetry=20
18 vars. 3 primary vars. 23 clauses. 183ms.
Instance found. Predicate is consistent. 41ms.

There's a lot of information there. The important part for now is that Alloy could find an instance (that is, a bunch of tuples) that conforms to the model and of which the predicate is true. Therefore the model and the predicates I have defined are consistent (that is, contain no contradictions). I can also ask Alloy not to run my predicate but instead to check it. The news here is not so good.

Executing "Check check$1"
Sig this/Point scope <= 3
Sig this/Point in [[Point$0], [Point$1], [Point$2]]
Solver=minisatprover(jni) Bitwidth=4 MaxSeq=4 SkolemDepth=2 Symmetry=20
18 vars. 3 primary vars. 24 clauses. 11ms.
Counterexample found. Assertion is invalid. 18ms.

It turns out although my model is consistent it is not valid. It is possible to construct instances of the model for which the predicate is not true.

Notice the lines in the reports about sig this/Point. In working with my model Alloy has made some instances of Point form which it has then constructed instances of the model. By default it chooses to make up to 3 instances of a signature. Here is a graphical (in both senses) representation of the instance of the model which Alloy built
Point$0Do you see this instance named in the array of three instances which Alloy reported it had created? Clearly the predicate is satisfied. Alloy will also produce a graph of the counterexample which it found—which is empty. (Well, strictly it's a message telling me that "every atom is hidden" in a "this page intentionally left blank" sort of way).

There is nothing in the model which says that there are any points, only that there possibly is such a thing as a Point. The problem domain can help us here, as it turns out that some of the points on the board have names.

pred tengen[p : Point]{}

fact tengen_exists {
one p : Point |
tengen[p]
}
Here I state a fact, which is very much like a predicate, except that it is information for Alloy to use not a question for it to ask of the model.

Read the fact tengen_exists like this: "it's true of exactly one instance, named p, of the signature Point that the predicate tengen is true of p". The predicate itself is parameterised on an instance of Point but does not depend upon that instance. Which seems as if it should smell.

Running the predicate as before finds that same instance of the model with one point in it. I can pop up an evaluator on that instance and ask for the value of tengen[Point$0] which is (of course) true. If I ask Alloy to check the model it now reports No counterexample found. Assertion may be valid. 69ms. Note the "may be" there. Alloy can't be absolutely sure because it only instantiates a small number of tuples for each signature. This is a manifestation of the Small Instance Hypothesis (sometimes "small model" or "small scope") which claims that if your model is bogus then this will show up very quickly after looking at a small number of small examples—exhaustive enumeration of cases is not required.

So now I have a model, however feeble, which is consistent and cannot be shown (using up to three Points) to be invalid. I'll check in.

Refinement

I'm not very happy with this model. I said that some points are named, such as tengen, but that's not really very well expressed. There's that smelly predicate which doesn't depend upon its parameter. If some instances of Point have names, then we can say that. After going around the fail-pass loop (trust me, I am doing that but I'm not going to write it out every time) the model looks like this
enum Name { Tengen }

sig Point {
name : lone Name
}

pred tengen[p : Point]{
p.name = Tengen
}

fact tengen_exists {
one p : Point |
tengen[p]
}
Several new Alloy features are used here. Since Alloy 4 doesn't support string literals I use an atom (an instance of a signature with no further structure). The enum clause creates quite a complex structure behind the scenes but gets me the atom Tengen. The signature of Point is extended to have a field named name which will, in a navigation expression such as p.name resolve to an instance of signature Name, or to none, as shown by the cardinality marker lone. These navigation expressions look like dereferencing as found in OO languages, but are actually joins.

This looks a lot healthier to me, and the model is still both consistent and not demonstrably invalid. Here's the new instance of the model. I'll check in.

Directions

There are many points on a goban, however. And these points stand in a certain relationship. Specifically, every point on the board has some neighbours. Tengen has four neighbouring points, one in each of the four directions I will call N, E, S and W. I start with N and obtain this invalid model

enum Name { Tengen }

sig Point {
name : lone Name,
neighbour : Direction -> lone Point
}

pred tengen[p : Point]{
p.name = Tengen
}

fact tengen_exists {
one p : Point | tengen[p]
}

enum Direction {N}
which admits a counterexample which does not satisfy this predicate
pred tengen_has_a_neighbour_in_each_direction{
let tengen = {p : Point | p.name = Tengen} {
not tengen.neighbour[N] = none
}
}
The counterexample looks like this
New Alloy features are the mapping Direction -> lone Point which is pretty much the same as a typical "dictionary" and the let form and its binding of the name tengen to the value of a comprehension. The comprehension should be read as "the set of things p, which are instances of Point of which it is true that the value of p.name is equal to Tengen" Some sugar in Alloy means that we don't need to distinguish between a value and the set of size 1 who's sole member is that value.

Running the model produces the somewhat surprising result that it is consistent. Looking at the example shows that this is a red herring. This is an interesting state of the world, so I check in with a suitable caveat in the message.

A YAGNI Moment

The invalid aspect of the model comes from the cardinality on Point.direction. There are points on a Goban which do not have four neighbours, one in each direction. But I haven't mentioned any of them yet. There's a good chance that eventually points will need to have optional neighbours, but right now YAGNI.

As the (so far, incomplete) predicate's name suggests, tengen really does have four neighbours. The cardinality should be one. Making that change produces a model which no cannot be shown to be invalid. However, the example instance is still bogus. I know from TDD practice what to do: write a test that will fail until the problem is fixed. Here it is
pred points_are_not_their_own_neighbour {
all p : Point |
not p in univ.(p.neighbour)
}
The construction univ.r for any relation r evaluates to the range of the relation.

As I hoped, this test fails. Although running the predicates can produce an instance in which the northerly neighbour of tengen is not tengen, checking can also still produce an invalidating counterexample in which it is. I must add a predicate to apply to all Points forcing them not to be their own neighbour

sig Point {
name : lone Name,
neighbour : Direction -> one Point
}{
not this in ran[neighbour]
}
Here I use the function ran imported from the module util/relation to state the constraint on the range of neighbour. The conjunction of the predicates listed in curleys immediately after a sig are taken as facts true of all instances of that signature. The model is now consistent and not demonstrably invalid, but a glance at the example reveals that all is not well

This is interesting, so I check it in.

Complementary Directions

Once again, I need to strengthen the tests. If a point is the northern neighbour of tengen, then tengen is the southern neighbour of the that point. Directions on the board come in complementary pairs.
pred directions_are_complementary{
N.complement = S
S.complement = N
}
and now I have to de-sugar Direction in order to insert the complement relation. And now we see how enums work
abstract sig Direction{
complement : one Direction
}{
symmetric[@complement]
}

one sig N extends Direction{}{complement = S}
one sig S extends Direction{}
In the fact appended to Direction I say that the relation complement (the @ means that I'm refering to the relation itself and not its value) is symmetric using the predicate util/relation/symmetric. Thus I do not have to specify that S's complement is N having once said the converse. The same pattern applies to E and W.

The instance is now a spectacular mess.I will check in anyway.

Distinct Neighbours

What I might like to say is
pred neighbours_are_distinct{
all p : Point |
all disj d, d' : Direction |
p.neighbour[d] != p.neighbour[d']
}
The nested quantification uses the disj modifier and should be read "for all distinct pairs of Direction, named d and d'..."

This immediately renders the model (seemingly) inconsistent. More specifically the problem is that Alloy can no longer find an instance. I don't think that this is because the model contains contradictions so much as that it now requires more than three points in order to satisfy it. I can increase the number of instances available when the predicates are run like this run correct for 5 Point The resulting example is a rats nest of dodgy looking relations (it's in the repo as instance.dot if you want a look)

A less extravagant predicate is
pred neighbours_of_tengen_are_distinct{
let tengen = {p : Point | p.name = Tengen} |
all disj d, d' : Direction |
tengen.neighbour[d] != tengen.neighbour[d']
}
and with this I see a much less tangled, but still wrong, instance (instance1.dot). And the model is also demonstrably invalid. I'm going to make a significant change to the model. One I've been itching to do for some time. I check in before this.

A Missing Abstraction?

I feel as if I'm missing a degree of freedom, which is making it hard to say what I want. I'm going to promote tengen to be a signature, or rather to create a signature of which tengen will be the only instance at the moment: InteriorPoint. I remove the predicates about distinct neighbours and introduce InteriorPoint
sig InteriorPoint extends Point{}
and can then quite happily say
pred interior_points_have_a_neighbour_in_each_direction{
all p : InteriorPoint {
not p.neighbour[N] = none
not p.neighbour[E] = none
not p.neighbour[S] = none
not p.neighbour[W] = none
}
}
and this gets me back to a consistent, not demonstrably invalid (although still wrong) model. I check in.

Now I can say
sig InteriorPoint extends Point{}{
#ran[neighbour] = #Direction
}
and leave other kinds of point to look after themselves. If the range of the neighbour relation (which is a set) is the same size as the set of Directions, then there must be one neighbour per direction.

This leaves me with the model in this state
sig Point {
neighbour : Direction -> lone Point
}{
not this in ran[neighbour]
all d : dom[neighbour] |
this = neighbour[d].@neighbour[d.complement]
}

sig InteriorPoint extends Point{}{
#ran[neighbour] = #Direction
}

fact tengen_exists {
#InteriorPoint = 1
}

abstract sig Direction{
complement : one Direction
}{
symmetric[@complement]
}

one sig N extends Direction{}{complement = S}
one sig S extends Direction{}

one sig E extends Direction{}{complement = W}
one sig W extends Direction{}
a little bit of tidying up and I check in. The model is consistent and cannot be shown to be invalid. The example instance looks respectable too—so long as we focus on the interior point and don't worry about how its neighbours relate to one-another, which is clearly wrong. But there are no tests for that. This diagram has been cleaned up in omnigraffle to focus on the interior point but the .dot of the original is checked in. Maybe another time I'll sort out the regular points.

Thoughts

Wow, that was hard work. Took a long time, too (although not so long as the timestamps make it look, I was also doing laundry and so forth during the elapsed). Does that make what is after all merely a rectangular array of points a "complex domain"? No. I'm out of practice with this kind of thing, and not fluent with the tool. Even so, I'm impressed by how good a fit the TDD cycle seems to be for this formal modelling tool. I even got into a bit of trouble towards the end but was rescued by recalling the TDD technique of making the tests dumb and repetitive—but concrete and clear. And how the same subtle trap of thinking too far ahead applies here too.

Would this have helped with the microcell predictor? Maybe not. Alloy doesn't do numbers at all well, and that was an intrinsically numerical problem. Could this approach help with other things? I think so. This tiny model(ling) problem has turned out to be harder and more time-consuming than I expected it to be, but it is my first go with the tool. I'm going to play around with it some more, as time allows, and see what comes up.

I'm certainly impressed with the tool. Alloy comes very close to making powerful models an everyday tool for the working programmer, but I don't think its quite there yet. The gulf, as it always was, is between that very succinct model and working code. How to bridge that gulf in a useful way I don't yet know.

Observations on Spotify

I use Spotify, but not quite enough to want to pay for the ad–free premium service just yet. I'm getting pretty good at not paying any attention to the ads (which seems as if it could be an increasingly useful life skill in the apparently inevitable entirely ad–funded on–line world to come), but a couple just now caught my...ear.

Spotify make a big noise about their targeted ads: better value for advertisers, less annoying for listeners. And yet here I am half way through the third act of Tristan and it's choosing to tell me about a campaign called dance4life which encourages clubbers under 25 to "start dancing and stop aids". A worthy goal, but unless Spotify knows something about Wagnerians that I don't this seems a little odd.

I also notice that while playing music in its peer–to–peer mode Spotify is gratifyingly parsimonious of bandwidth, but when the ads are streaming in my network suddenly jump up to 125KB/s and stay there for the duration of the ads. I wonder why that should be.