It's the Screaming

Tim Bray observes:
The community of developers whose work you see on the Web, who probably don’t know what ADO or UML or JPA even stand for, deploy better systems at less cost in less time at lower risk than we see in the Enterprise. [emphasis in original]
That's a pretty bold claim.

Oh, certainly, the Enterprise makes a botch of IT projects on a titanic scale, and Tim gives a partial list of very high profile failures. I think this data is tainted. Firstly, these project failures (such as the UK NHS NPfIT programme) are, well, very high profile—but they aren't representative. Plenty of corporate IT projects do very OK. Secondly, the specific examples Tim gives are actually government projects. Government IT is the reductio ad absurdum of Enterprise IT. These might not be the most informative examples to look at.

On the other side of the matter, Tim cites the successes of Facebook, Google and Twitter. These are also not representative. Quickly, name three failed web 2.0 startups...that's a hard ask because those startups that don't succeed sink without trace. Which, if you believe in the Y Combinator argument, is the point. A web 2.0 startup needs so little initial investment that western civilisation can afford to throw them away in bulk without noticing. The result is a sort of Monte Carlo method for product innovation. I wonder how that's working out for them now that money is no longer so cheap.

But still, something is very wrong with enterprise IT. Would the "web 2.0" way of working help?

Maybe, maybe not. I think Tim has made the mistake of assuming that a technique that he's seen work in one context will necessarily work in another. And should be deployed there post-haste. The IT industry collectively makes this mistake about every seven years.

Some commentators have noted some of the differences between the Web and the Enterprise. In summary: web 2.0 startups don't have to integrate with a heterogeneous ecosystem of legacy systems older than the combined ages of their founders. Web 2.0 startups don't have to operate within multiple tight and complex regulatory regimes. Some were even so unkind as to observe that Tim is one of the devisors of XML and so some of the chest–deep mud that Enterprise developers have to slog through is his fault.

I think there are two more issues that Enterprise groups have to deal with that web 2.0 startups don't, and don't have a good story for.


Scale

I'm going to say that enterprise IT faces a much tougher scale problem than do web developers. Part of the web 2.0 model is that the initial user base is the founders plus their immediate circle. The problem that startups face is that if they are one of the (very) few that make it, their user base may grow very quickly. Their problem is scaleability. This is a tough problem.

The problem for the Enterprise is scale.

Citigroup, Bank of America and JP Morgan Chase all have slightly less than a quarter of a million employees. If you are deploying a "corporate IT app" at one of those organisations then you have to plan for all quarter of a million of them to hit your app at 9am their local time tomorrow. And all day and everyday thereafter. That's a different problem. And by the standards of corporate IT this is no–where near as bad as it can get.

You know that NPfIT project that Tim holds up as an exemplar of corporate IT failure? In 2008 the NHS had 1.12 million non–medical staff, 99,000 medics and dentists and 34,000 General Practitioners. Imagine that they are all going to start using your app tomorrow. To help them treat patients. Of which there are several million a week handled by NHS services. Putting up the fail whale isn't going to cut it.

And that's the real problem that corporate IT have the deal with: the screaming.


Screaming

Let's say that your web 2.0 startup thingy that you put together in record time in your buddy's mum's spare room falls over. What's the worst thing that can happen? A few snarky tweets? The odd complaining blog post? The whole point of your business model is that your service (and company) are disposable, at least in the early days. It's not as if you have a revenue stream to protect. As if!

Now lets imagine what happens when your FX app you just deployed into Mammon Inc's trading floors falls over. What happens then is that a head of desk in Farawayvia phones you up at 3 am your time and will not stop screaming at you until you get it fixed. I exaggerate for comic effect, but not by much. If you are a Web 2.0 type then you have the luxury of fixing stuff up in something like your own time. If you are a Corporate IT type then you might well have someone who believes that you, personally, are trying to steal their bonus from them raging at you until you get the damn thing working again.

Or that you, personally, are trying to kill their patient.

If you were a founder of a web startup and the users of your service had your personal mobile number and believed that five minutes of downtime could cost them personally millions of bottletops (the currency of Farawayvia, ticker symbol BTP), or lead them to a malpractice suite over a dead patient, do you think you would behave the same way that other founders do?


Solutions?

So, what's the answer? I don't there is any "the answer". Not a technical one, anyway. There are a bunch of tools and techniques that I've come to think will help a lot, and I encourage my clients who happen to be in corporate IT to use them. I fact, Tim mentions some of them in his post. He notes that startups get a lot of help from:
dynamic languages and Web frameworks and TDD and REST and Open Source and NoSQL at varying levels of relative importance
Too right. But, as Gerry Weinberg says: it's always a people problem. It always is. It's the screaming.

Get people into a place where they believe they can do the right thing and they mostly will. The technology can then, to a surprising extent, look after itself. This is the really big advantage that startups have, I think: no–one to tell them not to do the right thing. It's not that corporate IT is full of people saying "don't do the right thing" (although in the worst case that does happen). Rather it's that the social context inside any organisation big enough to call itself an "enterprise", without any actual malice on the part of any individual, works against the right thing.

I think it was Ron Jeffries that I first heard say that most of the advice given to software developers to help them do better is about as useful as telling them to be taller. Telling folks who work in corporate IT to behave more like the folks in web startups seems no more useful than that to me.

On the other hand, if one does end up working in a really toxic environment that does work against the right thing it can be worth looking around. Look around to see who, exactly, is holding the gun to your head to make you put up with it.


Complex Domains: playing with Alloy

Before digging in to more Bayesian ideas I want to take a step back. The example in this previous post, which I worked through with Laurent, has an interesting property: Each subsequent test halves the remaining uncertainty about the correctness of the system under test. This is a property of the problem domain, not the solution. For a next example I want to look at a more complex domain. But what makes a domain complex (in the sense of essential complexity)? That's not a rhetorical question.

And what tools are appropriate to handling a complex domain? Let me tell you a story... (if you want to skip the story and go direct to the techie stuff, it's here)

Microcell Predictor

Back in the day I worked on a tool used by radio engineers to design mobile phone networks.

In particular I worked on a so–called "microcell predictor". This would take a description of a dense urban environment and a proposed low–power base station location and calculate the expected signal strength at various points in the area. The input was a file containing a bunch of polygons describing building footprints and some materials data (steel and glass, masonry, etc) and the base station location and properties (antenna design and so forth). The output was a raster of predicted signal strengths. This could overlay the building polygons and generate a map that the engineers could first eyeball and then if necessary analyse more closely to help them optimise the base station placement. This was a lot faster and cheaper than putting up a temporary antenna and then driving around in a vehicle with a meter measuring the actual signal strength, which was the way the very first networks were planned.

The requirement for this came from the radio specialists in the form of pages of maths describing various "semi-empirical" models of microwave propagation and how these interacted with buildings. Let's say we are looking at GSM 900. If a 900 MHz microwave photon were moving in free space it would have a wavelength of approximately 3×109 ms-1/9×108 s-1 or around 33mm. This makes such photons quite good at seeming to go around the corners of "human scale" structures by diffraction. To calculate that exactly would be very messy and on the boxes we used, impracticable. So we had these other methods which hid a lot of the details and gave results that the radio experts deemed good enough and that we could compute with. The input didn't have to be especially large or complicated for the prediction to take long enough that the user would give up, but the point of microcells is that they only cover a small area in a city centre anyway so that was OK. This was fifteen years ago, the techniques used now are a lot more sophisticated.

Semi-formal

We used a development process called Syntropy. It's an unusual day in which if I spend any time at all thinking about software I don't use ideas from Syntropy to good effect. Amongst other things Syntropy combines a graphical notation for object structures much like OMT with a textual notation for facts about them much like Z. Some (but not nearly enough) of these ideas made it into UML, particularly the OCL.

So, we had these mathematical requirements and we produced from them mathematically supported specifications and designs, full of ∀'s and ∃'s, and we had to turn these into working software. I learned a great deal about the art of doing that there, other parts of which are a story for another time.

The main thing for my current purpose, though, is that when I think back to those times I'm stunned by the amount of effort we put into determining if those specifications and designs were correct. The only way we knew how was to round up a bunch of seriously smart people (which luckily we had) and check these models manually. Management were smart about it and paid for us to be trained in Fagan inspection techniques, which helped a lot. But the expense! six or eight top-flight programmers in a room for a couple of hours is not a trivial investment. And to do that many times per document. Sometimes many times per page of a document, over the years that we worked on this thing.

But that was then. As it happens, more–or–less exactly then another group in the UK were using much more advanced formal methods to address a much trickier problem.

This is Now

In 2007 Sir Tony Hoare delivered a keynote at the Spa conference. He talked about the effort required to prove (really, prove) that the Mondex electronic money system was secure. The thing about Mondex is that the money is actually on the card, rather than being in the network with the card acting as a credential to allow the money to be moved. This made the Bank of England very nervous (Mondex was developed in the UK). Developing that 200 page proof was very expensive. This effort has become something of a celebrity amongst the Verified Software community.

The folks who worked on the Mondex proof were, almost certainly, much smarter than my colleagues and I who worked on the microcell predictor (sorry guys), but they seem not to have known a better way to proceed than manual checking, either. In fact they, so they say, they said at the time
mechanising such a large proof cost–effectively is beyond the state of the art
Hoare's keynote explained that between then and now, in fact in that year 2007, the problem had been re–addressed. The goal was to discover to what extent the state of the art had moved on in ten years and whether mechanisation had become cost–effective. Hoare suggested strongly that through improvements in theory and hardware that cost–effectiveness is within reach.

One of the things that came out of that effort was a model written in Alloy (note: the link was dead at the time of writing but the Alloy site is actively maintained).

Alloy is actually what I wanted to write about. Alloy seems to live at an interesting place: the intersection of proof and examples. What Alloy does is help you develop a proof of various properties of a specification by on the one hand generating examples (if the specification is consistent) or counter–examples (if it isn't).

Testing

Over time, and quite naturally, our focus changed while working on the microcell predictor. We became less interested in demonstrating that our code conformed to a design that conformed to a specification that conformed to a requirement. We became more interested that the code conformed to the users' needs. We showed this through intensive automated testing.

My boss at the time insisted that we write fully automated tests for every function we wrote. He had an automated testing framework that he carried around in his head and regenerated at each new place of work. I think he had learned this from previous boss of his and the framework had, IIRC, originally been written in Pascal. So we crated a C++ version and off we went writing tests and I can't begin to mention the number of times that writing the tests, and running them, again and again and again, was crucial to overcoming what would otherwise have been show–stopping problems.

It was especially interesting that one of the guys on the team, a real code–basher and much better programmer than I am, built a graphical test runner (polygons, remember) that let you see what the code was doing as a test ran. See the building footprint polygons, see the triangulation of the line–of–sight region, see the first–order rays from the antenna to the corners of buildings, see the second–order virtual sources, see the triangulation of their line–of–sight, and so on. See it in all these various scenarios, each devised specifically to check that some particularly interesting feature of the problem was dealt with correctly. At one time I had several sheets of big flipchart paper covered in the tiniest writing I could manage describing all the ways I could think of that a line segment could meet a set of polygons. I missed a few.

Something like Alloy would have helped so much.

These animated tests became the premier way of explaining what the microcell predictor did. Even to customers.

Notice that my work on microcells, with the intensive automate testing, and the original Mondex proof took place more–or–less contemporaneously with the discovery of Extreme Programming. I think I recall a lunchtime conversation during the microcell work to the effect that there was this mad project going on where they had automated tests for everything (as we did), but they wrote the tests first! I think I recall some comment along the lines that this was fine only so long as you knew the requirement in great and final detail, but in practice you never do. I'm now pretty confident that the converse is true: while we hardly ever do have great and finally detailed requirements, this is exactly when writing the tests first does help.

I'm glad that I dropped off the "models" path to correctness and onto the "test first" one. And I'm glad that I had the experience of doing the "models" approach. I find it interesting to look back over the fence sometimes, and see how those folks are getting along.

Alloy

And so to Alloy. I have Alloy 4.1.10 here on my MacBook Pro (2.4 GHz Core 2 Duo, 4GB ram). I'm going to try to develop a formal model of the points on a Goban. If you'd like to play along, there's an hg repo.

Points

Alloy models are essentially relational although the syntax is deliberately chosen to be as familiar as possible to users of "curly bracket" OO languages. I begin by writing a kind of test. This takes the form of a predicate called correct which says
pred correct {
there_are_such_things_as_points
}
and I can ask Alloy to run this test
run correct
and Alloy tells me that The name "there_are_such_things_as_points" cannot be found. which is excellent news. I'm well on the way to using the familiar TDD cycle. Not compiling is failure and here is a failing test. I can make the test fail in a slightly more informative way by defining there_are_such_things_as_points like so
pred there_are_such_things_as_points{
#Point > 0
}
which says that the size of the set named Point (which is the set of all tuples conforming to the signature Point—it's a relational model, remember) is strictly greater than zero. Of course I haven't defined that signature yet so Alloy tells me that The name "Point" cannot be found. I define Point like so
sig Point {}
and now Alloy reports that

Executing "Run correct"
Sig this/Point scope <= 3
Sig this/Point in [[Point$0], [Point$1], [Point$2]]
Solver=minisatprover(jni) Bitwidth=4 MaxSeq=4 SkolemDepth=2 Symmetry=20
18 vars. 3 primary vars. 23 clauses. 183ms.
Instance found. Predicate is consistent. 41ms.

There's a lot of information there. The important part for now is that Alloy could find an instance (that is, a bunch of tuples) that conforms to the model and of which the predicate is true. Therefore the model and the predicates I have defined are consistent (that is, contain no contradictions). I can also ask Alloy not to run my predicate but instead to check it. The news here is not so good.

Executing "Check check$1"
Sig this/Point scope <= 3
Sig this/Point in [[Point$0], [Point$1], [Point$2]]
Solver=minisatprover(jni) Bitwidth=4 MaxSeq=4 SkolemDepth=2 Symmetry=20
18 vars. 3 primary vars. 24 clauses. 11ms.
Counterexample found. Assertion is invalid. 18ms.

It turns out although my model is consistent it is not valid. It is possible to construct instances of the model for which the predicate is not true.

Notice the lines in the reports about sig this/Point. In working with my model Alloy has made some instances of Point form which it has then constructed instances of the model. By default it chooses to make up to 3 instances of a signature. Here is a graphical (in both senses) representation of the instance of the model which Alloy built
Point$0Do you see this instance named in the array of three instances which Alloy reported it had created? Clearly the predicate is satisfied. Alloy will also produce a graph of the counterexample which it found—which is empty. (Well, strictly it's a message telling me that "every atom is hidden" in a "this page intentionally left blank" sort of way).

There is nothing in the model which says that there are any points, only that there possibly is such a thing as a Point. The problem domain can help us here, as it turns out that some of the points on the board have names.

pred tengen[p : Point]{}

fact tengen_exists {
one p : Point |
tengen[p]
}
Here I state a fact, which is very much like a predicate, except that it is information for Alloy to use not a question for it to ask of the model.

Read the fact tengen_exists like this: "it's true of exactly one instance, named p, of the signature Point that the predicate tengen is true of p". The predicate itself is parameterised on an instance of Point but does not depend upon that instance. Which seems as if it should smell.

Running the predicate as before finds that same instance of the model with one point in it. I can pop up an evaluator on that instance and ask for the value of tengen[Point$0] which is (of course) true. If I ask Alloy to check the model it now reports No counterexample found. Assertion may be valid. 69ms. Note the "may be" there. Alloy can't be absolutely sure because it only instantiates a small number of tuples for each signature. This is a manifestation of the Small Instance Hypothesis (sometimes "small model" or "small scope") which claims that if your model is bogus then this will show up very quickly after looking at a small number of small examples—exhaustive enumeration of cases is not required.

So now I have a model, however feeble, which is consistent and cannot be shown (using up to three Points) to be invalid. I'll check in.

Refinement

I'm not very happy with this model. I said that some points are named, such as tengen, but that's not really very well expressed. There's that smelly predicate which doesn't depend upon its parameter. If some instances of Point have names, then we can say that. After going around the fail-pass loop (trust me, I am doing that but I'm not going to write it out every time) the model looks like this
enum Name { Tengen }

sig Point {
name : lone Name
}

pred tengen[p : Point]{
p.name = Tengen
}

fact tengen_exists {
one p : Point |
tengen[p]
}
Several new Alloy features are used here. Since Alloy 4 doesn't support string literals I use an atom (an instance of a signature with no further structure). The enum clause creates quite a complex structure behind the scenes but gets me the atom Tengen. The signature of Point is extended to have a field named name which will, in a navigation expression such as p.name resolve to an instance of signature Name, or to none, as shown by the cardinality marker lone. These navigation expressions look like dereferencing as found in OO languages, but are actually joins.

This looks a lot healthier to me, and the model is still both consistent and not demonstrably invalid. Here's the new instance of the model. I'll check in.

Directions

There are many points on a goban, however. And these points stand in a certain relationship. Specifically, every point on the board has some neighbours. Tengen has four neighbouring points, one in each of the four directions I will call N, E, S and W. I start with N and obtain this invalid model

enum Name { Tengen }

sig Point {
name : lone Name,
neighbour : Direction -> lone Point
}

pred tengen[p : Point]{
p.name = Tengen
}

fact tengen_exists {
one p : Point | tengen[p]
}

enum Direction {N}
which admits a counterexample which does not satisfy this predicate
pred tengen_has_a_neighbour_in_each_direction{
let tengen = {p : Point | p.name = Tengen} {
not tengen.neighbour[N] = none
}
}
The counterexample looks like this
New Alloy features are the mapping Direction -> lone Point which is pretty much the same as a typical "dictionary" and the let form and its binding of the name tengen to the value of a comprehension. The comprehension should be read as "the set of things p, which are instances of Point of which it is true that the value of p.name is equal to Tengen" Some sugar in Alloy means that we don't need to distinguish between a value and the set of size 1 who's sole member is that value.

Running the model produces the somewhat surprising result that it is consistent. Looking at the example shows that this is a red herring. This is an interesting state of the world, so I check in with a suitable caveat in the message.

A YAGNI Moment

The invalid aspect of the model comes from the cardinality on Point.direction. There are points on a Goban which do not have four neighbours, one in each direction. But I haven't mentioned any of them yet. There's a good chance that eventually points will need to have optional neighbours, but right now YAGNI.

As the (so far, incomplete) predicate's name suggests, tengen really does have four neighbours. The cardinality should be one. Making that change produces a model which no cannot be shown to be invalid. However, the example instance is still bogus. I know from TDD practice what to do: write a test that will fail until the problem is fixed. Here it is
pred points_are_not_their_own_neighbour {
all p : Point |
not p in univ.(p.neighbour)
}
The construction univ.r for any relation r evaluates to the range of the relation.

As I hoped, this test fails. Although running the predicates can produce an instance in which the northerly neighbour of tengen is not tengen, checking can also still produce an invalidating counterexample in which it is. I must add a predicate to apply to all Points forcing them not to be their own neighbour

sig Point {
name : lone Name,
neighbour : Direction -> one Point
}{
not this in ran[neighbour]
}
Here I use the function ran imported from the module util/relation to state the constraint on the range of neighbour. The conjunction of the predicates listed in curleys immediately after a sig are taken as facts true of all instances of that signature. The model is now consistent and not demonstrably invalid, but a glance at the example reveals that all is not well

This is interesting, so I check it in.

Complementary Directions

Once again, I need to strengthen the tests. If a point is the northern neighbour of tengen, then tengen is the southern neighbour of the that point. Directions on the board come in complementary pairs.
pred directions_are_complementary{
N.complement = S
S.complement = N
}
and now I have to de-sugar Direction in order to insert the complement relation. And now we see how enums work
abstract sig Direction{
complement : one Direction
}{
symmetric[@complement]
}

one sig N extends Direction{}{complement = S}
one sig S extends Direction{}
In the fact appended to Direction I say that the relation complement (the @ means that I'm refering to the relation itself and not its value) is symmetric using the predicate util/relation/symmetric. Thus I do not have to specify that S's complement is N having once said the converse. The same pattern applies to E and W.

The instance is now a spectacular mess.I will check in anyway.

Distinct Neighbours

What I might like to say is
pred neighbours_are_distinct{
all p : Point |
all disj d, d' : Direction |
p.neighbour[d] != p.neighbour[d']
}
The nested quantification uses the disj modifier and should be read "for all distinct pairs of Direction, named d and d'..."

This immediately renders the model (seemingly) inconsistent. More specifically the problem is that Alloy can no longer find an instance. I don't think that this is because the model contains contradictions so much as that it now requires more than three points in order to satisfy it. I can increase the number of instances available when the predicates are run like this run correct for 5 Point The resulting example is a rats nest of dodgy looking relations (it's in the repo as instance.dot if you want a look)

A less extravagant predicate is
pred neighbours_of_tengen_are_distinct{
let tengen = {p : Point | p.name = Tengen} |
all disj d, d' : Direction |
tengen.neighbour[d] != tengen.neighbour[d']
}
and with this I see a much less tangled, but still wrong, instance (instance1.dot). And the model is also demonstrably invalid. I'm going to make a significant change to the model. One I've been itching to do for some time. I check in before this.

A Missing Abstraction?

I feel as if I'm missing a degree of freedom, which is making it hard to say what I want. I'm going to promote tengen to be a signature, or rather to create a signature of which tengen will be the only instance at the moment: InteriorPoint. I remove the predicates about distinct neighbours and introduce InteriorPoint
sig InteriorPoint extends Point{}
and can then quite happily say
pred interior_points_have_a_neighbour_in_each_direction{
all p : InteriorPoint {
not p.neighbour[N] = none
not p.neighbour[E] = none
not p.neighbour[S] = none
not p.neighbour[W] = none
}
}
and this gets me back to a consistent, not demonstrably invalid (although still wrong) model. I check in.

Now I can say
sig InteriorPoint extends Point{}{
#ran[neighbour] = #Direction
}
and leave other kinds of point to look after themselves. If the range of the neighbour relation (which is a set) is the same size as the set of Directions, then there must be one neighbour per direction.

This leaves me with the model in this state
sig Point {
neighbour : Direction -> lone Point
}{
not this in ran[neighbour]
all d : dom[neighbour] |
this = neighbour[d].@neighbour[d.complement]
}

sig InteriorPoint extends Point{}{
#ran[neighbour] = #Direction
}

fact tengen_exists {
#InteriorPoint = 1
}

abstract sig Direction{
complement : one Direction
}{
symmetric[@complement]
}

one sig N extends Direction{}{complement = S}
one sig S extends Direction{}

one sig E extends Direction{}{complement = W}
one sig W extends Direction{}
a little bit of tidying up and I check in. The model is consistent and cannot be shown to be invalid. The example instance looks respectable too—so long as we focus on the interior point and don't worry about how its neighbours relate to one-another, which is clearly wrong. But there are no tests for that. This diagram has been cleaned up in omnigraffle to focus on the interior point but the .dot of the original is checked in. Maybe another time I'll sort out the regular points.

Thoughts

Wow, that was hard work. Took a long time, too (although not so long as the timestamps make it look, I was also doing laundry and so forth during the elapsed). Does that make what is after all merely a rectangular array of points a "complex domain"? No. I'm out of practice with this kind of thing, and not fluent with the tool. Even so, I'm impressed by how good a fit the TDD cycle seems to be for this formal modelling tool. I even got into a bit of trouble towards the end but was rescued by recalling the TDD technique of making the tests dumb and repetitive—but concrete and clear. And how the same subtle trap of thinking too far ahead applies here too.

Would this have helped with the microcell predictor? Maybe not. Alloy doesn't do numbers at all well, and that was an intrinsically numerical problem. Could this approach help with other things? I think so. This tiny model(ling) problem has turned out to be harder and more time-consuming than I expected it to be, but it is my first go with the tool. I'm going to play around with it some more, as time allows, and see what comes up.

I'm certainly impressed with the tool. Alloy comes very close to making powerful models an everyday tool for the working programmer, but I don't think its quite there yet. The gulf, as it always was, is between that very succinct model and working code. How to bridge that gulf in a useful way I don't yet know.