Showing posts with label tdd. Show all posts
Showing posts with label tdd. Show all posts

Complex Domains: playing with Alloy

Before digging in to more Bayesian ideas I want to take a step back. The example in this previous post, which I worked through with Laurent, has an interesting property: Each subsequent test halves the remaining uncertainty about the correctness of the system under test. This is a property of the problem domain, not the solution. For a next example I want to look at a more complex domain. But what makes a domain complex (in the sense of essential complexity)? That's not a rhetorical question.

And what tools are appropriate to handling a complex domain? Let me tell you a story... (if you want to skip the story and go direct to the techie stuff, it's here)

Microcell Predictor

Back in the day I worked on a tool used by radio engineers to design mobile phone networks.

In particular I worked on a so–called "microcell predictor". This would take a description of a dense urban environment and a proposed low–power base station location and calculate the expected signal strength at various points in the area. The input was a file containing a bunch of polygons describing building footprints and some materials data (steel and glass, masonry, etc) and the base station location and properties (antenna design and so forth). The output was a raster of predicted signal strengths. This could overlay the building polygons and generate a map that the engineers could first eyeball and then if necessary analyse more closely to help them optimise the base station placement. This was a lot faster and cheaper than putting up a temporary antenna and then driving around in a vehicle with a meter measuring the actual signal strength, which was the way the very first networks were planned.

The requirement for this came from the radio specialists in the form of pages of maths describing various "semi-empirical" models of microwave propagation and how these interacted with buildings. Let's say we are looking at GSM 900. If a 900 MHz microwave photon were moving in free space it would have a wavelength of approximately 3×109 ms-1/9×108 s-1 or around 33mm. This makes such photons quite good at seeming to go around the corners of "human scale" structures by diffraction. To calculate that exactly would be very messy and on the boxes we used, impracticable. So we had these other methods which hid a lot of the details and gave results that the radio experts deemed good enough and that we could compute with. The input didn't have to be especially large or complicated for the prediction to take long enough that the user would give up, but the point of microcells is that they only cover a small area in a city centre anyway so that was OK. This was fifteen years ago, the techniques used now are a lot more sophisticated.

Semi-formal

We used a development process called Syntropy. It's an unusual day in which if I spend any time at all thinking about software I don't use ideas from Syntropy to good effect. Amongst other things Syntropy combines a graphical notation for object structures much like OMT with a textual notation for facts about them much like Z. Some (but not nearly enough) of these ideas made it into UML, particularly the OCL.

So, we had these mathematical requirements and we produced from them mathematically supported specifications and designs, full of ∀'s and ∃'s, and we had to turn these into working software. I learned a great deal about the art of doing that there, other parts of which are a story for another time.

The main thing for my current purpose, though, is that when I think back to those times I'm stunned by the amount of effort we put into determining if those specifications and designs were correct. The only way we knew how was to round up a bunch of seriously smart people (which luckily we had) and check these models manually. Management were smart about it and paid for us to be trained in Fagan inspection techniques, which helped a lot. But the expense! six or eight top-flight programmers in a room for a couple of hours is not a trivial investment. And to do that many times per document. Sometimes many times per page of a document, over the years that we worked on this thing.

But that was then. As it happens, more–or–less exactly then another group in the UK were using much more advanced formal methods to address a much trickier problem.

This is Now

In 2007 Sir Tony Hoare delivered a keynote at the Spa conference. He talked about the effort required to prove (really, prove) that the Mondex electronic money system was secure. The thing about Mondex is that the money is actually on the card, rather than being in the network with the card acting as a credential to allow the money to be moved. This made the Bank of England very nervous (Mondex was developed in the UK). Developing that 200 page proof was very expensive. This effort has become something of a celebrity amongst the Verified Software community.

The folks who worked on the Mondex proof were, almost certainly, much smarter than my colleagues and I who worked on the microcell predictor (sorry guys), but they seem not to have known a better way to proceed than manual checking, either. In fact they, so they say, they said at the time
mechanising such a large proof cost–effectively is beyond the state of the art
Hoare's keynote explained that between then and now, in fact in that year 2007, the problem had been re–addressed. The goal was to discover to what extent the state of the art had moved on in ten years and whether mechanisation had become cost–effective. Hoare suggested strongly that through improvements in theory and hardware that cost–effectiveness is within reach.

One of the things that came out of that effort was a model written in Alloy (note: the link was dead at the time of writing but the Alloy site is actively maintained).

Alloy is actually what I wanted to write about. Alloy seems to live at an interesting place: the intersection of proof and examples. What Alloy does is help you develop a proof of various properties of a specification by on the one hand generating examples (if the specification is consistent) or counter–examples (if it isn't).

Testing

Over time, and quite naturally, our focus changed while working on the microcell predictor. We became less interested in demonstrating that our code conformed to a design that conformed to a specification that conformed to a requirement. We became more interested that the code conformed to the users' needs. We showed this through intensive automated testing.

My boss at the time insisted that we write fully automated tests for every function we wrote. He had an automated testing framework that he carried around in his head and regenerated at each new place of work. I think he had learned this from previous boss of his and the framework had, IIRC, originally been written in Pascal. So we crated a C++ version and off we went writing tests and I can't begin to mention the number of times that writing the tests, and running them, again and again and again, was crucial to overcoming what would otherwise have been show–stopping problems.

It was especially interesting that one of the guys on the team, a real code–basher and much better programmer than I am, built a graphical test runner (polygons, remember) that let you see what the code was doing as a test ran. See the building footprint polygons, see the triangulation of the line–of–sight region, see the first–order rays from the antenna to the corners of buildings, see the second–order virtual sources, see the triangulation of their line–of–sight, and so on. See it in all these various scenarios, each devised specifically to check that some particularly interesting feature of the problem was dealt with correctly. At one time I had several sheets of big flipchart paper covered in the tiniest writing I could manage describing all the ways I could think of that a line segment could meet a set of polygons. I missed a few.

Something like Alloy would have helped so much.

These animated tests became the premier way of explaining what the microcell predictor did. Even to customers.

Notice that my work on microcells, with the intensive automate testing, and the original Mondex proof took place more–or–less contemporaneously with the discovery of Extreme Programming. I think I recall a lunchtime conversation during the microcell work to the effect that there was this mad project going on where they had automated tests for everything (as we did), but they wrote the tests first! I think I recall some comment along the lines that this was fine only so long as you knew the requirement in great and final detail, but in practice you never do. I'm now pretty confident that the converse is true: while we hardly ever do have great and finally detailed requirements, this is exactly when writing the tests first does help.

I'm glad that I dropped off the "models" path to correctness and onto the "test first" one. And I'm glad that I had the experience of doing the "models" approach. I find it interesting to look back over the fence sometimes, and see how those folks are getting along.

Alloy

And so to Alloy. I have Alloy 4.1.10 here on my MacBook Pro (2.4 GHz Core 2 Duo, 4GB ram). I'm going to try to develop a formal model of the points on a Goban. If you'd like to play along, there's an hg repo.

Points

Alloy models are essentially relational although the syntax is deliberately chosen to be as familiar as possible to users of "curly bracket" OO languages. I begin by writing a kind of test. This takes the form of a predicate called correct which says
pred correct {
there_are_such_things_as_points
}
and I can ask Alloy to run this test
run correct
and Alloy tells me that The name "there_are_such_things_as_points" cannot be found. which is excellent news. I'm well on the way to using the familiar TDD cycle. Not compiling is failure and here is a failing test. I can make the test fail in a slightly more informative way by defining there_are_such_things_as_points like so
pred there_are_such_things_as_points{
#Point > 0
}
which says that the size of the set named Point (which is the set of all tuples conforming to the signature Point—it's a relational model, remember) is strictly greater than zero. Of course I haven't defined that signature yet so Alloy tells me that The name "Point" cannot be found. I define Point like so
sig Point {}
and now Alloy reports that

Executing "Run correct"
Sig this/Point scope <= 3
Sig this/Point in [[Point$0], [Point$1], [Point$2]]
Solver=minisatprover(jni) Bitwidth=4 MaxSeq=4 SkolemDepth=2 Symmetry=20
18 vars. 3 primary vars. 23 clauses. 183ms.
Instance found. Predicate is consistent. 41ms.

There's a lot of information there. The important part for now is that Alloy could find an instance (that is, a bunch of tuples) that conforms to the model and of which the predicate is true. Therefore the model and the predicates I have defined are consistent (that is, contain no contradictions). I can also ask Alloy not to run my predicate but instead to check it. The news here is not so good.

Executing "Check check$1"
Sig this/Point scope <= 3
Sig this/Point in [[Point$0], [Point$1], [Point$2]]
Solver=minisatprover(jni) Bitwidth=4 MaxSeq=4 SkolemDepth=2 Symmetry=20
18 vars. 3 primary vars. 24 clauses. 11ms.
Counterexample found. Assertion is invalid. 18ms.

It turns out although my model is consistent it is not valid. It is possible to construct instances of the model for which the predicate is not true.

Notice the lines in the reports about sig this/Point. In working with my model Alloy has made some instances of Point form which it has then constructed instances of the model. By default it chooses to make up to 3 instances of a signature. Here is a graphical (in both senses) representation of the instance of the model which Alloy built
Point$0Do you see this instance named in the array of three instances which Alloy reported it had created? Clearly the predicate is satisfied. Alloy will also produce a graph of the counterexample which it found—which is empty. (Well, strictly it's a message telling me that "every atom is hidden" in a "this page intentionally left blank" sort of way).

There is nothing in the model which says that there are any points, only that there possibly is such a thing as a Point. The problem domain can help us here, as it turns out that some of the points on the board have names.

pred tengen[p : Point]{}

fact tengen_exists {
one p : Point |
tengen[p]
}
Here I state a fact, which is very much like a predicate, except that it is information for Alloy to use not a question for it to ask of the model.

Read the fact tengen_exists like this: "it's true of exactly one instance, named p, of the signature Point that the predicate tengen is true of p". The predicate itself is parameterised on an instance of Point but does not depend upon that instance. Which seems as if it should smell.

Running the predicate as before finds that same instance of the model with one point in it. I can pop up an evaluator on that instance and ask for the value of tengen[Point$0] which is (of course) true. If I ask Alloy to check the model it now reports No counterexample found. Assertion may be valid. 69ms. Note the "may be" there. Alloy can't be absolutely sure because it only instantiates a small number of tuples for each signature. This is a manifestation of the Small Instance Hypothesis (sometimes "small model" or "small scope") which claims that if your model is bogus then this will show up very quickly after looking at a small number of small examples—exhaustive enumeration of cases is not required.

So now I have a model, however feeble, which is consistent and cannot be shown (using up to three Points) to be invalid. I'll check in.

Refinement

I'm not very happy with this model. I said that some points are named, such as tengen, but that's not really very well expressed. There's that smelly predicate which doesn't depend upon its parameter. If some instances of Point have names, then we can say that. After going around the fail-pass loop (trust me, I am doing that but I'm not going to write it out every time) the model looks like this
enum Name { Tengen }

sig Point {
name : lone Name
}

pred tengen[p : Point]{
p.name = Tengen
}

fact tengen_exists {
one p : Point |
tengen[p]
}
Several new Alloy features are used here. Since Alloy 4 doesn't support string literals I use an atom (an instance of a signature with no further structure). The enum clause creates quite a complex structure behind the scenes but gets me the atom Tengen. The signature of Point is extended to have a field named name which will, in a navigation expression such as p.name resolve to an instance of signature Name, or to none, as shown by the cardinality marker lone. These navigation expressions look like dereferencing as found in OO languages, but are actually joins.

This looks a lot healthier to me, and the model is still both consistent and not demonstrably invalid. Here's the new instance of the model. I'll check in.

Directions

There are many points on a goban, however. And these points stand in a certain relationship. Specifically, every point on the board has some neighbours. Tengen has four neighbouring points, one in each of the four directions I will call N, E, S and W. I start with N and obtain this invalid model

enum Name { Tengen }

sig Point {
name : lone Name,
neighbour : Direction -> lone Point
}

pred tengen[p : Point]{
p.name = Tengen
}

fact tengen_exists {
one p : Point | tengen[p]
}

enum Direction {N}
which admits a counterexample which does not satisfy this predicate
pred tengen_has_a_neighbour_in_each_direction{
let tengen = {p : Point | p.name = Tengen} {
not tengen.neighbour[N] = none
}
}
The counterexample looks like this
New Alloy features are the mapping Direction -> lone Point which is pretty much the same as a typical "dictionary" and the let form and its binding of the name tengen to the value of a comprehension. The comprehension should be read as "the set of things p, which are instances of Point of which it is true that the value of p.name is equal to Tengen" Some sugar in Alloy means that we don't need to distinguish between a value and the set of size 1 who's sole member is that value.

Running the model produces the somewhat surprising result that it is consistent. Looking at the example shows that this is a red herring. This is an interesting state of the world, so I check in with a suitable caveat in the message.

A YAGNI Moment

The invalid aspect of the model comes from the cardinality on Point.direction. There are points on a Goban which do not have four neighbours, one in each direction. But I haven't mentioned any of them yet. There's a good chance that eventually points will need to have optional neighbours, but right now YAGNI.

As the (so far, incomplete) predicate's name suggests, tengen really does have four neighbours. The cardinality should be one. Making that change produces a model which no cannot be shown to be invalid. However, the example instance is still bogus. I know from TDD practice what to do: write a test that will fail until the problem is fixed. Here it is
pred points_are_not_their_own_neighbour {
all p : Point |
not p in univ.(p.neighbour)
}
The construction univ.r for any relation r evaluates to the range of the relation.

As I hoped, this test fails. Although running the predicates can produce an instance in which the northerly neighbour of tengen is not tengen, checking can also still produce an invalidating counterexample in which it is. I must add a predicate to apply to all Points forcing them not to be their own neighbour

sig Point {
name : lone Name,
neighbour : Direction -> one Point
}{
not this in ran[neighbour]
}
Here I use the function ran imported from the module util/relation to state the constraint on the range of neighbour. The conjunction of the predicates listed in curleys immediately after a sig are taken as facts true of all instances of that signature. The model is now consistent and not demonstrably invalid, but a glance at the example reveals that all is not well

This is interesting, so I check it in.

Complementary Directions

Once again, I need to strengthen the tests. If a point is the northern neighbour of tengen, then tengen is the southern neighbour of the that point. Directions on the board come in complementary pairs.
pred directions_are_complementary{
N.complement = S
S.complement = N
}
and now I have to de-sugar Direction in order to insert the complement relation. And now we see how enums work
abstract sig Direction{
complement : one Direction
}{
symmetric[@complement]
}

one sig N extends Direction{}{complement = S}
one sig S extends Direction{}
In the fact appended to Direction I say that the relation complement (the @ means that I'm refering to the relation itself and not its value) is symmetric using the predicate util/relation/symmetric. Thus I do not have to specify that S's complement is N having once said the converse. The same pattern applies to E and W.

The instance is now a spectacular mess.I will check in anyway.

Distinct Neighbours

What I might like to say is
pred neighbours_are_distinct{
all p : Point |
all disj d, d' : Direction |
p.neighbour[d] != p.neighbour[d']
}
The nested quantification uses the disj modifier and should be read "for all distinct pairs of Direction, named d and d'..."

This immediately renders the model (seemingly) inconsistent. More specifically the problem is that Alloy can no longer find an instance. I don't think that this is because the model contains contradictions so much as that it now requires more than three points in order to satisfy it. I can increase the number of instances available when the predicates are run like this run correct for 5 Point The resulting example is a rats nest of dodgy looking relations (it's in the repo as instance.dot if you want a look)

A less extravagant predicate is
pred neighbours_of_tengen_are_distinct{
let tengen = {p : Point | p.name = Tengen} |
all disj d, d' : Direction |
tengen.neighbour[d] != tengen.neighbour[d']
}
and with this I see a much less tangled, but still wrong, instance (instance1.dot). And the model is also demonstrably invalid. I'm going to make a significant change to the model. One I've been itching to do for some time. I check in before this.

A Missing Abstraction?

I feel as if I'm missing a degree of freedom, which is making it hard to say what I want. I'm going to promote tengen to be a signature, or rather to create a signature of which tengen will be the only instance at the moment: InteriorPoint. I remove the predicates about distinct neighbours and introduce InteriorPoint
sig InteriorPoint extends Point{}
and can then quite happily say
pred interior_points_have_a_neighbour_in_each_direction{
all p : InteriorPoint {
not p.neighbour[N] = none
not p.neighbour[E] = none
not p.neighbour[S] = none
not p.neighbour[W] = none
}
}
and this gets me back to a consistent, not demonstrably invalid (although still wrong) model. I check in.

Now I can say
sig InteriorPoint extends Point{}{
#ran[neighbour] = #Direction
}
and leave other kinds of point to look after themselves. If the range of the neighbour relation (which is a set) is the same size as the set of Directions, then there must be one neighbour per direction.

This leaves me with the model in this state
sig Point {
neighbour : Direction -> lone Point
}{
not this in ran[neighbour]
all d : dom[neighbour] |
this = neighbour[d].@neighbour[d.complement]
}

sig InteriorPoint extends Point{}{
#ran[neighbour] = #Direction
}

fact tengen_exists {
#InteriorPoint = 1
}

abstract sig Direction{
complement : one Direction
}{
symmetric[@complement]
}

one sig N extends Direction{}{complement = S}
one sig S extends Direction{}

one sig E extends Direction{}{complement = W}
one sig W extends Direction{}
a little bit of tidying up and I check in. The model is consistent and cannot be shown to be invalid. The example instance looks respectable too—so long as we focus on the interior point and don't worry about how its neighbours relate to one-another, which is clearly wrong. But there are no tests for that. This diagram has been cleaned up in omnigraffle to focus on the interior point but the .dot of the original is checked in. Maybe another time I'll sort out the regular points.

Thoughts

Wow, that was hard work. Took a long time, too (although not so long as the timestamps make it look, I was also doing laundry and so forth during the elapsed). Does that make what is after all merely a rectangular array of points a "complex domain"? No. I'm out of practice with this kind of thing, and not fluent with the tool. Even so, I'm impressed by how good a fit the TDD cycle seems to be for this formal modelling tool. I even got into a bit of trouble towards the end but was rescued by recalling the TDD technique of making the tests dumb and repetitive—but concrete and clear. And how the same subtle trap of thinking too far ahead applies here too.

Would this have helped with the microcell predictor? Maybe not. Alloy doesn't do numbers at all well, and that was an intrinsically numerical problem. Could this approach help with other things? I think so. This tiny model(ling) problem has turned out to be harder and more time-consuming than I expected it to be, but it is my first go with the tool. I'm going to play around with it some more, as time allows, and see what comes up.

I'm certainly impressed with the tool. Alloy comes very close to making powerful models an everyday tool for the working programmer, but I don't think its quite there yet. The gulf, as it always was, is between that very succinct model and working code. How to bridge that gulf in a useful way I don't yet know.

TDD, Mocks and Design

Mike Feathers has posted an exploration of some ideas about and misconceptions of TDD. I wish that more people were familiar this story that he mentions:
John Nolan, the CTO of a startup named Connextra [...] gave his developers a challenge: write OO code with no getters. Whenever possible, tell another object to do something rather than ask. In the process of doing this, they noticed that their code became supple and easy to change.
That's right: no getters. Well, Steve Freeman was amongst those developers and the rest is history. Tim Mackinnon tells another part of the story. I think that there's actually a little bit missing from Michael's decription. I'll get to it at the end.


A World Without Getters

Suppose that we want to print a value that some object can provide. Rather than writing something like statement.append(account.getTransactions()) instead we would write something more like account.appendTransactionsTo(statement) We can test this easily by passing in a mocked statement that expects to have a call like append(transaction) made. Code written this way does turn out to be more flexible, easier to maintain and also, I submit, easier to read and understand. (Partly because) This style lends itself well to the use of Intention Revealing Names.

This is the real essence of TDD with Mocks. It happens to be true that we can use mocks to stub out databases or web services or what all else, but we shouldn't. Not doing that leads us to write code for each sub-domain within our application in terms of very narrow, very specific interfaces with other sub-domains and to write transducers that sit at the boundaries of those domains. This is a good thing. At the largest scale, with functional tests, it leads to hexagonal architecture. And that can apply equally well recursively down to the level of individual objects.

The next time someone tries to tell you that an application has a top and a bottom and a one-dimensional stack of layers in between like pancakes, try exploring with them the idea that what systems really have is an inside and a outside and a nest of layers like an onion. It works remarkable wonders.

If we've decided that we don't mock infrastructure, and we have these transducers at domain boundaries, then we write the tests in terms of the problem domain and get a good OO design. Nice.


The World We Actually Live In

Let's suppose that we work in a mainstream IT shop, doing in-house development. Chances are that someone will have decided (without thinking too hard about it) that the world of facts that our system works with will live in a relational database. It also means that someone (else) will have decided that there will be a object-relational mapping layer, based on the inference that since we are working in Java(C#) which is deemed by Sun(Microsoft) to be an object-oriented language then we are doing object-oriented programming. As we shall see, this inference is a little shaky.

Well, a popular approach to this is to introduce a Data Access Object as a facade onto wherever the data actually lives. The full-blown DAO pattern is a hefty old thing, but note the "transfer object" which the data source (inside the DAO) uses to pass values to and receive values from the business object that's using the DAO. These things are basically structs, their job is to carry a set of named values. And if the data source is hooked up to an RDBMS then they more-or-less represent a row in a table. And note that the business object is different from the transfer object. The write-up that I've linked to is pretty generic, but the inference seems to be invited that the business object is a big old thing with lots of logic inside it.

A lot of the mechanics of this are rolled up into nice frameworks and tools such as Hibernate. Now, don't get me wrong in what follows: Hibernate is great stuff. I do struggle a bit with how it tends to be used, though. Hibernate shunts data in and out of your system using transfer objects, which are (lets say) Java Beans festooned with getters and setters. That's fine. The trouble begins with the business objects.


Irresponsible and Out of Control

In this world another popular approach is, whether it's named as such or not, whether it's explicitly recognized or not, robustness analysis. A design found by robustness analysis (as I've seen it in the wild, which may well not be be what's intended, see comments on ICONIX) is built out of "controllers", big old lumps of logic, and "entities", bags of named values. (And a few other bits and bobs) Can you see where this is going? There are rules for robustness analysis and one of them is that entities are not allowed to interact directly, but a controller may have many entities that it uses together.

Can you imagine what the code inside the update method on the GenerateStatementController (along with its Statement and Account entities) might look like?
Hmmm.


Classy Behaviour

Whenever I've taught robustness analysis I've always contrasted it with Class Responsibility Collaboration, a superficially similar technique that produces radically different results. The lesson has been that RA-style controllers always, but always, hide valuable domain concepts.

It's seductively easy to bash in a controller for a use case and then bolt on a few passive entities that it can use without really considering the essence of the domain. What you end up with is the moral equivalent of stored procedures and tables. That's not necessarily wrong, and it's not even necessarily bad depending on the circumstances. But it is completely missing the point of the last thirty-odd years worth of advances in development technique. One almost might as well be building the system in PRO*C

Anyway, with CRC all of the objects we find are assumed to have the capability of knowing things and doing stuff. In RA we assume that objects either know stuff or do stuff. And how's a know-nothing stuff-doer get the information to carry out its work? Why, it uses a passive knower, an entity which (ta-daaah!) pops ready made out of a DAO in the form of a transfer object.

And actually that is bad.


Old Skool

Back in the day the masters of structured programming[pdf] worried a lot about various coupling modes that can occur between two components in a system. One of these is "Stamp Coupling". We are invited to think of the "stamp" or template from which instances of a struct are created. Stamp coupling is considered (in the structured design world) one of the least bad kinds of coupling. Some coupling is inevitable, or else your system won't work, so one would like to choose the least bad ones, and (as of 1997) stamp coupling was a recommended choice.

OK, so the thing about stamp coupling is that it implicitly couples together all the client modules of a struct. If one of them changes in a way that requires the shape of the struct to change then all the clients are impacted, even if they don't use the changed or new or deleted field. That actually doesn't sound so great, but if you're bashing out PL/1 it's probably about the best you can do. Stamp coupling is second best, with only "data" coupling as preferable: the direct passing of atomic values as arguments. Atomic data, eh? We'll come back to that.

However, the second worst kind of coupling that the gurus identified was "common coupling" What that originally meant was something like a COMMON block in Fortran, or global variables in C, or pretty much everything in a COBOL program: just a pile of values that all modules/processes/what have you can go an monkey with. Oops! Isn't that what a transfer object that comes straight out of a (single, system-wide) database ends up being? This is not looking so good now.

What about those atomic data values? What was meant back in the day was what we would now call native types: int, char, that sort of thing. The point being that these are safe because it's profoundly unlikely that some other application programmer is going to kybosh your programming effort by changing the layout of int.
And the trouble with structs is that they can. And the trouble with transfer objects covered in getters and setters is that they can, too. But what if there were none...


Putting Your Head in a Bag Doesn't Make you Hidden

David Parnas helped us all out a lot when in 1972 he made some comments[pdf] on the criteria to be used in decomposing systems into modules
Every module [...] is characterized by its knowledge of a design decision which it hides from all others. Its interface or definition was chosen to reveal as little as possible about its inner workings.
Unfortunately, this design principle of information hiding has become fatally confused with the implementation technique of encapsulation.

If the design of a class involves a member private int count then encapsulating that behind a getter public int getCount() hides nothing. When (not if) count gets renamed, changed to a big integer class, or whatever, all the client classes need to know about it.

I hope you can see that if we didn't have any getters on our objects then this whole story unwinds and a nasty set of design problems evaporate before our eyes.


What was the point of all that, again?

John's simple sounding request: write code with no getters (and thoroughly test it, quite important that bit) is a miraculously clever one. He is a clever guy, but even so that's good.

Eliminating getters leads developers down a route that we have known is beneficial for thirty years, without really paying much attention. And the idea has become embedded in the first really new development technique to come along for a long time: TDD. What we need to do now is make sure that mocking doesn't get blurred in the same was as a lot of these other ideas have been.

First step: stop talking about mocking out infrastructure, start talking about mocks as a design tool.

Debunking Debunking Cyclomatic Complexity

Over at SDTimes, this article by Andrew Binstock contains a claim that this result by Enerjy somehow "debunks" cyclomatic complexity as an indicator of problems in code. He suggests that what's shown is that for low complexity of methods (which is overwhelmingly the most common kind of complexity of methods) increasing complexity of methods is not (positively) correlated with the likelihood of defects. Binstock suggests that:
What Enerjy found was that routines with CCNs of 1 through 25 did not follow the expected result that greater CCN correlates to greater probability of defects.
Not so. What Enerjy say their result concerns is:
the correlation of Cyclomatic Complexity (CC) values at the file level [...] against the probability of faults being found in those files [...]). [my emphasis]
It's interesting all by itself that there's a sweet spot for the total complexity of the code in a file, which for Java pretty much means all the methods in a class. However, Binstock suggests that
[...] for most code you write, CCN does not tell you anything useful about the likelihood of your code’s quality.
Which it might not if you only think about it as a number attached to a single method, and that there are no methods of high complexity. But there are methods of high complexity—and they are likely to put your class/file into the regime where complexity is shown to correlate with the likelyhood of defects. Watch out for them.

Red and Green

Brian Di Croce sets out here to explain TDD in three index cards. It's a nice job, except that I'm not convinced by the faces on his last card.

Brian shows a sad face for the red bar, a happy face for the green, and a frankly delirious face for the refactoring step. There's something subtly wrong with this. We are told that when the bar is green the code is clean, which is great. But the rule is that we only add code to make a failing test pass, which implies a red bar. So, the red bar is our friend!

When I teach TDD I teach that green bar time is something to get away form as soon as possible. Almost all the time a green bar occurs when a development episode is incomplete: not enough tests have been written for the functionality in hand, more functionality is expected to go into the next release, or some other completeness condition is not met. 

It's hard to learn form a green bar,  but a red bar almost always teaches you something. Experienced TDDer's are very (and rightly) suspicious of a green-to-green transition. The green bar gives a false sense of security.

Generally speaking, in order to get paid we need to move an implementation forward, and that can only be done on a red bar. Wanting to get to the next red bar is a driver for exploring the functionality and the examples that will capture it. 

I tell people to welcome, to embrace, to seek out the red bar. And that when the bar is red, we're forging ahead.

Gauges

The "test" word in TDD is problematical. People are (rightly) uncomfortable with using it to describe the executable design documents that get written in TDD. The idea of testing has become too tightly bound to the practice of building a system and then shaking it really hard to see what defects fall out. There is an older sense of test, meaning "to prove", which would help but isn't current enough. Fundamentally, though, these artefacts are called tests for historical reasons (ie, intellectual laziness). One attempt to fix this vocabulary problem has the twin defects of going too far in the direction of propaganda, and not far enough in the actual changes it proposes.

In any case, I'm more interested in finding explanatory metaphors to help people use the tools that are currently widely available and supported than I am in...doing whatever it is to people's heads that the BDD crowd think they are doing. Anyway, I've found that it's a bit helpful to talk about test-first tests being gauges (as I've mentioned in passing before. Trouble is that too few people these days have done any metalwork.

A Metaphor Too Far


So, the important thing about a plug gauge or such is that it isn't, in the usual sense, a measuring tool. It gives a binary result, the work piece is correctly sized to within a certain tolerance or it isn't. This makes, for example, turning a bushing to a certain outside diameter a much quicker operation than it would be if the machinist had to get out the vernier micrometer and actually measure the diameter after each episode of turning and compare that with the dimensioned drawing that specifies the part. Instead, they get (or assemble, or make) a gauge that will tell whether or not a test article conforms to the drawing, and use that.

And this is exactly what we do with tests: rather than compare the software we build against the requirement after each development episode, we build a test that will tell us if the requirement is being conformed to. But so few people these days have spent much time in front of a lathe that this doesn't really fly.

But, flying home from a client visit today my eye was caught by one of those cage-like affairs into which you dunk your cabin baggage (or not). It would be far too slow for the check-in staff to get out a tape measure, measure your bag, and compare the measurements with the permitted limits. So instead, they have a gauge. From now on (until I find a better one), that's my explanatory metaphor. Hope it works.