An unreasonably high bar?

Some time ago Adam Goucher posted this response to my 5 questions interview with Michael Hunter. There's a few points in there that I want to come back to, but right now the one that's at the front of my mind is this:
Count tests to get a useless number; I can write a million tests that provide useless information but still shows 7 figures in the count.
Well yes, you could. But why would you? We seem to have a hankering in the industry for techniques that would give good results even when badly applied by malicious idiots. That seems unreasonable. And also pointless: I don't believe that the industry is populated by malicious idiots. On the other hand, the kind of answer one gets depends a lot on how a question is asked. 

There is (I read somewhere recently) a principle in economics that one cannot use one number as both a measure of and a target for the same thing and expect anything sensible to happen. [Allan tells me that this is Goodhart's Law --kb] In our world this is the route to the gaming of metrics. I also don't believe that gaming works by folks consciously sitting down and conspiring to fabricate results. I do believe that if we measure, say, test coverage at every check-in and publish it on our whizzy CI server dashboard thingy and have a trend line of coverage over time and we talk a lot about higher coverage being better, or even that test coverage has something to do with "quality" [that would be the "surrogate measure" part of Goodhart's Law --kb] then it is in fact the response of a smart and well intentioned team member to write more tests to get the number up. Even if those tests turn out not to be much use for anything else.

I think (certainly I hope so) that my recommendation to measure scope by counting tests doesn't fall into that trap. Don't write the tests so that you can measure scope. But observe that you can if you write the tests the right way. Of which I shall have a bit more to say later.


allan kelly said...

Your probably thinking of Goodhart's Law -

Usually stated as: Once an measure is used as a target it will change its behavior.
And thus will no longer reliably measure the original thing.

keithb said...

That's the puppy! Thanks.