There's a kind of psychological tool called a "projective test" in which the subject is exposed to a deliberately ambigious stimulus, asked a question like what could this be? and then their response to it is very carefully studied. The belief (largely discredited) being that the subject will read into the stimulus all sorts of revealing stuff about their inner tensions, repressed fears and so forth.
Just recently, a sort of projective test for programmers was executed via reddit. This old thing was the ambiguous stimulus, and the response revealed a lot about the inner tensions of the programming community--or the reddit-editing part of it, anyway.
It seems to me that no-one in Rickert's story comes away from looking like a hero, not Charles, not Alan and definitely not their managers. Although all for different reasons. This modulo the fact that both Alan and Charles delivered (that's how you can tell it's a work of fiction). It might make for an interesting interview to put the story in front of a candidate (especially one for a role involving any project or line management) and have them discuss it. More interesting to me that some interview techniques, anyway.
So, that was 1985. Twenty years later, where are we as regards discussion of process, its merits and demerits? Well, we nowadays get this sort of thing, and its aftermath.
Cruising?
So, a client is looking to set up a new continuous integration (CI) environment. For historical reasons the default choice is CruiseControl but there are a bunch more tools around, including the very interesting new Build-o-Matic.
Build-o-Matic exists because it turns out to be quicker and easier to build a CI server that does exactly what you want, from scratch, in Python, than to configure the XML-infested Cruise to do something only quite a lot like what you want.
Still, Cruise is fully enterprisey, which doesn't just mean overly complicated, bloated, opaque, and difficult to use--it also means unlikely to trigger the corporate immune system. Which mainly means that the folks who manage the build machine will tolerate it. There's a whole other discussion to be had about organisations that expect developers to be productive but won't give then control of their environment.
Anyway, It'd be nice to suggest B-o-M to my client, but I have a strong suspicion that it would take longer than the lifetime of the project to get B-o-M approved for use (Ivan knows that this is an issue and is working to make it better) What to do, what to do....
Well, how about this: this is a green-field project, and although there's a web front-end coming down the line all the work being done now is real programming with actual POJO's so we can keep the build short. In fact, the build script is going to be a pretty opinionated thing and quite capable of failing your pre-check-in local build if it takes too long (or exhibits any one of a quite long list of other characteristics). So, what will we lose if we have a cron job that checks out head and builds it every now and again (say, every build-time * 3), and a little web page that watches that job and reports what the cron job is upto?
Is this a complete CI solution? No. But then why would we need one of those?
Build-o-Matic exists because it turns out to be quicker and easier to build a CI server that does exactly what you want, from scratch, in Python, than to configure the XML-infested Cruise to do something only quite a lot like what you want.
Still, Cruise is fully enterprisey, which doesn't just mean overly complicated, bloated, opaque, and difficult to use--it also means unlikely to trigger the corporate immune system. Which mainly means that the folks who manage the build machine will tolerate it. There's a whole other discussion to be had about organisations that expect developers to be productive but won't give then control of their environment.
Anyway, It'd be nice to suggest B-o-M to my client, but I have a strong suspicion that it would take longer than the lifetime of the project to get B-o-M approved for use (Ivan knows that this is an issue and is working to make it better) What to do, what to do....
Well, how about this: this is a green-field project, and although there's a web front-end coming down the line all the work being done now is real programming with actual POJO's so we can keep the build short. In fact, the build script is going to be a pretty opinionated thing and quite capable of failing your pre-check-in local build if it takes too long (or exhibits any one of a quite long list of other characteristics). So, what will we lose if we have a cron job that checks out head and builds it every now and again (say, every build-time * 3), and a little web page that watches that job and reports what the cron job is upto?
Is this a complete CI solution? No. But then why would we need one of those?
Lessons from life
On a recent business trip to Germany I visited a vineyard. Seems as if climate change in Europe means that the Rheinland vintners can now make reasonable reds, but the (Merlot, in this case) grapes still struggle in the cooler weather. So, in September time the winemaker goes out and removes, by hand, any unripe fruit before the harvest is made and the wine begun.
The polyglot colleague who kindly translated the winemaker's story put it that they "sacrifice quantity for quality". Of course, I shrugged, they reduce scope.
Winemakers have an immovable "delivery date", the time after which the fruit will be past its best. And since, in the case of the Fleischer family, they believe that their business can only flourish is their product is excellent what they do is prefer to deliver less if that maintains the value of what is delivered. Why is this obvious to a winemaker, but not to so many programmers?
The polyglot colleague who kindly translated the winemaker's story put it that they "sacrifice quantity for quality". Of course, I shrugged, they reduce scope.
Winemakers have an immovable "delivery date", the time after which the fruit will be past its best. And since, in the case of the Fleischer family, they believe that their business can only flourish is their product is excellent what they do is prefer to deliver less if that maintains the value of what is delivered. Why is this obvious to a winemaker, but not to so many programmers?
Tidbits from Agile 2006
These are some items from Agile 2006 which interested me enough that I wrote them down.
An OpenSpace session
Laurent Bossavit and Emmanuel Gaillot's session "Tool Words and Weapon Words"
Owen Rogers' Agile Estimation tutorial
Peter Coffee's keynote
- it's a wonder that spreadsheets don't have features built in to deal with uncertainty
- tooling to improve programmer efficiency makes software so much cheaper to write that much more software is worth writing: and so generates jobs for programmers, not the converse
- 3M apparently has a target for revenues generated by products less than three years old
- the value of a codebase is not dependent on the effort that went into building it
- nervousness is the enemy of innovation
An OpenSpace session
- Agile development manifests the Kolb learning cycle
- if you wanted to cerify a team as Agile, maybe you'd only really need to certify the facilitators of the retrospectives
- maybe a retrospective (or similar explicit learning activity) is about deliberately moving from unconcious competence to concious incompetence
- ...and maybe also has something to do with Heidegger's distinction(pdf) of ready-to-hand from present-at-hand. Something that I've had cause to ponder before in its relation to software.
Ole Jepsen's session "Agile Meets Offshore"
- perhaps counter-intuitively, distributed teams tend to have shorter iterations than colocated ones
Laurent Bossavit and Emmanuel Gaillot's session "Tool Words and Weapon Words"
- You are likely in a difficult spot when people start saying things which can be given an "objective" spin but are really claiming the property on the left for themselves so as to imply the property on the right of you:
- natural vs artificial
- important vs childish
- life vs death
Owen Rogers' Agile Estimation tutorial
- have the "customer" provide estimates, as well as the developers: when the two are wildly different there's something to learn
- the size of a story as esitmated by the developers is independent of its business value
- the stories that happen to be in a given iteration are a random sample from the backlog
Johanna Rothman's hiring tutorial
- using puzzle questions at interview discriminates against candidates who aren't white upper-middle class suburban American males.
- some white etc. folks find this idea rather upsetting
Ward's Ten Favourite Wiki Pages
- arguing for of refactoring is arguing for your own lack of ability
- if you don't know calculus you aren't equipped to "embrace change"
"Agile Architecture" at Agile 2006
Last week I was at the Agile 2006 conference, in Minneapolis and I see from the records that some of you were too: hello again to anyone I met there.
My session was in the first slot after the opening keynote on Monday morning, and I got the impression that some of the folks there hadn't fully grasped what the nature of a Discovery Session was, some other presenters found the same, I think. I believe that the origanisers might do more to help attendees understand better what's likely to be expected of them at such sessions for next year: they aren't tutorials. Of course, my session was a re-presentation of an XP Day session, which like Spa sessions (the other sort I'm used to presenting) tend to be exploratory, open-ended and interactive in a way that the typical Agile session seems not to be.
The session was bookended by very quick brainstorm to try an capture any change in the attendees thinking after the discussions and work with the lego.
And a couple of longer items:
It looks to me as if the aims of the session were met.
Some conference wiki also bears some comments on the session.
My session was in the first slot after the opening keynote on Monday morning, and I got the impression that some of the folks there hadn't fully grasped what the nature of a Discovery Session was, some other presenters found the same, I think. I believe that the origanisers might do more to help attendees understand better what's likely to be expected of them at such sessions for next year: they aren't tutorials. Of course, my session was a re-presentation of an XP Day session, which like Spa sessions (the other sort I'm used to presenting) tend to be exploratory, open-ended and interactive in a way that the typical Agile session seems not to be.
Outputs
As soon as I get unpacked from moving house I'll have the London reuslts here to compare with, but a selection of the Minneapolis results are here.The session was bookended by very quick brainstorm to try an capture any change in the attendees thinking after the discussions and work with the lego.
Before
|
|
- Have an idea of what possibilities your customers may need to explore
- Not for conventional projects, developers know too much business and not only code by success criteria
- provide guide rails for teams to make decisions within
After
|
|
Some conference wiki also bears some comments on the session.
Complexity and Test-first 2
The story came from here.
Of course, I'm not the first to notice that software (or, OO software, anyway) has the scale free property. These folks have done a bunch of interesting work on the scale-free nature of the object graph in running systems, and also in various static properties of code. I might just, though, be the first person to observe that the distributions are quantitatively different for different design methodologies.
What the Wellington group are trying to do is disprove what they call the Lego Hypothesis, that all software can be built out of lots of small interchangeable components. Instead, they claim, because of the scale-free nature of software, the biggest components in bigger systems will be bigger than the biggest ones in smaller systems. I'd read some of this research before, but had forgotten that--conciously, anyway--until reminded of it by Kent Beck. Thanks, Kent.
Of course, I'm not the first to notice that software (or, OO software, anyway) has the scale free property. These folks have done a bunch of interesting work on the scale-free nature of the object graph in running systems, and also in various static properties of code. I might just, though, be the first person to observe that the distributions are quantitatively different for different design methodologies.
What the Wellington group are trying to do is disprove what they call the Lego Hypothesis, that all software can be built out of lots of small interchangeable components. Instead, they claim, because of the scale-free nature of software, the biggest components in bigger systems will be bigger than the biggest ones in smaller systems. I'd read some of this research before, but had forgotten that--conciously, anyway--until reminded of it by Kent Beck. Thanks, Kent.
The story continues here.
Complexity and Test-first 1
The story began here.
Interesting to note that a similar discussion, albeit brief, sparked up on the XP egroup. I posted a link to my previous blog post there, but it seemed to get lost in the flood. Oh well. It will be interesting to compare my results with those in the Muller paper.
Anyway, I've managed to find the time to look at a very small sample of Java codebases, find their distribution of complexity, fit it to a Pareto distribution and take a look at the slope of the best fit straight line. Here's the outcome, codebases ordered by published unit tests per unit total cyclomatic complexity (where applicable):
A few points present themselves: each of the code bases with no tests published has a substantially lower slope (and so substantially greater representation of more complex methods) than any of those with; of those with published tests, number of tests per "unit" of complexity is positively correlated (at about 0.96, very good for "social science", reasonable for "hard science", but this is "computer science", so who knows?) with higher slope and so a preference for simpler methods.
The story continues here
Interesting to note that a similar discussion, albeit brief, sparked up on the XP egroup. I posted a link to my previous blog post there, but it seemed to get lost in the flood. Oh well. It will be interesting to compare my results with those in the Muller paper.
Anyway, I've managed to find the time to look at a very small sample of Java codebases, find their distribution of complexity, fit it to a Pareto distribution and take a look at the slope of the best fit straight line. Here's the outcome, codebases ordered by published unit tests per unit total cyclomatic complexity (where applicable):
codebase | #tests/ total CC | slope |
---|---|---|
jasml 0.10 | 0 | 1.18 |
logica smpp library | 0 | 1.39 |
itext 1.4.1 | 0 | 1.96 |
jfreechart 1.0.1 | 0.02 | 2.43 |
junit 3.8.2 | 0.14 | 2.47 |
ust (proprietary) | 0.35 | 2.79 |
A few points present themselves: each of the code bases with no tests published has a substantially lower slope (and so substantially greater representation of more complex methods) than any of those with; of those with published tests, number of tests per "unit" of complexity is positively correlated (at about 0.96, very good for "social science", reasonable for "hard science", but this is "computer science", so who knows?) with higher slope and so a preference for simpler methods.
The story continues here
No graphs? Check the date
Hello. If you've found that some pages here have not been rendering properly it's likely that you were looking at them at the end of June of beinning of July 2006. The problem was that I moved house, thus changing what PSTN exchange my ADSL connection was carried by, and for some reason my ISP couldn't deal with that without taking my whole online presence through them to out of service--I only noticed this problem when my email account stopped working. I apologise on their behalf.
Complexity and Test-first 0
It recently occurred to me to run a Cyclomatic Complexity tool over a codebase I know to have been largely written test-first/code/refactor (and often test-driven). The code is Java, so I used this Eclispe plugin to measure the per-method complexity.
The distribution of complexity per method looked like this:
Note that the count of methods is shown on a logarithmic scale.
There were 19,917 methods in this codebase. The mean complexity per method was 1.46, and the median complexity is 4.5 Not shown on the chart are a few outliers: one with complexity 91, one with 47, and one with 35.
Down at the other end of the scale the tens of thousands of complexity 1 and 2 methods are mostly of these sorts:
Now, while we can be confident that these methods are overwhelmingly individually very simple the codebase itself is non-the-less considered (by the folks that work on it) highly complex. It does a lot of different things, and interacts with many external systems, and although it is all written in Java it contains several very different technology stacks. So the distribution of cyclomatic complexity isn't the whole story. The total complexity of the whole codebase is about 20,000. It has far fewer that 20,000 test methods (about 7000, in fact), as a simpleminded application of cyclomatic complexity suggests would be required. Although, if you leave out the complexity 1 methods, the total complexity is only 9,000 or so, which paints a different picture.
I don't want to get embroiled in a disucssion of whether or not cyclomatic complexity even makes sense for object-oriented code, not least because the code inside a Java method isn't particularly object-oriented. I do want to examine a mainsteam notion of complexity and testing (the rule is that the complexity equals the smallest number of test cases requied to cover all paths through the code) in the light of code written test-first. Once I've found a suitable candidate method and tests to explore, I'll let you know.
Wait a minute though...
Something about the shape of that chart caught my eye. It looks very much as if it has something close to a power-law distribution. This shows up better on a log-log rendering:
If we switch to a cumulative count of methods we can extract a pretty convincing Pareto distribution:
It so happens that inside the same repository is a bunch of Java that I'm confident was not written test-first (and certianly not test-driven). That doesn't mean that it's bad code, just written differently. What does its complexity chart look like?
Looks like fairly good scale-free beahviour, but let's look at a fitted Pareto distribution:
Not as good a fit. But perhaps more significantly, the slope is about half the slope of the test-first code, 1.39 vs 2.79
If that were true, how might we use it? Well, if the complexity distribution is Pareto/Zipf then there is no "preferred" complexity for the codebase. I'd imagine that if there were a preferred complexity then that might point to a (perhaps widely dispersed) collection of methods that could be candidates for refactoring. Question: do the well-known code smells [pdf] cause such a thing?
I'd further guess that the higher level of dupliction expected in an inadequately refactored codebase (not DRY enough, not OAOO enough) would cause a bit of a hump in the complexity distribution. A sag in the distribution I have less of an inution about, although it might point to something along these lines. You'll have to dig around in Prof Salingaros's work to see what I'm getting at, but it's worth it.
Also, if the slope of the Pareto distribution is low then there will be proportionatly more more complex methods in the codebase than less complex ones--that can't be good, can it?
Well, if it wasn't for the day job I'd be looking into this in ernest. But as it is, I'm going to tinker about with this some more as and when I get the time. I'll report back when I do. If anyone else out there does any measurements like this I'd be fascinated to hear about it.
The distribution of complexity per method looked like this:
Note that the count of methods is shown on a logarithmic scale.
There were 19,917 methods in this codebase. The mean complexity per method was 1.46, and the median complexity is 4.5 Not shown on the chart are a few outliers: one with complexity 91, one with 47, and one with 35.
The Extrema
The 91 complexity method turned out to be thedoGet
of a servlet providing a simple web service, crammed full of if (method.equals("getLocales")) printLocales(out, ids);
type dispatching code. Similarly, the 47 complexity method is mainly concerned with calling out to a third-party API and finding which of a small number of equivalence sets a value from a large domain of return codes falls into. It does this by means of a huge switch
with many, many fall-throughs. So in these cases we see a disconnect between the arithmatic of cyclomatic complexity and any notion of the code being complicated, in the sense of difficult to undetrstand (and therefore maintain).Down at the other end of the scale the tens of thousands of complexity 1 and 2 methods are mostly of these sorts:
- constructors
- bean methods
- forwarders
Interpretation
In one disucssion by McCabe there is the suggestion that 10 is an important threshold for cyclomatic complexity. The SEI treatment gives 11 as the lower bound of the "simple program, without much risk" class. They don't state quite what a "program" is in this sense, I guess they mean "subroutine" in the structured programming sense. As we can see, methods with complexity >= 10 are vanishingly uncommon in the codebase I examined. Evidence that the team working on this code are keeping per-method complexity under good control.Now, while we can be confident that these methods are overwhelmingly individually very simple the codebase itself is non-the-less considered (by the folks that work on it) highly complex. It does a lot of different things, and interacts with many external systems, and although it is all written in Java it contains several very different technology stacks. So the distribution of cyclomatic complexity isn't the whole story. The total complexity of the whole codebase is about 20,000. It has far fewer that 20,000 test methods (about 7000, in fact), as a simpleminded application of cyclomatic complexity suggests would be required. Although, if you leave out the complexity 1 methods, the total complexity is only 9,000 or so, which paints a different picture.
I don't want to get embroiled in a disucssion of whether or not cyclomatic complexity even makes sense for object-oriented code, not least because the code inside a Java method isn't particularly object-oriented. I do want to examine a mainsteam notion of complexity and testing (the rule is that the complexity equals the smallest number of test cases requied to cover all paths through the code) in the light of code written test-first. Once I've found a suitable candidate method and tests to explore, I'll let you know.
Wait a minute though...
Something about the shape of that chart caught my eye. It looks very much as if it has something close to a power-law distribution. This shows up better on a log-log rendering:If we switch to a cumulative count of methods we can extract a pretty convincing Pareto distribution:
It so happens that inside the same repository is a bunch of Java that I'm confident was not written test-first (and certianly not test-driven). That doesn't mean that it's bad code, just written differently. What does its complexity chart look like?
Looks like fairly good scale-free beahviour, but let's look at a fitted Pareto distribution:
Not as good a fit. But perhaps more significantly, the slope is about half the slope of the test-first code, 1.39 vs 2.79
Hypotheses
We can imagine all sorts of reasons for these differences (not least, the non-test-first code is an order of magnitude smaller than the test first, which might account for the lower R-squared) but I'm interested in further investigating the tentative hypotheses that the test-code-refactor cycle results in code that has a distribution of complexity closer to Pareto, and with a steeper slope, than traditional approaches.If that were true, how might we use it? Well, if the complexity distribution is Pareto/Zipf then there is no "preferred" complexity for the codebase. I'd imagine that if there were a preferred complexity then that might point to a (perhaps widely dispersed) collection of methods that could be candidates for refactoring. Question: do the well-known code smells [pdf] cause such a thing?
I'd further guess that the higher level of dupliction expected in an inadequately refactored codebase (not DRY enough, not OAOO enough) would cause a bit of a hump in the complexity distribution. A sag in the distribution I have less of an inution about, although it might point to something along these lines. You'll have to dig around in Prof Salingaros's work to see what I'm getting at, but it's worth it.
Also, if the slope of the Pareto distribution is low then there will be proportionatly more more complex methods in the codebase than less complex ones--that can't be good, can it?
Well, if it wasn't for the day job I'd be looking into this in ernest. But as it is, I'm going to tinker about with this some more as and when I get the time. I'll report back when I do. If anyone else out there does any measurements like this I'd be fascinated to hear about it.
The story continues here.
Staff to Your Methodology
Bill Sempf posted an interesting hypothesis reagrding programmer ability and choice of methodology. Sound enough notion, but I'm not sure I agree with the details, but replying did give me a chance to tell my Royce story.
Doing yourself no favours
There was a time a couple of years ago when I would often travel between Penrith and London on the West Coast Main Line. The WCML is always having a lot of engineering work done on it, so these journey were often held up or interrupted and took place then on the filthy, slow unreliable old train sets that were all the decrepit permanent way could bear at the time.
At Euston there was a display, put on by Railtrack, showing some of the track relaying work going on. Not only did this video loop show off the smart yellow machine that lifts the old track, removes old ties, lays new ballast, new ties and new track (while a bunch of platelayers stand around leaning on their shovels watching it) but the video was speeded up. In the video, the machine whizzed along. To see Railtrack showing off this technology did nothing to improve the mood of those travellers who had been delayed by works.
and the Scheme
Well, people who like this sort of thing will find that this is the sort of thing that they like.
Don't you especially love the way that C#'s type system, being at the "sour spot" for such things, requires the programmer to tell the compiler that the code she's writing implements a mapping from int to int? When in fact the implementation given would work for a range of types. It's a bit of a puzzle why this should be required. The way to avoid programmers having to do this is well known, and some of the leaders in that style of working even already work for Microsoft.
The syntax is (to me at least) ugly, verbose and unclear. So, I somehow doubt that putting these pieces of code next to one another is going to make anyone fall in love with the expressive power of C# Added to which, this just is not in general what anyone who knows Lisp means by "code==data ". They mean macros.
At Euston there was a display, put on by Railtrack, showing some of the track relaying work going on. Not only did this video loop show off the smart yellow machine that lifts the old track, removes old ties, lays new ballast, new ties and new track (while a bunch of platelayers stand around leaning on their shovels watching it) but the video was speeded up. In the video, the machine whizzed along. To see Railtrack showing off this technology did nothing to improve the mood of those travellers who had been delayed by works.
Why are you talking about trains?
I was reminded of this when I came to Don Box's exposition of "code==data " in the new C# He shows some C# along with allegedly equivalent Scheme, like this. Here's the C#Expression<Func<int, int>> e = a => a + 1;
and the Scheme
(define e (quote (lambda (a) (+ a 1))))
Well, people who like this sort of thing will find that this is the sort of thing that they like.
Don't you especially love the way that C#'s type system, being at the "sour spot" for such things, requires the programmer to tell the compiler that the code she's writing implements a mapping from int to int? When in fact the implementation given would work for a range of types. It's a bit of a puzzle why this should be required. The way to avoid programmers having to do this is well known, and some of the leaders in that style of working even already work for Microsoft.
The syntax is (to me at least) ugly, verbose and unclear. So, I somehow doubt that putting these pieces of code next to one another is going to make anyone fall in love with the expressive power of C# Added to which, this just is not in general what anyone who knows Lisp means by "code==data ". They mean macros.
Doing Design
In an extract from the new edition ("2.0", indeed) of Software Conflict Robert L. Glass revisits his prior discussion of "design" in software. He focuses on the undoubted fact that design is a cognitive process. It's perhaps a sad reflection on the state of the industry at the time he was writing that this should have been a noteworthy observation.
By the way, Google wants "protocal" to be a typo for "protocol" but apparently it isn't. "Protocal" seems to mean "talking about what you're doing while you do it."
The problem with the design process as described above is that it has by itself nothing to say about where these mental models come from, or how they are compared or improved. Glass quotes Charles Simonyi on this, thus:
Well, programmers aren't the only people who do design, nor who introspect upon their own design activity. In 1964 (about 20 years before the research Glass is talking about) Christopher Alexander produced his doctoral thesis, Notes on the Synthesis of Form, which describes how the "imagining" takes place within a culture and a tradition (the design patterns movement inspired by Alexander seeks to make these things explicit for the society of software builders, and others) in both self-conscious and un-selfconcious modes, but more importantly explains how to traverse the loop from less good to more so. Laurent Bossavit gives a good overview of these ideas. Alexander even went so far as to write some computer programs to help architects and planners use these ideas. That would be a "tool". Oh dear.
As Prof. Sir Tony Hoare has put it
(*) The Feynman Problem Solving Algorithm
Where to begin? How to proceed?
Anyway, it seems that Glass undertook a commendable self-assessment of his stance on design, found it wanting and turned to the research of Bill Curtis and Elliot Soloway. As reported by Glass, after a series of "protocal studies" came up with roughly this description of what programmers do when they design:- Construct a mental model of a proposed solution to the problem
- Mentally execute the model to see if it solves the problem
- It doesn't, so see which parts of the problem the model fails on and enhance it
- Repeat until good enough
By the way, Google wants "protocal" to be a typo for "protocol" but apparently it isn't. "Protocal" seems to mean "talking about what you're doing while you do it."
The problem with the design process as described above is that it has by itself nothing to say about where these mental models come from, or how they are compared or improved. Glass quotes Charles Simonyi on this, thus:
The first step in programming is imagining. Just making it crystal clear in my mind what is going to happen.Which all, as a description of the design "process", seems only little more useful than the Feynman Problem Solving algorithm (*). Especially since one of Glass's goals in studying this matter was to find a better way to teach design.
Tools for the mind?
In the 2006 update to this essay Glass reports on the outcome of the Curtis/Soloway research: nothing. Apparently, they gave up. Glass states:Because the nature of design, they discovered, is so intensely cognitive, happening inside the mind at mind speed, the researchers could conceive of no useful tools to help in that process!What!? No useful tools to help with a process "inside the mind". Come now! I'm not sure I believe that this is what two smart guys like that concluded, unless by tool they meant some clanking monstrosity that would, you know, run on a computer.
Well, programmers aren't the only people who do design, nor who introspect upon their own design activity. In 1964 (about 20 years before the research Glass is talking about) Christopher Alexander produced his doctoral thesis, Notes on the Synthesis of Form, which describes how the "imagining" takes place within a culture and a tradition (the design patterns movement inspired by Alexander seeks to make these things explicit for the society of software builders, and others) in both self-conscious and un-selfconcious modes, but more importantly explains how to traverse the loop from less good to more so. Laurent Bossavit gives a good overview of these ideas. Alexander even went so far as to write some computer programs to help architects and planners use these ideas. That would be a "tool". Oh dear.
As Prof. Sir Tony Hoare has put it
If only we could learn the right lessons from the successes of the past, we would not need to learn from our failures.
(*) The Feynman Problem Solving Algorithm
- Write down the problem
- Think very hard
- Write down the solution
Agile Architecture lessons from SEI
My colleague Tim and I addressed the matter of "agile architecture" at the XPDay conference last year, in a way that raised a few eyebrows. We'd be doing it again elsewhere this year, but the other conferences that we submitted the session to didn't have the imagination or guts to let us put the thing on I guess.
Anyway, recently this article from the SEI ("Leading the world to a software-enriched society") came to my attention. The SEI offers this definition of "software architecture"
Thus, my program, er, I mean component here has a structure composed of software elements, these have externally visible properties, and there are relationships among them. Obviously, then, this system is just dripping with architecture. Almost as much as this paragraph is with sarcasm.
This is not what anyone means by "architecture", but based on the SEI's definition it's hard to see why not. (Once upon a time, though, the facilities used by my component were a matter of live architectural discourse--but that sort of change over time is for another posting)
Now, this sort of problem happens all the time with definitions, which is why definitions suck for serious thinking about the world. The authors of the SEI article kind-of dodge coming right out and saying that, but they do instead adopt a much smarter stance and start to talk about what distinctions can usefully be drawn between "architecture", "design" and the rest.
Speaking of drawing, they note in passing that it is a source of confusion to use a notation primarily intended to capture design ideas (UML) to capture architectural ideas. Why do we keep shooting ourselves in the foot like this? Especially since, as an industry, we have plenty of alternatives that we could use. But I digress.
Well, the authors present, in fine consultant style, a 2 x 2 taxonomy (apparently it is the "underlying dynamic structure of 2 x 2 modeling that brings richness, depth and a uniquely transformational power to this simple form". Whatever) And of course, being architects, they call this taxonomy an "ontology". They might not have noticed that it's a 2 x 2 model, though, because they only give three terms. Using their distinctions intensional vs extensional and local vs non-local, the diagram they don't provide would look like this:
Hmm, how fascinating! What is it that address non-local extensional thoughts? (Hey, maybe that's the uniquely transformational power doing its thing ;)
So, architecture is to do with what global constraints there are on how the parts of the system are chosen and assembled. I might be wrong but I'm guessing that anyone who writes for the SEI thinks that in general architectures come first and are somehow realised or filled in during design and implementation. That fits well with the idea that architectural ideas are extensional--they don't say how to build the system but they do say what general properties any system built to that architecture will have, what declarative constraints they all satisfy.
But what about the Agile setting? Well that's a broad church, so what specifically about the XP setting? In XP we first build a spike (very thin, goes from one end to the other) to learn about the essence of the problem/solution. Then we build out more thin vertical slices of functionality, refactoring as we go. Where's the architecture? If it is non-local then it can only become evident when there are enough of these slices in place for there to be anything that isn't local. And with test-driven development we very much proceed in an extensional fashion, filling iexplicitct cases one at a time, and discover the generalities through refactoring.
So the architecture (non-local, intensional) can only arise late in the game. It can only emerge.
Hypothesis for further investigation: the non-local extensional thing might just be the system metaphor, which is stated in quite concrete, explicit, albeit general, terms and is intended to apply to the whole system. And is stated early. If so, maybe the model presented in the SEI article helps explain why there are so many folks want the system metaphor to be instead of architecture and why they get disappointed when that doesn't work.
Architecture, by definition
Anyway, recently this article from the SEI ("Leading the world to a software-enriched society") came to my attention. The SEI offers this definition of "software architecture"
The software architecture of a program or computing system is the structure or structures of the system, which comprise software elements, the externally visible properties of those elements, and the relationships among them.This strikes me as a rather poor definition, since it means that, for example, I can extract some architecture from this program:
#include <stdio.h>
int main (int argc, char** argv){This is source code to be compiled into a binary component intended for deployment into a container. There are many free and commercial implementations of this container, and there's a really great book that explains all about how the container instantiates components like this, wires them up to the other components they rely on (that
printf("Hello World\n");
return 0;
}
#include <stdio.h>
thing there is a bit of declarative configuration information that tells the container that this component is going to use some services from the stdio
component) There's an entrypoint into the component that the container will recognise, and the component will indicate to the container using a standard value that it has completed its work successfully (whether it has or not, but when did you ever see anyone check the return value of printf
?)Thus, my program, er, I mean component here has a structure composed of software elements, these have externally visible properties, and there are relationships among them. Obviously, then, this system is just dripping with architecture. Almost as much as this paragraph is with sarcasm.
This is not what anyone means by "architecture", but based on the SEI's definition it's hard to see why not. (Once upon a time, though, the facilities used by my component were a matter of live architectural discourse--but that sort of change over time is for another posting)
Distinctively Architectural
Now, this sort of problem happens all the time with definitions, which is why definitions suck for serious thinking about the world. The authors of the SEI article kind-of dodge coming right out and saying that, but they do instead adopt a much smarter stance and start to talk about what distinctions can usefully be drawn between "architecture", "design" and the rest.
Speaking of drawing, they note in passing that it is a source of confusion to use a notation primarily intended to capture design ideas (UML) to capture architectural ideas. Why do we keep shooting ourselves in the foot like this? Especially since, as an industry, we have plenty of alternatives that we could use. But I digress.
Well, the authors present, in fine consultant style, a 2 x 2 taxonomy (apparently it is the "underlying dynamic structure of 2 x 2 modeling that brings richness, depth and a uniquely transformational power to this simple form". Whatever) And of course, being architects, they call this taxonomy an "ontology". They might not have noticed that it's a 2 x 2 model, though, because they only give three terms. Using their distinctions intensional vs extensional and local vs non-local, the diagram they don't provide would look like this:
Intensional | Extensional | |
---|---|---|
non-local | architecture | ??? |
local | design | implementation |
Hmm, how fascinating! What is it that address non-local extensional thoughts? (Hey, maybe that's the uniquely transformational power doing its thing ;)
So, architecture is to do with what global constraints there are on how the parts of the system are chosen and assembled. I might be wrong but I'm guessing that anyone who writes for the SEI thinks that in general architectures come first and are somehow realised or filled in during design and implementation. That fits well with the idea that architectural ideas are extensional--they don't say how to build the system but they do say what general properties any system built to that architecture will have, what declarative constraints they all satisfy.
An Agile view Emerges
But what about the Agile setting? Well that's a broad church, so what specifically about the XP setting? In XP we first build a spike (very thin, goes from one end to the other) to learn about the essence of the problem/solution. Then we build out more thin vertical slices of functionality, refactoring as we go. Where's the architecture? If it is non-local then it can only become evident when there are enough of these slices in place for there to be anything that isn't local. And with test-driven development we very much proceed in an extensional fashion, filling iexplicitct cases one at a time, and discover the generalities through refactoring.
So the architecture (non-local, intensional) can only arise late in the game. It can only emerge.
Hypothesis for further investigation: the non-local extensional thing might just be the system metaphor, which is stated in quite concrete, explicit, albeit general, terms and is intended to apply to the whole system. And is stated early. If so, maybe the model presented in the SEI article helps explain why there are so many folks want the system metaphor to be instead of architecture and why they get disappointed when that doesn't work.
Lessons from Life
Motorcycles, post-conditions and idempotent operations
Whenever during my programming work I have to think about pre- and post-conditions, or idempotent operations, I always think back to the Compulsory Basic Training course I had to pass to get my motorcycle licence.
The point came when the instructor was explaining the indicators (aka turn signals). On most bikes these are operated by a switch on the left-hand grip. It has two degrees of freedom: sliding to left or right, or being pressed inwards. All of these are intermittent contact operations and a spring returns the button to its home position (in the middle, popped out) when it is released. Sliding the button to the left or the right activates the indicators on that side, pressing the button in deactivates the indicators.
Now, when riding a bike, unlike when driving a car, there isn't the tick-tick-tick of the flasher to let you know that the indicators are on (well, there is, but you can't hear it over the wind noise and beat of the engine through your helmet), and neither is there a switch on the steering to deactivate the indicators automatically after a manoevre. And while it's very dangerous to ride along with a misleading indication going, so too is it to take your eyes off the road and peer around the bike to see if the indicators are still going. So, said the instructor, if you aren't sure if you cancelled the indicators after the last turn, don't bother to look to find out if they're still on--just press the button.
You have to be a programmer to get this confused
The Simpson's Building in Picadilly (now home to Waterston's flagship store) is, in its public areas, roughly this shape:
Where the area that in my diagram is the horizontal cross-bar meets the vertial section there are a pair of fire-shutters. Closer up, it looks like this:
On the wall between the apertures for the shutters is a small notice (indicated by the thicker line) stating that "the area between these shutters must be kept clear". But what is that area? I thought at first that it was the area between the shutter on the left of the notice and the shutter on the right of it, until I turned around and saw the large sales desk (shown in brown). On closer inspection this desk has two gaps in it, to allow the shutters to close. I've shown a matching pair of shutters behind the desk, which seems right but I couldn't absolutely swear to it. Anyway, the little island of desk in the middle occupies the space between where the left and right shutters would be when closed: my reading can't be correct.
So then it ocurred to me that the "these shutters" to which the sign refers are each of the shutters on either side of the notice and its respective shutter behind the desk. That's confusing. Confuses me, anyway.
I once saw a very good documentary about fire safety, starring a New York fire marshal. At one point he pulled out his copy of the fire safety code for buildings in NY, NY and said "this is like a holy book, everything it says in here was put there because of a time when a lot of people died." Which is an interesting take on religion, but that's not what I wanted to talk about. This notice about the fire shutters, in a busy semi-public space is important. Understanding it correctly could well be a matter of life or death. The fire shutters are a life-critical system, and yet an important part of their operation is woefully under specified. The notice could have some more precise explanation of just what the "these shutters" the area between which it is that should be kept clear are, and that would help, but really the problem is moving prematurely from specification to implementation.
What is required is that "these shutters must always be able to be closed completely". The person who drafted the motice has thought about a likely scenario that could prevent this from being true and prohibited that. There's also an attempt to capture a requirement about an action (closing the shutters must always complete) by a static constraint (the area between the shutters is clear).
I suspect that normal people wouln't give the instructions on the sign a second thought. Actually, most normal people probably wouldn't notice it at all.
Test-first and flow
Frank Sommers has proposed a conflict between test-first and "flow". To answer the question he asks at the end of the article "Do you really write your tests first?" I find it easiest to quote Jason Yip from the ensuing discussion on artima: "Yes I do... except when I don't and I pay for it every time..." These days I even think differently about the two activities: test-first is programming. Anything else may involve the creation of code, but really is exploratory noodling about--which has its value but the results should be discarded.
I wonder where Frank learned his test-first? I suspect that he hasn't ever sat down and paired for a significant time with a master of the art. It is a skill and can't be picked up just from reading about it and having a bit of a go. Also, test-first is only a transitional practice to test-driven, which is where the big wins come from.
In any case, I find Frank's appeal to the theory of flow to explain why he doesn't enjoy test-first programming rather bizarre. He quotes Csikszentmihalyi on some of the characteristics of flow:
I find that test-driven development is the way that I turn programming into an activity with the right amount of challenge. I'm just not smart enough to program any other way. I did somehow manage to get paid to program in the days before I adopted TDD, but I no longer understand how that can possibly have happened. YMMV.
In writing a TDD test I am explicitly setting an clear an unabiguous goal for the next few minutes programming.
The red bar -> green bar -> refactor cycle provides very clear-cut and very immediate feedback on my programming activity.
And, as Frank rightly points out, test-first works best on a very small scale. In fact, TDD encourages you to program in very small steps (and, perhaps counterintuitively, the smaller the steps the faster you make progress). So I find that writing the tests first encourages me to center on the present.
Fascinating that two people can have such utterly opposed experience with the same technique.
I wonder where Frank learned his test-first? I suspect that he hasn't ever sat down and paired for a significant time with a master of the art. It is a skill and can't be picked up just from reading about it and having a bit of a go. Also, test-first is only a transitional practice to test-driven, which is where the big wins come from.
In any case, I find Frank's appeal to the theory of flow to explain why he doesn't enjoy test-first programming rather bizarre. He quotes Csikszentmihalyi on some of the characteristics of flow:
And suggests that they are in conflict somehow with the practice of test-first. I find the exact opposite, and I'm not alone in that. I'm going to talk more about TDD than merely test-first, because I don't practice the one without the other any more.
- The activity must present just the right amount of challenge: If the activity is too hard, we become frustrated; if too easy, boredom ensues.
- There must be clear and unambiguous goals in mind.
- There needs to be clear-cut and immediate feedback about the activity's success.
- Finally, our focus during the activity must center on the present - the activity itself.
I find that test-driven development is the way that I turn programming into an activity with the right amount of challenge. I'm just not smart enough to program any other way. I did somehow manage to get paid to program in the days before I adopted TDD, but I no longer understand how that can possibly have happened. YMMV.
In writing a TDD test I am explicitly setting an clear an unabiguous goal for the next few minutes programming.
The red bar -> green bar -> refactor cycle provides very clear-cut and very immediate feedback on my programming activity.
And, as Frank rightly points out, test-first works best on a very small scale. In fact, TDD encourages you to program in very small steps (and, perhaps counterintuitively, the smaller the steps the faster you make progress). So I find that writing the tests first encourages me to center on the present.
Fascinating that two people can have such utterly opposed experience with the same technique.
Agile film-making?
The possibility of parallels between these disciplines came
up on the XP egroup recently, which remineded me of this wiki posting I made a few years ago. Seems the idea is gaining
more currency.
up on the XP egroup recently, which remineded me of this wiki posting I made a few years ago. Seems the idea is gaining
more currency.
Job (and other) Trends Revealed
Over at Quoderat someone called iain made a comment on this posting:
Their site can display the occurrencece of ads with particular search terms in them over roughly the past year. You have to be careful with false positives, though. The origin of the name of the programming language Forth is from a (forced) mis-spelling of "fourth" as in "generation computer", but "forth" is still a word. It appears frequently in ads for positions in which someone will have to go back and it. I was amazed at the popularity of this niche language, until I figured that one out.
Don't worry about the extreme tinyness of the numbers. When indeed.com say "all jobs", they mean all jobs. That Python accounts for 0.15% of them is salutary reminder that there's a huge old world of work out there where what programming language you prefer isn't even a comprehensible question.
So, this looks like good news for Pythonistas, and very encouraging news for Rubyists. iain would suggest that this is a sign that the "elite" (whomever they are) should be thinking about moving on from Python. A dubious proposition.
Speaking of elites, that Lisp curve is worth drilling down to:
Unless I wildly miss my mark, that's the 2005 Y Combinator Summer Founders Program wildly distorting an entire job market. Let's not be rude about Lisp's 0.005% of the job scene baseline. (Actually, I suspect that most Lisp jobs never get as far as being publicly advertised, so tight-knit is that community).
Ouch!
That "java" appears in as much as one fiftieth of "all jobs" is cause for sober reflection. Quickly followed by a stiff drink.
Oh yes, that's something I've waited to see. Mid October 2005, a date (range) to remember: "agile" overtook "RUP". Notice, though, how the "agile" and "RUP" curves seem to fall into step afterwards. This is a potential trend worth watching. As is the seemingly imminent overtaking of XP by Scrum.
Increasingly often now there are lead or management type jobs being advertised with some "Agile methods or RUP an advantage" style requirements. Or even, "Agile methods such as RUP". Hmmm. Well, there are those that will claim that RUP done right is Agile. But I digress.
Thinking about closely correlated search terms, how about these two head-to-head competitors:
That's close to the point of being spooky.
Search for commoditized skills like Java, SQL, C++ and you find x,000 jobs advertised. Ideally of course you'd have a longitudinal view on this; you'd have data on the number of jobs offering requiring this and that skill across time and understand the rise and ebb of particular skills.Well, thanks to the kind folks at job search aggregator indeed.com we can do exactly that.
Their site can display the occurrencece of ads with particular search terms in them over roughly the past year. You have to be careful with false positives, though. The origin of the name of the programming language Forth is from a (forced) mis-spelling of "fourth" as in "generation computer", but "forth" is still a word. It appears frequently in ads for positions in which someone will have to go back and it. I was amazed at the popularity of this niche language, until I figured that one out.
Dynamic Languages Duke it Out...
Anyway, take a look at this:Don't worry about the extreme tinyness of the numbers. When indeed.com say "all jobs", they mean all jobs. That Python accounts for 0.15% of them is salutary reminder that there's a huge old world of work out there where what programming language you prefer isn't even a comprehensible question.
So, this looks like good news for Pythonistas, and very encouraging news for Rubyists. iain would suggest that this is a sign that the "elite" (whomever they are) should be thinking about moving on from Python. A dubious proposition.
Speaking of elites, that Lisp curve is worth drilling down to:
Unless I wildly miss my mark, that's the 2005 Y Combinator Summer Founders Program wildly distorting an entire job market. Let's not be rude about Lisp's 0.005% of the job scene baseline. (Actually, I suspect that most Lisp jobs never get as far as being publicly advertised, so tight-knit is that community).
...and then get Put in Their Place
But let's put our little posse of dynamic languages into context:Ouch!
That "java" appears in as much as one fiftieth of "all jobs" is cause for sober reflection. Quickly followed by a stiff drink.
In Other news...
This is absolutely fascinating:Oh yes, that's something I've waited to see. Mid October 2005, a date (range) to remember: "agile" overtook "RUP". Notice, though, how the "agile" and "RUP" curves seem to fall into step afterwards. This is a potential trend worth watching. As is the seemingly imminent overtaking of XP by Scrum.
Increasingly often now there are lead or management type jobs being advertised with some "Agile methods or RUP an advantage" style requirements. Or even, "Agile methods such as RUP". Hmmm. Well, there are those that will claim that RUP done right is Agile. But I digress.
Thinking about closely correlated search terms, how about these two head-to-head competitors:
That's close to the point of being spooky.
Mull this over
This one pretty much speaks for itself. Couldn't be more zeitgeisty:Some thoughts about you...
And a big السلام عليك to my readers in Arabia.
One of the great things about teh interweb is how you can set one part of it to watch another.
From this I know that, over the last month, only 7.5% of visitors here used IE. Thus I'm fairly sanguine about the fact that the template doesn't render well for them. It's a shame, and I'm sorry. But I'm not going to fix it, either. You folks should do like the 83% who use Firefox.
To the 80% of readers who are first-timers, I say welcome. And to the 20% of repeat readers, I'm gratified that you found the site interesting enough to return to.
One of the great things about teh interweb is how you can set one part of it to watch another.
From this I know that, over the last month, only 7.5% of visitors here used IE. Thus I'm fairly sanguine about the fact that the template doesn't render well for them. It's a shame, and I'm sorry. But I'm not going to fix it, either. You folks should do like the 83% who use Firefox.
To the 80% of readers who are first-timers, I say welcome. And to the 20% of repeat readers, I'm gratified that you found the site interesting enough to return to.
Just what do you need an RDBMS for, anyway?
If you are in the business of writing anything like an "enterprise application" (and these days, that's pretty much every programmer who's target platform doesn't come sealed in epoxy) then you store your data in an RDBMS.
And if you're at all up to speed with developments in the art of programming over the past thirty-odd years you write the application "logic", middle tier, call it what you will, in an object-oriented language (or, at least, something that looks quite a lot like one from a sufficient distance). And so you spend a deal of your time dealing with a thing called the "Object-Relational Impedance Mismatch".
There's money to be made from this. Oracle recently purchased what was originally a Smalltalk O-R mapping tool, Toplink. Like most of its ilk it allows programmers to maintain (by mostly manual processes) a huge pile of XML relating a bunch of tables in a database to a bunch of classes in a codebase. I focus on Toplink because that's where my experience with the topic largely lies, not through any great love or hatred of the product. Well, anyway, the aim of these tools is "transparent persistence", whereby modifications to the mapped objects make their way to the database automagically. (*) So, there's this arbitrary graph of objects that gets mapped onto this strict table-based model, via hierarchical text-based metadata. Ooo-kay.
For instance, if you want to get hold of an object that (you have to know) happens to be backed by the underlying database then what you do is obtain an entry point to the data, and search over it for the object you want. Unlike the navigation that you might expect to do in an object graph, instead you build...queries. Like this:
Building these query objects can be a pain, though, if they get all complicated--which they do, always. I know one team that ended rolling its own library of query-building classes. Had a load of methods on them with names like "
Why do programmers put up with this? A lot of the time, habit. Which is to say, not thinking terribly hard about what it is they do that adds value.
To be fair, the modern RDBMS is a pretty amazing thing: high performance in time and space, robust, reliable, available...it's just that a lot of application programmers are a bit confused, I believe, about where the benefit of using an RDBMS lies.
Flowing from this are two principle advantages of using an RDBMS to store data:
Now, the RDBMS crowd already distinguish between OLTP and OLAP, (and a very interesting debate that is, too--no really, it is) and the endpoint of moving in the OLAP direction is the data warehouse. And lot of regular RDBMS practice, predating the warehouse, allows for a highly optimised "read-only" route into the data involving a bunch of denormalized views and such. It's a failure mode to "denormalize for performance" even before the schema is built, never mind used for long enough to identify the frequent access patterns that will benefit from that denormalization, but when some user is going to need the same report updated time and again: go for it.
Lets say that we are building a system where the data needed to satisfy the wants of an end-user changed over periods much longer that the duration of a user's typical interaction with the system. Why, then, would we worry about all that
Phew! So, rolling all that together you can start to think about doing what the team who re-implemented SQL in Java whom I mentioned earlier did. Or, rather, what one of the more capable programmers did in his spare time because he was fed up with all that junk. Every now and again (say, once every 24 hours) run a bunch of queries (that's your data mart) over the data and pull out all the currently useful bits and write them to a file in one simple encoding or another. Then, when the application gets bounced, it reads that data and builds its in-memory object graph and uses that to serve user requests. Call out to the DB for really volatile stuff, but how much of that do you really, really have? Really?
Do this and I'll bet you find, as that team did when the databaseless version of their app was deployed, that it goes faster and scales better. Oh, and by isolating the application (in their case, completely) from a database they were able to very easily pull out a chunk of functionality to sell as a stand-alone application and/or plugin to other systems and enabled a whole new product line for their business. Nice.
This sort of decision should be the work of architects, but too often wouldn't even come up for disucssion as an option. Why is that?
(*) No approbation of the rather sinister ESR or his dubious schemes should be construed from this reference to the Jargon File
And if you're at all up to speed with developments in the art of programming over the past thirty-odd years you write the application "logic", middle tier, call it what you will, in an object-oriented language (or, at least, something that looks quite a lot like one from a sufficient distance). And so you spend a deal of your time dealing with a thing called the "Object-Relational Impedance Mismatch".
There's money to be made from this. Oracle recently purchased what was originally a Smalltalk O-R mapping tool, Toplink. Like most of its ilk it allows programmers to maintain (by mostly manual processes) a huge pile of XML relating a bunch of tables in a database to a bunch of classes in a codebase. I focus on Toplink because that's where my experience with the topic largely lies, not through any great love or hatred of the product. Well, anyway, the aim of these tools is "transparent persistence", whereby modifications to the mapped objects make their way to the database automagically. (*) So, there's this arbitrary graph of objects that gets mapped onto this strict table-based model, via hierarchical text-based metadata. Ooo-kay.
"Translucent" Persistence
Except that the object graph isn't arbitrary. And the "transparency" is of a rather murky kind. "translucent persistence" might be a better term. At the time that you write the classes from the object graph will be instantiated the mapping tool will impose all sorts of constraints on what you can do, and how you do it. This is very far from the promise of orthogonal persistence. What happened to JSR 20, that's what I want to know.For instance, if you want to get hold of an object that (you have to know) happens to be backed by the underlying database then what you do is obtain an entry point to the data, and search over it for the object you want. Unlike the navigation that you might expect to do in an object graph, instead you build...queries. Like this:
ExpressionBuilder builder = new ExpressionBuilder();Yup, that's pretty transparent alright. Can't tell that there's anything other than plain old in-memory objects there. For Java fans: isn't it especially wonderful how the programmer had to mention the name of the class three times? So that it will be true, presumably.
Expression expression =
builder.get("surname").equalsIgnoreCase("Smith");
Person p =
(Person) aSession.acquireUnitOfWork().readObject(
Person.class,
expression);
Building these query objects can be a pain, though, if they get all complicated--which they do, always. I know one team that ended rolling its own library of query-building classes. Had a load of methods on them with names like "
select
" and "orderBy
" and so forth...Why do programmers put up with this? A lot of the time, habit. Which is to say, not thinking terribly hard about what it is they do that adds value.
To be fair, the modern RDBMS is a pretty amazing thing: high performance in time and space, robust, reliable, available...it's just that a lot of application programmers are a bit confused, I believe, about where the benefit of using an RDBMS lies.
If only an RDBMS were Realational
While it's true that what we have today in the widely used RDBMSs is a pale reflection of what the relational model is capable of, they are nevertheless closely related. So let's take a look at the ur-text of relational theory, Codd's paper of 1970 (back when "data base" was still two words). What's of interest about it is that it doesn't talk about persistence much at all. There is mention in passing that the data is "stored" in the "data bank" but the focus throughout is on the structure of the data, how that structure can be preserved in the face of updates to data and to what we would now call the schema, and how users of the data can be isolated from the details of how the data is held and changed.Flowing from this are two principle advantages of using an RDBMS to store data:
- ease of framing ad-hoc queries for decision support
- robustness in the face of concurrent updates from multiple sources
What if the data "never" changed?
If you are indeed in the business of business of writing anything like an enterprise app (and if you aren't, how come you read this far?), ask yourself this: how much of the code data source layer (or whatever you call it) of your stack could go away if the data it supplies to layers above it never changed? By "never", of course I mean "not during the lifetime of an application instance".Now, the RDBMS crowd already distinguish between OLTP and OLAP, (and a very interesting debate that is, too--no really, it is) and the endpoint of moving in the OLAP direction is the data warehouse. And lot of regular RDBMS practice, predating the warehouse, allows for a highly optimised "read-only" route into the data involving a bunch of denormalized views and such. It's a failure mode to "denormalize for performance" even before the schema is built, never mind used for long enough to identify the frequent access patterns that will benefit from that denormalization, but when some user is going to need the same report updated time and again: go for it.
Data "corner shop"
Data warehouses are generally expected to be 1) gigantic, 2) concerned with transactional data that accumulates at a high rate (leading to 1) and 3) aimed at sophisticated reporting (the "analytical" bit of OLAP). The smaller volume, more task specific version is the ''data mart''.Lets say that we are building a system where the data needed to satisfy the wants of an end-user changed over periods much longer that the duration of a user's typical interaction with the system. Why, then, would we worry about all that
aSession.acquireUnitOfWork().readObject( blah blah blah
stuff? We would have no need to do that for each user session. If the data changed slowly enough, there's be no need to do it for each instantiation of the (web) application itself. We can imagine the "transactions" getting consolidated in the warehouse (and condensed in the mart) might be things like "add a new item to the catalogue", or "change the price of this item". Happens on the timescale of days--or weeks. And furthermore, these changes might be known about, and actioned, some time in advance of being turned on for a particular application. But we have this idea avilable that access to the data that involves rapidly changing entities doens't have to go by the same route, or be maintained in the same way as that which changes slowly.Phew! So, rolling all that together you can start to think about doing what the team who re-implemented SQL in Java whom I mentioned earlier did. Or, rather, what one of the more capable programmers did in his spare time because he was fed up with all that junk. Every now and again (say, once every 24 hours) run a bunch of queries (that's your data mart) over the data and pull out all the currently useful bits and write them to a file in one simple encoding or another. Then, when the application gets bounced, it reads that data and builds its in-memory object graph and uses that to serve user requests. Call out to the DB for really volatile stuff, but how much of that do you really, really have? Really?
Do this and I'll bet you find, as that team did when the databaseless version of their app was deployed, that it goes faster and scales better. Oh, and by isolating the application (in their case, completely) from a database they were able to very easily pull out a chunk of functionality to sell as a stand-alone application and/or plugin to other systems and enabled a whole new product line for their business. Nice.
This sort of decision should be the work of architects, but too often wouldn't even come up for disucssion as an option. Why is that?
(*) No approbation of the rather sinister ESR or his dubious schemes should be construed from this reference to the Jargon File
Keywords, Magic and (E)DSLs
Imagine you could write programs with text like this:
A big part of the reason why Smalltalk is so great is this keyword message syntax (blocks help a lot, too, as we shall see). It allows code like the above, which look very much like natural language. If you are more familiar with curly bracket languages or similar, it can take a little effort to get used to reading code like this, but it's worth it.
One kind of magic is a little bit of laziness around choosing alternatives. In Java we choose between two alternatives like this:
Anyway, imagine that Java were an OO language and that
The syntax of our invented ObjectJava is pretty bad there,
The implementation of
This is pretty much exactly the implementation of Booleans used in the Lambda Calculus, and that fact reveals one aspect of the close relationship between OO and functional programming.
The best example that I've seen of an EDSL in Java is the language used to describe expectations in jMock. This language supports a certain style of programming that many folks are finding valuable. jMock allows us to write programs to talk about how other programs will behave. Like this:
It's worth digging into the implementation of jMock, not only because it is delightful code, but also to see the amount of heavy lifting that has to go on behind the scenes to make it possible for Java programmers to write code of the clarity seen above.
This is symptomatic of one way in which the industry has decayed. The message of Smalltalk (and Lisp) is that the route to productivity is to use simple tools with few features and allow everyone interested to build upon them. The favoured route at the moment is to encode every good idea into an all-singing all-dancing "solution", take it or leave it.
Once, the computer itself was locked away, ministered to by a priestly class who mediated your desire to perform computation. Then the (personal) computer revolution began to start and we could all gain direct access to our computing power, and grow our own way of using it. But the something went wrong and a new priestly class--the tool builders in the corporate software vendors--arose to try and put the genie back in the bottle (to mix an increasingly muddled metaphor). They must not be permitted to succeed.
(+) Well, kinda. If it were, then it would be, but for practical reasons it isn't. But for our purposes, it's exactly as if it is. Don't worry about it.
(*) Well, they only have the one: they can hack the virtual machine itself. But then so can you
aJanitor open: aDoor with: aKey.Pretty sweet. How might the implementation of
open:with:
look? Like this:self insert: aKey into: aDoor; turn: aKey; push: aDoor.Imagine how productive you could be if you could write code this way. Imagine how easy it would be to maintain code written this way. It's so clear what this code achieves (though maybe not quite how it works) that I'm not even going to explain it. This is Smalltalk.
A big part of the reason why Smalltalk is so great is this keyword message syntax (blocks help a lot, too, as we shall see). It allows code like the above, which look very much like natural language. If you are more familiar with curly bracket languages or similar, it can take a little effort to get used to reading code like this, but it's worth it.
No Magic
There's more to it than just ease or reading and writing, though. Smalltalk has no magic. Well actually, it has a lot of magic, but it enables all programmers to be magicians, whereas certain other languages hide their magic away. The language implementers grant themselves wizardly status, but deny it to the language user.One kind of magic is a little bit of laziness around choosing alternatives. In Java we choose between two alternatives like this:
if(condition){and we can be confident that one thing or the other will happen, but not both. That's actually quite clever, although the order that Java is usually taught in hides this. Imagine that
this.doOneThing();
} else {
this.doAnotherThing();
}
if
were a method on some object...ah yes, turns out that Java isn't really object oriented after all. Java has objects that instantiate classes that declare methods, it's true, but the code inside those methods, with if
and switch
and the rest, isn't object-oriented. Such a shame, as we shall see.Anyway, imagine that Java were an OO language and that
if
, therefore, were a method. There would be a potential problem (assuming the near universal eager evaluation of function arguments). Our imagined ObjectJava code might look like this:if(condition, {doOneThing();}, {doAnotherThing();});which kind-of suggests that both the one thing and the other would get done. And this is the magic of the keyword
if
and it's funny syntax. What are those things with the {}
's? They're little lumps of code (possibly with local variables declared in them), and one of them gets run and one doesn't, under programmatic control. That's a pretty powerful feature, too powerful for the Java wizards to allow we working programmers to wield.The syntax of our invented ObjectJava is pretty bad there,
);});
isn't a thing of beauty, although it's not much worse than the way some real Java looks. The Smalltalk equivalent is much neater. Continuing with our janitorial example:aDoor isAlarmed ifTrue: [self disarm: aDoor] ifFalse: [self openNormally: aDoor].The
[]
, conspicuously unlike the {}
of Java, creates a new object, a so-called block, which contains the code to run. And the interesting thing about this is that ifTrue:ifFalse:
is just a method. It's a method of the class Boolean. And Boolean has two subclasses: True
and False
, each of which is a Singleton. The sole instance of True
is called true
, and similarly for False
.The implementation of
ifTrue:ifFalse:
in True
is(+):ifTrue: t ifFalse: fand in
^ t value
False
it isifTrue: t ifFalse: f(Note:
^ f value
^
is how Smalltalk spells "return" and value
is a method on blocks that returns the value of the code inside the block).This is pretty much exactly the implementation of Booleans used in the Lambda Calculus, and that fact reveals one aspect of the close relationship between OO and functional programming.
(Embedded) Domain Specific Languages
Perhaps most astonishing about the Smalltalk approach is that Boolean values and selecting different actions based upon them is not part of the language. These are facilities provided by methods of classes in the Smalltalk standard library! This library (the "standard image") turns out to contain a large number of overlapping Embedded Domain Specific Languages--one of which, provided by the classesBoolean
, True
and False
is specific to the domain of two-valued logic. That's a very remarkable thing. Most remarkable is what it says about the business of writing programs in Smalltalk.There is no mechanism available to the Smalltalk programmer to create programs other than the creation of EDSLsEven the Smalltalk programmers who write Smalltalk itself don't have any other (*) mechanisms available to them. And that's the benefit of No Magic: anything the language implementers can do, you can do too. And you end up doing what they language implementers do, that is, implement (a) language(s). Our example,
self insert: aKey into: aDoor; turn: aKey; push: aDoor.is nothing more (and emphatically nothing less) than a statement about janitors, doors and keys written in an EDSL that knows about...janitors, doors and keys.
Dot Dispatch language
What about those of is who are not fortunate enough to be able to use Smalltalk in our work. What can be done in the languages where regular programmers are second-class citizens?The best example that I've seen of an EDSL in Java is the language used to describe expectations in jMock. This language supports a certain style of programming that many folks are finding valuable. jMock allows us to write programs to talk about how other programs will behave. Like this:
mock.expects(once()).method("m").with( or(stringContains("hello"),again, I think this is clear enough without explanation. You would use code like this within a programmer test to ensure that the test failed if some other object collaborating with our
stringContains("howdy")) );
mock
doens't call into the mock in the expected way.It's worth digging into the implementation of jMock, not only because it is delightful code, but also to see the amount of heavy lifting that has to go on behind the scenes to make it possible for Java programmers to write code of the clarity seen above.
Productivity?
Building DSLs is the bread and butter of Smalltalk (and Lisp) programming, but is a bit of a struggle in the Java (and similar) worlds. The big vendors are attempting to fix this through the use of mighty tools in the interests of supporting a new-but-old-but-new model of development, a rather fishy proposition at best.This is symptomatic of one way in which the industry has decayed. The message of Smalltalk (and Lisp) is that the route to productivity is to use simple tools with few features and allow everyone interested to build upon them. The favoured route at the moment is to encode every good idea into an all-singing all-dancing "solution", take it or leave it.
Once, the computer itself was locked away, ministered to by a priestly class who mediated your desire to perform computation. Then the (personal) computer revolution began to start and we could all gain direct access to our computing power, and grow our own way of using it. But the something went wrong and a new priestly class--the tool builders in the corporate software vendors--arose to try and put the genie back in the bottle (to mix an increasingly muddled metaphor). They must not be permitted to succeed.
(+) Well, kinda. If it were, then it would be, but for practical reasons it isn't. But for our purposes, it's exactly as if it is. Don't worry about it.
(*) Well, they only have the one: they can hack the virtual machine itself. But then so can you
Agility, Architecture and Scale
Is "scale later" a fatal mistake? And further, a symptom of the indiscipline and sloppy software development that some folks hide behind claims of Agility, according to Aditya.
Well, Aditya is commenting upon this advice from 37signals:
Now, the pragmatists hardly look as if they are indisciplined and sloppy, if they are following those tips. But there are always those other guys. For instance, while many fine developers would agree that often optimization is your worst enemy, there'll always be someone to come along with a counter story of a time when the optimization really should have been done. Similarly, the Extreme Programming advice that You Aren't Gonna Need It, and Do the Simplest Thing that Could Possibly Work are countered with stories about times when folks think that using them would have failed. Furthermore, folks claim that these practices are used by bad programmers to justify, well, being indisciplined and sloppy (usually because XP doesn't explicitly metion whatever thechnique their most recent book was about). In my experience it requires a great deal of discipline on the part of the typical programmer to apply YAGNI and DtSTtCPW, so action oriented, operationally biased and generally in love with doing the next cool, clever thing are they.
In isolation, out of context, these practices can be damaging. But one remedy is Worst Thing First. Notice the examples that Ron gives:
On the other hand, if you know that the first day of the rollout of your new product it's going to get banged on by all n tens of thousands of employees of Yoyodyne worldwide, then you'd better be sure that it will operate at that scale. And if you're Werner Vogels and you expect your historical growth in user base to continue at whatever rate it is, you have a definite idea of what extra capacity you'll have to accommodate and when. But abstract scalability? Nah.
The key to the message from 37signals, and from XP is this: how far do you think Google would have got if the first thing Larry and Sergey had done in 1996 was to think "you know, this could get huge, so first up we should design a filesystem to handle huge data volumes across many machines and then devise an algorithm for distributing certain classes of problems across huge clusters" instead of building the search engine itself?
Well, Aditya is commenting upon this advice from 37signals:
In the beginning, make building a solid core product your priority instead of obsessing over scalability and server farms. Create a great app and then worry about what to do once it's wildly successful. Otherwise you may waste energy, time, and money fixating on something that never even happens. [emphasis in original]That's sound Agile and/or Lean advice. Aditya counters with a reference to this piece by Werner Vogels, CTO of Amazon.com Clearly, Werner knows a thing or two about large systems. And clearly, any new functionality that his teams implement must be capable of handling the sort of load that Amazon's customer base generates, within Amazon's large scale, high performance, highavailability system. But Amazon have been in business for over a decade as I write this, and that's not the scenario that 37signals are talking about. They are talking about what to do when you're in the first year of your new web-based business and you need to get a product in front of paying customers right now, or else there isn't going to be any scalability problem ten years down the line. Or even two.
Pragmatic?
37signals is a Ruby (on Rails) house, so they're joined up with the PragmaticProgramming movement. In The Pragmatic Programmer Hunt and Thomas present some techniques that will lead to a system that will have some degree of scalability:These essentialy architectural practices have other drivers, and other benefits, but they will also introduce into your system the points of flexibility and cleavage planes through into which mechanisms for improving scalability maybe inserted. Inserted later.
- tip 13 Eliminate effects between unrelated things
- tip 41 Always design for concurrency
- tip 45 Estimate the order of your algorithms
tip 46 Test your estimates
Now, the pragmatists hardly look as if they are indisciplined and sloppy, if they are following those tips. But there are always those other guys. For instance, while many fine developers would agree that often optimization is your worst enemy, there'll always be someone to come along with a counter story of a time when the optimization really should have been done. Similarly, the Extreme Programming advice that You Aren't Gonna Need It, and Do the Simplest Thing that Could Possibly Work are countered with stories about times when folks think that using them would have failed. Furthermore, folks claim that these practices are used by bad programmers to justify, well, being indisciplined and sloppy (usually because XP doesn't explicitly metion whatever thechnique their most recent book was about). In my experience it requires a great deal of discipline on the part of the typical programmer to apply YAGNI and DtSTtCPW, so action oriented, operationally biased and generally in love with doing the next cool, clever thing are they.
In isolation, out of context, these practices can be damaging. But one remedy is Worst Thing First. Notice the examples that Ron gives:
In the PlanningGame, users assign priority to stories, among other reasons, because of risks they perceive. "System must be able to pay 60,000 people in two hours. High priority." Developers assign technical risk similarly. "System must be able to marshal up to 60,000 networked PCs simultaneously for 1 minute each. High risk." We sit together and build consensus on what is risky, what needs a SpikeSolution, what order to do things. Is there a better way to identify worst things than to get together and work on it?Those are performance and scale requirements. Being addressed up front. Because the customer wants them, so you really are going to need them.
Abstract Scalability
But that's scale, not scalability. Werner Vogels defines scalability this way:A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources addedThis is an abstract quality referring to a hypothetical situation. A terribly difficult thing to do engineering about. When might you need to feel confident about this scalability? Well, if you were running an web 1.0 startup and your business model relies on capturing a dominant slice of the global market for your particular genius product in very short order, you might. You might very well need to be seen (by your VCs) spending money on the big iron needed to handle the load that would have to come to earn the revenue that would show them any kind of return.
On the other hand, if you know that the first day of the rollout of your new product it's going to get banged on by all n tens of thousands of employees of Yoyodyne worldwide, then you'd better be sure that it will operate at that scale. And if you're Werner Vogels and you expect your historical growth in user base to continue at whatever rate it is, you have a definite idea of what extra capacity you'll have to accommodate and when. But abstract scalability? Nah.
The key to the message from 37signals, and from XP is this: how far do you think Google would have got if the first thing Larry and Sergey had done in 1996 was to think "you know, this could get huge, so first up we should design a filesystem to handle huge data volumes across many machines and then devise an algorithm for distributing certain classes of problems across huge clusters" instead of building the search engine itself?
Sustainable Pace Supported
At last, pointers to hard number studies in support of the XP practice of "Sustainable Pace".
Once again, other industries are far ahead of software development in the depth and maturity of their thinking about work. This pull quote sums up the article nicely:
Those of you reading this who have any substantial experience in the software industry will not be surprised to learn that I nearly lost my job as a result.
And Evan addresses this:
Once again, other industries are far ahead of software development in the depth and maturity of their thinking about work. This pull quote sums up the article nicely:
A hundred years of industrial research has proven beyond question that exhausted workers create errors that blow schedules, destroy equipment, create cost overruns, erode product quality, and threaten the bottom line. They are a danger to their projects, their managers, their employers, each other, and themselves. Any way you look at it, Crunch Mode used as a long-term strategy is economically indefensible. Longer hours do not increase output except in the short term. Crunch does not make the product ship sooner--it makes the product ready later . Crunch does not make the product better--it makes the product worseNow, there have been times when I, as a development manager, have pushed back against pressure from my management to suggest to my reports that they enter "crunch mode" for an extended period (oh, say, the lasquarterer of disappointingng year). Once, I put together a spreadsheet that, although I didn't know it at the time, embodiesomethingng similar to Chapman's model, as mentioned by Evan. The covering email that announced this spreadsheet explained that I intended it to be used for scenario modelling: think up some scheme for deploying overtime, plug it into the model, see what improvement (or, more likely, decline) in productivity would come about. Or better still goal seek to the improvement desired, see what the model had to say about the assumptions that would have to be true for that improvement to materialise, and then judge if those assumptions were reasonable.
Those of you reading this who have any substantial experience in the software industry will not be surprised to learn that I nearly lost my job as a result.
And Evan addresses this:
Managers decide to crunch because they want to be able to tell their bosses "I did everything I could." They crunch because they value the butts in the chairs more than the brains creating games. They crunch because they haven't really thought about the job being done or the people doing it. They crunch because they have learned only the importance of appearing to do their best to instead of really of doing their best. And they crunch because, back when they were programmers or artists or testers or assistant producers or associate producers, that was the way they were taught to get things done.There is a circle of abuse here, (real abuse; lives relationships and families are damaged), that must be broken if the software industry is to flourish.
Functional Style and Multiple Returns
In his excellent "programming in the small" series, Ivan Moore suggests that demanding a single exit point for a function is an example of cargo cult programming. I wonder if favouring or disfavouring that idiom has any connection with how much exposure a programmer has had to functional programming. Ivan's example showing the multiple return style:
which would print out
This kind of thinking about programs, as expressions that evaluate to the desired result, leads to a lot of interesting places, can be very valuable, and crops up all over the place. Most importantly, it is relatively easy to think about, which makes it easy to write correct code in thie style.
It seems to me that to prefer the multiple exit point form is to tend towards the functional (that is, expression evaluating) style, and thus towards easy correctness. And this is largely because functional programs tend to go to the essence of the problem being solved.
It is possible to do the single return style in Scheme, since it is an impure language:
Why would a programmer place upon themsleves the burden of understanding and maintaining all that non-essential stuff? And yet so many do, by clinging to the imperative style (do this, and then do this, and then do this) of programming, which in fact requires all the further overhead of structured programming to be tractable.
void foo(boolean condition) {bears a great resemblence to the (much terser) equivalent in Scheme:
if(condition){
return 5;
}else{
return 6;
}
}
(define (foo boolean)Why is the Scheme version so short? Firstly, Scheme has no syntax to speak of so nothing much gets in the way of expressing the computational idea. And secondly, in the functional paradigm we program by writing expressions for the computer to evaluate, rather than a sequence of instructions for it to follow. A use of
(if boolean 5 6))
foo
, would look like this:
(display (foo #t))
which would print out
5
since the name #t
stands for the boolean constant true. display
writes out the value of its arguments. The value of (foo #t)
is the value of (if #t 5 6)
and if
evaluates to the value of its second argument if its first evaluates to true, and the value of its third argument otherwise. That's harder to explain that to show. Anyway, (if #t 5 6)
evaluates to 5
. Notice that there is no return
statement anywhere. All Scheme code is built of expressions that evaluate to some value, so a return
is implied wherever there is no further level of evaluation to proceed down to, in this case when evaluating the literal constant 5
.This kind of thinking about programs, as expressions that evaluate to the desired result, leads to a lot of interesting places, can be very valuable, and crops up all over the place. Most importantly, it is relatively easy to think about, which makes it easy to write correct code in thie style.
Structured vs. Functional?
Ivan suggests that the single exit point style is a cargo cult much like the goto-less style--and both belong to the so-called "structured programming" tradition. The father of that tradition is E. W. Dijkstra, who offered a method for thinking about the other kind of programming, the sequence-of-instructions form (as seen in C, C++, Java, C#, Pascal, Delphi, etc etc), which is excruciatingly hard. Its a sad irony that the imperative style of programming tends to be taught first when it is so much harder to get right.It seems to me that to prefer the multiple exit point form is to tend towards the functional (that is, expression evaluating) style, and thus towards easy correctness. And this is largely because functional programs tend to go to the essence of the problem being solved.
It is possible to do the single return style in Scheme, since it is an impure language:
(define (foo boolean)Without explaining how all that works it's still fairly clear that a lot of mechanics has been introduced: some local storage, some extra expressions for updating that storage, explicitly returning the value. All of this has got nothing to do with the essential thing of choosing
(let ((return 0))
(if boolean (set! return 5)
(set! return 6))
return))
5
or 6
.Why would a programmer place upon themsleves the burden of understanding and maintaining all that non-essential stuff? And yet so many do, by clinging to the imperative style (do this, and then do this, and then do this) of programming, which in fact requires all the further overhead of structured programming to be tractable.
Visibility and Control
Something I said recently that made the listener's ears prick up, re use of (distributed) standups to manage outsourced development:
I would expect a daily handover would do much to raise visibility to you of what the outsourcers are doing--and visibility is the first step to control
MDA => Agile!?
Came across this article regarding the MDA. It took me long enough to get my head around the idea that it's the Model Driven Architecture, rather than just any old architecture that's driven by models. Except that Haywood tells us in the piece that actually there are two MDAs depending on which tool chain you buy. The difference is that one set of vendors builds tools to support an elaborative workflow from the Computation Independent Model through to a Platform Specific Model and another a translational one. Eh? Translation vs elaboration was one of the big arguments back in the days when there was an active marketplace for OO methodologies. Seems as if the MDA is intended to be transformational. Or possibly elaborationalist, depending it seems on whom you ask.
Steve Cook has this[pdf] (amongst other things) to say about MDA. Now, Steve was partly responsible for the Syntropy method, which features three kinds of model: Essential, Specification and Implementation--reflecting the view that one single modelling vocabulary won't do for capturing knowledge about the world, specifying a software system to deal with it or figuring out how to build such a system. That's a little bit like the MDA's distinction between Computation Independent Models, Platform Independent Models and Platform Specific Models. Kinda. But it's not clear what these models are really supposed to be written in. UML? The MDA is an OMG offering, and UML is the OMG's modelling language.
Unfortunately, it's sometimes quite hard to know what a given UML model is supposed to mean (without, ironically enough, showing the corresponding code), and so it's hard to see how automated translation of UML models is going to go anywhere interesting. During the last run at this sort of thing, the CASE tool boom of around ten years ago, I had a job in a C++ shop where all code was round-tripped through a leading tool. It wasn't unusual to have to go through certain hoops to get this tool to generate code that compiled at all, never mind do what I wanted it to do. Maybe the tools are a lot better now, although "OTI" Dave Thomas's fairly recent assessment of the MDA space makes me think not.
Meanwhile, Haywood makes this rather provocative statement:
Certainly the idea of keeping your whole model and code stack in sync at all times seems like an Agile idea. I'd expect that to be more of a strain on those folks who in the past have taken to producing huge UML models that no-one ever looks at, and not so much the sketchers, so maybe that does drive you in the Agile direction.
Steve Cook has this[pdf] (amongst other things) to say about MDA. Now, Steve was partly responsible for the Syntropy method, which features three kinds of model: Essential, Specification and Implementation--reflecting the view that one single modelling vocabulary won't do for capturing knowledge about the world, specifying a software system to deal with it or figuring out how to build such a system. That's a little bit like the MDA's distinction between Computation Independent Models, Platform Independent Models and Platform Specific Models. Kinda. But it's not clear what these models are really supposed to be written in. UML? The MDA is an OMG offering, and UML is the OMG's modelling language.
Unfortunately, it's sometimes quite hard to know what a given UML model is supposed to mean (without, ironically enough, showing the corresponding code), and so it's hard to see how automated translation of UML models is going to go anywhere interesting. During the last run at this sort of thing, the CASE tool boom of around ten years ago, I had a job in a C++ shop where all code was round-tripped through a leading tool. It wasn't unusual to have to go through certain hoops to get this tool to generate code that compiled at all, never mind do what I wanted it to do. Maybe the tools are a lot better now, although "OTI" Dave Thomas's fairly recent assessment of the MDA space makes me think not.
Meanwhile, Haywood makes this rather provocative statement:
Adopting MDA also requires a move towards agile development practices. While like many I'm an advocate of agile processes, it's still foreign to many organizations. The need for agile development follows from the fact that MDA requires models to be treated as software, hence they are the "as-is" view. Those organizations that used UML only for blueprints or sketches (Fowler's analysis [4.]) will find that MDA does not permit the use of UML in that way.I suspect that it would come as a surprise to many Agile practitioners that adopting the MDA would make an organisation more agile (and that this is because you can't use UMLAsSketch!) But let's examine the claim in a little more detail.
Certainly the idea of keeping your whole model and code stack in sync at all times seems like an Agile idea. I'd expect that to be more of a strain on those folks who in the past have taken to producing huge UML models that no-one ever looks at, and not so much the sketchers, so maybe that does drive you in the Agile direction.
Extremely Agile
I have to be careful here, because I'm not just an Agilist, but an Extreme Programmer, so even other Agilists sometimes think I have some very strange ideas about development. The principles of Agile development compiled by the authors of the Agile Manifesto don't seem to say anything that would stop you from using MDA, but they do suggest that, not only do you maintain your UML CIM in-sync with your code in the reverse direction (which is what I understand Haywood to be talking about--don't let the code run away from the model) but that you regenerate the whole system CIM->PIM->PSM->code->deploy frequently. The principles suggest:Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.Well, yeah. Actually, my preference is for a much shorter timescale: weekly releases do not seem too frequent to me. And I want to be able to release on any given day. And I want all my changes (to whatever artifact) reflected in a deployable system ASAP. By which I mean, within minutes. Beck recommends a Ten Minute Build, which in MDA terms would seem to mean running the whole workflow (modulo dependency analysis) all the way through and executing a suite of tests in ten minutes. Should make the hardware vendors happy.
Peer Review
Just recently I received an email from the organizers of a new conference concerned with the technique of peer review. This events seems to be largely concerned with the review process applied to scholarly papers but it got me thinking about "review by peers", something that I do a lot of.
As a practitioner of Extreme Programming of course I prefer to write production code as one member of a pair. And pair-programming is explicitly an activity needing an effort towards a peer relationship: the higher-powered programmer in the pair has to throttle back a little, and the lower-powered one stretch. Peer-review of papers usually involves a panel of reviewers for each submission, and if you rotate people through pairs often enough (which might turn out to be very often indeed if this experience report[pdf] is anything to go by) then it won't be long before the code changes you make have been thoroughly reviewed by the time it gets checked in.
Then again, during the planning game I like to use Delphi techniques to explore possibilities, reach consensus, and obtain estimates. Delphi works by having a group of, as they are called in the technique, "experts", who are by definition peers, iteratively review a proposal or (answer to a) question. Interestingly, the anonymous nature of the input to a Delphi review would seem to fix some of the problems that occur when using less formal means to reach a consensus.
In development shops that don't do pairing, but do care about quality all the same, you'll often come across code review techniques of one sort or another. Fagan Inspection seems to be the best way to get this done with small groups, although my experience it is ferociously expensive and although produces excellent results perhaps doesn't offer great value-for-money. YMMV. If you have a (potentially) large group available, then the Open Source route is also a good one. Can be very slow, but also very effective, and (like the National Health Service), free of cost at the point of delivery...
And then there's reviewing conference sessions and papers and journal papers and drafts of books. This can be pretty excruciating work. As a reviewer, I whish more authors would heed this advice when preparing their submissions. As an author, I whish more conferences worked along the lines of the PLoPs, which are all about very intensive, hands-on, in the room, right before your eyes peer review.
Although it is (in)famously possible for a clever and resourceful author to get carefully crafted utter tosh published in a supposedly serious journal, there have also been cases of what seems to be genuine fraud (as it were) getting past the review process in vary hard science fields indeed. One has to wonder if this isn't, as with the remarkable ease with which dubious patents may be obtained these days, mostly due to reviewers being snowed under.
On the other hand, the organisers of that conference on peer review I mentioned up at the top are themselves notorious for accepting nonsense papers without review. In a spirit of enquiry I replied politely to the email suggesting that I probably wouldn't be able to attend the conference, but that I would be interested in helping out with the proposed book on peer review--as a reviewer. We'll see what comes of that.
As a practitioner of Extreme Programming of course I prefer to write production code as one member of a pair. And pair-programming is explicitly an activity needing an effort towards a peer relationship: the higher-powered programmer in the pair has to throttle back a little, and the lower-powered one stretch. Peer-review of papers usually involves a panel of reviewers for each submission, and if you rotate people through pairs often enough (which might turn out to be very often indeed if this experience report[pdf] is anything to go by) then it won't be long before the code changes you make have been thoroughly reviewed by the time it gets checked in.
Then again, during the planning game I like to use Delphi techniques to explore possibilities, reach consensus, and obtain estimates. Delphi works by having a group of, as they are called in the technique, "experts", who are by definition peers, iteratively review a proposal or (answer to a) question. Interestingly, the anonymous nature of the input to a Delphi review would seem to fix some of the problems that occur when using less formal means to reach a consensus.
In development shops that don't do pairing, but do care about quality all the same, you'll often come across code review techniques of one sort or another. Fagan Inspection seems to be the best way to get this done with small groups, although my experience it is ferociously expensive and although produces excellent results perhaps doesn't offer great value-for-money. YMMV. If you have a (potentially) large group available, then the Open Source route is also a good one. Can be very slow, but also very effective, and (like the National Health Service), free of cost at the point of delivery...
And then there's reviewing conference sessions and papers and journal papers and drafts of books. This can be pretty excruciating work. As a reviewer, I whish more authors would heed this advice when preparing their submissions. As an author, I whish more conferences worked along the lines of the PLoPs, which are all about very intensive, hands-on, in the room, right before your eyes peer review.
Although it is (in)famously possible for a clever and resourceful author to get carefully crafted utter tosh published in a supposedly serious journal, there have also been cases of what seems to be genuine fraud (as it were) getting past the review process in vary hard science fields indeed. One has to wonder if this isn't, as with the remarkable ease with which dubious patents may be obtained these days, mostly due to reviewers being snowed under.
On the other hand, the organisers of that conference on peer review I mentioned up at the top are themselves notorious for accepting nonsense papers without review. In a spirit of enquiry I replied politely to the email suggesting that I probably wouldn't be able to attend the conference, but that I would be interested in helping out with the proposed book on peer review--as a reviewer. We'll see what comes of that.
Subscribe to:
Posts (Atom)