Agility, Architecture and Scale

Is "scale later" a fatal mistake? And further, a symptom of the indiscipline and sloppy software development that some folks hide behind claims of Agility, according to Aditya.

Well, Aditya is commenting upon this advice from 37signals:
In the beginning, make building a solid core product your priority instead of obsessing over scalability and server farms. Create a great app and then worry about what to do once it's wildly successful. Otherwise you may waste energy, time, and money fixating on something that never even happens. [emphasis in original]
That's sound Agile and/or Lean advice. Aditya counters with a reference to this piece by Werner Vogels, CTO of Clearly, Werner knows a thing or two about large systems. And clearly, any new functionality that his teams implement must be capable of handling the sort of load that Amazon's customer base generates, within Amazon's large scale, high performance, highavailability system. But Amazon have been in business for over a decade as I write this, and that's not the scenario that 37signals are talking about. They are talking about what to do when you're in the first year of your new web-based business and you need to get a product in front of paying customers right now, or else there isn't going to be any scalability problem ten years down the line. Or even two.


37signals is a Ruby (on Rails) house, so they're joined up with the PragmaticProgramming movement. In The Pragmatic Programmer Hunt and Thomas present some techniques that will lead to a system that will have some degree of scalability:
  • tip 13 Eliminate effects between unrelated things
  • tip 41 Always design for concurrency
  • tip 45 Estimate the order of your algorithms
  • tip 46 Test your estimates
These essentialy architectural practices have other drivers, and other benefits, but they will also introduce into your system the points of flexibility and cleavage planes through into which mechanisms for improving scalability maybe inserted. Inserted later.

Now, the pragmatists hardly look as if they are indisciplined and sloppy, if they are following those tips. But there are always those other guys. For instance, while many fine developers would agree that often optimization is your worst enemy, there'll always be someone to come along with a counter story of a time when the optimization really should have been done. Similarly, the Extreme Programming advice that You Aren't Gonna Need It, and Do the Simplest Thing that Could Possibly Work are countered with stories about times when folks think that using them would have failed. Furthermore, folks claim that these practices are used by bad programmers to justify, well, being indisciplined and sloppy (usually because XP doesn't explicitly metion whatever thechnique their most recent book was about). In my experience it requires a great deal of discipline on the part of the typical programmer to apply YAGNI and DtSTtCPW, so action oriented, operationally biased and generally in love with doing the next cool, clever thing are they.

In isolation, out of context, these practices can be damaging. But one remedy is Worst Thing First. Notice the examples that Ron gives:
In the PlanningGame, users assign priority to stories, among other reasons, because of risks they perceive. "System must be able to pay 60,000 people in two hours. High priority." Developers assign technical risk similarly. "System must be able to marshal up to 60,000 networked PCs simultaneously for 1 minute each. High risk." We sit together and build consensus on what is risky, what needs a SpikeSolution, what order to do things. Is there a better way to identify worst things than to get together and work on it?
Those are performance and scale requirements. Being addressed up front. Because the customer wants them, so you really are going to need them.

Abstract Scalability

But that's scale, not scalability. Werner Vogels defines scalability this way:
A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added
This is an abstract quality referring to a hypothetical situation. A terribly difficult thing to do engineering about. When might you need to feel confident about this scalability? Well, if you were running an web 1.0 startup and your business model relies on capturing a dominant slice of the global market for your particular genius product in very short order, you might. You might very well need to be seen (by your VCs) spending money on the big iron needed to handle the load that would have to come to earn the revenue that would show them any kind of return.

On the other hand, if you know that the first day of the rollout of your new product it's going to get banged on by all n tens of thousands of employees of Yoyodyne worldwide, then you'd better be sure that it will operate at that scale. And if you're Werner Vogels and you expect your historical growth in user base to continue at whatever rate it is, you have a definite idea of what extra capacity you'll have to accommodate and when. But abstract scalability? Nah.

The key to the message from 37signals, and from XP is this: how far do you think Google would have got if the first thing Larry and Sergey had done in 1996 was to think "you know, this could get huge, so first up we should design a filesystem to handle huge data volumes across many machines and then devise an algorithm for distributing certain classes of problems across huge clusters" instead of building the search engine itself?

1 comment:

Aditya said...

Taken at face value, the "Scale Later" piece from 37signals implies that developers need not design for scalability. Advice like "Create a great app and then worry about what to do once it's wildly successful" sounds lean and Agile-like, but is in fact misleading and downright dangerous in the hands of inexperienced developers and architects. This is because of what it leaves unsaid on what qualifies as a "great app" I'm not advocating that companies spend hundreds of thousands on big iron in their first year of business so they can test scalability. On the other hand, if your data structures and algorithms are not scale-ready (the tips from Thomas and Hunt are worthy of consideration), you are likely to up rewriting the entire system from scratch or band-aiding it to keep it from entirely collapsing. I have no doubt that the "Getting Real" writers have the best of intentions... It is just they don't do a good job of conveying it.