Principles of Large Scale Systems: Reuse by Copy vs Reuse by Reference

Copying code is a much maligned strategy for reuse. Conventional wisdom seems to hold that any reusable bits should be abstracted with a clean interface and the consumer of that code should use those interfaces directly and not change the underlying structure. Complexity must be abstracted.

Everyone has some experience where they forked a code-base and ended up with painful merge conflicts and spent a huge amount of energy converging.

However, copying code has significant advantages as a reuse strategy. Convergence is rarely as technically painful as it is politically painful. Often too many interfaces are exposed by reference leading to unnecessary complexity. Too often those interfaces are exposed without proper infrastructure to guarantee backwards compatibility, promote adoption, or track usage.

The simplest argument in favor of code-copying is that it allows you to use whatever code, tools, systems that you want to build your application. You’re not constrained in any way. So long as the application you’re building works, it doesn’t matter how clean it is.

I don’t care if my Tivo is written in Cobol or Perl so long as it works. I don’t care if they copied and hacked their Linux kernel. I don’t care if the source-code is littered with fragments of a dozen previous projects. As the end user I just care that the Tivo works.

HTML has been widely successful in large part because of the ease of code copying. Everyone has their own code.

So why reuse any software by reference? Ultimately it can be boiled down to a cost consideration. Every line of code that you are personally responsible for working is a line of code that you need to support and maintain. If you copy code you’re responsible for any issues (security, Y2K, Daylight-Savings-Time). You’re responsible for having some expertise in the code you copied.

The reason to use another piece of code by reference is to reduce your support cost and the expertise you need to have on your own team. Before you do that though, you need to be confident that your vendor has support infrastructure in place, is willing to guarantee backwards compatibility, and will treat your requirements with the appropriate priority.

The worst case scenario is to reference a piece of software which the vendor isn’t willing to guarantee backwards compatibility. In that instance, all advantage to by-reference just went out the window as you’re still stuck paying a support cost. Worse, every consumer of that software now has to pay that cost.

Done well software-by-reference can create significant economies of scale. You benefit by updates that other customers requested far before you needed that capability. Bugs can be identified once and fixed and then applied 100x or 1000x times instead of needing to be fixed independently 1000x times.

Striking the right balance is difficult. The goal is to share any code where centralized support costs are less than distributed support costs. However, in order for centralized support costs to be less, the actual bits that the centralized team supports need to be generic. Any “customized” code needs to be maintained and supported by the consumer, not by the centralized support team.

Unfortunately, these cost analyses often miss key details.

The following costs must be factored in when considering vending or consuming a by-reference service:
* the cost of maintaining backwards compatibility
* the cost of customer engagement for requirements
* the cost of deprecation of functionality on both the centralized team and the decentralized team
* the cost of training and documentation
* the cost of engaging clients in the upgrade process

The following costs must be factored in when considering vending or consuming a by-copy service:
* The cost of acquiring and continuously maintaining knowledge of that code
* The cost of identifying and fixing bugs in the copied code
* The cost of fixing externally-driven problems (security, Y2K, daylight-savings-time)

Engineers and managers looking to solve a problem with a piece of code often make the false assumption that controlling your destiny through by-copy semantics is better than taking a by-reference dependency on someone else. This implies that the customer has no confidence in the vendor. My advice is that if by-reference is the best solution based on the cost-analysis, then find a vendor you can trust. By properly leveraging internal or external vendors a small team can add substantial value on top of by-reference solutions.

Vendors often believe they can provide a better solution centrally through by-reference. Sometimes that is the case, but sometimes the customer's requirements are sufficiently "on the fringe" relative to other customers that they truly won't get the support they need.

Both models work and are appropriate in different situations. The challenge is to find the balance and eliminate institutional biases while developing a proper cost model.

Principles of Large Scale Systems

Saturday, December 22, 2007

Reuse by Copy vs Reuse by Reference

No comments:

Blog Archive