Premature Generalization

I’m trying to understand premature generalization. First, why is premature generalization a problem? Dave Smith :

One result of premature commitment to a generalization is that you’re supporting code that isn’t used. I’ve joined projects and found that 1/3rd of the code base wasn’t reachable, all because one or more programmers had tried to write class libraries that would solve all possible future needs. It is really, really tempting to give in to the misplaced belief that “… as long as I’m here, I might as well add the other methods that people might need at some point.”

And worse, premature generalization locks in untested intellectual structures that will tend to shape or constrain people’s future thinking. If you’ve ever been involved in a project where a team member got ahead of the team, building an elaborate class library before the team has finished designing the code that’s going to need that library, you know how much this situation can suck. Naive managers don’t want to see the prematurely-generated class library reworked or thrown away (“we don’t have time!”) and you find yourself handcuffed, and possibly forced into writing even more code to solve impedance mismatches.

It’s far better to first solve specific cases first, and then let the generalizations emerge through refactoring. Don’t waste time writing code you don’t need, and don’t base your believe about “need” on forward thinking alone. The future is uncertain, and there are are limits to our horizons.

I don’t believe there is a physical/temporal way to measure the conditions under which premature generalization happens. In my experience it seems to be based on the technical capabilities of the developer and the degree of familiarity with the problem and solution domains. For example, if I have solved a particular problem before, I am aware not only of the problem domain but the solution domain as well (at least a part of it).

This domain knowledge enables me to see patterns ahead of time and develop my solutions accordingly, generalizing along lines that I know will be beneficial later.

If I am unfamiliar with the problem domain (and necessarily the solution domain), I may begin coding up a solution that I believe solves the problem. I may also begin to see patterns in my solution and want to factor them out or generalize them. I am more worried about being DRY than I am with clarity.

This is the point of inflection I believe. Because I am unfamiliar with the problem domain, I should trust that my solution is suboptimal in the best case and flat-out wrong in the common case.

If I fail to recognize my own unfamiliarity with the problem domain, or over-estimate my competency in the solution domain, I will likely generalize at this point and it will likely be a premature generalization: I will be factoring out things and bending my solution around a paradigm that is wrong.

It is giving a hammer to someone who has never seen a hammer or a screw, then asking them to drive a screw. You cannot unsee the wrong solution and all of your solutions will be centered around the use of a hammer when it is the wrong approach and should be unremembered.

Once generalized, an incorrect solution is more difficult to change than had it not been generalized (the premise of premature generalization) because once a general solution is modeled, anchoring and availability biases kick in so that reasoning about the problem and its solutions is done in terms of the broken model.

It’s not impossible to recover from a broken generalized model, especially if you can see it for what it is and already know at least one model that is better. But the cost to overcome the broken model to those who aren’t experts in the domain is expensive.

Being DRY is good, but not before you understand the domain of the problem and have explored most of the good options for solutions. It’s better to stay explicit, even WET, for a little while until you can be sure that your DRY isn’t a premature generalization.

Last modified on 2015-10-12