Anybody with a heartbeat knows that the pressure of publish-or-perish has been driving scientists to massage their data until desirable conclusions have been drawn. Statisticians refer to this phenomenon as overfitting, and one of the causes is allowing too many degrees of freedom in one’s explanations. The classic example fitting a high-degree polynomial to roughly linear data, which ends up learning the noise rather than any meaningful trend. (Getting a bit more technical, the phenomenon corresponds to erring on the side of too little bias, too much variance on the eponymous tradeoff.)
One can also have too few degrees of freedom; the textbook picture here is one of insisting on fitting a straight line to an obviously non-linear data plot. This is called underfitting, or choosing too much bias and too little variance. A classic (and unavoidable) cause of underfitting is a lack of the requisite domain knowledge to produce a sufficiently descriptive hypothesis class (subject to the obvious constraint of not being so rich as to overfit, of course!).
A much less widely discussed — and more ominous — cause of underfitting is to deliberately exclude taboo explanations from one’s hypothesis set. This is done in advance of seeing any data, often in spite of any empirical evidence to the contrary, and purely for social and political reasons. As Larry Summers found out the hard way, even including such potential explanations in one’s arsenal — not proposing them, mind you, just listing them as a possibility — can get one fired, even post-tenure. If you go the extra step of actually reaching taboo conclusions, then not even a Nobel prize will save you.
Which brings me to Sailer’s recent column, where he describes an academic culture in the social sciences where published results are guilty of overfitting and underfitting simultaneously!