An experiment, posted on LessWrong.com that led to a further diatribe on rationality and rational thinking traps got me thinking too. Here’s the problem:
Once upon a time, there was an instructor who taught physics students. One day she called them into her class, and showed them a wide, square plate of metal, next to a hot radiator. The students each put their hand on the plate, and found the side next to the radiator cool, and the distant side warm. And the instructor said, Why do you think this happens? Some students guessed convection of air currents, and others guessed strange metals in the plate. They devised many creative explanations, none stooping so low as to say “I don’t know” or “This seems impossible.“
And the answer was that before the students entered the room, the instructor turned the plate around.
This is Occam’s Razor at its finest. But these unwitting physics students are no different than the average programmer out there when confronted with a bug report. Take this example of a guy who had a strange logic error in a core Linux package:
A few weeks ago, though, I encountered some bizarre behavior on my desktop, that honestly just didn’t make sense. I spent about half an hour digging to discover what had gone wrong, and eventually determined, conclusively, that my problem was a single undetected flipped bit in RAM.
This guy admirably spent a long, painful session tracking his error to a faulty location in RAM. But his most likely rationale for why?
For me, bitflips due to cosmic rays are one of those problems I always assumed happen to “other people”. I also assumed that even if I saw random cosmic-ray bitflips, my computer would probably just crash, and I’d never really be able to tell the difference from some random kernel bug.
Cosmic rays! Now, the possibility exists, that’s true. But is it the most likely explanation? No, not by a long shot. More likely: faulty RAM due to manufacturing defects. More RAM failures are documented as problems than cosmic ray defects.
Why do we have some perverse belief that all our problems are exotic, unusual and outside of normal?
When you discover a bug in your code, is your response:
- Hmmm, I wonder what simple thing I did wrong here?
- I wonder if there’s a kernel bug in Linux causing that?
Simplicity isn’t just a goal in optimization, but in finding the source of bugs too. Never let the truth interfere with a good story, I always say.
Great post. Don’t forget about all the complexity that developers build into systems in the first place, though. If a software developer tried to invent a 12-step process to avoid complexity, it would turn into a 32,768-step process just so it would align with a half-word boundary.
Build simplicity in from the start, I say.
Excellent post.
At risk of biasing someone’s reading of what follows, I may as well admit up front that I’m a programmer.
I’ve long subscribed to the theory — based on several decades of experience — that the simpler a piece of software is, the more “complicated” it is perceived by the user. I attribute this to the fact that a human being carries a vast number of unspoken, and perhaps even unconscious, assumptions about what is a “reasonable” or “logical” behavior in any given situation, whereas in a piece of software all of these must be programmed in explicitly. The human’s ability to take into account a vast number of factors, some of them very subtle, conflicts with “simple” software’s blunt-force, rigid, small repertoire of responses. The more smoothly one wants software to integrate into a human’s flexible world(view), the more factors it must be designed to take into account, and the more complicated it must become.
I perceive an analogy here to the atomic nature of matter. A rubber ball can roll, bounce, deform when compressed, maybe float in water, reflect light in a certain delicately-parameterized way, have a certain scent, maybe an ability to be used as a pencil eraser, etc. How many of these factors are likely to be incorporated into a computer simulation of “a rubber ball?” How many other factors are likely to be omitted? Where does one draw the line?
Umm, “Cosmic Ray” is programmer short hand for faulty ram or some other randomn cause of bitflip. Of course he didn’t really think it was a cosmic ray, he just means “random occurrence that resulted in an incorrect bit”.
http://catb.org/jargon/html/C/cosmic-rays.html
Re-reading the article, I’m pretty sure he meant “real cosmic rays”, as opposed to “stuff that just went weird”:
He seems aware of the differences there, but specifically tags cosmic rays as the culprit in his first paragraph…
Assuming the most complicated explanation is the correct one is hardly a failure of programmers but of humans and their nature.
However in this case it may just indicate a lack of understanding of fundamental scientific ideas which is apparently rather widespread. Any grad student or PhD should have this at heart and probably also programmers even though _building_ a complex system has nothing to do with Occam’s Razor.
When you discover a bug in your code, your response should not be whose to blame. It doesn’t matter if the you created the bug or Linus hisself created it. Your only thought should be what is going wrong and how do I fix it, not who is at fault.
Your blame-centric attitude would crush a team.