Are computers infallible?

It’s because We all know that computers can “crash”.

But can anyone provide a really good explanation of why memory corruption occurs? Are computers calculations infallible?

Allow me to share with you a little story. When I was studying thermodynamics at university, a very strange thing happened: a relatively old computer essentially made a mistake!

What happened was this. We were in the laboratory, measuring a quantity for an experiment and then tabulating the results in an excel spreadsheet with the aid of an old computer.

But one of those cells refused to play nicely. It didn’t produce the appropriate result. It was way off. Not just a little off. It was way, way off. It was so far off we couldn’t help but notice it.

The excel spreadsheet has failed. Either the software or the hardware had failed. Something had clearly failed. How can the same mathematical algorithm generate one inconsisent result among many rows of similar cells?

We investigated further. We looked at the individual cells but the equations in each one were all identical. We looked at the references to the other cells and they were all correct.

I remember looking at conditional formatting, number format and a load of other things. Nothing. Everything should have worked. And none of us were excel juniors. We were advanced users!

So being conscientious students with lab-work to do, eventually we just had to let the incident go and get on with the assigned experiment.

We had to acknowledge, that, this time —this one time— the computer had made the mistake, not the human.  We are told that this is impossible. We are taught that this is impossible.

It was bizarre. Computers do not, cannot, make mistakes. One plus one always equals two. But this time, there were two other witnesses! One witness was my friend and lab partner. The other witness was the thermodynamics professor himself! We were all equally baffled and perplexed. What had happened?

What’s interesting is the professor’s response. Do you know how the professor dealt with this strange situation? He said that “God must have sneezed”. And he said it more than once, half-jokingly, as you might in front of two young science students upon having no decent explanation.

The truth is, we never found the real reason for this odd behaviour. But I personally prefer to deal with that incident by saying that the CPU was struck by a gamma ray emitted by an external source —perhaps a stray cosmic ray originating from outer space— leading to the unexpected result. One rogue particle leading to one rogue answer. Who knows?

I can’t recall whether we just deleted the equation in the cell or started a new worksheet template or had to restart the computer in order to reset its memory. I don’t actually remember the particulars about the numbers or formulae either. I don’t even recall anything to do with the experiment. But what is important, what I did I remember is that computers can and do make mistakes.

Newer electronics never seem to be as reliable as older electronics, do they? That’s why it was so odd when this particular old 80286 computer made that one mistake. Because those intel processors were built like tanks in compaarison to the fine nano-scale architecture that we see today. Yet as electronic devices age, they are more likely to make mistakes. Why is that?

It’s because the smaller the internal parts get, the less reliable they become. Smaller things are more susceptible to the effects of galvanic corrosion. And other things occur at microsopic scales; things like tin whiskers growing along the soldered joints in lead-free electronic tracks for instance. The sudden jolt of your smart phone being dropped on the floor and so on. I have dropped my phone several times. It’s almost a gamble, a veritable lottery these days, to see whether a modern smart phone will survive a fall or not.

A sturdily built abacus would be a less likely to break, would it not? But what normally happens to computers is that they just stop working altogether. Rather than produce something unexpected, the normal outcome is that they simply don’t produce any result at all. Perhaps because more than one component is damaged simultaneously and the computer can no longer compute anything. The system as a whole is way too complicated. Get one of the inputs wrong and no result appears…

Leave a Reply

Your email address will not be published. Required fields are marked *