Skip links

Hidden Failure

As some of you know, our email was out of operation yesterday from roughly noon until the evening. We’re addressing the problem and it should not reoccur…but making a promise seems foolish. In the spirit of investigating IT failures as well as structural failures, the reason we did not see this coming is interesting.

In short, we had been allowing an enormous number of emails to sit on the server, and over the past six years that mass of correspondence and attachments had grown to over 130 GB. The limit was supposed to be 100 GB. The old server we were on until about two weeks ago allowed us to go over without complaint, even though we should have received a warning. That sounds great until you think about it: we were being allowed to use more storage than we should have, but we did not know we were doing so. The new server, as of two weeks ago, is better in pretty much every way, but it’s pickier. (As much as a piece of electronics can be picky.) It stopped allowing emails yesterday because we were over the storage limit. The tech support people at A2 helped get it running again, we’re deleting emails off the server to get our storage amount down, and we’re looking at switching to a higher-tier server that would get us more storage.

In short: the better-designed piece of equipment failed because of a problem that the lesser equipment accepted. In reality we had the failure condition – too much email in storage – for years but did not know it.

This brought to mind a brief mention of a similar condition in David McCullough’s The Path Between the Seas, his history of the Panama Canal. When the US engineers and construction team got to Panama, they found the remains of the earlier French effort. The US effort relied heavily on trains, as seen in the picture above. (The thing that looks like a steam engine is a steam shovel. There are two long trains full of earth spoil, one immediately to the right of the shovel and one further right.) The tracks were badly irregular, in part because of the soft ground in the area of the earth cut and in part because the tracks were often moved to keep them close to where the shovels were working. The French locomotives were precision machinery and would derail on the irregular tracks, while the US locomotives worked better under poor conditions. Locally, under specific conditions, better isn’t always better, even though in the long run it is.