Plan to Fail. I’m begging you! November 23, 2007Posted by gordonwatts in computers.
When we write code we always do it in an ideal situation. Single machine. Small amounts of input data. We then expect it to work well when we through multi-gigabyte data sets at it, or have it run on multiple computers. I see evidence of this all the time in both commercial and custom software. Especially GRID software. 🙂
The thing I would like to see more of in the way we code is graceful failure. For example, if you have to make an entry in a database in a machine that is half way around the world, the chances are pretty good that it will be down, or some bit of the Internet between you and it will be down. Plan for it! That will make you and your user’s life so much easier: if you do then either a) the data won’t be lost forever, or it won’t require you to, by-hand, add the data to the database every time.
I also don’t get these statements that we are connected to the internet all the time! Sure, about 90% of the time our portables or desktop computers are. But that other 10% of the time there are network outages, congestion, firewall issues. Code we write has to be ready for this sort of thing!
This is not easy. I see large corporations and dedicated open source projects get this sort of thing wrong all the time. Heck, I helped design and code up a distributed system that needs >95% up time and I still find issues where I didn’t design this in from the start (the D0 DAQ system). But one can do a lot!
The real problem is that there aren’t enough people to implement the features we need. Sometimes I wonder if delivering a limited system that worked would be better than a big system that has bugs.
P.S. If you couldn’t tell, I’ve spent a pretty frustrating several days dealing with code that doesn’t look like it was designed with this in mind! 🙂