Buffer Overrun felled Global Surveyor? April 14, 2007Posted by gordonwatts in Uncategorized.
Buffer overruns are a particularly difficult source of bugs, and in the right places, security holes in software. Wikipedia has a marbled-mouthed description of it:
A buffer overflow is an anomalous condition where a process attempts to store data beyond the boundaries of a fixed-length buffer. The result is that the extra data overwrites adjacent memory locations. The overwritten data may include other buffers, variables and program flow data.
It has some great pictures of what it is talking about, however. Basically, a program stores something where it shouldn’t.
…an errant computer command five months earlier had been placed in the wrong location of the computer memory for the spacecraft. That, in effect, implanted a fatal defect in the spacecraft, disabling a safety feature to prevent the solar panels from rotating too far and mangling its ability to communicate with Earth in case of a mishap.
In short — no error checking caught that errant computer command. What a pity. It is very difficult to catch these sorts of programing errors (I see them with some regularity in our experiment’s code base).
The end of the new york times article describes what happened in the last minutes of the Surveyor’s life. The controllers watching from Earth must have been going nuts, helpless and unable to communicate:
In its last 13-minute contact, the Global Surveyor reported numerous alarms to mission controllers but gave no indication that it was in immediate danger.
As the spacecraft tried to recover, it ended up in an orientation such that the Sun was shining directly on a battery, causing it to overheat. The Global Surveyor misinterpreted that signal, sensing that it had overcharged the battery and stopped charging its other battery, as well.
Meanwhile, because the June error caused the craft’s antenna to point in the wrong direction, mission controllers on Earth could not get in touch with the craft again.
I’m going to ask Toby, who is part of the GLAST team, if they have done these sorts of checks. Or if this discovery means a delay while they check over the GLAST software.