Tuesday, October 14, 2008

Why is self-healing in computer systems important?

Reading about the malfunction that rendered the Hubble Space Telescope silent, I realized indeed how important is for systems to have the property of self-healing that our human body has, and in the broader sense, be autonomic. This would imply for the system to know itself in such a way as to poses detail knowledge of its components, status, and internal and external connections. If a system does not have information about a specific component, it cannot control it, hence the importance of knowing itself.

Furthermore, reconfiguring itself based on the environment is also a desired property. Most importantly, such a system would need to heal itself without the interference of human experts. Why do we need such a self-healing characteristic in computer systems? The Hubble Space Telescope mentioned above is one example. The failure in the telescope came from a unit that collects data and transmits it to earth. Solving this problem means to remotely send commands to the telescope to switch its operations to a backup unit. Why hasn't this been achieved automatically? Because Hubble was not designed with self-healing in mind. Would have been that complicated to discover the problem on its own and find an alternative resource that it could use to continue its normal operation? Self-healing really means just making use of redundant or underutilized components to take over the task of the malfunctioned element. This is similar to how the brain works when parts of it are damaged. The problem is that because Hubble is not 'aware' of its backup unit, that specific unit has not been utilized since 1990, making it subject to "harmful rays of the sun, extreme temperature changes during orbits and 18 years of cosmic debris".

If you are interested in Autonomic Computing, IBM has a whole research project devoted to this subject. You can find out more about this here.

No comments: