Redundant PSU unimportant comared to MTBF
Authored by: SOX on Jan 08, '04 11:54:20AM
Redundant PSU is probably not needed for nodes (server's yes, nodes maybe not). With nodes the cost of failure is not great since there are presumably losts of them. With disk servers or a master node, the cost of down time is intolerable.

The cost of down time on a node comes from two sources. When you have a lot of nodes, you want the proabblity that any one of them is down in a given week to be small so that its a maganagable job for a service tech. The second cost is that you need the probability of a long duration job failing due to a node dieing to be very small. How small is becomeing less important with time because of better checkpointing and adaptive cluster healing scheduling. But still you want most jobs to run with out a node dieing.

since the costs for node dieing are simply costs proportional to up-time and not catostrophic costs, then simply using more reliable part in thepower supply to reduce the probablity of failure is just a better solution than redundant power supplies. that is if a pSU fails you still have to schedule a service tech to replace it. thus two crappy by redundant power supplies will be more expensive to maintain for most people than one high quality PSU with say a 2x to 10x beter mean time between failures.

I bought apple xserves for exactly this reason. so far they have a 10,000X better Mean time between failures than my previous Athlon 2U computers did. this translates to about a 100X lower TCO for me.

