[BLASTTALK] Spud news - spud1 is stable again, spud8 is out of commission.

From: Scott Garman (sgarman@einstein.unh.edu)
Date: Mon Dec 30 2002 - 15:04:59 EST

I'd like to notify users of the SPUD cluster that some changes were made
to the spud systems last Friday.

As many of you know, spud1 has been crashing and spontaneously rebooting
incessantly. We have done a number of things to try to determine what
was causing this. We swapped out the memory, cpus, and heavily monitored
the temperature to see what was going on. In the end we are now quite
certain that the motherboard is the culprit.

Since spud1 and spud8 had identical hardware, we exchanged them and
swapped their network identities, and moved the data from the original
spud1 to the "new" spud1. Spud1 is now acting very stable and spud8 is
crashing up a storm. Until we are able to order a new motherboard for
spud8, we are shutting this machine down completely.

In other news, we have disabled network channel bonding mode, which was
a possible source of other network stability issues on all of the spuds.
Since no one was maxing out the network bandwidth anyway, no one should
notice a difference, but this change simplifies the network setup and
makes it more reliable at the same time.

In the end, all of this should result in a better computing experience
for the spud users.



PS - Problems or questions about the spuds should still be directed to
Ernie Bission (bission@mit.edu), but we'd also appreciate it if you
could cc: those messages to an open mailing list we have set up at:
blastfarm@einstein.unh.edu so we can keep tabs on things too. Thanks!

Scott A. Garman                        Unix System Administrator
sgarman@einstein.unh.edu               UNH Nuclear Physics Group

This archive was generated by hypermail 2.1.2 : Mon Feb 24 2014 - 14:07:29 EST