[BLAST_ANAWARE] autocruncher double crunch 100kC worth of data affected

From: Chi Zhang (zhangchi@MIT.EDU)
Date: Sat Oct 16 2004 - 18:45:58 EDT


Hi all,

sorry for breaking this bad news but ever since September 22nd, the
auto-cruncher has been crunching the same runs for multiple times.

the sympton being: in status_list.txt multiple entries for same run appear
and they are ON DIFFERENT CPUs!!!!!!!!!! When dst is openend and the
following command in root " dst->Scan("fNEvent") is issued, one can see
same CODA event umber appears multiple times!!!!!!!!! this continues until
all but one lrn crashed out.

see this section of status_list.txt:
11922 spud2.bates.daq 30
11923 bud06.bates.daq 1
11923 spud4.bates.daq 30
11924 bud23.bates.daq 1
11924 spud1.bates.daq 30
11925 spud1.bates.daq 30
11925 spud3.bates.daq 1
11926 spud2.bates.daq 30
11927 spud3.bates.daq 1
11927 spud5.bates.daq 30
11928 bud22.bates.daq 1

the last run crunched normally is run 11297 finished at 3:05 of Sep 22nd.
The following runs and later are all crunched multiple times and
unfortunately at the same time:

143635915 Sep 22 03:50 /net/data/4/Analysis/data//dst-11296.root
249868274 Sep 22 07:22 /net/data/4/Analysis/data//dst-11298.root
253659856 Sep 22 07:36 /net/data/4/Analysis/data//dst-11293.root
251871484 Sep 22 07:56 /net/data/4/Analysis/data//dst-11295.root
253803099 Sep 22 08:19 /net/data/4/Analysis/data//dst-11294.root
254026042 Sep 22 22:12 /net/data/4/Analysis/data//dst-11299.root

I stopped the cruncher daemon on dlbast09 and there does not seem to be
another cruncher running at the same time since the runlist is modified
only by elog, not cruncher (run numbers being written in, not taken out).

all these runs up to 11960 will have to be recunched with
lrn!!!!!!!!!!!!!!!!!!!

For people going to Chicago, we need to figure out what shall we present.
For people went to Triesta, hope your "PRELIMINARY" stamps are BIG enough.

Chi

keywords: FAILURE

P.S. I don't have the stomach to debug the cruncher, I turned it off and
am crunching runs from 11962 manually. cruncher experts please
investigate.



This archive was generated by hypermail 2.1.2 : Mon Feb 24 2014 - 14:07:31 EST