[BLAST_ANAWARE] batchnames and runnames

From: Tancredi Botto (tancredi@lns.mit.edu)
Date: Thu Feb 05 2004 - 11:49:02 EST


Hello,
I'd like to make a few comments about blast data organization with the
aim of making more flexible than what it is right now. I hope to receive
comments. If I don't then at some point soon I will just go ahead (!!)

Somehow our data as always been written as batchname/batchname-xxx.dat
However changing a batchanme every year (thus giving it a meaning) is a
problem for when you want to *crunch* together several runs. It was
mentioned (for the same reasons) that we should not reset the run number
counter every year. In doing so, the run number has all the info we need.

With the present conventions we still can (and will) just do " root lr.C
1000 5000" where run 1000 and 5000 where taken and crunched in different
years. In fact the lr, flr, dst data format is backward compatible and so
are the file names, e.g., lr-1000.dat and lr-5000.dat. Of course the files
lr-1000.dat, lr-5000.dat should be in the same path or directory, which is
not the default right now. More on that below.
Note also that we can not even *crunch* runs across different years without
changing programs or the environment as now we have file names formats that
differ, such as pro2003-1000.dat and pro2004-5000.dat.

On the other hand there is no reason to give pro2003, pro2004 a meaning,
since that is what the run numbers 1000 and 5000 do already. The filename
format for the raw data should just be run-xxxx.dat. I propose to make
that change from "now on" (note, for sake of "consistency", lets adopt the
format pro-xxx.dat). Then, the variable

*.DataFile: $DATADIR/pro-#.dat

defined in blastrc is always good just as

lrn.OutFile: $ANALDIR/lr-#.root
flrn.OutFile: $ANALDIR/flr-#.root

are always good across runs/productions/years.

We should also not have different paths on where we store the data and our
analysis macros. In the first case, all raw data should go to
/net/data/4/Daq/data (not /net/data/4/Daq/pro2003 ), and all crunched data
go to /net/data/4/Analysis/data.

The path to a different analysis directory can be hidden, but it can also
be convenient since - for instance now - in `/pro2003/analysis and
~/pro2004/analysis we use different libBlast which are not compatible (in
the call to to TOpt) and require different macros. They also make
different CVS modules which is as some advantages (and disadvantages) of
its own. Not that you should plan on analyzing with an old version
libBlast. There is no need to change cvs either. But again, there
will be a /home/blast/blast/pro link to the "right" directory (relative
to the curent libBlast).

So, as you can see the above problems started with the continued use of a
structure such as commis/commis-xxx.dat , pro2003/pro2003-xxx.dat. Last
year we missed an opportunity to make this change (however, we did not
even have TOpt back then...). I take credit for not having thought of
this. This year (and for the future) we should make such a change.

Data of last year (pro2003) will not be touched. However they can be made
compatible by either renaming them or linking to them (the change, again,
is from pro2003/pro2003-xxx.dat to data/pro-xxx.dat). If we choose to
rename them that we should probably write the RUN table in mysql. Finally,
I do not think we'll ever need to change the elog html pages.

regards,
tancredi

P.S.
another alternative, which would bring similar results with less extensive
changes, is to stick to the "batchname" pro2003 from now on.

P.P.S.
Of course, be prepared for run numbers well above 10000. The run count
is 4551 right now.

-- 
________________________________________________________________________________
Tancredi Botto,  		phone: +1-617-253-9204  mobile: +1-978-490-4124
research scientist		MIT/Bates, 21 Manning Av    Middleton MA, 01949
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^



This archive was generated by hypermail 2.1.2 : Mon Feb 24 2014 - 14:07:30 EST