Re: [BLAST_ANAWARE] batchnames and runnames

From: Chris Crawford (chris2@lns.mit.edu)
Date: Thu Feb 05 2004 - 12:45:39 EST


hi tancredi,
  i agree with adrian and chi.
--chris

Tancredi Botto wrote:

>Hello,
>I'd like to make a few comments about blast data organization with the
>aim of making more flexible than what it is right now. I hope to receive
>comments. If I don't then at some point soon I will just go ahead (!!)
>
>
>Somehow our data as always been written as batchname/batchname-xxx.dat
>However changing a batchanme every year (thus giving it a meaning) is a
>problem for when you want to *crunch* together several runs. It was
>mentioned (for the same reasons) that we should not reset the run number
>counter every year. In doing so, the run number has all the info we need.
>
>
>With the present conventions we still can (and will) just do " root lr.C
>1000 5000" where run 1000 and 5000 where taken and crunched in different
>years. In fact the lr, flr, dst data format is backward compatible and so
>are the file names, e.g., lr-1000.dat and lr-5000.dat. Of course the files
>lr-1000.dat, lr-5000.dat should be in the same path or directory, which is
>not the default right now. More on that below.
>Note also that we can not even *crunch* runs across different years without
>changing programs or the environment as now we have file names formats that
>differ, such as pro2003-1000.dat and pro2004-5000.dat.
>
this is not a serious problem. each run is crunched separately, and you
can change the prefix with simple command-line options.

>On the other hand there is no reason to give pro2003, pro2004 a meaning,
>since that is what the run numbers 1000 and 5000 do already. The filename
>format for the raw data should just be run-xxxx.dat. I propose to make
>that change from "now on" (note, for sake of "consistency", lets adopt the
>format pro-xxx.dat). Then, the variable
>
>*.DataFile: $DATADIR/pro-#.dat
>
>defined in blastrc is always good just as
>
>lrn.OutFile: $ANALDIR/lr-#.root
>flrn.OutFile: $ANALDIR/flr-#.root
>
>are always good across runs/productions/years.
>
>
>We should also not have different paths on where we store the data and our
>analysis macros. In the first case, all raw data should go to
>/net/data/4/Daq/data (not /net/data/4/Daq/pro2003 ), and all crunched data
>go to /net/data/4/Analysis/data.
>
>The path to a different analysis directory can be hidden, but it can also
>be convenient since - for instance now - in `/pro2003/analysis and
>~/pro2004/analysis we use different libBlast which are not compatible (in
>the call to to TOpt) and require different macros. They also make
>
there should be no reason to every use the pro2003-style macros. you
can still analyze 2003 data from the pro2004 directory.

>different CVS modules which is as some advantages (and disadvantages) of
>its own. Not that you should plan on analyzing with an old version
>
different directories, but actually just different versions of the same
module

>libBlast. There is no need to change cvs either. But again, there
>will be a /home/blast/blast/pro link to the "right" directory (relative
>to the curent libBlast).
>
>
>So, as you can see the above problems started with the continued use of a
>structure such as commis/commis-xxx.dat , pro2003/pro2003-xxx.dat. Last
>year we missed an opportunity to make this change (however, we did not
>even have TOpt back then...). I take credit for not having thought of
>this. This year (and for the future) we should make such a change.
>
>
>Data of last year (pro2003) will not be touched. However they can be made
>compatible by either renaming them or linking to them (the change, again,
>is from pro2003/pro2003-xxx.dat to data/pro-xxx.dat). If we choose to
>
then you would have a big incompatibility issue, which would keep
popping up where you least expect it!

>rename them that we should probably write the RUN table in mysql. Finally,
>I do not think we'll ever need to change the elog html pages.
>
>regards,
>tancredi
>
>
>P.S.
>another alternative, which would bring similar results with less extensive
>changes, is to stick to the "batchname" pro2003 from now on.
>
good idea. all we have to change is the coda filename to pro2004. all
of the coda configuration stuff can stay pro2004, of course.

>
>P.P.S.
>Of course, be prepared for run numbers well above 10000. The run count
>is 4551 right now.
>
no problem

>
>



This archive was generated by hypermail 2.1.2 : Mon Feb 24 2014 - 14:07:30 EST