Re: [BLAST_ANAWARE] batchnames and runnames

From: Tancredi Botto (tancredi@lns.mit.edu)
Date: Thu Feb 05 2004 - 14:20:47 EST


Ok,
we choose the minimum path. All blast runs from now on
will be called pro2003-xxxx but that prefix is just incindental and
does not relate to the production year.

-- t
________________________________________________________________________________
Tancredi Botto, phone: +1-617-253-9204 mobile: +1-978-490-4124
research scientist MIT/Bates, 21 Manning Av Middleton MA, 01949
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

On Thu, 5 Feb 2004, Tancredi Botto wrote:

>
> Hello,
> I'd like to make a few comments about blast data organization with the
> aim of making more flexible than what it is right now. I hope to receive
> comments. If I don't then at some point soon I will just go ahead (!!)
>
>
> Somehow our data as always been written as batchname/batchname-xxx.dat
> However changing a batchanme every year (thus giving it a meaning) is a
> problem for when you want to *crunch* together several runs. It was
> mentioned (for the same reasons) that we should not reset the run number
> counter every year. In doing so, the run number has all the info we need.
>
>
> With the present conventions we still can (and will) just do " root lr.C
> 1000 5000" where run 1000 and 5000 where taken and crunched in different
> years. In fact the lr, flr, dst data format is backward compatible and so
> are the file names, e.g., lr-1000.dat and lr-5000.dat. Of course the files
> lr-1000.dat, lr-5000.dat should be in the same path or directory, which is
> not the default right now. More on that below.
> Note also that we can not even *crunch* runs across different years without
> changing programs or the environment as now we have file names formats that
> differ, such as pro2003-1000.dat and pro2004-5000.dat.
>
>
> On the other hand there is no reason to give pro2003, pro2004 a meaning,
> since that is what the run numbers 1000 and 5000 do already. The filename
> format for the raw data should just be run-xxxx.dat. I propose to make
> that change from "now on" (note, for sake of "consistency", lets adopt the
> format pro-xxx.dat). Then, the variable
>
> *.DataFile: $DATADIR/pro-#.dat
>
> defined in blastrc is always good just as
>
> lrn.OutFile: $ANALDIR/lr-#.root
> flrn.OutFile: $ANALDIR/flr-#.root
>
> are always good across runs/productions/years.
>
>
> We should also not have different paths on where we store the data and our
> analysis macros. In the first case, all raw data should go to
> /net/data/4/Daq/data (not /net/data/4/Daq/pro2003 ), and all crunched data
> go to /net/data/4/Analysis/data.
>
> The path to a different analysis directory can be hidden, but it can also
> be convenient since - for instance now - in `/pro2003/analysis and
> ~/pro2004/analysis we use different libBlast which are not compatible (in
> the call to to TOpt) and require different macros. They also make
> different CVS modules which is as some advantages (and disadvantages) of
> its own. Not that you should plan on analyzing with an old version
> libBlast. There is no need to change cvs either. But again, there
> will be a /home/blast/blast/pro link to the "right" directory (relative
> to the curent libBlast).
>
>
> So, as you can see the above problems started with the continued use of a
> structure such as commis/commis-xxx.dat , pro2003/pro2003-xxx.dat. Last
> year we missed an opportunity to make this change (however, we did not
> even have TOpt back then...). I take credit for not having thought of
> this. This year (and for the future) we should make such a change.
>
>
> Data of last year (pro2003) will not be touched. However they can be made
> compatible by either renaming them or linking to them (the change, again,
> is from pro2003/pro2003-xxx.dat to data/pro-xxx.dat). If we choose to
> rename them that we should probably write the RUN table in mysql. Finally,
> I do not think we'll ever need to change the elog html pages.
>
> regards,
> tancredi
>
>
> P.S.
> another alternative, which would bring similar results with less extensive
> changes, is to stick to the "batchname" pro2003 from now on.
>
> P.P.S.
> Of course, be prepared for run numbers well above 10000. The run count
> is 4551 right now.
>



This archive was generated by hypermail 2.1.2 : Mon Feb 24 2014 - 14:07:30 EST