Re: [BLAST_ANAWARE] data files

From: Taylan Akdogan (akdogan@MIT.EDU)
Date: Tue Apr 27 2004 - 12:43:14 EDT


On Tue, 27 Apr 2004, Tancredi Botto wrote:

> Two copies of the data will be backed up on tape by ErnieB. (it turns he
> needs a different tape drive). There is still quite some compression
> to do, so compressed data are ideal for tape/DVD archiving. This is not
> used now but at some point there will be a gunzip wrapping around lrn (we
> can't feed gzipped data directly to it since two passes are needed to fill
> the dst, and the gunzip is anyways a relatively small overhead)

Just for the records:

I had done some compressibility test of raw data. It is clear
that we bzip2 look better in compression to gzip. Before I loose
my notes on this, let me report them here:

bzip2 algorithm/program
Compression: 386MB -> 186MB in 134sec
Decompression: 186MB -> 386MB in 72sec

gzip algorithm/program
Compression: 386MB -> 279MB in 41sec
Decompression: 279MB -> 386MB in 9sec

NFS read time for 386MB file: 58sec (56mbps) on buds

The compressing and uncompressing tests are done completely on
RAM without any disk I/O-operation overheads. Thus, reading a
compressed data will reduce the NFS activity a little, and gain
some speeds for smaller compressed files.

Ex: A compressed data takes:
72sec (for decompressing) + 186/386*58sec (for reading) = 99sec (bzip2)
 9sec (for decompressing) + 279/386*58sec (for reading) = 51sec (gzip)

(not 72 vs 9 sec, as it seems)

Conclusion: bzip2 compresses to 48%, gzip compresses to 72%.
Uncompressing takes more time for bzip2, but still within the
noise of data-crunching time.

Taylan

-- 
---=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=---
Taylan Akdogan              Massachusetts Institute of Technology
akdogan@mit.edu                             Department of Physics
Phn:+1-617-258-0801                Laboratory for Nuclear Science
Fax:+1-617-258-5440                                  Room 26-402b
---=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=---



This archive was generated by hypermail 2.1.2 : Mon Feb 24 2014 - 14:07:30 EST