All output data and associated files should be copied to the archive area on pbstore. Because the pbstore does not play well with large numbers of small files, we have a script that will tar up the small files before transferring them:
archivedifx.py /data/corr/corrdat/v190n/ hbignall@cortex.ivec.org/pbstore/groupfs/astrotmp/as03/VLBI/Archive/Curtin/v190/v190n
This script will only tar small files, larger files (> 5 MB) will be transferred without tarring.
If correlation was done using the espresso scripts, all of the correlator model, input and associated files for each job (including .vex and .v2d files) will have been automatically copied to the output data directory.
You should set up an ssh key pair on your own account in order to transfer between cuppa and pbstore without being prompted for the remote password.
Note: The new cortex system may require the output directory to be accessible via a softlink in the home area, e.g. $HOME/LBA_ARCH/{expt} where LBA_ARCH is a softlink pointing to the archive area. (Otherwise a bug results in the path being created within the user's home area)
Simplest is to do ln -s /pbstore in your home area.
The pipeline outputs also should go on pbstore, in the ftp directory. In this case we do not wish to tar up the small files (they need to be accessible from a wiki page) so transferring with globus-url-copy is recommended. The gloPut7T.sh script may be used, but note that it only checks file sizes to determine whether source and destination files are the same, so is deprecated for these files (its intended use is for moving large numbers of unchanging files).
globus-url-copy -sync -cd -r file://$PWD/ sshftp://hbignall@cortex.ivec.org/pbstore/groupfs/astrotmp/astronomy/anonymous/v190n/
On http://www.ivec.org/services/data-storage/faq it states that to put data on cortex, “the preferred method is to bundle your files into a single tar or zip prior to sending them to cortex. This is to reduce the amount of tiny files within the system, as this cause significant issues.”
The suggested way of doing this is to use the archivedifx.py script described above. That script uses globus-url-copy to transfer the data. If you need to use ssh, then something like this would also work (though lacks the ability of archivedifx.py to only tar small files and leave large files unmodified):
tar cvf - v190n | ssh hbignall@cortex.ivec.org "cat > /pbstore/groupfs/astrotmp/as03/VLBI/Archive/Curtin/v190/v190n.tar"
The notes also suggest using none-cipher option with ssh if possible to reduce CPU overhead from encryption..
A similar result can be obtained with globus-url-copy
tar cvf - v190n | globus-url-copy -vb -cd - sshftp://hbignall@cortex.ivec.org/pbstore/groupfs/astrotmp/as03/VLBI/Archive/Curtin/v190/v190n.tar
Additional files for pulsar modes only:
Jobs may live in subdirectories, with identical filenames for files from different jobs within each subdirectory. (Therefore it's important to keep the directory structure.)
Ideally we want to keep all relevant files for all production jobs.
However, in some cases SWIN format output data may be impractically large for online storage and it may not be desirable to keep this intermediate-stage data. For example, output from DiFX 1.5 when a full band is correlated at high spectral resolution, but the user only wants the subset of the band containing the spectral lines at high resolution. In this case the output FITS data will generally be a manageable size, but the SWIN output data in the .difx directories will typically be at least several times larger (e.g. it covers 16 MHz, while the region of interest is only 2 MHz wide).
It may be useful to keep all jobs (clock search and test as well - e.g. to have some data at higher spectral resolution for checking). Usually clock search jobs are in “clocks” subdirectory, but it won't necessarily be obvious which jobs are test or final production. Test jobs could be manually moved to a subdirectory (e.g. “test”). Other (dated) subdirectories may exist for production jobs (especially where multiple runs were needed). Note: Running with espresso now creates a comment.txt file to contain a description of each job (operator prompted to edit file at completion of correlation).
Not useful to keep: