Difference between revisions of "Mercator Automatic Data Acquisition"
From MohidWiki
(No difference)
|
Revision as of 14:28, 27 October 2006
This article describes the automatic data acquisition system built to get large-scale circulation model gridded data from Mercator-Océan ftp server. Another article is written to describe the interpolation automated task for a given model.
Objective
Every week Mercator-Océan releases a set of 63 netcdf gzipped files to their ftp site in a timestamped folder. The goal is to get these files, store them in the local network and perform data manipulation on them.
The catch is that sometimes some files are missing at the ftp site, sometimes the connection breaks down for some reason. Hence the data acquisition system must be "intelligent" to "know" when he has fetched all the files and when he must try again a new ftp connection to fetch the missing files.
Technological base
Perl and shell scripting, using ftp and smtp perl modules. Mohid tools to convert from netcdf to HDF5, and to interpolate. Scheduler or crontab to agend the task weekly.
Control flow diagram
Aside is the control flow diagram and below is the full batch file describing all the operations:
REM CalcDate.bat
:CALC
perl CalculateCurrentDate.pl > _date.tmp
IF %ERRORLEVEL% NEQ 0 GOTO CALC
IF ERRORLEVEL 0 GOTO CREATE
REM ftp_get.bat
:CREATE
more < _date.tmp | perl Create_ftp_address.pl > _ftp_address.tmp
IF %ERRORLEVEL% NEQ 0 GOTO CREATE
IF ERRORLEVEL 0 GOTO GET
:GET
more < _ftp_address.tmp | perl ftp_get.pl
IF %ERRORLEVEL% NEQ 0 GOTO CREATE
IF ERRORLEVEL 0 GOTO CHECKFILES
:CHECKFILES
more < _ftp_address.tmp > _Extract.tmp
echo (ist_meteog-mercatorPsy2v2r1v_R\d{8})>> _Extract.tmp
more < _Extract.tmp | perl ExtractPattern.pl > _Count.tmp
echo .nc.gz>> _Count.tmp
more < _Count.tmp | perl CountFiles.pl | Checkfiles.pl
IF %ERRORLEVEL% NEQ 0 GOTO CREATE
IF ERRORLEVEL 0 GOTO CONVERT
REM convert.bat
:CONVERT
more < _date.tmp | perl Convert.pl
IF %ERRORLEVEL% NEQ 0 GOTO CONVERT
IF ERRORLEVEL 0 GOTO CHECK
REM interpol.bat
REM Still misses ...
REM check.bat
:CHECK
more < _date.tmp | perl check.pl
IF %ERRORLEVEL% NEQ 0 GOTO BAD
IF ERRORLEVEL 0 GOTO GOOD
:BAD
more mail_Bad.txt | perl smtp.pl
GOTO END
:GOOD
more mail_Ok.txt | perl smtp.pl
GOTO END
:END
del *.tmp
The command names are pretty self-explaining. The perl files are best described at their repository where the commented code is published. Some care was taken so most of the little perl programs could be re-used generically.
Comments
20061025
This week two new problems rose:
- 3 intruder files were in the ftp download folder. This makes a number of 66 downloaded instead of 63. Thus the automatic data acquisition system ends up looping endlessly until 63 arrives.
- The other problem is that when the ftp connection fails, the program re-hooks the ftp and continues to download the next non-existent file. The problem is that a zero bytes files exists and isn't downloaded again.
The solutions are:
- Give in the ftp address file a regexp pattern for the ftp-get program to match, so it will download only the wanted files.
- Check, besides the existence of the file, its size also. If zero bytes then download.