Changes between Initial Version and Version 1 of HpcInfo


Ignore:
Timestamp:
Jan 23, 2012, 10:49:06 PM (12 years ago)
Author:
dzollars
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HpcInfo

    v1 v1  
     1= HPC Communications =
     2
     3[[Image(sc-comm.png)]]
     4
     5!UltraScan's communication with the High Performance Computer (HPC) or Grid Cluster is
     6in accordance with the above drawing.   The following tasks will be accomplished by the
     7elements as described below.
     8
     9== Laboratory Information Management System (LIMS) ==
     10
     11The purpose of this system it to interface with the user to specify an analysis type,
     12such as the Genetic Algorithm (GA) or Two Dimensional Spectrum Analysis (2DSA), and
     13the needed parameters for the analysis to the HPC system.  After the user specifies the needed
     14data, the data is packaged into a control.xml package.  The command-line program
     15grid-submit is then invoked.   
     16
     17The contents of the control.xml file will include a generated AnalysisGroupGUID and
     18all needed child HPCAnlysisRequest records. 
     19
     20Specific data in the control.xml file will be specified here once we agree on this
     21top level design.
     22
     23The database tables HPCAnalysisGroup, HPCAnalysisRequest, and appropriate Settings tables
     24are populated by LIMS before calling grid-submit.php.
     25
     26
     27[wiki:Us3HpcDb US3 HPC Database Tables]
     28
     29LIMS is currently a Web interface.  In the future, it's functionality may be ported
     30to the !UltraScan client.
     31
     32=== grid-submit ===
     33
     34grid-submit.php is a command line tool that creates the initial HPCAnalysisResult table
     35entry with a queue staus of 'Submitted'.  It copies control.xml and all files that it specifies
     36to the HPC system using the gsiscp utility.
     37
     38It then uses the submission technique needed for the specified supercomputer cluster
     39to queue the job.
     40
     41== Supercomputer Queue ==
     42
     43This task is controlled by the Supercomputer system.  It is responsible for controlling
     44the jobs running on that system and communication with clients.
     45
     46Communication tasks include receiving tasks, returning job status, and informing the
     47client when a task has been completed or aborted.
     48
     49== NNLS (!UltraScan HPC Analysis Program) ==
     50
     51The NNLS program reads the control.xml file and uses that as a guide to read other data
     52files as needed to populate internal data structures.  It then performs the analysis,
     53writing any needed output to disk.
     54
     55At the beginning of the program, periodically during execution, and at the end of of
     56processing, NNLS writes a UDP status datagram to a listener on the host and port specified
     57in the control.xml file.  Each datagram will consist of the analysisRequestGUID and a
     58status (e.g. started, iteration number, finished).  This is not a reliable two-way
     59communication and it is the responsibility of the listener to follow up and manage any
     60missed messages.
     61
     62== grid-timeout ==
     63
     64This program will ether be scheduled periodically via cron, or run as a daemon.  It will
     65check status of jobs in the mysql database and initiate a status query for jobs
     66that have overdue status updates.  If a job has been aborted, it will notify the
     67grid-listen program of that status.
     68
     69== grid-query ==
     70
     71This is a command line program that submits a status query to the Supercomputer Queue and
     72returns the result.
     73
     74== grid-listen ==
     75
     76This program runs as daemon receiving udp packets from the NNLS program or the grid-timeout
     77program.  It is responsible for updating the mysql database table HPCAnalysisResult with current
     78status and, upon completion or abort of an analysis, fetches needed files from
     79the supercomputer cluster, sends an email to the user, and does any other cleanup necessary.
     80