wiki:HpcInfo

Version 5 (modified by dzollars, 9 years ago) (diff)

--

HPC Communications

UltraScan's communication with the High Performance Computer (HPC) or Grid Cluster is implemented according to the above drawings. The tasks are accomplished as described below. The original OpenOffice document is attached to this page.

Laboratory Information Management System (LIMS)

The purpose of this system is to interface with the user to specify an analysis type, such as the Genetic Algorithm (GA) or Two Dimensional Spectrum Analysis (2DSA), as well as the needed parameters for the analysis to the HPC system. After the user specifies the needed data, the data is packaged into a common data directory, uslims3.uthscsa.edu:/srv/www/htdocs/uslims3/uslims3_data. For this analysis request LIMS creates a new GUID, associates it with the analysis data, and creates a new record in the HPCAnalysisRequest database table. This record is the parent record for everything relating to this HPC analysis, and this GUID serves as the common identifier of this analysis. For instance, this GUID is the name of the subdirectory in the common data directory where all the files relating to this particular analysis are stored when the job is submitted to the HPC system. Another example: LIMS prepends the string "US3-" to this guid and it becomes the gfacID, used by the GFAC system to identify the job, and used in the listen script and grid control program to identify the job. The database name and gfacID is how the grid control program associates the HPC job with the analysis request record in the LIMS database.

Contents of the data directory

US3 HPC Database Tables

The LIMS job submit process

Supercomputer Queue

This task is controlled by the Supercomputer system. It is responsible for controlling the jobs running on that system and communication with clients.

Communication tasks include receiving tasks, returning job status, and informing the client when a task has been completed or aborted.

MPI_Analysis (UltraScan HPC Analysis Program)

The MPI_Analysis program reads the control.xml file and uses that as a guide to read other data files as needed to populate internal data structures. It then performs the analysis, writing any needed output to disk.

At the beginning of the program, periodically during execution, and at the end of of processing, MPI_Analysis writes a UDP status datagram to a listener on the host and port specified in the control.xml file. Each datagram will consist of the analysisRequestGUID and a status (e.g. started, iteration number, finished). This is not a reliable two-way communication and it is the responsibility of the listener to follow up and manage any missed messages.

grid-timeout

This program will either be scheduled periodically via cron, or run as a daemon. It will check status of jobs in the mysql database and initiate a status query for jobs that have overdue status updates. If a job has been aborted, it will notify the grid-listen program of that status.

grid-query

This is a command line program that submits a status query to the Supercomputer Queue and returns the result.

grid-listen

This program runs as daemon receiving udp packets from the MPI_Analysis program or the grid-timeout program. It is responsible for updating the mysql database table HPCAnalysisResult with current status and, upon completion or abort of an analysis, fetches needed files from the supercomputer cluster, sends an email to the user, and does any other cleanup necessary.

Attachments (3)

Download all attachments as: .zip