| 24 | || Job submission parameters || cluster_shortname-requestGUID-jobxmlfile.xml || |
| 25 | || Post-analysis job stats || job_statistics.xml || |
| 26 | || Queue messages, stdout, stderr || database-requestID-messages.txt || |
| 27 | || Analysis results, if any || analysis.tar || |
| 28 | |
| 29 | == The LIMS job submit process == |
| 30 | |
| 31 | In LIMS job submission happens in stages. In the first stage everything related to the data itself is collected and placed into a tar file. This includes the scan data itself, the edit profile, analysis parameters, and any noise files that have been selected. In the second stage, the job submission parameters are placed into a job submit file and the job is submitted. It is submitted as an HTTP request, where the job submission xml file is the body of the request and the tar file is sent as a base-64 encoded, chunk-split attachment. LIMS writes the parent HPCAnalysisRequest record in the LIMS database, as well as the analysis record for the job in the local gfac database, which contains the jobs from all the LIMS databases that are currently being processed. |
45 | | At the beginning of the program, periodically during execution, and at the end of of |
46 | | processing, MPI_Analysis writes a UDP status datagram to a listener on the host and port specified |
47 | | in the control.xml file. Each datagram will consist of the analysisRequestGUID and a |
48 | | status (e.g. started, iteration number, finished). This is not a reliable two-way |
49 | | communication and it is the responsibility of the listener to follow up and manage any |
50 | | missed messages. |
| 43 | At the beginning of the program, periodically during execution, and at the end of of processing, MPI_Analysis writes a UDP status datagram to a listener on the host and port specified in the control.xml file. Each datagram will consist of the analysisRequestGUID and a status (e.g. started, iteration number, finished). This is not a reliable two-way communication and it is the responsibility of the listener to follow up and manage any missed messages. |
59 | | == grid-query == |
| 49 | || Current Status || Meaning || |
| 50 | || SUBMITTED || Job is queued, waiting to be run. || |
| 51 | || SUBMIT_TIMEOUT || Job is queued, waiting to be run, however it's been waiting for more than 24 hours. || |
| 52 | || RUNNING || Job is running. || |
| 53 | || RUN_TIMEOUT || Job is running, however it's been running for more than 24 hours. || |
| 54 | || DATA || Job has completed, however the data has not arrived. || |
| 55 | || DATA_TIMEOUT || Job has completed and we're waiting on the data, however we've been waiting for more than an hour. || |
| 56 | || COMPLETE || Job has completed and data has been delivered. || |
| 57 | || FAILED || Job has failed. || |
| 58 | || CANCELED || User canceled the job. || |
| 59 | || ERROR || Grid control or listen has encountered an undocumented error. || |
66 | | This program runs as daemon receiving udp packets from the MPI_Analysis program or the grid-timeout |
67 | | program. It is responsible for updating the mysql database table HPCAnalysisResult with current |
68 | | status and, upon completion or abort of an analysis, fetches needed files from |
69 | | the supercomputer cluster, sends an email to the user, and does any other cleanup necessary. |
| 65 | || Current Status || Actions taken || Timeout || On timeout, change status to || |
| 66 | || SUBMITTED || If > 10 mins, request status update || 24 hours || SUBMIT_TIMEOUT || |
| 67 | || SUBMIT_TIMEOUT || Request a status update || 24 hours || FAILED || |
| 68 | || RUNNING || If < 10 mins, request status update || 24 hours || RUN_TIMEOUT || |
| 69 | || RUN_TIMEOUT || Request a status update || 48 hours || FAILED || |
| 70 | || DATA || Request status; Request data every 5 mins || 1 hour || DATA_TIMEOUT || |
| 71 | || DATA_TIMEOUT || Request status; Request data every 15 mins || 24 hours || FAILED || |
| 72 | || COMPLETE || Do cleanup || || || |
| 73 | || FAILED || Do cleanup || || || |
| 74 | || CANCELED || Do cleanup || || || |
| 75 | || ERROR || Do cleanup || || || |