wiki:SomoMpi

US-SOMO mpi notes

us_saxs_cmds_mpi

  • computes various small angle scattering curves
  • executable $HOME/ultrascan/bin/us_saxs_cmds_mpi
  • setup on lonestar, user ultrasca
  • test submission directory ~ultrasca/ussomo_test
    • to run the example:
       $ cd ~ultrasca/ussomo_test/run
       $ cp ../bjob ../myjob.tgz .
       $ qsub bjob
      

  • sample job script:
    #!/bin/bash
    #$ -A TG-MCB070039N
    #$ -V                     # Inherit the submission environment
    #$ -cwd                   # Start job in  submission directory
    #$ -N uss_myjob               # Job Name
    #$ -o myjob.stdout
    #$ -e myjob.stderr
    #$ -pe 12way 12          # Requests 12 cores/node 12 total
    #$ -q normal            # Queue name
    #$ -l h_rt=00:15:00       # Run time (hh:mm:ss) - 24 hours
    #$ -M emre@biochem.uthscsa.edu      # Email notification address (UNCOMMENT)
    #$ -m be                  # Email at Begin/End of job  (UNCOMMENT)
    set -x                   #{echo cmds, use "set echo" in csh}
    
    ibrun $HOME/ultrascan/bin/us_saxs_cmds_mpi iq myjob.tgz
    
  • "iq" argument is currently fixed
    • might remove later or define new types
      • not sure yet
  • library dependencies: qt3 and its supporting libraries (libX11 etc).
    • no dependence on libus (the ultrascan library)
  • environment variable ULTRASCAN should be set
  • there is one input file, "jobname".tgz or "jobname".tar
  • there is one output file, "jobname"_out.tar
    • "jobname" is guaranteed not to contain _out
    • I could change this to a uniform results_out.tar if necessary
  • the input file will be removed during execution
  • lots of other "dropping" will be left in and under the run dir
    • these can be ignored / removed

US-SOMO condor notes

us_saxs_cmds_t

  • example condor script
    Universe                = vanilla
    Executable              = /home/ba01/u108/brookes/ultrascan/bin/us_saxs_cmds_t
    arguments               = iq job-1.tgz
    transfer_executable     = false
    should_transfer_files   = yes
    when_to_transfer_output = ON_EXIT
    transfer_input_files    = job-1.tgz
    transfer_output_files   = job-1_out.tgz
    Log                     = runcondor.log
    Output                  = runcondor.out
    Error                   = runcondor.error
    Queue
    
  • executable is us_saxs_cmds_t (vs us_saxs_cmds_mpi for the mpi version)
  • library dependencies: qt3 and its supporting libraries (libX11 etc).
    • no dependence on libus (the ultrascan library)
  • environment variable ULTRASCAN should be set
  • there is one input file, "jobname".tgz
  • there is one output file, "jobname"_out.tgz
    • "jobname" is guaranteed not to contain _out

supplementary programs

  • these are programs available in binary-only format that can be used by us_saxs_cmds_{t,mpi}
  • they generally should be available under $ULTRASCAN/bin
    • required for some saxs computations:
      • crysol foxs
    • required for dmd computations:
      • complex.linux complex_M2P.linux findSS.linux reconMissSideChains.linux rexDMD.linux xDMD.linux

Job submission notes

Additional points / To Do

  • Additional points:
    • we are currently running via staging directly on target and communicating with GFAC
      • staging needs a better method than "scp"
      • GFAC is working well so far
        • add stdout/stderr to the files copied back to the job submission directory
        • add ranger
      • authentication questions
        • need middleware or directly in GFAC?
      • a GFAC or middleware options "GET resources" would be helpful
        • fields for resource
          • type: mpi condor
          • staging location: eg $SCRATCH/ussomo/jobs with $SCRATCH expanded
          • maximum run time (e.g. 24 or 48 hours)
          • performance index: some sort of relative speed indicator
          • current status: such as "down for maintenance" could simply be "up or down" for now
            • maybe reads from some text file
              • that way we can update things like "scheduled maintenance"
              • is there some xcede url where this can be directly read?
                • do this within gfac and include or add a resource status url to the fields sent via "GET resources"
          • probably some more fields
    • "middleware" server
      • initial user priority throttled down
        • approval stage increased priority
      • will occasionally poll on submission requests for "publication info"
        • throttle based on # of registered pubs
        • share with condor
          • David requested this info
    • udp server for ussomo progress messages
      • status merged in "GET" status or via a separate request?
    • additional GET status info?
      • time job when active and/or time left
    • when jobs complete but not COMPLETED: (skype transcript)
       rocco1-nmae08-43_all_h05.tar (on ranger) shows ACTIVE but there is nothing in the queue and the results are there, can you change it to COMPLETED?
       Similar situation with rocco1-nmae27-43_all.h05.tar (on lonestar), shows PENDING, but nothing in queue and results are there.
      [07:30:56] Raminder Singh: Emre in such case we have a service to overwrite the GFAC status
      [07:32:04] … http://gw61.quarry.iu.teragrid.org:8080/ogce-rest/job/setstatus/rocco1-nmae27-43_all_h05.tar
      [07:33:08] … use this to set the status of job.. this happens because gram does not report the status sometime
      [07:33:58] … this will work for such cases
      

development notes

Last modified 7 years ago Last modified on Mar 14, 2012 10:25:17 PM

Attachments (2)

Download all attachments as: .zip