Multiphase Support
-multiphase [1|0] | Enables multiphase jobs. |
-mpres "resource string" | Sets the resources that will be used for each phase. |
-mpres "linux64 foo%linux64 bar:linux64 baz"
. In addition, the autoRescheduleCount
server configuration parameter
needs to be set to the max number of job phases or higher. The default is 4, so this
applies to jobs with 5 or more phases.
Specifying Resources
- mpres RESLIST
- Specification of the resources required by a multiphase job. The RESLIST specifies the resource lists for all phases of the job with % characters delimiting the phases. The sublists of resources for each phase are percent sign delimited.
- mpres+ <rsrc>
- Append one resource to the multiphase resource list. This option must
follow the
"-mpres"
or"-mpres<n>"
option, otherwise the resources specified in"-mpres+"
will be overwritten. - -mpres1 RESLIST, -mpres2 RESLIST, -mpres<n> RESLIST
- Specify resources for a stage <n> of a multiphase job. The number <n> is in the range from 1 to 9.
vtk_resourcemap_set License:blue UNLIMITED License:blue_tasker1
vtk_resourcemap_set License:red UNLIMITED License:red_tasker2
vtk_resourcemap_set License:blue_tasker1 1 tasker1
vtk_resourcemap_set License:red_tasker2 1 tasker2
You could then run a
multiphase job as:
nc run -multiphase 1 -mpres "linux64 License:blue%linux64 License:red%linux64 License:blue" -- -e BASE -D /home/jjmcwill/testDir/testMultistage.sh
MPRESOURCES | Contains the same resources passed in
-mpres , and is used to reset the job resources
for each phase. |
MPCURRENTPHASE | Contains an integer indicating the current job phase. It starts at one, and has a max value of 9. |
- If the script exits with an exit code of 216, Accelerator will increment the job phase, change the job resources, and reschedule the job to run again.
- If the script exits with an exit code of 0, the job is considered "Done", and MPCURRENTPHASE is reset to 1.
Failed Jobs
If a job fails during a phase with a code other than 0 or 216, it is considered
FAILED and MPCURRENTPHASE will not increment. If the job is invalided and re-run
(such as, nc rerun -f JOBID
), the job will re-run starting at
MPCURRENTPHASE and further phases will run if the job exits with code 216, as
described above.
Logging
After the first phase is run, subsequent phases of the job will have the command
rewritten so that the wrappers are passed -a -A
, telling the
wrappers to append to the job log. This is so that all phases of the job get their
stdout and stderr logged to the same file. If this was not done, each phase of the
job would overwrite the log, and you would only see the output from the last phase
that was run.
If Accelerator does not detect one of the standard vov wrappers at the beginning of
the command line, it will assume the command is not using a wrapper. In this case,
it will look for the standard >;
redirect symbol in the command and
replace it with >>;
.
REST Support
In the payload for submitting a job via REST, two new fields are allowed:
multiphase
and mpres
. Setting
multiphase = True
enables multiphase job support. Setting the
mpres
field behaves the same as described for the command line
argument described above. Re-running a multiphase job that has failed via the REST
re-run API will behave similarly to rerunning a failed multiphase job from the
command line as described above.