Interfaces to Other Batch Processing Systems
You can find examples of interfaces in the directory $VOVDIR/etc/tasker_scripts. The file $VOVDIR/etc/tasker_scripts/taskerLSF.tcl implements the interface to LSF.
Use Implemented Interface to SGE
% vovtasker -I $VOVDIR/etc/tasker_scripts/taskerSGE.tcl -T 300 -r "dc_shell_license dracula_license"
You can use other options of vovtasker to specify other attributes of your BPS agent. Among all the attributes, the default value of "capacity" for BPS agent is set to 100 (while the default for a normal tasker is equal to the number of tasker machine's CPUs). You can always overwrite this default by -T option. Usually you want to have a large capacity, because you want to give the BPS the ability to schedule jobs in the most effective manner. Capacities of 200 or 300 are typical.
Study and Write Your Own Implementation to Other BPS
The BPS agent is implemented in a BPS agent file, which is a Tcl script that implements the procedures described in the table below.
- Procedure
- Description
taskerStartJob
- This procedure submits a job to the BPS. All information about the job
is in the global array
jobDesc
. It should return a reference id, i.e. the id of the job in the context of the BPS. taskerStopJob
- This procedure is called when the tasker wants to stop a running or queued job.
taskerCheckJob
- This procedure is called when the tasker wants to check the status of a job. It returns one of the following values: LOST DONE FAILED RUNNING QUEUED.
taskerResumeJob
- This procedure is called when the tasker wants to resumed a suspended job.
taskerSuspendJob
- This procedure is called when the tasker wants to suspend a job.
taskerJobEnded
- This procedure is called when the tasker receives a notification from the server that a job has ended. This allows the tasker to update its state promptly instead of waiting for another timeout.
taskerMapResources
- A procedure to map the resources required by the job in the context of the local project and the resources required by the job in the context of the BPS.
taskerCleanup
- This procedure is called when the indirect tasker exits. It should be used to cleanup the garbage that may have been created by the tasker.
taskerStartJob
tasker Start
takes a single
argument, the jobId
. The rest of the job information is available
in the array jobDesc, as described in the following table: - Variable Name
- Meaning
jobDesc(command)
- The complete command line.
jobDesc(env)
- The environment label for the job.
jobDesc(id)
- The job Id.
jobDesc(priority)
- The VOV priority level.
jobDesc(resources)
- The resource list for the job.
jobDesc(user)
- The user that owns the job.
jobDesc(xdur)
- The expected duration.
proc taskerStartJob { jobId } {
global jobDesc env
# Generate a label from the command by eliminating
# all non alphanumeric characters.
set label $jobDesc(command)
regsub -all {[^a-zA-Z0-9_]+} $label "_" label
set label [string range $label 0 7]
set submitInfo [exec qsub -V -v VOV_ENV=BASE -j y -N $label $env(VOVDIR)/scripts/vovfire $jobId]
set refId [lindex $submitInfo 2]
return $refId
}
taskerStopJob
This procedure is called when the tasker wants to stop a job. The
procedure takes two arguments: the VovId
of the job and the
referenceId
returned by taskerStartJob
.
proc taskerStopJob { jobId refId } {
# Stop a SGE job.
exec qdel $refId
}
taskerCheckJob
VovId
of the job and the
referenceId returned by taskerStart
. It is
called when the tasker wants to find out the status of a job.
The procedure is expected to return one of the following values: - Return Value
- Meaning
LOST
- The job is no longer in the BPS. It is generally assumed that the job is done.
DONE
- The job is done.
FAILED
- The BPS believes that the job has failed.
RUNNING
- The job is currently executing.
QUEUED
- The job is in the BPS queue.
proc taskerCheckJob { jobId refId } {
# Check status of a SGE job.
set status [ParseOutputOf [exec qstat] $refId]
if { $status == "RUNNING" } {
vtk_tasker_job_started $jobId [GetStartTime $refId]
}
return $status
}
vtk_tasker_job_started $jobId $timespec
taskerSuspendJob, taskerResumeJob
These procedures are used to suspend and resume a job. These procedures also take two parameters: the jobId and the referenceId.
taskerJobEnded
This procedure is called when the tasker receives a notification from the server that a job has ended. This allows the tasker to update its state promptly instead of waiting for another timeout.
taskerCleanup
This procedure is called when the indirect tasker exits. It should be used to cleanup the garbage that may have been created by the tasker.