Accelerator Plus Modulation
When using Accelerator Plus, jobs launched in Accelerator Plus are essentially bundled into groups that are run by vovtaskers on hosts allocated by the base scheduler. This means that it is harder to depend on job retirement to free up slots in the base scheduler, because the bundle of jobs is of course many times longer than the individual jobs.
This section describes a means of freeing up slots more quickly by preempting the vovtaskers that have been assigned to Accelerator Plus, based on FairShare statistics. A preempted vovtasker will stop accepting jobs (tasker status "DONE") and will still finish any running job.
The preemption rule drives the system and this is the main place to influence the systems behavior. A sample rule is found in $THISGIT/vovpreempt/config.tcl and this should be appended to any existing preemption rules in XXXX.swd/vovpreemptd/config.tcl.
While the rule can be tuned there are some key elements that must be retained.
Preemptable jobs should be have the predicate
JOBNAME~${WXQueueName}
and the method should send SIGUSR2 but
only to the vovtasker process:
0:*:EXT,SIGUSR2,vovtasker
.
The preemptable job sort predicate is "FS_EXCESS_RUNNING DESC, PRIORITY, AGE DESC" which chooses vovtaskers ordered on greatest excess FairShare, lowest priority and oldest age.
# Taken from $VOVDIR/etc/config/vovpreemptd/config_wx_modulation.tcl
set WXQueueName wx
VovPreemptRule \
-pool "WXJobModulation" \
-rulename "fastFairshare_$WXQueueName" \
-ruletype "FAST_FAIRSHARE" \
-method "0:*:EXT,SIGUSR2,vovtasker" \
-killage 0 \
-numjobs 10 \
-maxattempts 1 \
-waitingfor "HW" \
-preempting "JOBNAME~${WXQueueName} FS_EXCESS_RUNNING<0" \
-preemptable "JOBNAME~${WXQueueName} FS_EXCESS_RUNNING>0 FSRANK9>>@FSRANK9@" \
-resumeres "" \
-enable 1 \
-sortjobs "FS_EXCESS_RUNNING DESC,PRIORITY,AGE DESC"
Monitoring
This is a dynamic system with quite a few moving parts and this makes monitoring a bit challenging. Some suggestions follow.
% vovsh -x 'vtk_server_config set_debug_flag PreemptRules'
% vovsh -x 'vtk_server_config reset_debug_flag PreemptRules'