Stop Jobs
A job can be stopped when it is running or queued. Stopping a job does not "forget it" from the vovserver database. A job can only be stopped by the owner or the Accelerator administrator.
nc stop
nc: Usage Message
NC STOP:
Stop jobs.
1. If the jobs are running, they are killed
(unless you use option -dequeueonly).
2. If the jobs are scheduled in the queue,
they are removed from the queue.
In either case, the jobs remain in the system.
To remove them from the system, use the "forget" command.
Jobs in the system can be rerun with the "rerun" command.
When stopping a single job, the procedure checks for
the properties NC_STOP_SIGNALS and NC_STOP_SIG_DELAY
attached to the job to be stopped.
The list of signals is also controlled
by the environment variables VOV_STOP_SIGNALS and NC_STOP_SIGNALS.
If both NC_STOP_SIGNALS and VOV_STOP_SIGNALS are present in the
environment, the value of VOV_STOP_SIGNALS will be used. Their
functionality is otherwise identical.
The default list of signals is TERM,HUP,INT,KILL and
can be customized with the variable defaultStopSignalCascade in policy.tcl.
USAGE:
% nc stop [OPTIONS] <jobId> ...
OPTIONS:
-after <s> -- Start sending signals after specified seconds.
This is an initial delay, between 0 and 20s.
-allusers -- Stop all jobs (only ADMIN can do it).
-d -- Same as -dequeueonly.
-delay <s> -- Minimum delay between signals (in seconds),
between 0 and 20s. Default is 3.
This can also be set with the property
NC_STOP_SIG_DELAY, or with the environment
variables NC_STOP_SIG_DELAY or
VOV_STOP_SIGNAL_DELAY. If both NC_STOP_SIG_DELAY
and VOV_STOP_SIGNAL_DELAY are present in the
environment, the value of NC_STOP_SIGNAL_DELAY
will be used.
Priority: 1. Option -delay
2. job property NC_STOP_SIG_DELAY
3. env variable VOV_STOP_SIGNAL_DELAY,
NC_STOP_SIG_DELAY
4. default
-dequeueonly -- Just remove jobs from the queue.
All currently running jobs are not affected.
Can be abbreviated to -d.
-dir <directory> -- Stop all jobs in the given directory.
-exclude <PROCLIST> -- List of processes to exclude from receiving the
signal.
-h -- This message
-include <PROCLIST> -- List of processes to receive the signal.
-J <jobname> -- Stop all my jobs with given jobname.
-mine -- Stop all my jobs.
-set <setname> -- Stop all my jobs in the given set.
-sig <SIGLIST> -- Same as -signals.
-signals <SIGLIST> -- Comma separated list of signals to send to the jobs
(default is the sequence TERM,HUP,INT,KILL )
This can be also set with property NC_STOP_SIGNALS
or with the environment variables NC_STOP_SIGNALS
or VOV_STOP_SIGNALS.
Priority: 1. Option -signals
2. job property NC_STOP_SIGNALS
3. env variable VOV_STOP_SIGNALS,
NC_STOP_SIGNALS
4. default (can be configured as
defaultStopSignalCascade
in policy.tcl)
See also: vovshow -env VOV_STOP_SIGNALS
vovshow -env NC_STOP_SIGNALS
-skiptop <0|1> -- Whether to kill the top process.
This is normally the job wrapper (e.g. vw, vwi).
Default is 0.
-why <reason> -- Give a reason for the stop.
This is stored on the WHYSTATUS
field of the stopped jobs.
EXAMPLES:
% nc stop 00123456
% nc stop -d -mine
% nc stop -after 3 -mine
% nc stop -set Class:hsim
% nc stop -mine -why "Jobs no longer needed"
% nc stop -sig "TERM,KILL" -delay 4 0012345
% env VOV_STOP_SIGNALS=TERM,INT,KILL nc stop 0012345
SEE ALSO:
% vovshow -env VOV_STOP_SIGNALS
% vovshow -env NC_STOP_SIGNALS
% vovshow -env VOV_STOP_SIGNAL_DELAY
% vovshow -env NC_STOP_SIG_DELAY
Override Signals to Stop a Job
A job can be stopped by overriding the sequence of signals that are sent for the job. To do so, set the properties NC_STOP_SIGNALS and NC_STOP_SIG_DELAY.
Automatic Stopping Based on Elapsed Time
If a job is submitted with the -autokill option, it will be stopped after the specified amount of time has elapsed. The check to stop the job is performed by the tasker itself at an interval of about one minute, which can be controlled with the -U option of vovtasker).
% nc run -autokill 30m sleep 1000000
Automatic Stopping Based on CPU Time
To stop a job that exceeds a specific duration of CPU time, set the variable VOV_LIMIT_cputime. A job that exceeds the limit will be killed by UNIX and will have status "Failed".
% nc run -e "BASE+D(VOV_LIMIT_cputime=10)" vovmemtime 10 100 0