2019.01 Update 5 Release
New Features and Enhancements
The following new features and enhancements were introduced this software release:
Product | Internal Number | Case Number | Description |
---|---|---|---|
All | VOV-11299 | 25102 | Fixed error that resulted in a "Server is operating on a non-internal object" error to be printed in the server log. This error is linked to querying for the "why" status of a job that has an input dependency. |
All | VOV-11350 | Fixed statistics for some hierarchical sets. | |
Accelerator | VOV-7535 | 20833 | The proc VovGetRevokeDelay {} can now be added and customized by redefining it in vovresourced/config.tcl under the SWD directory to allow users to customize the revoke delay to be used in vovreconciled. This allows users to have the revoke delay from their job classes override the default value of RESD(revokeDelay). The proc definition has been added to the Altair Accelerator Administrator Guide. In addition, the verbosity levels of various messages have been modified per customer requests. |
Accelerator | VOV-3765 | 20136 | The nc run command now supports an option to
control the number of times a job can be rescheduled. Thus:
|
Accelerator Plus | VOV-11260 | 24568, 24890, 25001, 25249 | Added policy parameter fairshare.overshoot.damping 0/1; 1=enabled, 0=disabled, controls whether or not FairShare restricts the number of jobs scheduled for groups that are over budget. |
Accelerator Plus | VOV-11337 | Added accounts option (-A) for PBS Pro resource list for Accelerator Plus. | |
Accelerator Plus, FlowTracer | VOV-11188 | A configuration value for the Accelerator Plus configuration file, SWD/vovwxd/config.tcl, has been added allow the user to specify a limit on how many consecutive failures of a slave job in the base queue will be allowed before we no longer attempt to create slaves for a bucket. The default value is 0 (no limit). This is to prevent a malformed job from causing churn in the system. | |
Allocator | VOV-9411 | 23688 | Added support for hierarchical Altair Allocators. Please see
documentation for details. Resource plots in the child LA are identical to the plots in the top-most parent LA. In other words, they show the data for the entire resource pool, and not the sub-set of the resource corresponding to the child LA. The slave definition in the child LA (in the <swd>/slaves.tcl file) should use the hostname of the slave, and not localhost. For
example, 'jaguar' is the hostname
here:
This hostname must match the hostname used when adding the child
LA to the parent LA (in
<swd>/vovlad/config.tcl). For
example:
|
FlowTracer | VOV-11462 | This update brings feature parity with vovlsfd. For example, LSFjobname can now be overridden on a per job basis. Bucket reservations are now used to map jobs to batch submitted vovslave, rather than resource strings. Code to address jobs that have an xdur greater than maxlife has moved into the vovslave itself. |
Resolved Issues
The following issues were resolved in this release.
Product | Internal Number | Case Number | Description |
---|---|---|---|
All | VOV-10110 | vovwxd cleaner log files will be preserved for the time spec specified by the delCleanerLog,older config parameter in vovwxd/config.tcl. | |
All | VOV-11326 | Slave slot licenses will be released when a slave exits in Auto Licensing mode. | |
All | VOV-11294 | The /local/registry/system-accelerator folder may have not always been writable because it was created with user's umask permissions. Now created with 777. | |
All | VOV-11350 | Fixed statistics for some hierarchical sets. | |
Accelerator | VOV-8012 | 21662 | VOV_LM_VARNAMES functionality will now be available for interactive jobs. |
Accelerator | VOV-11347 | 25138 | Fixed an issue in vovfsgroup loadconfig where the weight and window values of the FairShare group were not getting set to the values in the config file. |
Accelerator | VOV-11242 | 25036 | Broken HTML links in the Altair Accelerator Training Guide have been fixed. |
Accelerator | VOV-10926 | 24819 | Fixed issue with interactive jobs (nc -I) failing when run with a PRECMD that reschedules the job. This also fixes the issue of the PTY overriding the exit code from the PRECMD. |
Accelerator | VOV-11307 | 25103 | Fixed a race condition in the job fostering system, which is used to properly account for jobs running on a host that has had its vovslave restarted, that could cause the foster jobs to fail and the restarted vovslave to refuse any future stop requests. |
Accelerator | VOV-11305 | 25109 | Prevent a new autokill sequence from initiating if an existing sequence is already being processed. Prior to this change, an autokill sequence that took more than 5m to process would result in a new sequence starting without the existing one completing. This would result in the slave entering a looping condition that may never end. |
Accelerator | VOV-11296 | 25093 | A change to the command nc hosts that displayed all reservations a slave may have was backed out due to adverse performance effects on the server. The command nc hosts will now only show the "dominant" reservation", i.e. the oldest unexpired reservation. |
Accelerator, Accelerator Plus | VOV-11338 | 25079 | Fixed issues with job resource usage reporting by including detached processes with unique gpids and session ids by matching VOV_JOBID and VOV_SLAVE_PID. The VOV_JOBID to be matched will be taken from the transaction object rather than depending on the subslave environment. Also added NC_JOBID and NC_SLAVE_PID env variables so that WX and NC slaves can both correctly track processes. |
Accelerator | VOV-11358 | Fixed an issue where running Altair Accelerator in interactive mode (nc run -I) with both the input and output redirected would result in lost output. | |
Accelerator | VOV-11336 | CS0120656, CS0120663 | Fixed issue where preempted jobs may have been prematurely resumed preventing the preempting job from running. |
Accelerator | VOV-11180 | Handled invalid values for these 3 resources: "RAM" "CORES" "SLOTS". Numeric within this range [0 - 2147483647] is allowed. | |
Accelerator | VOV-11160 | The output of nc getfield JOB CPUTIME with an uppercase "CPUTIME" will now accurately show time in milliseconds instead of 0. | |
Accelerator | VOV-11210 | Changed 'cputime' type from integer to integer64 in vovshow -fields command output | |
Accelerator | VOV-11222 | Underscores have been removed from the Node Field Names help topic to reflect the updated behavior. | |
Accelerator | VOV-11799 | CS0120663 | Fixed issue where preempted jobs may have been prematurely resumed preventing the preempting job from running. |
Accelerator | VOV-11828 | CS0120715 | The output of "nc getfield JOB CPUTIME" with an uppercase "CPUTIME" will now accurately show time in milliseconds instead of 0. |
Accelerator Plus | VOV-11276 | 24834, 25080 | Fixed an issue with array submission in WX that would lead to "Illegal set id" errors. This also fixes an issue that resulted in log file conflicts with the error messages "Error: OnLaunchError for <queue>,time: <timestamp>, err: Launcher job failed:" and "FATAL ERROR: Cannot use FILEX <log_filename>" |
Accelerator Plus | VOV-11234 | Fixed issue with core file generation on signals SIGSEGV and SIGBUS | |
Accelerator Plus | VOV-11115 | Internal optimization of the WX slave creation process. | |
Accelerator Plus | VOV-11191 | vovwxd will no longer create extraneous slave objects and/or processes when launching slaves using the vovlsf.tcl driver. | |
Accelerator Plus, FlowTracer | VOV-11646 | vovwxd should no longer attempt to provision extra slaves when the number of pending slaves is sufficient to handle the currently queued load. | |
Accelerator Plus | VOV-11677 | Fixed issue which prevented vovwxd from launching more slaves when the limit was increased in the SWD/vovwxd/config.tcl file without requiring a vovwxd daemon restart. | |
Accelerator Plus | VOV-11645 | Fixed issue that caused the PBS_JOBID environment variable to be modified to contain the numeric part of the job ID only. | |
Accelerator Plus | VOV-11630 | Modified PBS driver script to use the -V submission option for launcher jobs to ensure that all environment variables required for slave operation are set in the slave's environment. Also added a new configuration item, CONFIG(pbsBin), in the vovwxd configuration file that can be used to specify the location of the PBS binaries (default: /opt/pbs/bin). | |
Allocator | VOV-11306 | 25123 | Fixed a crash that was introduced in 2019.01 u4. The call stack
for the crash would have entries similar to the
following:
|
FlowTracer | VOV-10095 | vovslaves running under WX or FT with vovwxd will have the environment variable VOV_SLAVE_NAME set to the name of the FT slave spawned by vovwxd. The VOVSLAVE environment variable will no longer be set. CONFIG(slave,timeout) will be passed to the vovslaves launched by vovwxd as the -t option, setting the time allowed for the new vovslave to connect to vovserver. | |
FlowTracer | VOV-11259 | Supported Force Validation of 'PHANTOM' files. | |
FlowTracer | VOV-11187 | Allow for a prescripts subdirectory to be placed inside the
vovwxd/launcher directory and be immune
from periodic cleanup by the vovwxd stale file
cleaner. Also change the LSF:pre special resource to use this
directory as the base directory for a specified prescript.
Example: |
|
FlowTracer | VOV-1171 | Fixed problem which prevented vovwxd from launching additional slaves as expected when more jobs are added to a bucket that has active jobs. | |
Monitor | VOV-11366 | Fixed an issue with the Detailed Plots report that resulted in a Tcl error when generating a report for a feature with no usage for the specified time range. Also fixed an issue with the Usage Trends report that resulted in a Tcl error when generating a report for a feature with a capacity of 1000 or more tokens. | |
Monitor | VOV-11467 | Fix an issue with all SFD packages for Windows where in some Windows configurations, the controls for installing and controlling a Windows Service were disabled due to administrative rights not being detected properly. |