Client Limitation and Tuning

The maximum number of clients - the combination of vovtaskers, user interfaces and proxies, that can be concurrently connected to a vovserver is limited by the number of file descriptors available.

This is an operating system parameter, and is inherited from the shell that starts vovserver. It can not be changed after vovserver starts.

There are two kinds of limits, a hard limit and a soft limit. Limits are imposed to reduce the likelihood of exhausting system resources. A soft limit may be set by shell commands so long as a value less than or equal to the hard limit is selected. The hard limits for descriptors and other resources may vary by user, group, and other attributes.

On UNIX, this number is operating system dependent. In most UNIX installations, the hard limit for file descriptors is 1024 or more.

On Windows NT, VOV sets the limit at 256 file descriptors.

On Linux, root can change the limits in the file /etc/security/limits.conf. Example:
* hard nofile 8192
* soft nofile 2048

The above example sets the soft limit for all users to 2048, and the hard limit to 8192. The '*' character could be replaced by that of the Accelerator owner account, e.g. 'cadmgr'.

Background

Each operating system offers a limited number of file descriptors for each process. In modern systems, this limit may be up to 65000. The vovserver can handle as many clients as the "descriptors limit" allows. It is also possible to reduce the number by setting a soft limit using the methods described above.

To allow a large number of clients, vovserver must be started with a high limit. The ncmgr command reports the number at startup time, please read it carefully before replying 'yes'. Example:
% limit descriptors 16000
% ncmgr start 

The file descriptors are used by the vovserver to communicate with the clients: vovtaskers, GUI, browser, interactive jobs, etc.

The utilization of file descriptors are approximately:
  • The server by itself needs about 10
  • The other descriptors less than 40 are not used
  • Each vovtasker needs 1
  • Each running batch job needs 1
  • Each running interactive job needs 2
  • Each vovconsole needs 2
  • Each nc monitor needs 1
Example: On a loaded farm with 500 vovtaskers, each with four CPUs, with half jobs interactive, the estimated number of descriptors needed is:
10 + 500 + 500 * 2  + 500 * 2 * 2 =  3510 file descriptors

This leaves descriptors for about 580 monitors and GUIs.

Behavior with Exhausted File Descriptors

The exhaustion of file descriptors rarely occurs. Altair Accelerator's main concern is preserving the integrity of the vovserver. Commands that attempt a new connection to the vovserver fail to connect and return an error message too many clients in the system. The vovserver will then post an alert showing too many open files. In that condition, ordinary commands such as nc hosts and nc list will not work because the vovsh that runs those commands cannot connect to vovserver.

Reserved Connection on the localhost

When file descriptors are exhausted, you can connect to vovserver using a special method through the software loopback interface (lo0, 127.0.0.1). This is achieved by setting VOV_HOST_NAME to localhost. Example:
% vovproject enable vncNNNN
% setenv VOV_HOST_NAME localhost
% vsi

Solutions to File Descriptor Exhaustion

A short-term solution is to stop some of the clients is to lower the demand for file descriptors. Transient GUI clients such as VovConsole, monitors, and Accelerator GUI should be stopped first. Any idle vovtaskers should also be stopped. Guidelines:
  • Check how users are submitting jobs. There are some limits on maxNormalClients and maxNotifyClients in the policy.tcl file to prevent accidental or malicious denial-of-service attacks. Sometimes we have seen jobs submitted with the -wl option and placed in background, each consuming 2 descriptors.
  • Next you should first find whether it is possible to raise the descriptor limit on the current host. If not, a longer-term solution is to move the vovserver to another host that offers more file descriptors. A newer versions of UNIX is a good candidate as they offer 65K descriptors.

It is possible to continue operation, even in the presence of interactive jobs, by moving the vovserver and making the new queue the default queue so that newly-submitted jobs go to the new default queue. The server on the host with limited descriptors will finish all jobs, and it may then be shut down.

Client Service Modes

On Linux-based systems, there are two client servicing modes from which to choose: poll (default) and epoll. The mode chosen specifies which POSIX mechanism the vovserver will use to determine which client file descriptors are ready for use. The mode can be specified by setting config(useepoll) to 0 (poll, default) or 1 (epoll) in the SWD/policy.tcl file.

Generally, the epoll mode should result in more efficient processing of service requests. As of this version, epoll mode is a new feature and is therefore disabled by default.