Control FairShare Tree Access

By default, the FairShare tree is extendable; new nodes can be added without default user permissions. This enables users to add their own nodes, including those implicitly added at submission time by inclusion in the command line.

For example, here is a minimal FairShare tree:
mac12 vncaux@mac12.local DEFAULT vnc/vncaux.swd > vovfsgroup show
ID        GROUP                                           OWNER WEIGHT   WINDOW RUNNING   QUEUED
000000016 /                                            (server)      0    1h00m       0        0
000001006 /system                                        taylor    100    1m00s       0        0
000001004 /time                                          taylor    100    1h00m       0        0
000001005 /time/users                                    taylor     10    2h00m       0        0
000108049 /time/users.pistol                             pistol    100    2h00m       0        0

When users submit jobs, the jobs are added to the /time/users branch by default. In this case, the user pistol has recently submitted a job and a node was created for him, and his job was attached to it. This model allows users to run jobs without special settings, which requires a minimal amount of setup work for the administrator. By default, each job will get an equal share of the /time/users node.

However, a user can also run a job outside the /time/users node by using the -g option to the nc run command. In the following example, a job is running on the /garterinn node:
[localhost:~] falstaff% nc run -g /garterinn sleep 500
Fairshare= /garterinn.falstaff
Resources= macosx
Env      = SNAPSHOT(vnc_logs/snapshots/falstaff/macosx/env55747.env)
Command  = vw sleep 500
Logfile  = vnc_logs/20131015/161821.9853
JobURL   = http://mac12:6349/cgi/node.cgi?id=000108058
JobId    = 000108058
Looking at the FairShare tree summary:
mac12 vncaux@mac12.local DEFAULT vnc/vncaux.swd > vovfsgroup show 
ID        GROUP                                           OWNER WEIGHT   WINDOW RUNNING   QUEUED
000000016 /                                            (server)      0    1h00m       1        0
000108056 /garterinn                                   falstaff    100    1h00m       1        0
000108057 /garterinn.falstaff                          falstaff    100    1h00m       1        0
000001006 /system                                        taylor    100    1m00s       0        0
000001004 /time                                          taylor    100    1h00m       0        0
000001005 /time/users                                    taylor     10    2h00m       0        0
The above data shows that /garterinn has been added, which allows the job to run. Additionally, the weight for this node is set to the default value of 100. The total resources at the top level is now divided in 3 (garterinn, system, time) each with equal weights of 100. With this setup, falstaff has a FairShare allocation of 1/3rd of the compute resources. In other words, with unrestricted access, arbitrary users can easily tie up an inordinate proportion of the total resources.
Note: For this reason, it is often desirable to restrict access to the top level.
The example below shows the default permissions for the top level:
mac12 vncaux@mac12.local DEFAULT vnc/vncaux.swd > vovfsgroup show /
Id:       000000016
FullName: /
Owner:    (server)
        ACL  1: OWNER      ""   ATTACH DETACH EDIT VIEW FORGET DELEGATE EXISTS
        ACL  2: ADMIN      ""   ATTACH DETACH EDIT VIEW FORGET
        ACL  3: EVERYBODY  ""   ATTACH DETACH VIEW

The key value is the third permission listed, ACL 3. ATTACH indicates that EVERYBODY can attach node to this root location. When a job runs, it needs ATTACH permission. It would be desirable to reduce the EVERYBODY ACL to VIEW. However, there is no direct mechanism to selectively remove ACLs at this level of granularity. Instead, the solution is to go back to a zero ACLs and then add more.

Zero ACL

Changing to a zero ACL state on the root node '/' is problematic as that would remove your own permissions to edit the ACL. The workaround is to get the SERVER role. SERVER is a super user mode that ignores the restrictions implied by the ACLs. As well as resolving the issue of zero ACLs on the root node, the SERVER role also allows correcting other lock-out scenarios that may occur due to administration errors.
Important: In ACL terms, the SERVER role is the highest level of access, and is valuable as a last resort back door access.

To access the SERVER role requires an active login shell on the same host as the Accelerator server process (such as ssh into the Accelerator server host). Additionally, Accelerator must be accessed through the loopback interface 127.0.0.1. To do so, set VOV_HOST_NAME=localhost.

Once in the SERVER role, the root '/' node permissions can be fixed. This is done with three actions:
  1. Set the OWNER
  2. Set the ADMIN
  3. Set EVERYBODY ACLs
Note: There is a side effect of working on the top level node '/'. The ACL change is applied recursively to all nodes, which will need to be fixed. Here is a transcript of the top level change:
[cadmgr@rtda01 ~]$ vovfsgroup acl / SET OWNER ALL
[cadmgr@rtda01 ~]$ vovfsgroup show /
Id:       000000016
FullName: /
Owner:    (server)
        ACL  1: OWNER      ""   ATTACH DETACH EDIT VIEW FORGET DELEGATE EXISTS
[cadmgr@rtda01 ~]$ vovfsgroup acl / APPEND ADMIN ALL
[cadmgr@rtda01 ~]$ vovfsgroup show /
Id:       000000016
FullName: /
Owner:    (server)
        ACL  1: OWNER      ""   ATTACH DETACH EDIT VIEW FORGET DELEGATE EXISTS
        ACL  2: ADMIN      ""   ATTACH DETACH EDIT VIEW FORGET DELEGATE EXISTS
[cadmgr@rtda01 ~]$ vovfsgroup acl / APPEND EVERYBODY VIEW
[cadmgr@rtda01 ~]$ vovfsgroup show /
Id:       000000016
FullName: /
Owner:    (server)
        ACL  1: OWNER      ""   ATTACH DETACH EDIT VIEW FORGET DELEGATE EXISTS
        ACL  2: ADMIN      ""   ATTACH DETACH EDIT VIEW FORGET DELEGATE EXISTS
        ACL  3: EVERYBODY  ""   VIEW
With the above setup, the user falstaff will be unable to move to the boarshead. Following is the message to expect when a user tries to submit a job to a nonexistent FairShare node, and the parent's node has been locked down.
[localhost:~] falstaff% nc run -g /boarshead sleep 500
vnc 10/17/2013 12:51:38: Error: Problem joining fairshare group /boarshead
	Please check with administrator to see that you have 
	permissions to join the group.
vnc 10/17/2013 12:51:38: FATAL ERROR: Failed to submit batch job.
Since actions on / are currently recursive, it may be necessary to relax the restriction on the default group /time/users:
rtda01 vnc@rtda01 [BASE] 854 > vovfsgroup acl /time/users SET EVERYBODY "ATTACH VIEW"

Clean Up After an Escape

While the unpermitted attach was prevented (as described above), deleting the offending node is problematic. A node can only be deleted when it is empty; all jobs, including valid jobs, must be forgotten. When this is not possible, the alternative action is to reduce the weight of the offending node as shown below:
mac12 vncaux@mac12.local DEFAULT vnc/vncaux.swd > vovfsgroup modrec /garterinn weight 1
mac12 vncaux@mac12.local DEFAULT vnc/vncaux.swd > vovfsgroup show
ID        GROUP                                           OWNER WEIGHT   WINDOW RUNNING   QUEUED
000000016 /                                            (server)      0    1h00m       0        0
000108056 /garterinn                                   falstaff      1    1h00m       0        0
000108057 /garterinn.falstaff                          falstaff      1    1h00m       0        0
000001006 /system                                        taylor    100    1m00s       0        0
000001004 /time                                          taylor    100    1h00m       0        0
000001005 /time/users                                    taylor     10    2h00m       0        0
000128088 /time/users.taylor                             taylor    100    2h00m       0        0