RedHawk by Apache/Ansys
- If you are running a version of Altair Accelerator before 2014.03, you need to apply the "redhawk_scripts" patch. Request the patch from https://www.pbsworks.com/ContactSupport.aspx
- You need to create an environment called REDHAWK. Use as template the file
in
$VOVDIR/eda/Ansys/redhawk/REDHAWK.start.sh
% cp $VOVDIR/eda/Ansys/redhawk/REDHAWK* $VOVDIR/local/environments/. % vi $VOVDIR/local/environments/REDHAWK.start.sh
- Test the environment with this sequence:
% ves BASE+REDHAWK % which redhawk ## Do you have redhawk in the path? % which nc_redhawk ## Do you have nc_redhawk? % lmstat -f redhawk ## Is LM_LICENSE_FILE correct?
- You may need to identify the taskers in your farm that
can run RedHawk. If all taskers are ok, you may skip this
step. To identify the taskers, use the taskerClass.table file and add the resource
"hasRedhawk" to selected taskers.
lnx0021: hasRedhawk lnx0022: hasRedhawk lnx0023: lnx0024: hasRedhawk
- You need an nc.cfg configuration file for the
-dmp option of redhawk.
GRID_TYPE RTDA ## This number must match the number in option -dp in nc run. NUMBER_OF_JOBS 4 ## This assumes you have a jobclass called "redhawk" QUEUE_NAME redhawk
Running Without License Management
To get started and understand how distributed processing works, let's run without worrying about licenses.
% ves BASE+REDHAWK
% setenv DISPLAY "good_name_for_DISPLAY:XX"
% nc run -e SNAPSHOT+D,DISPLAY=$DISPLAY -profile -preemptable 0 \
-dp 4 -dpres hasRedhawk -dpwait 3m \
nc_redhawk \
-lmwait -dmp nc.cfg -f run.tcl
Explanation of options
- -e SNAPSHOT+D,DISPLAY=$DISPLAY
- Take the current environment, including the DISPLAY variable. This may not be necessary if running in batch mode (option -b of redhawk).
- -profile
- Track RAM and CPU usage of each component of the job. In the case of redhawk, you need to have 2014.03 or more to see the usage, because of the way the processes of redhawk detach themselves from their parents.
- -preemptable 0
- In general, you do not want to preempt jobs that are as complex as these.
- -dp 4
- We want 4 processes, one of which becomes the "master" and the other 3 will be the work-horses. The master runs in the first component of the Distributed Parallel job.
- -dpres
- Run each component on machines that have the "hasRedhawk" resource.
- -dpwait 3m
- Wait up to 3 minutes for all components to be started. If 3m pass from the start of the first component and not all components are started, the dispatch is aborted and soon after restarted with a longer wait time.
- nc_redhawk
- This is the script that is started on the master component and that activates all other components. Techincal note: In Altair Accelerator, all components are up and running when this script runs and the script convinces OpenMPI to use a special ssh script to launch the appropriate command on each of the remote components.
- -lmwait -dmp nc.cfg -f run.tcl
- The actual redhawk command you want to run