Ramses Cluster Job Submission Guide (Outdated)

Preliminaries

  1. Obtain a local cluster account on Ramses
    Note: Although you can easily log into the master node with your current user account, you will be unable to access the cluster in its entirety.
  2. Setup SSH for private/public key authentication:
    cd ~
    mkdir .ssh
    cd .ssh
    ssh-keygen -t dsa
    cp id_dsa identity
    cp id_dsa.pub authorized_keys2

    Once done, you must ssh to all machines between 2 and 8 and accept the host identification so that their id's are stored in your known_hosts.

Instructions for Single CPUs

  1. A simple way to submit a process on a free node is simply to ssh into the node and run your job.
    ssh ramses6
    <run your job> &
    exit
    That's all there is to it. Be sure to check the Ganglia Monitor and select a machine which shows the lowest load.
  2. Alternatively, there is a perl script named bp.pl in /scratch/template/bin. Copy it in your directory. It makes use of bpsh and scans the system for the optimal host. Once it identifies the host it will submit the job with the output captured in a OUTPUT file and errors in ERROR.

Instructions for Multiple CPUs with MPI

  1. Copy the file cluster from the template directory.
    cp /scratch/template/cluster ~/.
  2. This will allow you to boot up your own MPI cluster by issuing the following command.
    lamboot ~/cluster
  3. To issue multi-processor jobs, you may submit them directly by issuing:
    mpirun c<processor number> <program> <program arguments>
    where processor c<processor number can be a list eg. c0-4,10 would specify processors 0, 1, 2, 3, 4, and 10. NOTE: That this is for actual MPI software. More information can be found just by running mpirun without any options.

  4. Here we also post an outdated procedure that may still be useful, but if you don't need it, ignore the following steps.

    Copy the old scripts /scratch/template/bin/sub*; they will cycle through all processors distributing your tasks to each processor on the cluster. You may use them as follows:

    submit <job script>
    These are very simplistic scripts allowing for 6 command line parameters. The scripts do not check to see the status of the processors or anything but merely rotates through them. We recommend therefore to check the status of the clusters by looking at the Ganglia Monitor. On this page you can see the current user load on any given machine and can submit to a machine accordingly. To submit to a specific machine use one of the specific sub scripts as follows:
    sub<processor number>

    where <processor number> is:
    Node
    Processor Number
    ramses
    0-1
    ramses2
    2-3
    ramses3
    4-5
    ramses4
    6-7
    ramses5
    8-9
    ramses6
    10-11
    ramses7
    12-13
    ramses8
    14-15

    A final note, it appears that the submit protocols leave some open file handles. This currently seems to only be a problem with MPI. To fix this, if you are not running any processes you can issue:

    lamhalt
    lamreboot (This is a script in /scratch/template/bin - copy it to your own directory)
    lamboot ~/cluster

Main Page