NOTE: please do NOT e-mail support@millennium to get your class account. We will get you an account if you don't have one already.
Ever notice how your computer slows down more and more when you open more and more applications? This is especially true for computationally intensive applications. How can you tell if all your optimization efforts are worth anything, if the computer is running all kinds of random programs at the same time as your super-optimized library? This is why people sometimes set up computers to use a batch scheduler, also called a batch queue. This is a program running on a "master" or "front-end" server, that controls computational jobs on a cluster of "worker" or "compute" nodes. The master ensures that each worker is running at most one job at a time. You can log into the master and submit jobs to the queue. However, you can't log into the compute nodes and run jobs directly; they are protected from logins that would disturb running computations. Furthermore, the scheduler can impose limits on the computational resources demanded by your jobs, like memory or disk or runtime. This protects the cluster from broken or malicious programs, and also ensures a fair distribution of resources among the jobs.
We'll be using clusters with batch schedulers as much as possible in this class, as they ensure consistent and fair timings. Don't worry, they aren't hard to use, although different clusters may use different scheduler commands. For machines that use the PBS system, like the PSI cluster,we provide a "skeleton" script with blanks that you can fill in to run your homeworks. An online tutorial explains how to submit jobs to the PBS queue.
For NERSC machines, each of their clusters may use a different machines may use a different queue system. You should refer to their (excellent) online tutorials, such as this one for Bassi.
Some of the NERSC machines support interactive jobs, but actually these are sent to the batch scheduler too! The scheduler has a special "interactive" class of jobs, and interactive jobs are scheduled in this class. You run them like interactive jobs, but they are scheduled like batch jobs, and also charged like batch jobs. On NERSC machines, you might also want to use the "debug" job class for debugging, as some NERSC machines have compute nodes specifically dedicated to debug jobs. This may give you a much shorter turnaround if you're in the thick of debugging a tricky code. Debug jobs are submitted like batch jobs, but with a different job class. The batch script you write will specify the job class. Debug jobs are also charged, so be stingy with debugging on NERSC clusters.
The UC Berkeley Millennium project offers a number of parallel machines for research use. They have generously offered their machines and administration expertise for your educational use, as we don't yet have an educational cluster set up. Please be nice and don't flood the cluster with unnecessary jobs, as it will interfere with other users' research work!
One of Millennium's newer machines is the PSI cluster. It has modern multicore or SMP 64-bit x86 processors, and some nodes have special low-latency network hardware. Part of the machine is configured for interactive use, and part is for batch jobs. We'll be using the batch part of the cluster, as it provides more accurate timings. To use PSI, SSH into the front-end node, zen.millennium.berkeley.edu. You can find an example batch scheduling script for PSI here.
Note that the PSI cluster has two kinds of nodes, which you can see on the cluster's home page. One kind has 8 CPUs per node, but slow internode connections; the other kind has 2 CPUs per node, but fast internode connections. You should prefer the ones with 8 CPUs per node for assignments involving threads, as threading alone can't take advantage of the network. The batch scheduling script can be used to select one or the other kind of node; just read its comments and you'll learn how.
Millennium also offers another batch cluster, which is an Itanium-2 based machine called Citris. Each node has two processors. You can access it by logging into the front-end node, grapefruit.millennium.berkeley.edu. It uses the same (PBS) batch scheduling system as PSI does. The Millennium sysadmins tell us that the load on Citris is quite light, so you should consider using it, especially for debugging, sequential, and MPI jobs.
In order to access useful programs that are installed on Millennium, you should add some directories to your PATH, MANPATH, and LD_LIBRARY_PATH environment variables. If you are a bash user, here's how you might do this in your .bashrc startup script:
if [ -d /usr/mill/bin ]; then
export PATH=/usr/mill/bin:$PATH
fi
if [ -d /usr/mill/man ]; then
export MANPATH=/usr/mill/man:$MANPATH
fi
if [ -d /usr/mill/lib ]; then
export LD_LIBRARY_PATH=/usr/mill/lib:/usr/mill/pkg/intel_cc/lib:$LD_LIBRARY_PATH
fi
WARNING: I just found out as of 30 August 2007 that your default shell
might be csh or tcsh instead of bash. If you don't want to use
(t)csh, you should e-mail inst@eecs or help@eecs to get your shell
changed. If you are a (t)csh user, you might try putting the
following in your .cshrc file (caveat lector: I'm NOT a csh user, I
don't know if this works):
if (-d /usr/mill/bin) set path = ( /usr/mill/bin $path );
if (-d /usr/mill/man) setenv MANPATH "/usr/mill/man:${MANPATH}";
if (-d /usr/mill/lib) setenv LD_LIBRARY_PATH "/usr/mill/lib:/usr/mill/pkg/intel_cc/lib:${LD_LIBRARY_PATH}";
Later on, when we do Java assignments, we may have you add other
directories and aliases to your startup script. I describe how to do
that below.
The Java JDK contains both the Java runtime (for running Java programs) and Java development tools (e.g., the bytecode compiler "javac"). By default, the "java" and "javac" commands may be mapped to GNU's Java JDK. It supports an earlier version of Java (1.4) than that which we need for our homeworks. Thus, on the Millennium machines, you should use one of the various versions of the Java JDK in the "/usr/mill/pkg/" directory. I usually access these by setting aliases in my .bashrc script (.cshrc for (t)csh users). Make sure to use only those Java JDK versions that support Java 1.5 or newer (this is only a problem on Citris). You can run "java -version" to get the name and version number. If you want to use the same JDK for both Citris and PSI, you can add the following to your .bashrc, if you are a bash user (mfh 30 Aug 2007: corrected from parentheses to curly brackets by a helpful student, many thanks!):
if [ -d /usr/mill/pkg/jrockit ]; then
export JAVA_ROOTDIR=/usr/mill/pkg/jrockit;
alias javac="${JAVA_ROOTDIR}/bin/javac";
alias java="${JAVA_ROOTDIR}/bin/java";
fi
If you are a (t)csh user, try the following:
if (-d /usr/mill/pkg/jrockit)
setenv JAVA_ROOTDIR "/usr/mill/pkg/jrockit"
alias javac "${JAVA_ROOTDIR}/bin/javac"
alias java "${JAVA_ROOTDIR}/bin/java"
endif
If you want to experiment with different JDK's, take a look in the "/usr/mill/pkg/"
directory. Note that things marked "x86" or the like won't work on Citris, because
Citris has Itanium 2 processors which can't run x86 executables (and vice versa with
things marked "ia64" (Itanium)). For example, on PSI, I use Sun's
Java 1.6 JDK by adding the following to my .bashrc (I'm a bash user):
export JAVA_ROOTDIR=/usr/mill/pkg/jdk1.6.0_02
alias javac="${JAVA_ROOTDIR}/bin/javac"
alias java="${JAVA_ROOTDIR}/bin/java"
For (t)csh users, the following would go in your .cshrc:
setenv JAVA_ROOTDIR "/usr/mill/pkg/jdk1.6.0_02"
alias javac "${JAVA_ROOTDIR}/bin/javac"
alias java "${JAVA_ROOTDIR}/bin/java"
The Millennium sysadmins usually make sure that if you can see it in "/usr/mill/pkg/"
and it's not specifically marked to be for an incompatible processor, then you can
run it. Any exceptions to that rule will be made clear and obvious right away when
you try to run them on the wrong processor.
When you report performance results in Java, it's important that you tell us the name and version of the Java compiler and runtime. Is it Sun's? IBM's? GNU's? Use "javac -version" to get the compiler version, and "java -version" to get the runtime version.
On machines that run Linux, you can generally figure out the type and number of processors by reading the (human-readable) file "/proc/cpuinfo". This may be useful for checking that your batch scheduler script gets you the kind of nodes that you want.
NERSC has generously donated some computer time for our class. We'll let you know if necessary which machines you can use and how to use them. Stay tuned for more information here.
NERSC computer time is a limited resource. It's allocated for the entire CS 194-2 course, not for each user. If you burn through all of it, then nobody else can use NERSC machines for the whole semester! We have to ask for a grant for each block of time that we allocate from them. This is very annoying for us and for them, so please don't make us do it!
There are many ways you can avoid using up all of the NERSC time allocation. First, you can do all your debugging on non-NERSC machines, and only use the NERSC machines once you know that your code works. Second, you can submit your jobs as low priority; this charges only half the hours of regular jobs. This website explains how NERSC charges for jobs. Finally, you can try using "cheaper" NERSC machines, especially for debugging. Bassi's jobs are charged at six times the rate of Seaborg's jobs, for example. This is because Seaborg is much slower, but slow or not, it has 16 processors per node, twice as many as Bassi has per node. Seaborg also uses the same OS and compiler as Bassi, so it's easy to port codes between the two.
In case you need to know, the NERSC allocation number is "mp309". I don't think you'll need to know this, because it should be the only allocation to which you have access. If you ever need to talk to NERSC people, e.g., to change your password, then they might refer to the allocation as belonging to CS267; it actually does, but we're also using it for CS194-2 this semester. You can use NERSC's "NIM" web interface to check on the status of the allocation; NERSC's website offers help on that.
NERSC's web pages for each cluster explain quite well how to use the various software packages. NERSC uses the "modules" system for fixing up the path and other environment variables, so generally you won't have to mess with changing environment variables in your shell's startup scripts.
This tutorial explains how to do coding and debuggin on your home Windows machine, by using Cygwin.