Matlab configuration advice

From Spinach Documentation Wiki
Revision as of 15:05, 9 January 2017 by Admin (talk | contribs) (Batch runs on PBS clusters)

Jump to: navigation, search

Large-scale deployments

Protein simulations with up to 100 amino acids would fit into 64 GB of RAM, unless the relaxation superoperator is computed, in which case the requirement is 256 GB + 10 GB per node per 100 amino acids. The asymptotic run time scaling is linear on shared-memory systems.


If you plan to run lengthy parallel calculations, disable the parallel pool timeout (Matlab Ribbon / Parallel / Parallel Preferences) by unticking the "shut down and delete" box in the settings.

Java heap size

Matrix exponential caching and Clebsch-Gordan coefficients with L ranks in excess of 2,000 require a bigger Java heap size than Matlab's default. The size of the Java heap can be increased in Preferences.

MDCS on Amazon EC2

To set up a Matlab Distributed Computing Server cluster on Amazon EC2 cloud, follow these instructions:

  1. Start a single Windows instance without the Amazon firewall.
  2. Change the Administrator password to the one you prefer.
  3. Install Matlab 2016a or later with Distributed Computing Server option.
  4. Run addMatlabToWindowsFirewall.bat file as Administrator.
  5. Switch off Windows firewall.
  6. Run 'mdce install', then 'mdce start' as Administrator.
  7. Stop the mdce service and set it to "Manual" in the service settings.
  8. Set up Spinach and set the paths in Matlab.
  9. Shut down the instance and make an AMI.
  10. Spawn several instances from the AMI, pick a head node and log into it.
  11. Run admincenter.bat, feed it the node addresses and start the mdce services.
  12. Create a JobManager in AdminCenter, start the workers (one at a time per node).
  13. From a client instance, connect to the JobManager.

Batch runs on PBS clusters

We have yet to see a university-owned Linux cluster with a conveniently configured deployment of MDCS. The task of running a batch of single-node Matlab jobs is significantly easier, however; an example PBS script appears below:

    #PBS -l nodes=1:ppn=16
    #PBS -l walltime=59:00:00
    #PBS -V
    #PBS -m n
    #PBS -t 1-22
    #PBS -q name_of_your_queue


    matlab -nodesktop -r "your_function(`expr ${PBS_ARRAYID}`); exit"   \
                      1 > your_function_`expr ${PBS_ARRAYID}`.out       \
                      2 > your_function_`expr ${PBS_ARRAYID}`.err       \
                        < /dev/null

This script assumes that you have a Matlab function called your_function.m in your working directory, that this function takes a scalar integer from 1 to 22 as a job identifier and that you would need a 16-core node for 59 hours for each of those jobs. The header of your_function.m should contain a manual call to start the local parallel pool, for example:


and a manual specification of the path, because Matlab instances on the worker nodes might not inherit your path settings:


You can get your current path by running the path command in Matlab. Do not forget to load the Matlab module in PBS before you submit the script to the PBS queue. The module should be installed by your technical support team.

Revision 3284, authors: Ilya Kuprov