StarCluster

from Hao Hu:

The tool utilizes the spot instances in amazon (amazon let people bid for unused cores) so the price is amazingly cheap – running 1 head node + 640 cores - 20x cc2.8xlarge, 32 cores each - cost me about <$6/hour. This tool automated all of complicated setups with multiple spot instances (file share, scheduler, software environment, etc). The whole cluster took 10 minutes to set up.

There are three downsides I discovered so far: (1) the cores might be gone any time, so it is important to check the spot instance price history carefully, and find one with less fluctuations. (2) MPI simulations are slow due to the poor communication bandwidth between cores, so lots of single-core sims are the best fit for this environment. (3) A stable head node is needed (I use t1.micro to keep cost down), and I need to pay extra for storage. So it might not work if the output is in Terabytes range. Amazon also sets a cap of maximum number of spot instances, but it’s easy to raise the limit if you write to them explaining the needs.