Monday, June 16, 2008

Condor for number crunching


We have a need for HPC within our group (quantum chemistry calculations, machine learning, molecular dynamics simulations & analysis, etc.). To fulfil this need we have several SGE-based clusters within our department and the university. Our local clusters were in need of a refresh (multiple dated OS's - Red Hat 7!) and ideally needed to be unified somehow. It became tiresome having multiple clusters to pick from. What one has the most free slots, or the shortest queue? If it is full you would have to move all your data and get setup to run on a different cluster. We needed something to maximise our use of the compute nodes but simplify the submission process to avoid wasting time.

We opted for a more grid-based solution: condor. The reasons for this were:
  • All our local clusters are now combined into one condor pool.
  • It removes the needs for multiple head nodes, as users can submit direct from their desktops.
  • Cross-platform so you can use with Windows, Linux & Mac.
  • Grid approach means we take advantage of our desktop computers as well.
We still use the university's central SGE cluster, it is an invaluable resource. However, condor allows us to make the most of our local resources which are exclusively for our use.

Find out more about condor here: http://www.cs.wisc.edu/condor/. The annual Condor Week now has videos of some of tutorials (as well as slides) so check out what it is all about.

Image courtesy of Wikipedia.