We have a need for HPC within our group (quantum chemistry calculations, machine learning, molecular dynamics simulations & analysis, etc.). To fulfil this need we have several SGE-based clusters within our department and the university. Our local clusters were in need of a refresh (multiple dated OS's - Red Hat 7!) and ideally needed to be unified somehow. It became tiresome having multiple clusters to pick from. What one has the most free slots, or the shortest queue? If it is full you would have to move all your data and get setup to run on a different cluster. We needed something to maximise our use of the compute nodes but simplify the submission process to avoid wasting time.
We opted for a more grid-based solution: condor. The reasons for this were:
- All our local clusters are now combined into one condor pool.
- It removes the needs for multiple head nodes, as users can submit direct from their desktops.
- Cross-platform so you can use with Windows, Linux & Mac.
- Grid approach means we take advantage of our desktop computers as well.
Find out more about condor here: http://www.cs.wisc.edu/condor/. The annual Condor Week now has videos of some of tutorials (as well as slides) so check out what it is all about.
Image courtesy of Wikipedia.