Monday, July 28, 2008

Weka Online

Weka is an excellent machine learning/data mining workbench, from the University of Waikato. It is Java-based and available under GNU GPL.

An advantage of being in Java means it can easily run on virtually any platform. On the flip side Java can be limited by the amount of RAM available, this is the case with weka as it has been programmed with a memory-driven approach, not disk-driven. As data sets get larger and larger more RAM is required to run them. Couple this with Weka not being specifically designed for large data sets means it isn't hard to exceed a 2GB RAM requirement.

Now for the technical part, 32-bit hardware and Operating Systems (x86) can only use up to 2GB RAM per single process, regardless how much the machine actually has. To use more than 2GB RAM per process you need both 64-bit hardware and Operating System (x86_64). Thankfully it is increasingly common to have 64-bit hardware as standard on new purchases.

However, if you don't have new hardware another solution has recently become available: Weka Online. They allow you to submit Weka tasks from a web interface on to their 64-bit computer cluster (with 2.5-3.5GB RAM available). Alas, as I write this they have disabled submission while they bolster security due to a malicious attack.

Once this service is back it actually offers more than standard Weka, via their CEO framework, see more here. I've not actually tried the service myself, but the idea is certainly appealing.