## Thursday, December 18, 2008

### MySQL & Perl on Leopard

Mac OS X Leopard comes with Apache2 and PHP pre-installed, but no MySQL. This is the missing link in the very useful MAMP setup. The kind people at mysql.com offer it ready packaged for mac users. When picking your version, MacIntel users pick the 32-bit version, unless you want problems installing Perl's DBD::mysql. What is DBD::mysql? the very useful module that allows Perl scripts to access your MySQL databases. If you install the 64-bit version of MySQL you will hit errors when you install DBD::mysql as a 32-bit installation of Perl is not interested in compiling against 64-bit MySQL libraries.

If you need 64-bit MySQL you could install both versions and compile against the 32-bit libraries.

### MacPorts on Leopard with a proxy

MacPorts is an excellent tool for Mac users to grab extra software without having to worry about working out the compile options and read endless instructions etc. As of today there are 5253 ports available for everything from apache2 to iphone apps. The beauty of installing anything from MP is it will install any dependancies as well. I've just upgraded one of our servers to Leopard, but MP didn't want to play anymore. Turns out there is a bug in handling the proxy settings. We have to use a proxy, so this is a show stopper. It looks like they will fix for the 1.8 release, but I can't wait!

Thankfully a simple workaround exists, that requires you to edit /etc/sudoers.
sudo visudo (Don't use vim direct on /etc/sudoers)
in the Default specification section add:
Defaults        env_keep += "http_proxy HTTP_PROXY HTTPS_PROXY FTP_PROXY"Defaults        env_keep += "ALL_PROXY NO_PROXY"
to
LIBS="-lmkl_intel -lmkl_sequential -lmkl_core $LIBS" Then run: ./configure CC="icc" CPPFLAGS="-I/opt/intel/mkl/10.0.3.020/include" LDFLAGS="-L/opt/intel/mkl/10.0.3.020/lib/32" --with-fft=mkl make make install You may need to alter the version number or location of your MKL files. I recommend you also download the test set to confirm the compilation. I have 4 fails, but closer inspection revealed they were not show stoppers. ## Thursday, July 31, 2008 ### Under and over sampling with Weka Weka uses the ARFF format for storing data. In the development series (3.5.x) an XML version of the ARFF format was introduced, XRFF. On the surface, there is little reason to use it, the format is far more verbose so file size quickly swells up. There are 3 additional features over the ARFF format: 1. Class attribute specification 2. Attribute weight 3. Instance weight Typically the class attribute is the last in the file, else you need to tell the classifier which attribute to use. Now set the class attribute to any attribute: <attribute class="yes" name="class" type="nominal"> Associate a weight to a attribute (within the header section) using metadata: <attribute name="petalwidth" type="numeric"><metadata> <property name="weight">0.9</property></metadata></attribute> Associate a weight to an individual instance: <instance weight="0.75"><value>5.1</value><value>3.5</value><value>1.4</value><value>0.2</value><value>Iris-setosa</value></instance> You can use the weight associated to an individual instance to simulate under and over sampling. For example, if you have 100 actives in a dataset and 1000 inactives, oversample the actives. This means training on each active 10 times so the model is composed from 1000 actives and 1000 inactives, granted the same actives are used, but this technique has positive effects on skewed datasets. The weight to add for this dataset would be 10 to each active instance. <instance weight="10"><value>5.1</value><value>3.5</value><value>1.4</value><value>0.2</value><value>active</value></instance> ## Monday, July 28, 2008 ### Weka Online Weka is an excellent machine learning/data mining workbench, from the University of Waikato. It is Java-based and available under GNU GPL. An advantage of being in Java means it can easily run on virtually any platform. On the flip side Java can be limited by the amount of RAM available, this is the case with weka as it has been programmed with a memory-driven approach, not disk-driven. As data sets get larger and larger more RAM is required to run them. Couple this with Weka not being specifically designed for large data sets means it isn't hard to exceed a 2GB RAM requirement. Now for the technical part, 32-bit hardware and Operating Systems (x86) can only use up to 2GB RAM per single process, regardless how much the machine actually has. To use more than 2GB RAM per process you need both 64-bit hardware and Operating System (x86_64). Thankfully it is increasingly common to have 64-bit hardware as standard on new purchases. However, if you don't have new hardware another solution has recently become available: Weka Online. They allow you to submit Weka tasks from a web interface on to their 64-bit computer cluster (with 2.5-3.5GB RAM available). Alas, as I write this they have disabled submission while they bolster security due to a malicious attack. Once this service is back it actually offers more than standard Weka, via their CEO framework, see more here. I've not actually tried the service myself, but the idea is certainly appealing. ## Tuesday, July 22, 2008 ### Condor 7.1.1 supported ports The development series for condor is dropping support for several platforms from 7.1.1 onwards: 1. Red Hat 9 (Suitable for openSUSE 10.x) 2. Solaris 5.8 3. Mac OS X PowerPC RHEL 3 binaries should be fine for any Red Hat system (and presumably CentOS). Solaris 5.8 users can use the 5.9 binaries. It should be noted they are continuing these ports for the current stable series (7.0.x). Unfortunately the RHEL 3 binaries do not work on openSUSE 10.x (well they run but give Shadow exceptions if you try to do anything useful - like run a job!). Looks like a case to compile from source... UPDATE: Condor 7.1.1 has been pulled due to numerous problems, look out for 7.1.2. ## Saturday, July 19, 2008 ### Perl on Eclipse 3.4 (Ganymede) I use Eclipse everyday and the ability to use it for multiple languages is crucial. Perl is one of the languages I use and there is an excellent plugin for it: EPIC. However, after installing the recently released Ganymede (3.4) release I couldn't install it. There are multiple versions of Eclipse available to download, typically I pick Classic. However, EPIC will not install on to this version. I found using Eclipse IDE for Java Developers worked fine. Hopefully any other plugins you use won't mind you using this version! ## Friday, July 18, 2008 ### Display source code in MediaWiki You have three options by default: 1. Display code inline like script.sh, by using <code>script.sh</code>. 2. Blocks of code can be wrapped with <pre>insert your code here</pre>. This works multi-line but doesn't allow formatting. 3. Indent your text to enable a <pre> like block, then apply standard '''bold''' and ''italic'' formatting. The above options generally work quite well, but if you end up with lots of code from different languages and more than a few lines it would be handy to have syntax highlighting. Thankfully an extension to MediaWiki can do this. Use SyntaxHighlight_GeSHi to colour away; it is also used on Wikipedia. You will need root access to your server and subversion installed, then follow the simple instructions on the extension website. Download the extension and GeSHi then add it to your LocalSettings.php. Now you have a fourth option, wrap your code with <source lang="X">code here</source> where X is php, java, bash, ruby or one of the other nearly 50 supported langauges! ## Thursday, July 17, 2008 ### Thumbnails and TeX support for MediaWiki on Mac OS X After you have setup a new MediaWiki installation you will likely want to enable some extra functionality which requires additional software. First off to create thumbnails of images you need to install ImageMagick, which gives you the incredibly handy convert program. Second you can add maths support, using TeX, for this you need ocaml, latex and dvipng. Grab the required programs from MacPorts, they are available from Fink as well. • sudo port install ImageMagick • sudo port install ocaml • sudo port install tetex • sudo port install ghostscript Texvc will convert your TeX into whatever MediaWiki wants to display (HTML, MathML or PNG), but it needs to know where several programs are. Given you are probably running your webserver as the www user, who has no$PATH settings how do you tell Texvc where to find the files? In an unusal move, hardcode them! Edit <mediawiki>/math/render.ml, prefixing /opt/local/bin to the four commands
let cmd_dvips tmpprefix = "/opt/local/bin/dvips -R -E " ^ tmpprefix ^ ".dvi -f >" ^ tmpprefix ^ ".ps"let cmd_latex tmpprefix = "/opt/local/bin/latex " ^ tmpprefix ^ ".tex >/dev/null"let cmd_convert tmpprefix finalpath = "/usr/local/bin/convert -quality 100 -density 120 " ^ tmpprefix ^ ".ps " ^ finalpath ^ " >/dev/null 2>/dev/null"let cmd_dvipng tmpprefix finalpath = "/opt/local/bin/dvipng -gamma 1.5 -D 120 -T tight --strict " ^ tmpprefix ^ ".dvi -o " ^ finalpath ^ " >/dev/null 2>/dev/null"
Them recompile texvc with make, ocaml will take over here.

Tell ImageMagick where gs is, by editing /opt/local/lib/ImageMagick-X.X.X/config/delegates.xml, where X.X.X is version number. Replaces every "gs" with "/opt/local/bin/gs", only edit "gs" entries, about half a dozen.

Finally tell MediaWiki to use TeX, by editing your LocalSettings.php with $wgUseTeX = true; Now math support should be good to go. Thank to this compilation of advice for assistance. So now this wikitext will produce the following: == Magical latex in action ==$\left \{ \frac{a}{b} \right \} \quad \left \lbrace \frac{a}{b} \right \rbrace$$x \implies y$ an AMS command$f(n) =\begin{cases}n/2, & \mbox{if }n\mbox{ is even} \\3n+1, & \mbox{if }n\mbox{ is odd}\end{cases}$== Image thumbnail ==[[Image:OpenSUSE.png|frame|Full size|center]][[Image:OpenSUSE.png|thumb|A thumbnail|center]] ## Wednesday, July 16, 2008 ### Solubility Challenge UCC has launched a competition in conjunction with JCIM. Essentially their article (DOI: 10.1021/ci800058v) which appeared on ASAP yesterday, details 132 druglike molecules. They report the solubility for 100 molecules and challenge you to predict the other 32 using whatever method you choose. Submit your predictions by 15th September 2008 upon which the best submissions will be invited to detail their models as JCIM articles. Full details are on the Goodman group website, including machine-readable files. ## Tuesday, July 8, 2008 ### Subversion with CruiseControl As we use subversion for our version control we need to do an extra step as CruiseControl only has limited subversion support (e.g. it can't checkout a project, I'm sure it should but has never worked for me). To give it the power to do so you need to download SvnAnt. Copy the three jar's from the lib folder into the the lib folder in your installation: /cruisecontrol/apache-ant-1.7.0/lib. This way everything that keeps CruiseControl happy is in one place.Now you need to define a property file defining the location of SvnAnt, something like the file svn-build.props: svnant.version=1.0.0lib.dir=../apache-ant-1.7.0/libsvnant.jar=${lib.dir}/svnant.jarsvnClientAdapter.jar=${lib.dir}/svnClientAdapter.jarsvnjavahl.jar=${lib.dir}/svnjavahl.jar
You need to ensure the lib.dir value is valid, depending where you call this file from (in this example /cruisecontrol/project). As you will see we make a wrapper script to grab the code from the repo, before launching the project ant script. The wrapper script may be in /cruisecontrol/project, but defines it basedir as /cruisecontrol/checkout.

A sample script can be found here (Blogger doesn't want to display it). To use it save to /cruisecontrol/project and edit the sample_project and subversion path. Your project will be checked out to /cruisecontrol/checkout, where it is built, tested, compiled etc. For the first time I had to checkout manually otherwise CruiseControl would kick a fuss up.

In your main config.xml call the new wrapper script (/cruisecontrol/project/sample_project.xml) in the schedule section. This way a fresh copy of the code is checked out before the CruiseControl commences the build.

## Thursday, July 3, 2008

### Start condor on boot with Mac OS X

Once you have condor running on your clients you will want it to load by default when booting. The condor distribution includes linux-based startup scripts, however there are none for the mac. Looking through the mailing list there is a suggestion of scripts to use, but they use Panther (10.3) based technologies, not recommended in Tiger (10.4) and not available in Leopard (10.5).

Delving a bit further I found another way to start condor by using cron.

Create a script to start condor: sudo vim /usr/sbin/start_condor

Enter these contents, and customise to your installation:
#!/bin/bash# Ensure network is all setupsleep 100# Ensure condor environment is loadedsource /opt/condor/condor.sh# Start condor/opt/condor/sbin/condor_master

Our condor installation is actually stored on an NFS drive, so the 100 second sleep is to ensure the NFS drives have mounted before the rest of this script runs. I handle the path settings for
$CONDOR_CONFIG,$PATH & $MANPATH in a separate script (condor.sh), alternatively you could specifiy$CONDOR_CONFIG in this script.

Tell cron about your script, and that it should be run on boot:
sudo echo "@reboot root /usr/sbin/start_condor" >> /etc/crontab
I have the condor daemons run as root, hence the root user mentioned in this crontab entry.

Test the script by running direct from the command-line first, if it runs then you should have trouble when rebooting.

## Monday, June 30, 2008

### Pretty URL for MediaWiki

MediaWiki is a very popular open source platform for wikis. The most notable user (and actual developer) being the infamous Wikipedia.

As I have previously mentioned, we run a MediaWiki based wiki within my research group. As a purely aesthetic function I chose to use a pretty URL. This essentially means the URL I use to access the wiki is shorter,
www.domain.com/wiki/Chemistry
www.domain.com/wiki/index.php/Chemistry
It turns out it is quite fiddly to get the setup sorted, but the end result is definitely preferable.

N.B. These instructions only work when using domain.com/wiki/Chemistry. If you use wiki.domain.com/Chemistry that is more tricky and domain.com/Chemistry is not recommended, see here for info on those scenarios. In addition, I assume you have root access to your webserver, you are using MediaWiki 1.9.x or later and an apache webserver.

In this example I assume your DocumentRoot for html files is /srv/htdocs. Move your unpackaged mediawiki files to /srv/htdocs/w. We will use apache to make wiki valid.

Edit your httpd.conf, to enable /srv/htdocs to read .htaccess files. Alter AllowOverride None to AllowOverride FileInfo.

Now create the file to read /srv/htdocs/.htaccess, with the contents:
RewriteEngine OnRewriteCond %{REQUEST_FILENAME} !-fRewriteCond %{REQUEST_FILENAME} !-d# Make the wiki appear to come from wiki/RewriteRule ^wiki/?(.*)$/w/index.php?title=$1 [L,QSA]
$wgScriptPath = "/w";$wgArticlePath      = "/wiki/$1"; Now when you access domain.com/wiki/Chemistry, Apache will actually request domain.com/w/index.php?title=Chemistry. Although you will not see this in the address bar unless you edit a page. If you edit a page but it wants to edit a page called Index.php, something has gone wrong! Do check here for other information, perhaps the details there are sufficiently up to date to get this work - they were not when I originally tried. There is one restriction with this method - articles names must not include a question mark (ampersands, &, are fine). ## Friday, June 27, 2008 ### Deploy your software with IzPack In terms of free software packagers there are two contenders for me NSIS and IzPack. NSIS is restricted to Windows, so on its own isn't suitable for my needs. IzPack is Java-based, so instantly suitable for every major platform. Granted it is ideal for Java software, but it is not at all restricted to it. Obviously you need a Java runtime environment (JRE) installed to run, but that is common enough nowadays. A run down of the features: • Open source • Cross-platform • Fully customisable • Native integration (Shortcuts for Windows and Linux) • Ant integration • Uninstaller • Unattended mode • User input • Translations Just like ant all your settings are stored in a XML file and parsed to create your customised package. One of the advantages of IzPack being Java-based means you can add it directly to our continuous integration environment, using the Ant integration. That of course runs using CruiseControl, as the final step after compiling and testing our code (courtesy of JUnit), IzPack can step into package my software. So for each build I can take away an ready-to-deploy package. ## Thursday, June 26, 2008 ### Subversion integration with Eclipse The latest version of Eclipse, Ganymede, has just been released. I'm a keen Eclipse user, I like having all my programming needs met by one application (mainly thanks to many language plugins available; Perl, Shell, LaTeX etc.). I'm not going to detail why you might want to upgrade to ganymede as there is plenty on that already. All I will say is that, as with previously releases, only some (in this case 23) of the ~90 projects that make up eclipse are actually releasing new milestones in this release, so your favourite subproject may not be updated at the current time. For me the most important plugin is subversion (although, personally I feel this should be built in like CVS support). Previously I have used Subclipse, but I thought I'd try Subversive this time around. Installation is fairly straightforward: 1. Open Software update 2. Select the Ganymede update site 3. From Collaborative Tools pick SVN Team Provider This doesn't include an SVN connector, which is a show stopper! 1. Add http://www.polarion.org/projects/subversive/download/eclipse/2.0/update-site/ as a new remote site in software update 2. For future reference get the latest update site from http://www.eclipse.org/subversive/downloads.php 3. Install SVNKit 1.x Implementation, or 2.x if you want to try the beta. Now you should be good to go. Select SVN from the New Project Wizard, and explore repositories from the SVN Repository Exploring perspective (Window > Open Perspective > Other...) Subversion users should also note that version 1.5 has also recently been released, and accordingly you will want your clients to run this version. If Subversive isn't for you Subclipse works fine in Ganymede too. ## Friday, June 20, 2008 ### Condor on openSUSE 11.0 Having just installed my first openSUSE 11.0 (32-bit) machine I've thrown condor (7.1.0 binary: RH9, x86, dynamic) on to see it runs fine. As with openSUSE 10.x you need to install compat-libstdc++ first. As root run: zypper in compat-libstdc++ to do this. Update: Although compiled, it doesn't behave correctly when running jobs, producing lots of shadow exceptions. ## Thursday, June 19, 2008 ### Setup network install for openSUSE 11.0 We run openSUSE on all our linux machines. Therefore, the quickest and easier way to install on lots of machines and ensure quick access to updates is to maintain a local copy of the core repositories onsite. As we have the entire installation repository we only need network install discs which contain a setup program and then you select whatever packages you want from the repository. First, download the repositories. There are 3 core repository you need: 1. Installation 2. 3rd party add on software 3. Updates Find a local mirror and use wget to grab the repositories. The mirror I use in the UK has rsync support, which is very handy for keeping the repositories up to date by only downloading content that has altered. rsync -Pvptrl --delete rsync://rsync.mirrorservice.org/sites/ftp.opensuse.org/pub/opensuse/distribution/11.0/repo/oss/ /www/suse/SUSE11.0-INSTALL/ rsync -Pvptrl --delete rsync://rsync.mirrorservice.org/sites/ftp.opensuse.org/pub/opensuse/distribution/11.0/repo/non-oss/ /www/suse/SUSE11.0-ADDON/ rsync -Pvptrl --delete rsync://rsync.mirrorservice.org/sites/ftp.opensuse.org/pub/opensuse/update/11.0/ /www/suse/SUSE11.0-UPDATE/ The installation and addon repositories are static, but updates will need updating. Let cron take care of that for you by running the command above once a day. Next grab the network install CD's, http://download.opensuse.org/distribution/11.0/iso/cd (Don't forget to use a mirror). We make the repositories available via a local apache web server. So when running the setup program just point to the IP and folder on the web server. During the installation add the addon and update repos, and hey presto fast install with up to date repositories! ## Wednesday, June 18, 2008 ### Compile condor 7.1.0 on openSUSE 10.3 Condor has traditionally only been available as binaries, most of the time this is fine. We successfully run the Mac OS X PowerPC & Intel binaries and for openSUSE 10.x use the Red Hat 9/dynamic binaries. However for 64-bit openSUSE the binary (RH5) doesn't really work, it struggles to start up. Since the release of condor 7.x they have included the source, so that seemed a sensible avenue to explore. It does compile, but it is a bit more involved than just configure; make; make install! These instructions should hold true for 7.0.0, 7.0.1 & 7.0.2 as well. Grab the source for the latest stable or development release (I personally go with the development release). First off glance through the README, check you have all the prerequisites. I also found I needed to add termcap, terminfo, ncurses-devel and flex (grab them from yast). Now lets start configuring: cd src ./build_init ./configure --disable-gcc-version-check --disable-full-port --without-classads The configure flags mean: 1. Our gcc version is newer than the built-in checks. 2. No standard universe/checkpointing. Standard for a new OS port not to have these. 3. ClassAds not yet supported (no condor_q -better-analyze). You will get an error: configure: error: Condor does NOT know what glibc external to use with glibc-2.6.1 To get around this edit configure.ac with your favourite editor. Around line 2500 add this option for the case statement: "2.6.1" ) # openSUSE 10.3 including_glibc_ext=NO ;; Now run build_int and configure again as above. If that runs you can then execute make. Sit back as this may take a while! If there is a problem, typically on the externals part view the log which will be indicated. You may need to install something (such as the packages I mentioned earlier). Compilation is finished when something like this pops up (and no obvious errors) make[1]: Nothing to be done for 'all'. make[1]: Leaving directory '/home/build/condor-7.1.0/src/condor_examples' Everything is compiled so now prepare the release: make release (output to release_dir, dynamically linked with debugging, ready for testing) make public (output to ../public, add stripped dynamic/static linked binaries and no debugging) Find your final installation bundle in ../public as condor-7.1.0---dynamic.tar.gz. Unpack and use it as you normally would. condor_version, will reveal your custom compile:$CondorVersion: 7.1.0 May 21 2008 CondorPlatform: X86_64-LINUX_SuSE_UNKNOWN \$

Need help/advice? There is an excellent presentation which covers this as well as the users mailing list which is both active and helpful.

Hopefully in the future condor will have better support for openSUSE so that a full port (standard universe/checkpointing) and ClassAds will be available.

Good luck!

## Tuesday, June 17, 2008

### Keep up with your literature

A colleague of mine recently blogged on useful websites and applications for keeping up with new journal articles and reference managers: How I keep track of scientific literature...

The popular social bookmarking service del.icio.us gets a mention. It has been under discussion on the CHMINF-L mailing list recently where Egon Willighagen recommends using Faviki (currently in beta). Similar idea, but using tags from wikipedia as well. I wonder if this will be the next big social bookmarker?

Image courtesy of Papers.

### openSUSE 11.0 this week

The new version of openSUSE lands this week. It looks like it will be an excellent release building on the successful 10.2 and 10.3 releases.

Find out more: Sneak peaksscreenshots and wiki.

## Monday, June 16, 2008

### Condor for number crunching

We have a need for HPC within our group (quantum chemistry calculations, machine learning, molecular dynamics simulations & analysis, etc.). To fulfil this need we have several SGE-based clusters within our department and the university. Our local clusters were in need of a refresh (multiple dated OS's - Red Hat 7!) and ideally needed to be unified somehow. It became tiresome having multiple clusters to pick from. What one has the most free slots, or the shortest queue? If it is full you would have to move all your data and get setup to run on a different cluster. We needed something to maximise our use of the compute nodes but simplify the submission process to avoid wasting time.

We opted for a more grid-based solution: condor. The reasons for this were:
• All our local clusters are now combined into one condor pool.
• It removes the needs for multiple head nodes, as users can submit direct from their desktops.
• Cross-platform so you can use with Windows, Linux & Mac.
• Grid approach means we take advantage of our desktop computers as well.
We still use the university's central SGE cluster, it is an invaluable resource. However, condor allows us to make the most of our local resources which are exclusively for our use.

Find out more about condor here: http://www.cs.wisc.edu/condor/. The annual Condor Week now has videos of some of tutorials (as well as slides) so check out what it is all about.

Image courtesy of Wikipedia.

## Friday, June 13, 2008

### Post one

It was recently recommended that I start my own blog, so here it is! I imagine it will be fairly technical, detailing the adventures I've had as sys admin within my research group at Nottingham. Alas, everything currently resides on our intranet-based wiki. Rest assured I'm not going to cut and paste the entire wiki here. I plan to share some of the solutions to problems I've had and other titbits of interest. Who knows where it might lead, certainly a novel venture for me to try.

The over-riding theme will be in silico chemistry, but that covers quite a lot!