Monday, November 10, 2014

Tag your AWS resources in StarCluster

StarCluster logo
StarCluster is a convenient way to manage HPC clusters on Amazon Web Services (AWS). If you use Billing docs.
the same AWS account for multiple tasks such as hosting your web servers and its associated databases, running StarCluster how can you tell how much of your bill is from StarCluster? AWS lets you tag your detailed billing information, this can be used to get a feel* for how much you spent on a specific tag. You need to enable detailed billing and pick your allocation tags, start by reading the

There is also another reason to tag your resources in StarCluster - to utilize IAM resource-level permissions to restrict access, e.g. a IAM StarCluster user can only start/stop StarCluster, not terminate every EC2 instance in your account. This is a topic for another blog post, but requires tagging to help restrict permissions.

There are three steps to enable tagging within StarCluster:
  1. Save a custom plugin
  2. Configure the plugin in your config
  3. Add the plugin to your cluster stanza
First, save this custom plugin, which we call tagger, to $HOME/.starcluster/plugins/tagger.py


Second, at the bottom of your StarCluster configuration (typically $HOME/.starcluster/config) define the tags you want to attach. Take care in picking your tags, you want everyone using the same tag key/values.


Finally, get your cluster to call the new plugin, by editing your cluster stanza:

[cluster smallcluster]
# Various other settings
...
# Enable the following plugins
PLUGINS = tagger

Now when you launch your next cluster you will see a few extra lines in the output confirming tagger has run and you can check on the AWS Management console that EC2 instances and ELB volumes are now tagged.

*It is not possible to tag every resource or the resource is shared so how do you split the cost? The actual cost will likely be higher than just the resources you tag.

Sunday, September 14, 2014

DjangoCon 2014: Top tips for developing and deploying on AWS

I was at my third DjangoCon earlier this month. As with the previous DjangoCon's it was a lot of fun. For the first time I was speaking, which was a first for me, at a tech conference, traditionally I've spoken at science conferences. My talk was entitled "Top tips for developing and deploying on AWS", granted more to do with AWS than Django. However, deploying your shiny new site is just as important as writing it, it's a rather poor website if no one can view it! With the rise of DevOps there has been a push for developers to deploy as well. You can read more on Full Stack Python, an excellent resource maintained by Matt Makai.

Unlike my previous talks this was the first one to be recorded. I'll be honest I was dreading seeing the recording (as I don't like seeing/hearing myself on video). However, I was pleasantly surprised and actually the talk sounded much better that I thought it went! Most importantly for me it will be useful to improve my presentation technique and with video evidence I can track my progress (hopefully anyway).

You can review the slides and videos below - I hope you find it useful.




Sunday, December 15, 2013

Install NumPy and SciPy without Fortran

NumPy and SciPy are two great Python packages for scientists, as is the popular Matplotlib. However, installing NumPy and SciPy is not for the faint hearted if you install your Python packages via pip. Assuming you have fortran, blas, lapack and atlas already installed it is actually quite a slow installation, especially SciPy. NumPy took 46 seconds to install, whereas SciPy took 6 mins and 50 seconds on my MacBook Pro. So what if you install once and forget? Two problems with that. First I use mktmpenv when debugging issues. Second I also use tox to test against multiple version of Python and/or Django. All of a sudden 6 build configurations is 42 minutes of SciPy compilation!

Let's not forget Windows users, Fortran - I don't think so and they should be able to enjoy pip and virtualenv as much as any Python developer.

The obvious solution is for SciPy to be packaged with wheel, the new Python binary distribution format. However, I appreciate that would be very hard for the authors, but hopefully one day.

In the meantime Anaconda might be of interest. It is like apt-get/yum for scientific Python, but a new feature has just been announced, you can pip install anaconda itself then take advantage of the binary distributions it provides for you.
So try this (assuming you have pip, virtualenv and virtualenvwrapper installed)

$ mktmpenv
$ pip install conda
$ conda init
$ conda install scipy

SciPy plus NumPy and numerous dependencies are installed in under a minute! Obviously, you can not convert this to a requirements.txt per se, but using Fabric you can make a task to install conda and then the conda packages all with a one liner.



Tuesday, November 19, 2013

New comments, but not like YouTube

I have switched the comments on this blog to Disqus (pronounced "discuss" as their engineers have corrected me on numerous occasions). I have nothing against the former Blogger comments or more recent Google+ comments, but Disqus is leagues ahead as a commenting platform. Plus, it is built on some of my favorite technologies: Python & Django.


Image credit: http://flic.kr/p/7oqZs2

Monday, November 18, 2013

GDG DevFest now includes Albuquerque

Courtesy of http://www.gdgabq.com/
Albuquerque just had it's first GDG DevFest (translation - Google Developer Group meetup). Where a selection of Google employees and enthusiasts met to share their experiences and insight to a few Google products (let's be honest there are quite a few now). Google Glass was out in force, six pairs, which is presumed to be the greatest concentration of them in New Mexico! I attended the following talks:

Google Drive Realtime API

The challenge for Drive is collaboration. Everything is stored as structured data - JSON. Due to the structured nature mutations can be created, which reflect a specific change, but not the actual data itself. Mutations are kept forever until the file is deleted. In fact the mutations make up the file, a snapshot describes the summary of changes to give the current file, save having to process all the mutations, which could be numerous. Mutations are saved on the server and the client (your browser) otherwise collisions when collaborating can easily occur, having the transformation manager on the client allows the reconciling of the incoming mutations and your local mutations, which are then pushed back to the server to the other collaborators. This technique is also how you can work offline, as the mutations are stored and reconciled later.

Welcome to Android

I have never done any Android programming, suffice to say it looks like a regular Java project with a touch of HTML (e.g. storing multiple resolutions of your images etc). Interestingly the IDE of choice, Eclipse with a plug-in is shifting to Android Studio. This is based on JetBrains IntelliJ - perhaps the gold standard of Java IDE (in cost as well), but free for Android developers.

ChromeCast - In Love

The instant success of the ChromeCast (nothing to do with that free Netflix subscription I'm sure) has a API available now. The ChromeCast is a receiver which runs web pages and you build the sender in your app (desktop, mobile, browser etc) to pair establish a connection with the ChomeCast. It looks surprisingly simple (famous last words).

AngularJS - Life changing tech

Google's latest JavaScript library which is receiving much love both within and outisde the company. While many Google products are built on Closure, it seems new sites are being built with AngularJS. AngularJS lets you extends the HTML vocabulary and is a powerful MVC aide.

HTML5 in the Movies

Since ABQ is now the center of the film universe directors look to local talent to fill various facets, which now include computer props. We saw numerous demonstrations of easy to use demos (single click for the actors) that make it appear as they type emails, send and receives replies all with a few random clicks. No surprise making the demos foolproof and loop-able was highly desirable, as was avoiding Windows for stability.

Startup Weekend Panel

For the final session we got a taste of what the the startup weekend events are like. A surprising amount of these events are cropping up around the state, not just in ABQ. We were given two disconnected words and had 10 minutes to come up with a business and 1 minute pitch, certainly a good ice breaker!

Summary

Overall it was an enjoyable day, my thanks to the GDG ABQ for taking the time to organize it and the sponsors (who doesn't want to see a 3D printer in action?) for making it possible. I think almost everyone left with a prize as well, I won an O'Reilly book in the raffle (worth more than the cost of the ticket). I was really impressed with the ChromeCast API and think I have found my next Hack Day project for work.




Wednesday, September 25, 2013

My first WordCamp

ABQ WordCamp is not the sort of conference I normally attend (think PyCon, DjangoCon and AWS Re:Invent are my regular haunts). Apart from being written in PHP the WordPress community is more diverse as you can develop and use it out of the box, with Django you need to build your site first before it gets a more CMS feel.

We have adopted WordPress for our company blog so it seemed sensible to meet some WordPress folk. I followed the user/publisher track, even though I'm more of a developer so it was refreshing to get a more SEO and social media slant. Some top tips:

  • WordPress handles most of your SEO for you
  • Building your blog is an active role, find like minded individuals/communities and get involved (comment on their blogs)
  • Comments are hugely important for an active and engaging blog
  • Disqus is the best commenting platform (plus it is powered by Django)
  • If traffic spikes for a particular post then repeat/stick with that topic
  • Introduce series for these popular topics
  • Be smart on social media, there are numerous WordPress plug-ins to make sharing easy
  • Google+ is the single best thing for SEO, even if you don't use it, post to it for instant indexing
  • bit.ly is perhaps the only URL shortener that is indexed by Google
  • Getting to page one on Google does require time and money (AdWords, potentially) via a trial and error approach, pick keywords, evaluate, update then repeat repeat repeat
  • You must have frequent new content (easier said than done) to get high search rankings
During the Developer Diversity Panel a common theme in tech popped up, that of women. Having heard similar talks at PyCon, WordPress is indeed well ahead of the curve in terms of the number of women present, which is great to see.

Yesterday I found out that Google are running DevFest ABQ, so maybe I'll be attending more conferences in ABQ in the future.


Sunday, May 5, 2013

Book chapter on the use of open source software in the pharmaceutical industry


During my final year at AstraZeneca I was asked to contribute a chapter to "Open source software in life science research: Practical solutions to common challenges in the pharmaceutical industry and beyond". Given this work would be very hard to publish in JCIM, JMC etc and I have never written a book chapter before it was the perfect opportunity. The chapter was entitled: Design Tracker: an easy to use and flexible hypothesis tracking system to aid project team working. It was coauthored by Martin Harrison, who wrote Design Tracker. The abstract sums up best what it covers:
Design Tracker is a hypothesis tracking system used across all sites and research areas in AstraZeneca by the global chemistry community. It is built on the LAMP (Linux, Apache, MySQL, PHP/ Python) software stack, which started as a single server and has now progressed to a six-server cluster running cutting-edge high availability software and hardware. This chapter describes how a local tool was developed into a global production system.
Design Tracker has been mentioned in a few external presentations before but I believe this is the first firm details about it. We talk about its use and how it came to be a global chemistry tool from a prototype at one site. As the book topic suggests we also cover the open source technologies we used to power it. While LAMP is not new, it is not exactly mainstream in the corporate environment for many pharmaceutical companies. We had to harden our setup to make it suitable for 24/7 use, so in addition to the regulars on LAMP we added Red Hat Cluster Suite, Continuent Tungsten and NGINX. We also took the opportunity to move away from apache/mod_python to apache/mod_wsgi. The end result was a service which is available 24/7 and future proofed compared to our previous solution.

The worlds most dynamic and frequently visited websites are powered by similar technologies so they have clearly proven themselves to be suited for the relatively modest needs of a single pharmaceutical company.

The book is available on Amazon UK & US and probably your favourite book reseller as well (ISBN: 978-1907568978). I hope you enjoy our chapter and the many others interesting topics covered.