Introductory Discussion

The purpose of this document is to provide a detailed set of instructions on how to deploy a Galaxy server and develop new tools for it. There are a number of ways to perform these tasks. The various options are here discussed.

Because Galaxy hosts Web services on TCP/IP ports which may be firewalled, one criterion is that the user, who is running a Galaxy server, must be able to expose those ports to the outside world without the involvement of a third party. This precludes the installation of Galaxy at most HPC centers and even on the resources provided by most campuses – by regular users anyway. Cloud computing provides a way around this limitation and the focus of this document is upon setting up a Galaxy server in a cloud environment.

Many different cloud providers exist. As Galaxy is a Unix-centric tool, cloud providers which cater to Windows, such as Microsoft Azure, can be effectively be ruled out. Within the Linux world, Amazon EC2 (Elastic Compute Cloud) is currently the most popular cloud provider. Thus, we will examine the list of options available for using Amazon as the cloud provider.

The Galaxy development team provides AMIs (Amazon Machine Images) which come with a cloud-enabled Galaxy already installed. A tool, called CloudMan, is integrated into these AMIs and allows a Galaxy administrator to allocate clusters of Galaxy workers, monitor these clusters, and dynamically grow and shrink them. While this is some fairly powerful functionality to have available, especially the dynamic resizing, this approach does have some drawbacks. These drawbacks are:

  • AMI configuration and initialization occur via Amazon’s EC2 web console. There are no command-line tools available for these tasks.
  • Cluster management occurs via the CloudMan web interface. There might not be any command-line tools available for this.
  • The administrator must supply his or her EC2 credentials via some metadata associated with the initial virtual machine (VM) instance of the AMI. While these credentials are transmitted securely, it is still a dubious security practice to store them as part of the machine metadata. For example, it may give a dishonest Amazon employee access to information which he or she would not otherwise be able to view. (The metadata fields are clear text; none of your credentials are stored in hashed or encrypted form.)

The STAR (Software Tools for Academics and Researchers) program at MIT provides a wonderful command-line tool called StarCluster. This tool has a number of subcommands, which can be used to create, manage, login to, stop, and destroy clusters of one or more VM instances on EC2. Although StarCluster does not natively support Galaxy (yet), its value as an extremely convenient, general purpose EC2 management tool cannot be denied. A more detailed examination of this tool will follow.

If you are interested in maintaining a cluster of Galaxy workers in the long term, then it is conceivable that CloudMan may be the better option for you. Of course, in such a case, you may wish to provide some dedicated hardware for this purpose and avoid the pay-as-you-go scheme associated with cloud computing. This document, however, focuses on running a Galaxy server for the purpose of developing tools. For this purpose, using MIT StarCluster as a basis is eminently useful and more detailed discussion will proceed along those lines.

As a final note, it should be mentioned that the Galaxy developers do provide a public-facing development system, whereat you can upload and test your own tools. While this has the convenience of being ready-made, the rather open permissions model leaves a lot to be desired. Rogue tools can be uploaded to this public-facing system. Developers are also at the mercy of the maintenance schedule for this system, which may prove disruptive to development. Also, the act of development and testing is not as efficient as if one had behind-the-scenes access to his or her own Galaxy server, as we will see below.

Previous topic

“Quick” Guide for Experts

Next topic

MIT StarCluster

This Page