Grabbing The Source Code
On the source repository for Whirr, we can find the latest and greatest source code for Whirr.
We will use the code from trunk, which will give you the very latest version of the software.
Even though version 0.2.0 is currently the most recent stable version we can download on the official Whirr site, 0.3.0 is available here under the tagged releases and hence the trunk. So let’s download the source for that. Version 0.3.0 will soon be made the official release, but by following this step-by-step guide, you will be ahead of the pack.
cd ~/src svn co http://svn.apache.org/repos/asf/incubator/whirr/trunk whirr-trunk cd whirr-trunk
Make Sure You Have Those Dependencies
cat BUILD.txt
Looking at the dependencies in BUILD.txt, you can see there is a few things we need.
Apache Whirr Build Instructions
REQUIREMENTS
- Java 1.6
- Apache Maven 2.2.1 or greater
- Ruby 1.8.7 or greater (to run build-tools/update-versions)BUILDING
To run unit tests and install artifacts locally:
mvn clean install
To build a source package:
mvn package -Ppackage
If you have followed through one of my previous posts, then you will likely have these dependencies all installed, but I will review them just encase.
Being on Mac OS X, I generally use Homebrew so install my dependencies. This will install Maven on Mac OS X if you have Homebrew installed.
sudo brew install maven
If you are on Debian Linux (eg. Ubuntu) use apt-get to install Maven.
sudo apt-get install maven2
There are also instructions available for installing Maven on Windows or other platform.
If you are using the latest Mac OS X then you should have Ruby 1.8.7 and Java 1.6 installed. If you have to upgrade your Ruby, then I recommend checking out Ruby Version Manager (RVM).
Here’s how to check your versions and what I’m currently running…
# Ruby ruby -v ruby 1.9.2p94 (2010-12-08 revision 30140) [x86_64-darwin10.5.0] # Java java -version java version "1.6.0_22" Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode) # Maven mvn -v Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700) Java version: 1.6.0_22 Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home Default locale: en_US, platform encoding: MacRoman OS name: "mac os x" version: "10.6.6" arch: "x86_64" Family: "mac"
Building Whirr
Let’s build Whirr 0.3.0!
Run the following command from inside the source directory.
cd ~/src/whirr-trunk mvn clean install
The first time I installed this, it took 17 minutes to build, most of which was downloading dependencies. It failed downloading one of the dependencies and so failed to build.
[WARNING] Unable to get resource 'org.apache.hadoop:hadoop-core:jar:0.20.2' from repository central (http://repo1.maven.org/maven2): GET request of: org/apache/hadoop/hadoop-core/0.20.2/hadoop-core-0.20.2.jar from central failed [INFO] ------------------------------------------------------------------------ [ERROR] BUILD ERROR [INFO] ------------------------------------------------------------------------ [INFO] Failed to resolve artifact. Missing: ---------- 1) org.apache.hadoop:hadoop-core:jar:0.20.2
The path to this dependency was ok, so it must have just been one of those unfortunate glitches in the magic workings of the Internet. Please let me know if you have a similar experience. The chances of this happening to you are very slim.
I ran “mvn clean install” once more.
mvn clean install
This time is only took 2 minutes and built successfully.
[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] ------------------------------------------------------------------------ [INFO] Whirr ................................................. SUCCESS [6.779s] [INFO] Apache Whirr Build Tools .............................. SUCCESS [2.017s] [INFO] Apache Whirr Core ..................................... SUCCESS [11.104s] [INFO] Apache Whirr Cassandra ................................ SUCCESS [3.332s] [INFO] Apache Whirr Hadoop ................................... SUCCESS [9.983s] [INFO] Apache Whirr ZooKeeper ................................ SUCCESS [7.584s] [INFO] Apache Whirr HBase .................................... SUCCESS [57.760s] [INFO] Apache Whirr CLI ...................................... SUCCESS [38.708s] [INFO] Apache Whirr Hadoop ................................... SUCCESS [1.884s] [INFO] ------------------------------------------------------------------------ [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESSFUL [INFO] ------------------------------------------------------------------------ [INFO] Total time: 2 minutes 20 seconds [INFO] Finished at: Mon Jan 24 12:49:58 PST 2011 [INFO] Final Memory: 90M/123M [INFO] ------------------------------------------------------------------------
Now that our code is compiled, we can run the final build command.
mvn package -Ppackage (reduced output) [INFO] Scanning for projects... [INFO] Reactor build order: [INFO] Whirr [INFO] Apache Whirr Build Tools [INFO] Apache Whirr Core [INFO] Apache Whirr Cassandra [INFO] Apache Whirr Hadoop [INFO] Apache Whirr ZooKeeper [INFO] Apache Whirr HBase [INFO] Apache Whirr CLI [INFO] Apache Whirr Hadoop [INFO] ------------------------------------------------------------------------ [INFO] Building Whirr [INFO] task-segment: [package] [INFO] ------------------------------------------------------------------------
Launch A HBase Cluster
Currently in trunk there is a recipe under the “recipes” directory called “hbase-ec2″. We can copy that, although in this example we are not going to modify it.
cp recipes/hbase-ec2.properties .
There is many comments in there, so here is a summary.
whirr.cluster-name=hbase whirr.instance-templates=1 zk+nn+jt+hbase-master,5 dn+tt+hbase-regionserver whirr.provider=ec2 whirr.identity=${env:AWS_ACCESS_KEY_ID} whirr.credential=${env:AWS_SECRET_ACCESS_KEY} whirr.hardware-id=c1.xlarge whirr.image-id=us-east-1/ami-da0cf8b3 whirr.location-id=us-east-1
In the above, the line you will want to play with is “whirr.instance-templates”, as this defines the shape and size of your cluster. Increasing the value “5″ will give you a bigger cluster.
We have a total of 6 machines here. 1 machine runs all the master services and 5 more machines run all the worker services. Here is breakdown of the services running on those machines, as defined by whirr.instance-templates.
1 zk+nn+jt+hbase-master | 1 = 1 instance of the following
zk = Zookeeper |
5 dn+tt+hbase-regionserver | 5 = 5 instances of the following
dn = Hadoop Data-Node |
If you are interested in how all these Hadoop and HBase components work together, see Lars George’s excellent posts “HBase Architecture 101 – Storage” and HBase Architecture 101 – Write-ahead-Log” or check-out the HBase wiki.
In previous Whirr examples I have defined the Amazon EC2 credentials in this properties file, but the above will pick them up from the environment, which is a better way to go. Export you credentials into your environment (here I use dumby credentials as an example).
export AWS_ACCESS_KEY_ID=123456789ABCDEFGHIJKLM export AWS_SECRET_ACCESS_KEY=ABCDabcd1234/xyzXZY54321acbd
We can now launch our cluster.
bin/whirr launch-cluster --config hbase-ec2.properties Bootstrapping cluster Configuring template Starting 1 node(s) with roles [jt, nn, zk, hbase-master] Configuring template Starting 5 node(s) with roles [tt, hbase-regionserver, dn] Nodes started: [[id=us-east-1/i-e134808d, providerId=i-e134808d, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.112.205.48], publicAddresses=[184.72.159.249], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]]] Nodes started: [[id=us-east-1/i-f734809b, providerId=i-f734809b, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.194.127.79], publicAddresses=[50.16.154.13], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-fb348097, providerId=i-fb348097, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.98.33.250], publicAddresses=[50.16.71.166], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-f5348099, providerId=i-f5348099, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.195.6.143], publicAddresses=[204.236.242.78], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-f9348095, providerId=i-f9348095, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.119.22.224], publicAddresses=[174.129.72.44], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-f134809d, providerId=i-f134809d, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.98.146.48], publicAddresses=[174.129.142.130], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]]] Authorizing firewall Authorizing firewall Authorizing firewall Running configuration script Configuration script run completed Running configuration script Configuration script run completed Completed configuration of hbase Web UI available at http://ec2-184-72-159-249.compute-1.amazonaws.com Wrote Hadoop site file /Users/phil/.whirr/hbase/hadoop-site.xml Wrote Hadoop proxy script /Users/phil/.whirr/hbase/hadoop-proxy.sh Completed configuration of hbase Hosts: ec2-184-72-159-249.compute-1.amazonaws.com:2181 Completed configuration of hbase Web UI available at http://ec2-184-72-159-249.compute-1.amazonaws.com Wrote HBase site file /Users/phil/.whirr/hbase/hbase-site.xml Wrote HBase proxy script /Users/phil/.whirr/hbase/hbase-proxy.sh Wrote instances file /Users/phil/.whirr/hbase/instances Started cluster of 6 instances Cluster{instances=[Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/204.236.242.78, privateAddress=/10.195.6.143, id=us-east-1/i-f5348099}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/174.129.72.44, privateAddress=/10.119.22.224, id=us-east-1/i-f9348095}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/50.16.71.166, privateAddress=/10.98.33.250, id=us-east-1/i-fb348097}, Instance{roles=[jt, nn, zk, hbase-master], publicAddress=ec2-184-72-159-249.compute-1.amazonaws.com/184.72.159.249, privateAddress=/10.112.205.48, id=us-east-1/i-e134808d}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/174.129.142.130, privateAddress=/10.98.146.48, id=us-east-1/i-f134809d}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/50.16.154.13, privateAddress=/10.194.127.79, id=us-east-1/i-f734809b}], configuration={hbase.zookeeper.quorum=ec2-184-72-159-249.compute-1.amazonaws.com:2181}}
Destroy!
At some point you will want to tear-down that cluster. Here is how you can do that.
bin/whirr destroy-cluster --config hbase-ec2.properties Destroying hbase cluster Cluster hbase destroyed
Conclusion
Congratulations! If you have followed through this example, then you now have your own HBase cluster running in the cloud. Now… what to do with that HBase cluster? Stay tuned for a future posts.
Quickly Launch A Cassandra Cluster On Amazon EC2

If you have read my previous post, “Map-Reduce With Ruby Using Hadoop“, then you will know that firing up a Hadoop cluster is really simple when you use Whirr. Without even ssh’ing on the machines in the cloud you can start-up your cluster and interact with it. In this post I’ll show you that it [...]
Map-Reduce With Ruby Using Hadoop

Here I demonstrate, with repeatable steps, how to fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java.
Hi Phil,
Great post! We are still working on a few kinks but once Whirr 0.3.0 is out and will be released with CDH it will be even easier to get going as no build is required (obviously).
A few notes, you are missing the “tt” (i.e TaskTracker) in the explanation table for the cluster template. Also, just to reiterate, all of the above services share the same instance. So in your example you will start 1+5 EC2 server with the various services running on them.
Finally, what is also cool is that Whirr creates a local “$HOME/.whirr//” directory, so in your example $HOME/.whirr/hbase/ which has various files to help you working with the cluster. One of those is the “hadoop-proxy.sh” (start it like “source $HOME/.whirr/hbase/hadoop-proxy.sh &” to launch it in the background) which sets up the SOCKS proxy for you so that you can talk to the servers and access the web UI for the master and region servers.
There is also a “instances” which lists nicely the servers in the cluster, their local and remote IPs and roles. It could be used by a script or program to parse the details and be able to communicate with the servers.
Thanks again for writing this post!
Regards,
Lars
Thanks for the comments Lars. Well spotted on that missing “tt”! I’ve also clarified the number of EC2 machines in use, as this was not clear.
I’m looking forward to 0.3.0 of Whirr being released and the general progress of it. It is a fantastic tool for getting up and running in the cloud. Thanks to yourself and all the committers!
Very helpful. The only thing I tripped on was understanding how the local keypair file relates to Whirr. In my case, I followed your tutorial from a clean machine, itself running on EC2. Since that machine had no ~/.ssh/id_rsa file, Whirr failed.
It’s still not completely clear to me how Whirr bootstraps the machines, but I did learn that eventually Whirr uploads your local public key to the bootstrapped instances. So a simple invocation of ssh-keygen -t rsa got everything running.
Thanks again for the time and effort spent on this tutorial.
Thanks a lot for this post; I found that your instructions work great, but if you prefer to use git, you can do :
git clone git://git.apache.org/whirr.git
instead of:
svn co http://svn.apache.org/repos/asf/incubator/whirr/trunk whirr-trunk
Great instructions!
I believe the /incubator is not needed in the svn URL
When mine starts, it says:
Completed configuration of hbase
Web UI available at http://107.20.125.231
Wrote HBase site file /Users/tim/.whirr/hbase/hbase-site.xml
Wrote HBase proxy script /Users/tim/.whirr/hbase/hbase-proxy.sh
Completed configuration of hbase role hadoop-datanode
Completed configuration of hbase role hadoop-tasktracker
Starting to run scripts on cluster for phase start on instances: us-east-1/i-116dab74
Running start phase script on: us-east-1/i-116dab74
start phase script run completed on: us-east-1/i-116dab74
Successfully executed start script: [output=, error=, exitCode=0]
Finished running start phase scripts on all cluster instances
Started cluster of 4 instances
but it seems only the zookeeper actually started up and there is no JT or HBase master:
tim@ip-10-79-37-92:~$ ps -ef | grep java
root 3214 1 0 13:29 ? 00:00:00 java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /usr/local/zookeeper-3.3.3/bin/../build/classes:/usr/local/zookeeper-3.3.3/bin/../build/lib/*.jar:/usr/local/zookeeper-3.3.3/bin/../zookeeper-3.3.3.jar:/usr/local/zookeeper-3.3.3/bin/../lib/log4j-1.2.15.jar:/usr/local/zookeeper-3.3.3/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper-3.3.3/bin/../src/java/lib/*.jar:/etc/zookeeper/conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg
All the slave node services started it seems.
This is an extremely useful post, thank you so much for writing it!