Posts

Run The Latest Whirr And Deploy HBase In Minutes

Fast, faster, faster!

In a few of my recent posts I have covered the ease of deploying clusters of Hadoop and Cassandra using Whirr. With Whirr you can simply write a configuration file specifying which cloud provider you are using, your credentials and the definition of the cluster you desire and it will build it for you. In this post, I am going to look at the latest Whirr, version 0.3.0, which is currently in release candidate status. I will show you how you can find and build the latest Whirr and live on the edge of what it possible.

Grabbing The Source Code

On the source repository for Whirr, we can find the latest and greatest source code for Whirr.

twitter post

We will use the code from trunk, which will give you the very latest version of the software.

Even though version 0.2.0 is currently the most recent stable version we can download on the official Whirr site, 0.3.0 is available here under the tagged releases and hence the trunk. So let’s download the source for that. Version 0.3.0 will soon be made the official release, but by following this step-by-step guide, you will be ahead of the pack.

cd ~/src
svn co https://svn.apache.org/repos/asf/incubator/whirr/trunk whirr-trunk
cd whirr-trunk

Make Sure You Have Those Dependencies

cat BUILD.txt

Looking at the dependencies in BUILD.txt, you can see there is a few things we need.

Apache Whirr Build Instructions

REQUIREMENTS
- Java 1.6
- Apache Maven 2.2.1 or greater
- Ruby 1.8.7 or greater (to run build-tools/update-versions)
BUILDING
To run unit tests and install artifacts locally:
mvn clean install
To build a source package:
mvn package -Ppackage

If you have followed through one of my previous posts, then you will likely have these dependencies all installed, but I will review them just encase.

Being on Mac OS X, I generally use Homebrew so install my dependencies. This will install Maven on Mac OS X if you have Homebrew installed.

sudo brew install maven

If you are on Debian Linux (eg. Ubuntu) use apt-get to install Maven.

sudo apt-get install maven2

There are also instructions available for installing Maven on Windows or other platform.

If you are using the latest Mac OS X then you should have Ruby 1.8.7 and Java 1.6 installed. If you have to upgrade your Ruby, then I recommend checking out Ruby Version Manager (RVM).

Here’s how to check your versions and what I’m currently running…

# Ruby
ruby -v
ruby 1.9.2p94 (2010-12-08 revision 30140) [x86_64-darwin10.5.0]

# Java
java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)

# Maven
mvn -v
Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
Java version: 1.6.0_22
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: en_US, platform encoding: MacRoman
OS name: "mac os x" version: "10.6.6" arch: "x86_64" Family: "mac"

twitter post2

Building Whirr

Let’s build Whirr 0.3.0!

Run the following command from inside the source directory.

cd ~/src/whirr-trunk
mvn clean install

The first time I installed this, it took 17 minutes to build, most of which was downloading dependencies. It failed downloading one of the dependencies and so failed to build.

[WARNING] Unable to get resource 'org.apache.hadoop:hadoop-core:jar:0.20.2'
from repository central (https://repo1.maven.org/maven2):
GET request of:
org/apache/hadoop/hadoop-core/0.20.2/hadoop-core-0.20.2.jar
from central failed

[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.

Missing:
----------
1) org.apache.hadoop:hadoop-core:jar:0.20.2

The path to this dependency was ok, so it must have just been one of those unfortunate glitches in the magic workings of the Internet. Please let me know if you have a similar experience. The chances of this happening to you are very slim.

I ran “mvn clean install” once more.

mvn clean install

This time is only took 2 minutes and built successfully.

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] ------------------------------------------------------------------------
[INFO] Whirr ................................................. SUCCESS [6.779s]
[INFO] Apache Whirr Build Tools .............................. SUCCESS [2.017s]
[INFO] Apache Whirr Core ..................................... SUCCESS [11.104s]
[INFO] Apache Whirr Cassandra ................................ SUCCESS [3.332s]
[INFO] Apache Whirr Hadoop ................................... SUCCESS [9.983s]
[INFO] Apache Whirr ZooKeeper ................................ SUCCESS [7.584s]
[INFO] Apache Whirr HBase .................................... SUCCESS [57.760s]
[INFO] Apache Whirr CLI ...................................... SUCCESS [38.708s]
[INFO] Apache Whirr Hadoop ................................... SUCCESS [1.884s]
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2 minutes 20 seconds
[INFO] Finished at: Mon Jan 24 12:49:58 PST 2011
[INFO] Final Memory: 90M/123M
[INFO] ------------------------------------------------------------------------

Now that our code is compiled, we can run the final build command.

mvn package -Ppackage

(reduced output)

[INFO] Scanning for projects...
[INFO] Reactor build order:
[INFO]   Whirr
[INFO]   Apache Whirr Build Tools
[INFO]   Apache Whirr Core
[INFO]   Apache Whirr Cassandra
[INFO]   Apache Whirr Hadoop
[INFO]   Apache Whirr ZooKeeper
[INFO]   Apache Whirr HBase
[INFO]   Apache Whirr CLI
[INFO]   Apache Whirr Hadoop
[INFO] ------------------------------------------------------------------------
[INFO] Building Whirr
[INFO]    task-segment: [package]
[INFO] ------------------------------------------------------------------------

Launch A HBase Cluster

Currently in trunk there is a recipe under the “recipes” directory called “hbase-ec2″. We can copy that, although in this example we are not going to modify it.

cp recipes/hbase-ec2.properties .

There is many comments in there, so here is a summary.

whirr.cluster-name=hbase
whirr.instance-templates=1 zk+nn+jt+hbase-master,5 dn+tt+hbase-regionserver
whirr.provider=ec2
whirr.identity=${env:AWS_ACCESS_KEY_ID}
whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
whirr.hardware-id=c1.xlarge
whirr.image-id=us-east-1/ami-da0cf8b3
whirr.location-id=us-east-1

In the above, the line you will want to play with is “whirr.instance-templates”, as this defines the shape and size of your cluster. Increasing the value “5″ will give you a bigger cluster.

We have a total of 6 machines here. 1 machine runs all the master services and 5 more machines run all the worker services. Here is breakdown of the services running on those machines, as defined by whirr.instance-templates.

1 zk+nn+jt+hbase-master 1 = 1 instance of the following

zk = Zookeeper
nn = Hadoop Name-Node
jt = Hadoop Job Tracker
hbase-master = HBase Master

5 dn+tt+hbase-regionserver 5 = 5 instances of the following

dn = Hadoop Data-Node
tt = Hadoop Task-Tracker
hbase-regionserver = HBase Region Server

If you are interested in how all these Hadoop and HBase components work together, see Lars George’s excellent posts “HBase Architecture 101 – Storage” and HBase Architecture 101 – Write-ahead-Log” or check-out the HBase wiki.

In previous Whirr examples I have defined the Amazon EC2 credentials in this properties file, but the above will pick them up from the environment, which is a better way to go. Export you credentials into your environment (here I use dumby credentials as an example).

export AWS_ACCESS_KEY_ID=123456789ABCDEFGHIJKLM
export AWS_SECRET_ACCESS_KEY=ABCDabcd1234/xyzXZY54321acbd

We can now launch our cluster.

bin/whirr launch-cluster --config hbase-ec2.properties

Bootstrapping cluster
Configuring template
Starting 1 node(s) with roles [jt, nn, zk, hbase-master]
Configuring template
Starting 5 node(s) with roles [tt, hbase-regionserver, dn]
Nodes started: [[id=us-east-1/i-e134808d, providerId=i-e134808d, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.112.205.48], publicAddresses=[184.72.159.249], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]]]
Nodes started: [[id=us-east-1/i-f734809b, providerId=i-f734809b, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.194.127.79], publicAddresses=[50.16.154.13], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-fb348097, providerId=i-fb348097, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.98.33.250], publicAddresses=[50.16.71.166], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-f5348099, providerId=i-f5348099, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.195.6.143], publicAddresses=[204.236.242.78], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-f9348095, providerId=i-f9348095, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.119.22.224], publicAddresses=[174.129.72.44], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]], [id=us-east-1/i-f134809d, providerId=i-f134809d, tag=hbase, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], userMetadata={}, state=RUNNING, privateAddresses=[10.98.146.48], publicAddresses=[174.129.142.130], hardware=[id=c1.xlarge, providerId=c1.xlarge, name=c1.xlarge, processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], supportsImage=is64Bit()]]]
Authorizing firewall
Authorizing firewall
Authorizing firewall
Running configuration script
Configuration script run completed
Running configuration script
Configuration script run completed
Completed configuration of hbase
Web UI available at https://ec2-184-72-159-249.compute-1.amazonaws.com
Wrote Hadoop site file /Users/phil/.whirr/hbase/hadoop-site.xml
Wrote Hadoop proxy script /Users/phil/.whirr/hbase/hadoop-proxy.sh
Completed configuration of hbase
Hosts: ec2-184-72-159-249.compute-1.amazonaws.com:2181
Completed configuration of hbase
Web UI available at https://ec2-184-72-159-249.compute-1.amazonaws.com
Wrote HBase site file /Users/phil/.whirr/hbase/hbase-site.xml
Wrote HBase proxy script /Users/phil/.whirr/hbase/hbase-proxy.sh
Wrote instances file /Users/phil/.whirr/hbase/instances
Started cluster of 6 instances
Cluster{instances=[Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/204.236.242.78, privateAddress=/10.195.6.143, id=us-east-1/i-f5348099}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/174.129.72.44, privateAddress=/10.119.22.224, id=us-east-1/i-f9348095}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/50.16.71.166, privateAddress=/10.98.33.250, id=us-east-1/i-fb348097}, Instance{roles=[jt, nn, zk, hbase-master], publicAddress=ec2-184-72-159-249.compute-1.amazonaws.com/184.72.159.249, privateAddress=/10.112.205.48, id=us-east-1/i-e134808d}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/174.129.142.130, privateAddress=/10.98.146.48, id=us-east-1/i-f134809d}, Instance{roles=[tt, hbase-regionserver, dn], publicAddress=/50.16.154.13, privateAddress=/10.194.127.79, id=us-east-1/i-f734809b}], configuration={hbase.zookeeper.quorum=ec2-184-72-159-249.compute-1.amazonaws.com:2181}}

Destroy!

At some point you will want to tear-down that cluster. Here is how you can do that.

bin/whirr destroy-cluster --config hbase-ec2.properties

Destroying hbase cluster
Cluster hbase destroyed

Conclusion

Congratulations! If you have followed through this example, then you now have your own HBase cluster running in the cloud. Now… what to do with that HBase cluster? /.

Comments

  1. Lars George

    Hi Phil,

    Great post! We are still working on a few kinks but once Whirr 0.3.0 is out and will be released with CDH it will be even easier to get going as no build is required (obviously).

    A few notes, you are missing the “tt” (i.e TaskTracker) in the explanation table for the cluster template. Also, just to reiterate, all of the above services share the same instance. So in your example you will start 1+5 EC2 server with the various services running on them.

    Finally, what is also cool is that Whirr creates a local “$HOME/.whirr//” directory, so in your example $HOME/.whirr/hbase/ which has various files to help you working with the cluster. One of those is the “hadoop-proxy.sh” (start it like “source $HOME/.whirr/hbase/hadoop-proxy.sh &” to launch it in the background) which sets up the SOCKS proxy for you so that you can talk to the servers and access the web UI for the master and region servers.

    There is also a “instances” which lists nicely the servers in the cluster, their local and remote IPs and roles. It could be used by a script or program to parse the details and be able to communicate with the servers.

    Thanks again for writing this post!

    Regards,
    Lars

  2. Lynn Monson

    Very helpful. The only thing I tripped on was understanding how the local keypair file relates to Whirr. In my case, I followed your tutorial from a clean machine, itself running on EC2. Since that machine had no ~/.ssh/id_rsa file, Whirr failed.

    It’s still not completely clear to me how Whirr bootstraps the machines, but I did learn that eventually Whirr uploads your local public key to the bootstrapped instances. So a simple invocation of ssh-keygen -t rsa got everything running.

    Thanks again for the time and effort spent on this tutorial.

  3. Eugene Koontz

    Thanks a lot for this post; I found that your instructions work great, but if you prefer to use git, you can do :

    git clone git://git.apache.org/whirr.git

    instead of:

    svn co https://svn.apache.org/repos/asf/incubator/whirr/trunk whirr-trunk

  4. Tim

    Great instructions!
    I believe the /incubator is not needed in the svn URL

    When mine starts, it says:

    Completed configuration of hbase
    Web UI available at https://107.20.125.231
    Wrote HBase site file /Users/tim/.whirr/hbase/hbase-site.xml
    Wrote HBase proxy script /Users/tim/.whirr/hbase/hbase-proxy.sh
    Completed configuration of hbase role hadoop-datanode
    Completed configuration of hbase role hadoop-tasktracker
    Starting to run scripts on cluster for phase start on instances: us-east-1/i-116dab74
    Running start phase script on: us-east-1/i-116dab74
    start phase script run completed on: us-east-1/i-116dab74
    Successfully executed start script: [output=, error=, exitCode=0]
    Finished running start phase scripts on all cluster instances
    Started cluster of 4 instances

    but it seems only the zookeeper actually started up and there is no JT or HBase master:

    tim@ip-10-79-37-92:~$ ps -ef | grep java
    root 3214 1 0 13:29 ? 00:00:00 java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /usr/local/zookeeper-3.3.3/bin/../build/classes:/usr/local/zookeeper-3.3.3/bin/../build/lib/*.jar:/usr/local/zookeeper-3.3.3/bin/../zookeeper-3.3.3.jar:/usr/local/zookeeper-3.3.3/bin/../lib/log4j-1.2.15.jar:/usr/local/zookeeper-3.3.3/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper-3.3.3/bin/../src/java/lib/*.jar:/etc/zookeeper/conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg

    All the slave node services started it seems.

  5. Kiko Aumond

    This is an extremely useful post, thank you so much for writing it!