Follow @philwhln

hadoop streaming

Map-Reduce With Ruby Using Hadoop

Map-Reduce With Ruby Using Hadoop

By Phil Whelan on December 31, 2010

Here I demonstrate, with repeatable steps, how to fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java.

Posted in Data processing, Hadoop, Ruby | Tagged amazon ec2, bash, cloudera, data processing, hadoop, hadoop cluster, hadoop streaming, hdfs, jclouds, map-reduce, ruby, whirr | 29 Responses

Top Posts

  • Homebrew - Intro To The Mac OS X Package Installer
  • Quora's Technology Examined
  • Gitolite Installation Step-By-Step
  • How To Get Experience Working With Large Datasets
  • Install Gitolite To Manage Your Git Repositories
  • Embed Base64-Encoded Images Inline In HTML
  • Map-Reduce With Ruby Using Hadoop
  • Highchart Vs Flot.js - Comparing JavaScript Graphing Engines

Tags

amazon ec2 android apple cassandra customers data processing entrepreneur entrepreneurship eventmachine gem git gitolite google hadoop hbase hdfs high scalability homebrew install iphone java location mac osx memcached mongodb mysql nginx nosql perl phone postgresql python rails redis ruby ruby on rails scala ssh-keygen startup tornado twitter vancouver web-development whirr wikipedia

Copyright © 2016 Big Fast Blog | Vancouver, BC, Canada