Follow @philwhln

wikipedia

How To Get Experience Working With Large Datasets

How To Get Experience Working With Large Datasets

By Phil Whelan on December 8, 2010

There are data sources out there, but which data source you choose depends on which technology you wish to get experience working with. The experience should be of the technologies you are using, rather than what the data is. Certain datasets pair better with certain technologies. Simulating the data can be another approach. You just need a clever way of generating and randomizing your fake data. Thirdly, you can use a hybrid approach. Take real data and replay it on a loop, randomizing it as it goes through. Simulating the Twitter fire-hose should not be too hard, should it?

Posted in Data processing | Tagged amazon ec2, bbc, cassandra, data processing, dbpedia, ebs, freebase, hadoop, hdfs, large datasets, livedoor, lucene, mongodb, nosql, rackspace, solr, wikipedia, working with data | 19 Responses

To Show Ads Or Not To Show Ads

By Phil Whelan on December 2, 2010

I was just reading “An appeal from Wikipedia founder Jimmy Wales”, where he is asking for donations for supporting the continuation of Wikipedia, ad-free. I do not need to tell you what an amazing website Wikipedia is. It’s a vast pool of information with a very large volume of visitors. It would be a perfect [...]

Posted in Money | Tagged adsense, wikipedia

Top Posts

  • Homebrew - Intro To The Mac OS X Package Installer
  • Quora's Technology Examined
  • Gitolite Installation Step-By-Step
  • How To Get Experience Working With Large Datasets
  • Install Gitolite To Manage Your Git Repositories
  • Embed Base64-Encoded Images Inline In HTML
  • Map-Reduce With Ruby Using Hadoop
  • Highchart Vs Flot.js - Comparing JavaScript Graphing Engines

Tags

amazon ec2 android apple cassandra customers data processing entrepreneur entrepreneurship eventmachine gem git gitolite google hadoop hbase hdfs high scalability homebrew install iphone java location mac osx memcached mongodb mysql nginx nosql perl phone postgresql python rails redis ruby ruby on rails scala ssh-keygen startup tornado twitter vancouver web-development whirr wikipedia

Copyright © 2016 Big Fast Blog | Vancouver, BC, Canada