Open Source Hadoop with HBase to Provide Scalabale Platform

| 0 Comments

When dealing with vast amount of data you need a scalable distributed storage system. All of my database driven web sites use MySQL and for smaller databases you can mount a MySQL search to the web site. But you soon find out that to deliver fast searching capabilities to a site, or if as in my case you intend to offer a search service of hundreds of millions of crawled niche data you need a scalable distributed storage system.

Recently Google hosted a Conference on Scalability in Seattle where they talked about MapReduce, BigTable, and other distributed systems for large datasets. Listed here are the talks which are now available on Google video:

(Kudo's to Greg Linden for compiling the list of videos.)

The video's provide some technical detail while Marissa Mayer's provides some insight into Google's big picture plans.

Google's technology however is closed so if you're interested in a solution that you can use then turning to open source projects is the way to go. And this is where Hadoop with HBase come in.


Hadoop is a framework for running applications on large clusters of commodity hardware. There's a lot of development going into Hadoop right now mostly being led by Doug Cutting and Owen O'Malley of Yahoo. In my experience if you implement Hadoop you really need to stay on top of it and tweak to suite your needs. To show how young Hadoop is, the current stable release is 0.13.0.

HBase is a distributed storage system for structured data and designed for storing very large amounts of data in a distributed environment. It's intent is to be similar in function to Google's Bigtable which is used with the Google File System. Hbase will provide Bigtable-like capabilities on top of Hadoop.

While these projects are still in their infancy the open source model is leading to rapid development in these technologies.

Leave a comment - Sign in with SpaceRef, Google, Yahoo or OpenID accounts

Recent Blog Entries

Thoughts on Apple's iPad - Why it Will Succeed
I haven't used one and I can't buy one yet, as I'm Canada, but I do have some thoughts on…
Bigelow Space Station 1/30th Scale Model
I received two Bigelow Space Station models today. They are 1/30 scale model and include one B.A. Standard Module, two…
What if Twitter was Down for Several Days? Perhaps it's Time for a new Internet Protocol
Anil Dash has an opinion piece today on CNN which basically says don't let a service like Twitter or Facebook…
Using Social Media Tools Like Twitter to add Value to Advertisers Campaigns
SpaceRef has recently started using Twitter as an additional marketing tool as part of our advertisers campaigns. We don't spam…
Apple 12″ PowerBook G4 Meet Yellow Dog Linux
I hate it when a perfectly good computer just sits around doing nothing. In this case it's my old Apple…
New Media Hearings - CRTC Should Once Again Do Nothing
Ten years ago I testified at the Canadian Radio-television Telecommunications Commission's (CRTC) New Media hearings in Ottawa and argued that…