Getting Started with HBase and Thrift for Node

Node and HBase

Why Node and HBase?

I think the title of this article is self explanatory, but for a bit of bit background for people who may not know: NodeJS is a JavaScript server framework based on a very thin and fast I/O multiplexer. That means NodeJS is great for things like proxying lots of short lived connections. Node’s abilities make it ideally...

Node and HBase

Why Node and HBase?

Traditionally the way to communicate with HBase using Node is to use the REST interface. This is both slow and not ideally suited for a scaling production environment. If we could leverage the write-scale of HBase with the proxy-scale of Node we could do some pretty cool things, like script map-reduce jobs for HBase in JavaScript, or create a service to do low latency writes. How can we do this? The answer is Thrift!

What is Thrift?

Thrift is a “software framework, for scalable cross-language services development…with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js…” Thrift allows us to create a thin API in NodeJS to communicate with HBase using a thin socket protocol. The advantages over REST are that the connection stays alive and the protocol is thinner than XML.

Enough analysis, let’s get started.

Getting Started

First you will need to install HBase and Thrift. I’m not going to go over that here, as there are other great tutorials to do that. Here’s one for HBase and one for Thrift (you’ll need some other things to get Thrift going). Once you have both HBase and Thrift installed go ahead and open up your shell of choice and create a working directory and make it your current directory.

mkdir node_hbase
cd node_hbase

Then generate the Thrift files for Node like so:

thrift --gen js:node [hbase-dir]/src/main/resources/org/apache/hadoop/hbase/thrift/HBase.thrift

The tutorial for generating these files can be found here.

After generating these files it’s time to create our Node file. Here is an example that we can use. Make sure this file is inside the folder we created:

varthrift=require('thrift'),HBase=require('./gen-nodejs/HBase.js'),HBaseTypes=require('./gen-nodejs/HBase_types.js'),connection=thrift.createConnection('localhost',9090,{transport:thrift.TBufferedTransport,protocol:thrift.TBinaryProtocol});connection.on('connect',function(){varclient=thrift.createClient(HBase,connection);client.getTableNames(function(err,data){if(err){console.log('gettablenames error:',err);}else{console.log('hbase tables:',data);}connection.end();});});connection.on('error',function(err){console.log('error',err);});

We don’t want to run this yet. We need to start HBase and HBase’s Thrift server/API. Starting HBase is easy, it’s just one command:

[hbase-dir]/bin/hbase-daemon.sh start thrift -f

Now we can run our Node file. You may want to go into the HBase shell and create a table, otherwise you will get an empty array back from the example above.

The HBase Thrift API is documented here. The Node implementation of these methods will be different than how they are written and will almost never return anything, but instead will have a callback as an argument which will give the result of the method.

I think Node, HBase, and Thrift as a stack is a winning combination and would be ideally suited for something like a custom analytics platform or a mapping platform.

I will do my best to keep up with any comments or questions people have below (until they close).