返回列表 发帖

Using thrift python client with HBase

原文:http://whynosql.com/using-thrift-python-client-with-hbase/

There are two useful tutorials (HBase wiki and Yaan’s blog) on the web devoted to this topic. But I think both of them missed few steps. In spite of following the tutorials, I found myself struggling with compiling thrift and python’s No module found errors. Hence this attempt.
You can use python client in two ways:
1) Use jython (HBase java client library can be directly accessed in python)
2) Using thrift

In this tutorial, I am going to explain how to use python and thrift to access HBase. Here is the summary of steps you will need to follow:
1) Download thrift
2) Install thrift dependencies
3) Compile and install thrift
4) Generate HBase thrift python module
5) Add HBase thrift python module to pythonpath
6) Start HBase thrift server
7) Use the client!

Following is the detailed explanation of the steps. I am assuming that you will be using ubuntu as your development environment. That’s what I use. I am also assuming that HBase is installed and you have HBASE_HOME defined in the environment.

1) Download thrift
Download thrift by clicking on the link embedded in this sentence.

Unzip the tar.gz file using tar -xvzf  thrift-0.3.0.tar.gz. Let’s say you unzipped it in /home/horcrux/Software/thrift-0.3.0/

2) Install thrift dependencies
Thrift requires many packages for compilation. It requires boost c++ libraries, flex, mkmf and other build essentials. You can install all the dependencies by executing the following commands. ruby1.8-dev is to get mkmf installed.

sudo apt-get install build-essential
sudo apt-get install libboost1.40-dev
sudo apt-get install flex
sudo apt-get install ruby1.8-dev

3) Compile and install thrift
Execute the following commands to compile and install thrift

cd /home/horcrux/Software/thrift-0.3.0/
./configure
make
sudo make install

Now let’s install thrift python. The following command will make sure that the thrift module is in your pythonpath.
cd /home/horcrux/Software/thrift-0.3.0/lib/py
sudo python setup.py install

4) Generate HBase thrift python module
Once this is done, you should have thrift in your path. You should be able to execute thrift command from anywhere. Now let’s generate the Hbase thrift modeule from the Hbase.thrift config file.

thrift --gen py $HBASE_HOME/src/java/org/apache/hadoop/hbase/thrift/Hbase.thrift

This command will create gen-py folder in your thrift folder (/home/horcrux/Software/thrift-0.3.0).

5) Add HBase thrift python module to pythonpath
We need to add gen-py folder to python path. You can do so by multiple ways
a) You can add it directly at the top of your python file
import sys
sys.path.append('/home/horcrux/Software/thrift-0.3.0/gen-py')
or
b) If you are using an IDE like pydev, add it as a pythonpath source folder.
or
c) add it to pythonpath environemnt variable in your .bashrc.
export PYTHONPATH=$PYTHONPATH:/home/horcrux/Software/thrift-0.3.0/gen-py

6) Start HBase thrift server
You can simply start the thrift server by executing the following command:
$HBASE_HOME/bin/hbase thrift start
This will start HBase thrift server on port 9090 (default port).

7) Use the client!
Here is a sample code that will print all the table names on your HBase server:

from thrift.transport.TSocket import TSocket
from thrift.transport.TTransport import TBufferedTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase

transport = TBufferedTransport(TSocket('localhost', 9090))
transport.open()
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
print(client.getTableNames())

That’s it.

Hbase提供了thrift的接口,但是怎么用它实现C++的程序往Hbase里面存储数据呢,这方面的例子都没找到

TOP

返回列表