Friday, March 8, 2013

Hoop : Hadoop HDFS over HTTP/s

Hoop provides access to all Hadoop Distributed File System (HDFS) operations (read and write) over HTTP/S

Hoop is a server that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). It can be used to transfer data between clusters running different versions of Hadoop. The Hoop server acts as a gateway and is the only system that is allowed to cross the firewall into the cluster so can be used for access data from HDFS.
Hoop can be used to access data in HDFS using HTTP utilities (such as curl) and HTTP libraries Perl from other languages than Java. It also provides a Hadoop HDFS FileSystem implementation to allow that enables access to HDFS over HTTP using the hadoop command line tool as well as the Hadoop FileSystem Java API.

So we can easily integrate the HDFS using Hoop with our normal java application using Apache tomcat or any other application server. It is very feasible way to store large amount/variety of data in short big data over the cluster file system(HDFS).

Hoop has hoop server and client components,
  • The Hoop server component is a REST HTTP gateway to HDFS supporting all file system operations. It can be accessed using standard HTTP tools, HTTP libraries from different programing languages (i.e. Perl, JavaScript) as well as using the Hoop client. 
  • The Hoop client component is an implementation of Hadoop FileSystem(HDFS) client that allows using the familiar Hadoop filesystem API to access HDFS data through a Hoop server. 

Accessing HDFS via Hoop using linux command

To get access to home directory:

$ curl -i "http://my_server:14333?op=homedir&user.name=ubuntu"

To reading a file

$ curl -i "http://my_server:14333?/user/ubuntu/test.txt&user.name=ubuntu"

To writing a file

$ curl -i -X POST "http://my_server:14333/user/ubuntu/test.txt?op=create" 
     --data-binary @data.txt --header "content-type: application/octet-stream" 
 

You can find hoop source code and documentation and setup here 

Hoop is distributed with an Apache License 2.0.
The source code is available at http://github.com/cloudera/hoop.
Instructions on how to build, install and configure Hoop server and the rest of documentation is available at http://cloudera.github.com/hoop.

No comments:

Post a Comment

Followers