Monday, October 15, 2012

Setting up HBase on Windows x64 VM



Yes, there is an “official” guide to HBase installation for Windows, but it seems to be written for older versions of HBase. Some steps are not necessary anymore, but on the other hand, there are some steps that weren’t mentioned, but are crucial (like the ZooKeeper stuff).
This tutorial will guide you through the HBase installation which is based on the Cygwin in a way that is similar to the official guide. I have tested this on Windows 7, 64bit.

Downloading Cygwin

  1. download cygwin setup.exe and run it
  2. choose an appropriate mirror
     I will assume that Cygwin will be installed into C:\Programs\CygwinDo not install Cygwin into a folder that contains a space character (C:\Program Files). If you do so, you will face many random and unexpected troubles.
  3. from packages, choose the following:
    • OpenSSH,
    • tcp_wrappers,
    • diffutils [this should be pre-selected],
    • zlib
  4. proceed with installation until it is finished.

Configuring Cygwin

  1. run CygWin Bash Shell with Administrator privileges (C:\cygwin\Cygwin.bat)
  2. from this Bash shell run ssh-host-config
    • say “yes” to privilege separation
    • say “yes” to create the sshd account
    • say “yes” to install sshd as a service
    • press to enter an empty value of CYGWIN for the daemon
    • Now Cygwin needs to create a new account that will be used as a “proxy”/setuid origin account. Say “no” to use the default name (cyg_server).
    • say “yes” to create a new privileged account cyg_server.
    • create a password for this new privileged account and confirm it
  3. synchronize Windows user accounts with Cygwin user accounts:
    mkpasswd -cl > /etc/passwd
    mkgroup --local > /etc/group
    
  4. start SSH server with net start sshd
  5. test connection with ssh localhost from Cygwin Bash Shell.
    • say “yes” to check and store server fingerprint
    • put your Windows account password to authenticate
    • issue a few test commands in the remote session
    • close session with exit.
  6. alternatively: test your SSHD with putty.

Configuring HBase

  1. I assume that you have Java JDK installed (if not, it’s time to do that now.) However, I assume that Java is installed into a file without spaces in the name. (Again, noC:\Program Files\Java.). If you have a previous Java installation with a space-using filename, reinstall it now.
  2. Download HBase from Apache Site. Unpack it into an appropriate folder. I assume this should be C:\java\hbase.
  3. Open ./conf/hbase-env.sh in HBase directory
    • uncomment and modify this line so it reads:
      export JAVA_HOME=/cygdrive/c/java/jdk7
      
    • uncomment and modify this line so it reads:
      export HBASE_CLASSPATH=/cygdrive/c/java/hbase/lib/zookeeper-3.4.3.jar 
      
  4. Copy ./src/main/resources/hbase-default.xml to ./conf
  5. Open ./conf/hbase-default.xml in HBase directory
    • Change hbase.rootdir to /tmp
       This will resolve into C:\tmp on Windows. We will create it later.
    • Change hbase.tmp.dir to C:/programs/cygwin/root/tmp/hbase/tmp
       This also assumes that Cygwin is installed intoC:\programs\cygwin.
    • If you have a computer that has no domain name, then determine your hostname: either by running hostname from shell or from System Properties | Computer Name tab. For example, my PC has hostname rn-PC.
    • Change hbase.zookeeper.quorum to rn-PC instead of localhost
       Windows 64-bit seems to have trouble resolving localhost to127.0.0.1.
    • Change hbase.defaults.for.version.skip to true instead of false
       This will disable weird version warnings. We are actually running HBase from “uncompiled” source tree, therefore some config files get unprocessed. Despite the fact that HBase is being built by Maven, it is heavily depending on Linux tools and building requires lots of hacking. Fortunately, it is not necessary.
  6. Create the appropriate directories. Execute this from Cygwin Bash Shell:
    mkdir -pv /root/tmp/hbase/data
    mkdir -pv /cygdrive/c/tmp
    
  7. Grant the appropriate rights
    chmod 777 /root/tmp/hbase/data
    chmod 777 /cygdrive/c/tmp
    

Running HBase

  1. Within Bash, change dir to
    cd /cygdrive/c/java/hbase
    
  2. Run
    ./bin/start-hbase.sh/
    
  3. Enter password twice and HBase should start. On the first run, you may be prompted for the SSH fingerprint mismatch — in that case, just confirm with “yes”. Ideally, the console should show:
    $ ./bin/start-hbase.sh
    rn@127.0.0.1's password:
    127.0.0.1: starting zookeeper, logging to /cygdrive/c/java/hbase/bin/../logs/hbase-rn-zookeeper-rn-PC.out
    starting master, logging to /cygdrive/c/java/hbase/bin/../logs/hbase-rn-master-rn-PC.out
    rn@localhost's password:
    localhost: starting regionserver, logging to /cygdrive/c/java/hbase/bin/../logs/hbase-rn-regionserver-rn-PC.out
    
  4. In case of failure, check the log files (see the C:\java\hbase\log).
  5. HBase can be stopped with
    ./bin/stop-hbase.sh.
    
    Note that you should wait for the stopping of the server (it may take a long time), otherwise you risk data corruption.

Using HBase

  1. Start Hbase server.
    ./bin/start-hbase.sh/
  2. Start Bash and start the HBase Shell:
    ./bin/hbase shell
    
  3. Create a simple table:
    create 'test', 'data'
    
  4. Verify that the table has been created
    list
    
  5. Insert some data:
    put 'test', 'row1', 'data:1', 'value1'
    
  6. List all rows in the table
    scan 'test'
    
  7. Optionally, drop table
    disable 'test'
    drop 'test'
    
  8. You can leave the HBase shell with exit.

Tuesday, October 2, 2012

Big Data Had Ooop!!!

If your a beginner and planning to learn more about big data and hadoop below are the few tutorials I recommend and I found interesting:

Books :
Hadoop: The Definitive Guide

Links:
++ Beginner Hadoop ++ 
http://www.cloudera.com/protected/?resource=introduction-to-apache-mapreduce-and-hdfs

http://nosqltapes.com/video/understanding-mapreduce-with-mike-miller



++ MapReduce Framework ++ 
Great 1 hour video introduction: http://nosqltapes.com/video/understanding-mapreduce-with-mike-miller

Read the famous 2004 paper from Google that kicked off the MapReduce revolution. This is a very readable paper that can be digested in about 2 - 3 hours:http://research.google.com/archive/mapreduce.html

Here's a 33 minute video on what kinds of simple things you can do with MapReduce: 
http://www.cloudera.com/videos/mapreduce_algorithms

Google's MapReduce course: 
http://code.google.com/edu/parallel/mapreduce-tutorial.html


These are few I have read, I will keep adding links to this as and when I get a good article that catches my wink.

Thursday, April 12, 2012

MACRUBY--- It's just ruby mac.. :)

Recently I read when I saw "MACRUBY" in hacker's shelf, I couldn't resist myself to pick one of these books.Have look about in the link below:
http://hackershelf.com/book/69/macruby/
  
When I started going through this book I felt , I was reading Nu Language details one more time... but it's different trust me. Don't loose me here.I love to code in ruby. I have used Ruby cocoa in the past. Mac Ruby is better in other words as u have the best of Ruby 1.9 + Objective C. Its the best for native OS problems.Learning MacRuby does not require you to rem Object C(Atleast I saw the last Object C code in year 2007!!! so i was like OBJECT C.. I don't think I rem anything)

Best things about MacRuby is it uses YARV(Yet Another Ruby VM) which is a byte code interpreter.. performance time is great in this VM and not MAtz ruby interpreter.Time difference is like 16 sec : 5 sec. 

MacRuby Threads implementation is way beyond ruby as they are native thread and unlike ruby which was impossible to call back from p-threads.This can be done pretty easily using MacRuby.So where is Object C popping in... :) When We look into the performacnec scale of the garbage collector of ruby it is veryyyyy slow, Mac Ruby uses Object C for the rescue.The new Objective-C garbage collector engine, due to its generational nature, performs fast collections. It also doesn't stop the world while collecting memory, because collections are done in a separate thread.In MacRuby, all Ruby classes and objects are actually Objective-C classes and objects. There is no need to create costly proxies, convert objects, and cache instances. A Ruby object can be cast (toll-free) at the C level as an Objective-C object. The Ruby VM can also handle incoming Objective-C objects without conversion.
In MacRuby, the primitive Ruby classes (e.g., String, Array, and Hash) have been re-implemented on top of their Cocoa equivalents (respectively, NSString, NSArray, and NSDictionary). As an example, all strings in MacRuby are Cocoa strings, so they can be passed directly to underlying C or Objective-C APIs. It is also possible to call any method of the String interface on any Cocoa string, subclass Objective-C methods, etc.Need to test with audio codecs and image processing implementations.
Will post my video aggregation in twitter using MACRuby 0.10 shortly in My GIT account, keep following it.
Mac Ruby is nothing but RUBY 1.9 with MACOSX Framework On Mac






and it ROCKS!!!!