Friday, 25 January 2013

This Blog is intended to give new users some guidelines to install Hadoop into their local machines. It provides detailed installation steps. All the steps are tested but still anyone can reach out to me if they need any assistance.

CDH3 Pseudo installation on Ubuntu (Single node) Apache Hadoop is an implementation of the MapReduce platform and distributed file system (HDFS) which is written in Java. Its can be considered as a software framework that supports data intensive distributed applications under a free license. In this blog I have tried putting all steps which will hep you to install Hadoop on your windows machine by installing a Virtual Machine and then using ubantu. Since Hadoop is written in Java, we will need JDK (version 1.6 or above) installed. Lets get started............ 0) Install VMware
a> Download VMware Workstation 8 --> https://my.vmware.com/web/vmware/info/slug/desktop_end_user_computing/vmware_workstation/8_0 b> Install VMware-workstation-full-8.0.0-471780 (Click Enter) c> Provide the Serial number. d> While installing it will ask for 32/64 bit file location (provide the path of ubuntu-10.04.3-desktop-amd64 or 32)
Fig 1: Screen which you see after VM is installed. 1) Create a user other than hadoop Eg: Master (or 'Your Name') Master pwd (123456) pwd (123456) (next page) Master (or 'Your Name')
Fig 2: Ubantu Screen 'Master' Simillarly create the Slave VM 2) Install Java a> Download Java jdk-6u30-linux-x64.bin , save it in your Ubantu Desktop b> Open a terminal (ctrl+alt+t) c> Go to Desktop and copy the file to "/usr/local" d> Extract the java file ( go to /usr/local, you can see the .bin file thr): ./jdk-6u30-linux-x64.bin A new file will generate "jdk1.6.0_3/"
Fig 3: Java Installed 3) Install CDH3 package Go to : https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation Click on - Installing CDH3 on Ubuntu and Debian Systems Click on - this link for a Maverick system - on CDH3 installation page Install using GDebi package installer or issue the command below You will see "cdh3-repository_1.0_all.deb" gets downloaded (keep that in Download folder) Execute below commands(this is mentioned in Cloudera site) $ sudo dpkg -i Downloads/cdh3-repository_1.0_all.deb $ sudo apt-get update 4) Install Hadoop $ apt-cache search hadoop $ sudo apt-get install hadoop-0.20 hadoop-0.20-native sudo apt-get install hadoop-0.20-<daemon type> install all Daemons sudo apt-get install hadoop-0.20-namenode sudo apt-get install hadoop-0.20-datanode sudo apt-get install hadoop-0.20-secondarynamenode sudo apt-get install hadoop-0.20-jobtracker sudo apt-get install hadoop-0.20-tasktracker 5) Set Java and Hadoop Home Using command: gedit ~/.bashrc # Set Hadoop-related environment variables export HADOOP_HOME=/usr/lib/hadoop export PATH=$PATH:/usr/lib/hadoop/bin # Set JAVA_HOME export JAVA_HOME=/usr/local/jdk1.6.0_30 export PATH=$PATH:/usr/local/jdk1.6.0_30/bin close terminals and open new one and test JAVA HOME and HADOOP HOME 6) Configuration Set Java Home in ./conf/hadoop-env.sh $ sudo gedit hadoop-env.sh export JAVA_HOME=/usr/local/jdk1.6.0_30 7) test hadoop version and java version hadoop version java -version
Fig 4: Verify Java and Hadoop versions. 8) Adding dedicated users to hadoop group $ sudo gpasswd -a hdfs hadoop $ sudo gpasswd -a mapred hadoop In step 8, 9 and 10 we will configure using 3 files core-site.xml, hdfs-site.xml and mapred-site.xml, which are under ./conf 9) core-site.xml Add below script to core-site.xml. Core-site.xml contains configuration information that overrides the default values for core Hadoop properties. <property> <name>hadoop.tmp.dir</name> <value>/usr/lib/hadoop/tmp</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property> $ sudo mkdir /usr/lib/hadoop/tmp $ sudo chmod 750 tmp/ $ sudo chown hdfs:hadoop tmp/ 10) hdfs-site.xml Add below script to hdfs-site.xml. Here we specify the permission, storage and replication factor. <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.name.dir</name> <value>/storage/name</value> </property> <property> <name>dfs.data.dir</name> <value>/storage/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> $ sudo mkdir /storage $ sudo chmod 775 /storage/ $ chown hdfs:hadoop /storage/ 11) mapred-site.xml Add below script to mapred-site.xml. specifies MapReduce formulas and parameters. <property> <name>mapred.job.tracker</name> <value>hdfs://localhost:8021</value> </property> <property> <name>mapred.system.dir</name> <value>/home/your user name here/mapred/system</value> </property> <property> <name>mapred.local.dir</name> <value>/home/ your user name here /mapred/local</value> </property> <property> <name>mapred.temp.dir</name> <value>/home/ your user name here /mapred/temp</value> </property> $ sudo mkdir /home/ your user name here /mapred $ sudo chmod 775 /home/ your user name here /mapred $ sudo chown mapred:hadoop /home/ your user name here /mapred 12) User Assignment export HADOOP_NAMENODE_USER=hdfs export HADOOP_SECONDARYNAMENODE_USER=hdfs export HADOOP_DATANODE_USER=hdfs export HADOOP_JOBTACKER_USER=mapred export HADOOP_TASKTRACKER_USER=mapred 13) Format namenode Go to below directory and format $ cd /usr/lib/hadoop/bin/ $ sudo -u hdfs hadoop namenode -format 14) Start Daemons $ sudo /etc/init.d/hadoop-0.20-namenode start $ sudo /etc/init.d/hadoop-0.20-secondarynamenode start $ sudo /etc/init.d/hadoop-0.20-jobtracker start $ sudo /etc/init.d/hadoop-0.20-datanode start $ sudo /etc/init.d/hadoop-0.20-tasktracker start Check for any errors in /var/log/hadoop-0.20 for each daemon check all ports are opened using $netstat -ptlen 15) Check UI localhost:50070 -> Hadoop Admin localhost:50030 -> Mapreduce
Fig 5 :Hadoop Admin


Fig 6 :MapReducer WELCOME TO THE WORLD OF BIG DATA..............

Note: The contents of this blog are simply for learning purpose. This blog was created keeping beginners in mind. For more information please visit official Cloudera site http://www.cloudera.com

36 comments:

  1. This blog is very useful in guiding the installation of hadoop.
    Could you please provide steps for installing Hive too?

    ReplyDelete
  2. Great piece of work. Hope this will inspire many to play with Hadoop in the days to unfold

    ReplyDelete
  3. Valuable information and excellent design you got here! I would like to thank you for sharing your thoughts and time into the stuff you post!!

    Hadoop online training

    ReplyDelete
  4. How do I get Hadoop in the first place? I am lost after Fig 1. I have the VM machine. What are the next steps? How do I get the purple screen?

    ReplyDelete
  5. Visit www.techlearnersacademy.com and follow our blog to install Hadoop 2.0 YARN in step-by-step manner with snapshots!!

    ReplyDelete
  6. I get a lot of great information from this blog. Thank you for your sharing this informative blog. Just now I have completed hadoop certification course at a leading academy. If you are interested to learn Hadoop Training Chennai visit FITA IT training and placement academy.

    ReplyDelete
  7. The Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop Administrator Online Training for Installing Haddop on their Own
    Thank you for sharing Such a good tutorials on Hadoop

    ReplyDelete
  8. Great thoughts you got there, believe I may possibly try just some of it throughout my daily life.


    Office Interiors Chennai

    ReplyDelete
  9. Great blog.. This blog really helpful to everyone and useful to cracking the interviews.. thank you for sharing

    hadoop training and placements | big data training and placements | hadoop training course contents

    ReplyDelete
  10. Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.
    Data Science Training in Chennai
    Data science training in bangalore
    Data science online training
    Data science training in pune
    Data science training in kalyan nagar

    ReplyDelete
  11. It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command

    java training in chennai | java training in bangalore

    java online training | java training in pune

    selenium training in chennai

    selenium training in bangalore

    ReplyDelete
  12. It seems you are so busy in last month. The detail you shared about your work and it is really impressive that's why i am waiting for your post because i get the new ideas over here and you really write so well.
    Python training in marathahalli
    Python training in pune
    Python course in chennai

    ReplyDelete
  13. Thanks for your informative article, Your post helped me to understand the future and career prospects & Keep on updating your blog with such awesome article.
    DevOps online Training | Online DevOps Certification Course - Gangboard
    Best Devops Training institute in Chennai

    ReplyDelete
  14. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
    iosh course in chennai

    ReplyDelete
  15. Hello, I read your blog occasionally, and I own a similar one, and I was just wondering if you get a lot of spam remarks? If so how do you stop it, any plugin or anything you can advise? I get so much lately it’s driving me insane, so any assistance is very much appreciated.
    apple service center chennai | ipod service center in chennai | Apple laptop service center in chennai | apple iphone service center in chennai | apple iphone service center in chennai

    ReplyDelete
  16. This comment has been removed by the author.

    ReplyDelete
  17. Thanks for sharing such a knowledgable post regarding Artificial Intelligence. Was look for this info from a while. Looking forward to see more of such informative posts..
    Artificial Intelligence Training In Hyderabad

    ReplyDelete
  18. Great Blog! The concept has been explained very well. Thanks for sharing nice information
    Machine Learning Training in Hyderabad

    ReplyDelete
  19. You have provided a nice article, Thank you very much for this one and i hope this will be useful for many people. Salesforce Training India 

    ReplyDelete
  20. This comment has been removed by the author.

    ReplyDelete
  21. This comment has been removed by the author.

    ReplyDelete
  22. This comment has been removed by the author.

    ReplyDelete
  23. This comment has been removed by the author.

    ReplyDelete
  24. This comment has been removed by the author.

    ReplyDelete
  25. This comment has been removed by the author.

    ReplyDelete
  26. I have been searching for a useful post like this on salesforce course details, it is highly helpful for me and I have a great experience with this Salesforce Training who are providing certification and job assistance.
    Salesforce training Hyderabad  

    ReplyDelete
  27. That is a good tip particularly to those fresh to the biosphere. Short but very accurate info… Many thanks for sharing this one. A must read post!

    Data Science Training in Hyderabad

    ReplyDelete

  28. Generally excellent review. I absolutely love this site. Much appreciated!

    best interiors

    ReplyDelete
  29. bet365 bet365 betway betway gioco digitale gioco digitale ボンズ カジノ ボンズ カジノ 카지노 카지노 1xbet korean 1xbet korean 카지노 카지노 카지노 카지노 온카지노 온카지노 카지노 카지노 470

    ReplyDelete
  30. Good to visit your weblog again, it has been months for me. Nicely this article that i've been waiting for so long. I will need this post to total my assignment in the college, and it has exact same topic together with your write-up. Thanks, good share.
    data scientist course in hyderabad

    ReplyDelete