CDH3 Pseudo installation on Ubuntu (Single node)
Apache Hadoop is an implementation of the MapReduce platform and distributed file system (HDFS) which is written in Java. Its can be considered as a software framework that supports data intensive distributed applications under a free license. In this blog I have tried putting all steps which will hep you to install Hadoop on your windows machine by installing a Virtual Machine and then using ubantu. Since Hadoop is written in Java, we will need JDK (version 1.6 or above) installed.
Lets get started............
0) Install VMware
a> Download VMware Workstation 8 --> https://my.vmware.com/web/vmware/info/slug/desktop_end_user_computing/vmware_workstation/8_0 b> Install VMware-workstation-full-8.0.0-471780 (Click Enter) c> Provide the Serial number. d> While installing it will ask for 32/64 bit file location (provide the path of ubuntu-10.04.3-desktop-amd64 or 32)
Fig 1: Screen which you see after VM is installed. 1) Create a user other than hadoop Eg: Master (or 'Your Name') Master pwd (123456) pwd (123456) (next page) Master (or 'Your Name')
Fig 2: Ubantu Screen 'Master' Simillarly create the Slave VM 2) Install Java a> Download Java jdk-6u30-linux-x64.bin , save it in your Ubantu Desktop b> Open a terminal (ctrl+alt+t) c> Go to Desktop and copy the file to "/usr/local" d> Extract the java file ( go to /usr/local, you can see the .bin file thr): ./jdk-6u30-linux-x64.bin A new file will generate "jdk1.6.0_3/"
Fig 3: Java Installed 3) Install CDH3 package Go to : https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation Click on - Installing CDH3 on Ubuntu and Debian Systems Click on - this link for a Maverick system - on CDH3 installation page Install using GDebi package installer or issue the command below You will see "cdh3-repository_1.0_all.deb" gets downloaded (keep that in Download folder) Execute below commands(this is mentioned in Cloudera site) $ sudo dpkg -i Downloads/cdh3-repository_1.0_all.deb $ sudo apt-get update 4) Install Hadoop $ apt-cache search hadoop $ sudo apt-get install hadoop-0.20 hadoop-0.20-native sudo apt-get install hadoop-0.20-<daemon type> install all Daemons sudo apt-get install hadoop-0.20-namenode sudo apt-get install hadoop-0.20-datanode sudo apt-get install hadoop-0.20-secondarynamenode sudo apt-get install hadoop-0.20-jobtracker sudo apt-get install hadoop-0.20-tasktracker 5) Set Java and Hadoop Home Using command: gedit ~/.bashrc # Set Hadoop-related environment variables export HADOOP_HOME=/usr/lib/hadoop export PATH=$PATH:/usr/lib/hadoop/bin # Set JAVA_HOME export JAVA_HOME=/usr/local/jdk1.6.0_30 export PATH=$PATH:/usr/local/jdk1.6.0_30/bin close terminals and open new one and test JAVA HOME and HADOOP HOME 6) Configuration Set Java Home in ./conf/hadoop-env.sh $ sudo gedit hadoop-env.sh export JAVA_HOME=/usr/local/jdk1.6.0_30 7) test hadoop version and java version hadoop version java -version
Fig 4: Verify Java and Hadoop versions. 8) Adding dedicated users to hadoop group $ sudo gpasswd -a hdfs hadoop $ sudo gpasswd -a mapred hadoop In step 8, 9 and 10 we will configure using 3 files core-site.xml, hdfs-site.xml and mapred-site.xml, which are under ./conf 9) core-site.xml Add below script to core-site.xml. Core-site.xml contains configuration information that overrides the default values for core Hadoop properties. <property> <name>hadoop.tmp.dir</name> <value>/usr/lib/hadoop/tmp</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property> $ sudo mkdir /usr/lib/hadoop/tmp $ sudo chmod 750 tmp/ $ sudo chown hdfs:hadoop tmp/ 10) hdfs-site.xml Add below script to hdfs-site.xml. Here we specify the permission, storage and replication factor. <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.name.dir</name> <value>/storage/name</value> </property> <property> <name>dfs.data.dir</name> <value>/storage/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> $ sudo mkdir /storage $ sudo chmod 775 /storage/ $ chown hdfs:hadoop /storage/ 11) mapred-site.xml Add below script to mapred-site.xml. specifies MapReduce formulas and parameters. <property> <name>mapred.job.tracker</name> <value>hdfs://localhost:8021</value> </property> <property> <name>mapred.system.dir</name> <value>/home/your user name here/mapred/system</value> </property> <property> <name>mapred.local.dir</name> <value>/home/ your user name here /mapred/local</value> </property> <property> <name>mapred.temp.dir</name> <value>/home/ your user name here /mapred/temp</value> </property> $ sudo mkdir /home/ your user name here /mapred $ sudo chmod 775 /home/ your user name here /mapred $ sudo chown mapred:hadoop /home/ your user name here /mapred 12) User Assignment export HADOOP_NAMENODE_USER=hdfs export HADOOP_SECONDARYNAMENODE_USER=hdfs export HADOOP_DATANODE_USER=hdfs export HADOOP_JOBTACKER_USER=mapred export HADOOP_TASKTRACKER_USER=mapred 13) Format namenode Go to below directory and format $ cd /usr/lib/hadoop/bin/ $ sudo -u hdfs hadoop namenode -format 14) Start Daemons $ sudo /etc/init.d/hadoop-0.20-namenode start $ sudo /etc/init.d/hadoop-0.20-secondarynamenode start $ sudo /etc/init.d/hadoop-0.20-jobtracker start $ sudo /etc/init.d/hadoop-0.20-datanode start $ sudo /etc/init.d/hadoop-0.20-tasktracker start Check for any errors in /var/log/hadoop-0.20 for each daemon check all ports are opened using $netstat -ptlen 15) Check UI localhost:50070 -> Hadoop Admin localhost:50030 -> Mapreduce
Fig 5 :Hadoop Admin
Fig 6 :MapReducer WELCOME TO THE WORLD OF BIG DATA..............
Note: The contents of this blog are simply for learning purpose. This blog was created keeping beginners in mind. For more information please visit official Cloudera site http://www.cloudera.com
a> Download VMware Workstation 8 --> https://my.vmware.com/web/vmware/info/slug/desktop_end_user_computing/vmware_workstation/8_0 b> Install VMware-workstation-full-8.0.0-471780 (Click Enter) c> Provide the Serial number. d> While installing it will ask for 32/64 bit file location (provide the path of ubuntu-10.04.3-desktop-amd64 or 32)
Fig 1: Screen which you see after VM is installed. 1) Create a user other than hadoop Eg: Master (or 'Your Name') Master pwd (123456) pwd (123456) (next page) Master (or 'Your Name')
Fig 2: Ubantu Screen 'Master' Simillarly create the Slave VM 2) Install Java a> Download Java jdk-6u30-linux-x64.bin , save it in your Ubantu Desktop b> Open a terminal (ctrl+alt+t) c> Go to Desktop and copy the file to "/usr/local" d> Extract the java file ( go to /usr/local, you can see the .bin file thr): ./jdk-6u30-linux-x64.bin A new file will generate "jdk1.6.0_3/"
Fig 3: Java Installed 3) Install CDH3 package Go to : https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation Click on - Installing CDH3 on Ubuntu and Debian Systems Click on - this link for a Maverick system - on CDH3 installation page Install using GDebi package installer or issue the command below You will see "cdh3-repository_1.0_all.deb" gets downloaded (keep that in Download folder) Execute below commands(this is mentioned in Cloudera site) $ sudo dpkg -i Downloads/cdh3-repository_1.0_all.deb $ sudo apt-get update 4) Install Hadoop $ apt-cache search hadoop $ sudo apt-get install hadoop-0.20 hadoop-0.20-native sudo apt-get install hadoop-0.20-<daemon type> install all Daemons sudo apt-get install hadoop-0.20-namenode sudo apt-get install hadoop-0.20-datanode sudo apt-get install hadoop-0.20-secondarynamenode sudo apt-get install hadoop-0.20-jobtracker sudo apt-get install hadoop-0.20-tasktracker 5) Set Java and Hadoop Home Using command: gedit ~/.bashrc # Set Hadoop-related environment variables export HADOOP_HOME=/usr/lib/hadoop export PATH=$PATH:/usr/lib/hadoop/bin # Set JAVA_HOME export JAVA_HOME=/usr/local/jdk1.6.0_30 export PATH=$PATH:/usr/local/jdk1.6.0_30/bin close terminals and open new one and test JAVA HOME and HADOOP HOME 6) Configuration Set Java Home in ./conf/hadoop-env.sh $ sudo gedit hadoop-env.sh export JAVA_HOME=/usr/local/jdk1.6.0_30 7) test hadoop version and java version hadoop version java -version
Fig 4: Verify Java and Hadoop versions. 8) Adding dedicated users to hadoop group $ sudo gpasswd -a hdfs hadoop $ sudo gpasswd -a mapred hadoop In step 8, 9 and 10 we will configure using 3 files core-site.xml, hdfs-site.xml and mapred-site.xml, which are under ./conf 9) core-site.xml Add below script to core-site.xml. Core-site.xml contains configuration information that overrides the default values for core Hadoop properties. <property> <name>hadoop.tmp.dir</name> <value>/usr/lib/hadoop/tmp</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property> $ sudo mkdir /usr/lib/hadoop/tmp $ sudo chmod 750 tmp/ $ sudo chown hdfs:hadoop tmp/ 10) hdfs-site.xml Add below script to hdfs-site.xml. Here we specify the permission, storage and replication factor. <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.name.dir</name> <value>/storage/name</value> </property> <property> <name>dfs.data.dir</name> <value>/storage/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> $ sudo mkdir /storage $ sudo chmod 775 /storage/ $ chown hdfs:hadoop /storage/ 11) mapred-site.xml Add below script to mapred-site.xml. specifies MapReduce formulas and parameters. <property> <name>mapred.job.tracker</name> <value>hdfs://localhost:8021</value> </property> <property> <name>mapred.system.dir</name> <value>/home/your user name here/mapred/system</value> </property> <property> <name>mapred.local.dir</name> <value>/home/ your user name here /mapred/local</value> </property> <property> <name>mapred.temp.dir</name> <value>/home/ your user name here /mapred/temp</value> </property> $ sudo mkdir /home/ your user name here /mapred $ sudo chmod 775 /home/ your user name here /mapred $ sudo chown mapred:hadoop /home/ your user name here /mapred 12) User Assignment export HADOOP_NAMENODE_USER=hdfs export HADOOP_SECONDARYNAMENODE_USER=hdfs export HADOOP_DATANODE_USER=hdfs export HADOOP_JOBTACKER_USER=mapred export HADOOP_TASKTRACKER_USER=mapred 13) Format namenode Go to below directory and format $ cd /usr/lib/hadoop/bin/ $ sudo -u hdfs hadoop namenode -format 14) Start Daemons $ sudo /etc/init.d/hadoop-0.20-namenode start $ sudo /etc/init.d/hadoop-0.20-secondarynamenode start $ sudo /etc/init.d/hadoop-0.20-jobtracker start $ sudo /etc/init.d/hadoop-0.20-datanode start $ sudo /etc/init.d/hadoop-0.20-tasktracker start Check for any errors in /var/log/hadoop-0.20 for each daemon check all ports are opened using $netstat -ptlen 15) Check UI localhost:50070 -> Hadoop Admin localhost:50030 -> Mapreduce
Fig 5 :Hadoop Admin
Fig 6 :MapReducer WELCOME TO THE WORLD OF BIG DATA..............
Note: The contents of this blog are simply for learning purpose. This blog was created keeping beginners in mind. For more information please visit official Cloudera site http://www.cloudera.com
This blog is very useful in guiding the installation of hadoop.
ReplyDeleteCould you please provide steps for installing Hive too?
Great piece of work. Hope this will inspire many to play with Hadoop in the days to unfold
ReplyDeleteValuable information and excellent design you got here! I would like to thank you for sharing your thoughts and time into the stuff you post!!
ReplyDeleteHadoop online training
How do I get Hadoop in the first place? I am lost after Fig 1. I have the VM machine. What are the next steps? How do I get the purple screen?
ReplyDeleteVisit www.techlearnersacademy.com and follow our blog to install Hadoop 2.0 YARN in step-by-step manner with snapshots!!
ReplyDeleteI get a lot of great information from this blog. Thank you for your sharing this informative blog. Just now I have completed hadoop certification course at a leading academy. If you are interested to learn Hadoop Training Chennai visit FITA IT training and placement academy.
ReplyDeleteThe Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop Administrator Online Training for Installing Haddop on their Own
ReplyDeleteThank you for sharing Such a good tutorials on Hadoop
Great thoughts you got there, believe I may possibly try just some of it throughout my daily life.
ReplyDeleteOffice Interiors Chennai
Great blog.. This blog really helpful to everyone and useful to cracking the interviews.. thank you for sharing
ReplyDeletehadoop training and placements | big data training and placements | hadoop training course contents
Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.
ReplyDeleteData Science Training in Chennai
Data science training in bangalore
Data science online training
Data science training in pune
Data science training in kalyan nagar
It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command
ReplyDeletejava training in chennai | java training in bangalore
java online training | java training in pune
selenium training in chennai
selenium training in bangalore
It seems you are so busy in last month. The detail you shared about your work and it is really impressive that's why i am waiting for your post because i get the new ideas over here and you really write so well.
ReplyDeletePython training in marathahalli
Python training in pune
Python course in chennai
Thanks for your informative article, Your post helped me to understand the future and career prospects & Keep on updating your blog with such awesome article.
ReplyDeleteDevOps online Training | Online DevOps Certification Course - Gangboard
Best Devops Training institute in Chennai
Really very nice blog information for this one and more technical skills are improve,i like that kind of post.
ReplyDeleteangularjs-Training in velachery
angularjs Training in bangalore
angularjs Training in bangalore
angularjs Training in btm
angularjs Training in electronic-city
angularjs online Training
I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
ReplyDeleteiosh course in chennai
iphone service center chennai | ipad service center chennai | imac service center chennai | apple iphone service center | iphone service center
ReplyDeleteHello, I read your blog occasionally, and I own a similar one, and I was just wondering if you get a lot of spam remarks? If so how do you stop it, any plugin or anything you can advise? I get so much lately it’s driving me insane, so any assistance is very much appreciated.
ReplyDeleteapple service center chennai | ipod service center in chennai | Apple laptop service center in chennai | apple iphone service center in chennai | apple iphone service center in chennai
nice blog
ReplyDeleteVoice of samanian | Giftbox | Tamilnews today | netcab | tamil news online | Naam tamilar katchi | tamil nadu politics update | Ammk | politics speech tamil | sivaavishnusvs | breaking news
This comment has been removed by the author.
ReplyDeleteThanks for sharing such a knowledgable post regarding Artificial Intelligence. Was look for this info from a while. Looking forward to see more of such informative posts..
ReplyDeleteArtificial Intelligence Training In Hyderabad
Great Blog! The concept has been explained very well. Thanks for sharing nice information
ReplyDeleteMachine Learning Training in Hyderabad
Great Post. the post is really clarifying the queries for the The learners. every concept of this blog is really admired.
ReplyDeleteData Science Training Course In Chennai | Data Science Training Course In Anna Nagar | Data Science Training Course In OMR | Data Science Training Course In Porur | Data Science Training Course In Tambaram | Data Science Training Course In Velachery
You have provided a nice article, Thank you very much for this one and i hope this will be useful for many people. Salesforce Training India
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteI have been searching for a useful post like this on salesforce course details, it is highly helpful for me and I have a great experience with this Salesforce Training who are providing certification and job assistance.
ReplyDeleteSalesforce training Hyderabad
Excellent and very cool idea and the subject at the top of magnificence and I am happy to this post..Interesting post! Thanks!!
ReplyDeleteAndroid Training in Chennai
Android Online Training in Chennai
Android Training in Bangalore
Android Training in Hyderabad
Android Training in Coimbatore
Android Training
Android Online Training
Hi, Thanks for sharing wonderful stuff...
ReplyDeleteDevOps Training in Hyderabad
That is a good tip particularly to those fresh to the biosphere. Short but very accurate info… Many thanks for sharing this one. A must read post!
ReplyDeleteData Science Training in Hyderabad
ReplyDeleteGenerally excellent review. I absolutely love this site. Much appreciated!
best interiors
bet365 bet365 betway betway gioco digitale gioco digitale ボンズ カジノ ボンズ カジノ 카지노 카지노 1xbet korean 1xbet korean 카지노 카지노 카지노 카지노 온카지노 온카지노 카지노 카지노 470
ReplyDeleteGood to visit your weblog again, it has been months for me. Nicely this article that i've been waiting for so long. I will need this post to total my assignment in the college, and it has exact same topic together with your write-up. Thanks, good share.
ReplyDeletedata scientist course in hyderabad