摘要:和,容器中的這三個文件不存在于鏡像,而是存在于,在啟動容器的時候,通過的形式將這些文件掛載到容器內(nèi)部。
基于docker1.7.03.1單機(jī)上部署hadoop2.7.3分布式集群
[TOC]
聲明文章均為本人技術(shù)筆記,轉(zhuǎn)載請注明出處:
[1] https://segmentfault.com/u/yzwall
[2] blog.csdn.net/j_dark/
PC:ubuntu 16.04.1 LTS
Docker version:17.03.1-ce OS/Arch:linux/amd64
Hadoop version:hadoop-2.7.3
1 docker中配置構(gòu)建hadoop鏡像 1.1 創(chuàng)建docker容器container創(chuàng)建基于ubuntu鏡像的容器container,官方默認(rèn)下載ubuntu最新精簡版鏡像;
sudo docker run -ti container ubuntu
修改默認(rèn)源文件/etc/apt/source.list,用國內(nèi)源代替官方源;
1.3 安裝java8# docker鏡像為了精簡容量,刪除了許多ubuntu自帶組件,通過`apt-get update`更新獲得 apt-get update apt-get install software-properties-common python-software-properties # add-apt-repository apt-get install software-properties-commonapt-get install software-properties-common # add-apt-repository add-apt-repository ppa:webupd8team/java apt-get update apt-get install oracle-java8-installer java -version1.4 docker中安裝hadoop-2.7.3 1.4.1 下載hadoop-2.7.3源碼
# 創(chuàng)建多級目錄 mkdir -p /software/apache/hadoop cd /software/apache/hadoop # 下載并解壓hadoop wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz tar xvzf hadoop-2.7.3.tar.gz1.4.2 配置環(huán)境變量
修改~/.bashrc文件。在文件末尾加入下面配置信息:
export JAVA_HOME=/usr/lib/jvm/java-8-oracle export HADOOP_HOME=/software/apache/hadoop/hadoop-2.7.3 export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin
source ~/.bashrc使環(huán)境變量配置生效;
注意:完成./bashrc文件配置后,hadoop-env.sh無需再配置;
配置hadoop主要配置core-site.xml、hdfs-site.xml、mapred-site.xml, yarn-site.xml三個文件;
在$HADOOP_HOME下創(chuàng)建namenode, datanode和tmp目錄
cd $HADOOP_HOME mkdir tmp mkdir namenode mkdir datanode1.5.1 配置core.site.xml
配置項hadoop.tmp.dir指向tmp目錄
配置項fs.default.name指向master節(jié)點,配置為hdfs://master:9000
1.5.2 配置hdfs-site.xmlhadoop.tmp.dir /software/apache/hadoop/hadoop-2.7.3/tmp A base for other temporary directories. io.file.buffer.size 131072 fs.default.name hdfs://master:9000 true The name of the default file system.
dfs.replication表示節(jié)點數(shù)目,配置集群1個namenode,3個datanode,設(shè)置備份數(shù)為4;
dfs.namenode.name.dir和dfs.datanode.data.dir分別配置為之前創(chuàng)建的NameNode和DataNode的目錄路徑
1.5.3 配置mapred-site.xmldfs.namenode.secondary.http-address master:9001 dfs.replication 3 true Default block replication. dfs.namenode.name.dir /software/apache/hadoop/hadoop-2.7.3/namenode true dfs.datanode.data.dir /software/apache/hadoop/hadoop-2.7.3/datanode true dfs.webhdfs.enabled true
在$HADOOP_HOME下使用cp命令創(chuàng)建mapred-site.xml
cd $HADOOP_HOME cp mapred-site.xml.template mapred-site.xml
配置mapred-site.xml,配置項mapred.job.tracker指向master節(jié)點;
在hadoop 2.x.x中,用戶無需配置mapred.job.tracker,因為JobTracker已經(jīng)不存在,功能由組件MRAppMaster實現(xiàn),因此需要用mapreduce.framework.name指定運行框架名稱,指定yarn
——《Hadoop技術(shù)內(nèi)幕:深入解析YARN架構(gòu)設(shè)計與實現(xiàn)原理》
1.5.4 配置yarn-site.xmlmapreduce.framework.name yarn mapreduce.jobhistory.address master:10020 mapreduce.jobhistory.address master:19888
1.5.5 安裝vim,ifconfig與pingyarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.address master:8032 yarn.resourcemanager.scheduler.address master:8030 yarn.resourcemanager.resource-tracker.address master:8031 yarn.resourcemanager.admin.address master:8033 yarn.resourcemanager.webapp.address master:8088
安裝ifconfig與ping命令所需軟件包
apt-get update apt-get install vim apt-get install net-tools # for ifconfig apt-get install inetutils-ping # for ping1.5.6 構(gòu)建hadoop基礎(chǔ)鏡像
假設(shè)當(dāng)前容器名為container,保存基礎(chǔ)鏡像為ubuntu:hadoop,后續(xù)hadoop集群容器都根據(jù)該鏡像創(chuàng)建啟動,無需重復(fù)配置;
sudo docker commit -m "hadoop installed" container ubuntu:hadoop /bin/bash
分別根據(jù)基礎(chǔ)鏡像ubuntu:hadoop創(chuàng)建mater容器和slave1~3容器,各自主機(jī)名與容器名一致;
創(chuàng)建master:docker run -ti -h master --name master ubuntu:hadoop /bin/bash
創(chuàng)建slave1:docker run -ti -h slave1 --name slave1 ubuntu:hadoop /bin/bash
創(chuàng)建slave2:docker run -ti -h slave2 --name slave2 ubuntu:hadoop /bin/bash
創(chuàng)建slave3:docker run -ti -h slave3 --name slave3 ubuntu:hadoop /bin/bash
在各容器的/etc/hosts中添加以下內(nèi)容,各容器ip地址通過ifconfig查看:
master 172.17.0.2 slave1 172.17.0.3 slave2 172.17.0.4 slave3 172.17.0.5
注意:docker容器重啟后,hosts內(nèi)容可能會失效,經(jīng)驗不足暫時只能避免容器頻繁重啟,否則得手動再次配置hosts文件;
參考http://dockone.io/question/400
2.3 集群節(jié)點SSH配置 2.3.1 所有節(jié)點:安裝ssh1./etc/hosts, /etc/resolv.conf和/etc/hostname,容器中的這三個文件不存在于鏡像,而是存在于/var/lib/docker/containers/
,在啟動容器的時候,通過mount的形式將這些文件掛載到容器內(nèi)部。因此,如果在容器中修改這些文件的話,修改部分不會存在于容器的top layer,而是直接寫入這三個物理文件中。
2.為什么重啟后修改內(nèi)容不存在?原因是:每次Docker在啟動容器的時候,通過重新構(gòu)建新的/etc/hosts文件,這又是為什么呢?原因是:容器重啟,IP地址為改變,hosts文件中原來的IP地址無效,因此理應(yīng)修改hosts文件,否則會產(chǎn)生臟數(shù)據(jù)。?原因是:每次Docker在啟動容器的時候,通過重新構(gòu)建新的/etc/hosts文件,這又是為什么呢?原因是:容器重啟,IP地址為改變,hosts文件中原來的IP地址無效,因此理應(yīng)修改hosts文件,否則會產(chǎn)生臟數(shù)據(jù)。1./etc/hosts, /etc/resolv.conf和/etc/hostname,容器中的這三個文件不存在于鏡像,而是存在于/var/lib/docker/containers/,在啟動容器的時候,通過mount的形式將這些文件掛載到容器內(nèi)部。因此,如果在容器中修改這些文件的話,修改部分不會存在于容器的top layer,而是直接寫入這三個物理文件中。
apt-get update apt-get install ssh apt-get install openssh-server2.3.2 所有節(jié)點:生成隨機(jī)密鑰
# 生成無密碼密鑰,生成密鑰位于~/.ssh下 ssh-keygen -t rsa -P ""2.3.3 master節(jié)點:生成證書文件authorized_keys
將生成的公鑰寫入authorized_keys中
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2.3.4 所有節(jié)點:修改sshd_config文件
通過修改sshd_config文件,保證ssh可遠(yuǎn)程登陸其他節(jié)點的root用戶
vim /etc/ssh/sshd_config # 將PermitRootLogin prohibit-password修改為PermitRootLogin yes # 重啟ssh服務(wù) service ssh restart2.3.5 master節(jié)點:通過scp傳輸證書到slave節(jié)點
傳輸master節(jié)點上的authorized_keys到其他slave節(jié)點~/.ssh下,覆蓋同名文件;保證所有節(jié)點的證書一致,因此可以實現(xiàn)任意節(jié)點間可以通過ssh訪問;
cd ~/.ssh scp authorized_keys root@slave1:~/.ssh/ scp authorized_keys root@slave2:~/.ssh/ scp authorized_keys root@slave3:~/.ssh/2.3.6 slave節(jié)點:修改證書權(quán)限確保生效
chmod 600 ~/.ssh/authorized_keys注意
查看ssh服務(wù)是否開啟:ps -e | grep ssh
開啟ssh服務(wù):service ssh start
重啟ssh服務(wù):service ssh restart
完成2.3.1操作后,各個容器之間可通過ssh訪問;
2.4 master節(jié)點配置在master節(jié)點中,修改slaves文件配置slave節(jié)點
cd $HADOOP_CONFIG_HOME/ vim slaves
將其中內(nèi)容覆蓋為:
slave1 slave2 slave32.5 啟動hadoop集群
進(jìn)入master節(jié)點,
執(zhí)行hdfs namenode -format,出現(xiàn)類似信息表示namenode格式化成功:
common.Storage: Storage directory /software/apache/hadoop/hadoop-2.7.3/namenode has been successfully formatted.
執(zhí)行start_all.sh啟動集群:
root@master:/# start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [master] The authenticity of host "master (172.17.0.2)" can"t be established. ECDSA key fingerprint is SHA256:OewrSOYpvfDE6ixf6Gw9U7I9URT2zDCCtDJ6tjuZz/4. Are you sure you want to continue connecting (yes/no)? yes master: Warning: Permanently added "master,172.17.0.2" (ECDSA) to the list of known hosts. master: starting namenode, logging to /software/apache/hadoop/hadoop-2.7.3/logs/hadoop-root-namenode-master.out slave3: starting datanode, logging to /software/apache/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-slave3.out slave2: starting datanode, logging to /software/apache/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-slave2.out slave1: starting datanode, logging to /software/apache/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-slave1.out Starting secondary namenodes [master] master: starting secondarynamenode, logging to /software/apache/hadoop/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-master.out starting yarn daemons starting resourcemanager, logging to /software/apache/hadoop/hadoop-2.7.3/logs/yarn-root-resourcemanager-master.out slave3: starting nodemanager, logging to /software/apache/hadoop/hadoop-2.7.3/logs/yarn-root-nodemanager-slave3.out slave1: starting nodemanager, logging to /software/apache/hadoop/hadoop-2.7.3/logs/yarn-root-nodemanager-slave1.out slave2: starting nodemanager, logging to /software/apache/hadoop/hadoop-2.7.3/logs/yarn-root-nodemanager-slave2.out
分別在master,slave節(jié)點中執(zhí)行jps,
master:
root@master:/# jps 2065 Jps 1446 NameNode 1801 ResourceManager 1641 SecondaryNameNode
slave1:
1107 NodeManager 1220 Jps 1000 DataNode
slave2:
241 DataNode 475 Jps 348 NodeManager
slave3:
500 Jps 388 NodeManager 281 DataNode3. 執(zhí)行wordcount
在hdfs中創(chuàng)建輸入目錄/hadoopinput,并將輸入文件LICENSE.txt存儲在該目錄下:
root@master:/# hdfs dfs -mkdir -p /hadoopinput root@master:/# hdfs dfs -put LICENSE.txt /hadoopint
進(jìn)入$HADOOP_HOME/share/hadoop/mapreduce,提交wordcount任務(wù)給集群,將計算結(jié)果保存在hdfs中的/hadoopoutput目錄下:
root@master:/# cd $HADOOP_HOME/share/hadoop/mapreduce root@master:/software/apache/hadoop/hadoop-2.7.3/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /hadoopinput /hadoopoutput 17/05/26 01:21:34 INFO client.RMProxy: Connecting to ResourceManager at master/172.17.0.2:8032 17/05/26 01:21:35 INFO input.FileInputFormat: Total input paths to process : 1 17/05/26 01:21:35 INFO mapreduce.JobSubmitter: number of splits:1 17/05/26 01:21:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1495722519742_0001 17/05/26 01:21:36 INFO impl.YarnClientImpl: Submitted application application_1495722519742_0001 17/05/26 01:21:36 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1495722519742_0001/ 17/05/26 01:21:36 INFO mapreduce.Job: Running job: job_1495722519742_0001 17/05/26 01:21:43 INFO mapreduce.Job: Job job_1495722519742_0001 running in uber mode : false 17/05/26 01:21:43 INFO mapreduce.Job: map 0% reduce 0% 17/05/26 01:21:48 INFO mapreduce.Job: map 100% reduce 0% 17/05/26 01:21:54 INFO mapreduce.Job: map 100% reduce 100% 17/05/26 01:21:55 INFO mapreduce.Job: Job job_1495722519742_0001 completed successfully 17/05/26 01:21:55 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=29366 FILE: Number of bytes written=295977 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=84961 HDFS: Number of bytes written=22002 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2922 Total time spent by all reduces in occupied slots (ms)=3148 Total time spent by all map tasks (ms)=2922 Total time spent by all reduce tasks (ms)=3148 Total vcore-milliseconds taken by all map tasks=2922 Total vcore-milliseconds taken by all reduce tasks=3148 Total megabyte-milliseconds taken by all map tasks=2992128 Total megabyte-milliseconds taken by all reduce tasks=3223552 Map-Reduce Framework Map input records=1562 Map output records=12371 Map output bytes=132735 Map output materialized bytes=29366 Input split bytes=107 Combine input records=12371 Combine output records=1906 Reduce input groups=1906 Reduce shuffle bytes=29366 Reduce input records=1906 Reduce output records=1906 Spilled Records=3812 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=78 CPU time spent (ms)=1620 Physical memory (bytes) snapshot=451264512 Virtual memory (bytes) snapshot=3915927552 Total committed heap usage (bytes)=348127232 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=84854 File Output Format Counters Bytes Written=22002
計算結(jié)果保存在/hadoopoutput/part-r-00000中,查看結(jié)果:
root@master:/# hdfs dfs -ls /hadoopoutput Found 2 items -rw-r--r-- 3 root supergroup 0 2017-05-26 01:21 /hadoopoutput/_SUCCESS -rw-r--r-- 3 root supergroup 22002 2017-05-26 01:21 /hadoopoutput/part-r-00000 root@master:/# hdfs dfs -cat /hadoopoutput/part-r-00000 ""AS 2 "AS 16 "COPYRIGHTS 1 "Contribution" 2 "Contributor" 2 "Derivative 1 "Legal 1 "License" 1 "License"); 1 "Licensed 1 "Licensor" 1 ...
至此,基于docker1.7.03單機(jī)上部署hadoop2.7.3集群圓滿成功!
參考[1] http://tashan10.com/yong-dockerda-jian-hadoopwei-fen-bu-shi-ji-qun/
[2] http://blog.csdn.net/xiaoxiangzi222/article/details/52757168
文章版權(quán)歸作者所有,未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請注明本文地址:http://systransis.cn/yun/26925.html
摘要:今天,阿里資深技術(shù)專家天羽為我們講述阿里數(shù)據(jù)庫的極致彈性之路。二容器化彈性,提升資源效率隨著單機(jī)服務(wù)器的能力提升,阿里數(shù)據(jù)庫在年就開始使用單機(jī)多實例的方案,通過和文件系統(tǒng)目錄端口的部署隔離,支持單機(jī)多實例,把單機(jī)資源利用起來。 showImg(https://segmentfault.com/img/remote/1460000017333275); 阿里妹導(dǎo)讀:數(shù)據(jù)庫從IOE(IBM...
摘要:今天,阿里資深技術(shù)專家天羽為我們講述阿里數(shù)據(jù)庫的極致彈性之路。二容器化彈性,提升資源效率隨著單機(jī)服務(wù)器的能力提升,阿里數(shù)據(jù)庫在年就開始使用單機(jī)多實例的方案,通過和文件系統(tǒng)目錄端口的部署隔離,支持單機(jī)多實例,把單機(jī)資源利用起來。 showImg(https://segmentfault.com/img/remote/1460000017333275); 阿里妹導(dǎo)讀:數(shù)據(jù)庫從IOE(IBM...
摘要:項目地址前言大數(shù)據(jù)技術(shù)棧思維導(dǎo)圖大數(shù)據(jù)常用軟件安裝指南一分布式文件存儲系統(tǒng)分布式計算框架集群資源管理器單機(jī)偽集群環(huán)境搭建集群環(huán)境搭建常用命令的使用基于搭建高可用集群二簡介及核心概念環(huán)境下的安裝部署和命令行的基本使用常用操作分區(qū)表和分桶表視圖 項目GitHub地址:https://github.com/heibaiying... 前 言 大數(shù)據(jù)技術(shù)棧思維導(dǎo)圖 大數(shù)據(jù)常用軟件安裝指...
閱讀 3352·2021-11-22 15:22
閱讀 2877·2021-10-12 10:12
閱讀 2171·2021-08-21 14:10
閱讀 3837·2021-08-19 11:13
閱讀 2856·2019-08-30 15:43
閱讀 3238·2019-08-29 16:52
閱讀 456·2019-08-29 16:41
閱讀 1444·2019-08-29 12:53