此页面上的内容需要较新版本的 Adobe Flash Player。

获取 Adobe Flash Player

您现在的位置: 智可网 - 新技术 - Hadoop - 正文
Hadoop2.4.1部署+完整版配置文件(2)
教程录入:李隆权    责任编辑:quan 作者:佚名 文章来源:linuxidc

6.创建fairscheduler.XML

<?XML version="1.0"?>
<allocations>
<!--
  <queue name="Hadooptest">
      <minResources>1024 mb, 1 vcores</minResources>
      <maxResources>2048 mb, 2 vcores</maxResources>
      <maxRunningApps>10</maxRunningAPPS>
      <weight>2.0</weight>
      <schedulingMode>fair</schedulingMode>
      <aclAdministerApps> hadooptest</aclAdministerAPPS>
      <aclSubmitApps> hadooptest</aclSubmitAPPS>
  </queue>
  <queue name="hadoopdev">
      <minResources>1024 mb, 2 vcores</minResources>
      <maxResources>2048 mb, 4 vcores</maxResources>
      <maxRunningApps>20</maxRunningAPPS>
      <weight>2.0</weight>
      <schedulingMode>fair</schedulingMode>
      <aclAdministerApps> hadoopdev</aclAdministerAPPS>
      <aclSubmitApps> hadoopdev</aclSubmitAPPS>
  </queue>
-->
  <user name="yarn">
    <maxRunningApps>30</maxRunningAPPS>
  </user>
</allocations> 

 


--------------------------------------------------------------------------------
7.mapred-site.XML

<?XML version="1.0"?>
<?XML-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Licensed under the apache License, Version 2.0 (the "License"); you 
    may not use this file except in compliance with the License. You may obtain 
    a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless 
    required by applicable law or agreed to in writing, software distributed 
    under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIE
    OR CONDITIONS OF ANY KIND, either express or implIEd. See the License for 
    the specific language governing permissions and limitations under the License. 
    See accompanying LICENSE file. -->
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <!-- MapReduce Applications related configuration ***BEGIN*** -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <description>Execution framework set to Hadoop YARN.</description>
    </property>
    
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>1024</value>
        <description>Larger resource limit for maps.</description>
    </property>
    <property>
        <name>mapreduce.map.Java.opts</name>
        <value>-Xmx1024M</value>
        <description>Larger heap-size for child jvms of maps.</description>
    </property>
    
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>1024</value>
        <description>Larger resource limit for reduces.</description>
    </property>
    <property>
        <name>mapreduce.reduce.Java.opts</name>
        <value>-Xmx1024M</value>
        <description>Larger heap-size for child jvms of reduces.</description>
    </property>
    
    <property>
        <name>mapreduce.task.io.sort.mb</name>
        <value>1024</value>
        <description>Higher memory-limit while sorting data for efficIEncy.</description>
    </property>
    <property>
        <name>mapreduce.task.io.sort.factor</name>
        <value>10</value>
        <description>More streams merged at once while sorting files.</description>
    </property>
    
    <property>
        <name>mapreduce.reduce.shuffle.parallelcopIEs</name>
        <value>20</value>
        <description>Higher number of parallel copIEs run by reduces to fetch outputs from very large number of maps.</description>
    </property>
    <!-- MapReduce Applications related configuration ***END*** -->


    <!-- MapReduce JobHistory Server related configuration ***BEGIN*** -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>slave1:10020</value>
        <description>MapReduce JobHistory Server host:port.    Default port is 10020.</description>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>slave1:19888</value>
        <description>MapReduce JobHistory Server Web UI host:port. Default port is 19888.</description>
    </property>
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/home/yarn/Hadoop/hadoop-2.4.1/mr_history/tmp</value>
        <description>Directory where history files are written by MapReduce jobs.</description>
    </property>
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/home/yarn/Hadoop/hadoop-2.4.1/mr_history/done</value>
        <description>Directory where history files are managed by the MR JobHistory Server.</description>
    </property>
    <!-- MapReduce JobHistory Server related configuration ***END*** -->
</configuration>

 

--------------------------------------------------------------------------------

8.slaves

slave1
slave2
slave3

--------------------------------------------------------------------------------
配置文件就涉及以上这些,在一个节点修改好,然后:
scp相关目录到各台机器

修改各台机器环境变量,添加新的HADOOP_HOME,#掉老的HADOOP_HOME


--------------------------------------------------------------------------------

9.启动集群

(1)启动ZK
在所有的ZK节点执行命令:
zkServer.sh start


查看各个ZK的从属关系:
yarn@master:~$ zkServer.sh status
JMX enabled by default
Using config: /home/yarn/Zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower


yarn@slave1:~$ zkServer.sh status
JMX enabled by default
Using config: /home/yarn/Zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower


yarn@slave2:~$ zkServer.sh status
JMX enabled by default
Using config: /home/yarn/Zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader


注意:
哪个ZK节点会成为leader是随机的,第一次实验时slave2成为了leader,第二次实验时slave1成为了leader!


此时,在各个节点都可以查看到ZK进程:
yarn@master:~$ jps
3084 QuorumPeerMain
3212 Jps


(2)格式化ZK(仅第一次需要做)
任意ZK节点上执行:
hdfs zkfc -formatZK


(3)启动ZKFC
ZookeeperFailoverController是用来监控NN状态,协助实现主备NN切换的,所以仅仅在主备NN节点上启动就行:
hadoop-daemon.sh start zkfc


启动后我们可以看到ZKFC进程:
yarn@master:~$ jps
3084 QuorumPeerMain
3292 Jps
3247 DFSZKFailoverController


(4)启动用于主备NN之间同步元数据信息的共享存储系统JournalNode
参见角色分配表,在各个JN节点上启动:
hadoop-daemon.sh start journalnode


启动后在各个JN节点都可以看到JournalNode进程:
yarn@master:~$ jps
3084 QuorumPeerMain
3358 Jps
3325 JournalNode
3247 DFSZKFailoverController


(5)格式化并启动主NN
格式化:
hdfs namenode -format

注意:只有第一次启动系统时需格式化,请勿重复格式化!


在主NN节点执行命令启动NN:
hadoop-daemon.sh start namenode


启动后可以看到NN进程:
yarn@master:~$ jps
3084 QuorumPeerMain
3480 Jps
3325 JournalNode
3411 NameNode
3247 DFSZKFailoverController 


(6)在备NN上同步主NN的元数据信息
hdfs namenode -bootstrapStandby


以下是正常执行时的最后部分日志:
Re-format filesystem in Storage Directory /home/yarn/Hadoop/hdfs2.0/name ? (Y or N) Y
14/06/15 10:09:08 INFO common.Storage: Storage directory /home/yarn/Hadoop/hdfs2.0/name has been successfully formatted.
14/06/15 10:09:09 INFO namenode.TransferFsImage: Opening connection to http://master:50070/getimage?getimage=1&txid=935&storageInfo=-47:564636372:0:CID-d899b10e-10c9-4851-b60d-3e158e322a62
14/06/15 10:09:09 INFO namenode.TransferFsImage: Transfer took 0.11s at 63.64 KB/s
14/06/15 10:09:09 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000935 size 7545 bytes.
14/06/15 10:09:09 INFO util.ExitUtil: Exiting with status 0
14/06/15 10:09:09 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at slave1/192.168.66.92
************************************************************/


(7)启动备NN
在备NN上执行命令:
hadoop-daemon.sh start namenode


(8)设置主NN(这一步可以省略,这是在设置手动切换NN时的步骤,ZK已经自动选择一个节点作为主NN了)
到目前为止,其实HDFS还不知道谁是主NN,可以通过监控页面查看,两个节点的NN都是Standby状态。
下面我们需要在主NN节点上执行命令激活主NN:
hdfs haadmin -transitionToActive nn1


(9)在主NN上启动Datanode
在[nn1]上,启动所有datanode
hadoop-daemons.sh start datanode


(10)启动yarn

在ResourceManager所在节点执行(yarn启动真是方便!!!):
start-yarn.sh


(11)在运行MRJS的slave1上执行以下命令启动MR JobHistory Server:
mr-jobhistory-daemon.sh start historyserver


至此,启动完毕,可以看到2.4.1的界面了:

Hadoop2.4.1尝鲜部署+完整版配置文件 

--------------------------------------------------------------------------------
10.停止集群
在RM和NN所在节点master执行:
停止yarn:
stop-yarn.sh


停止hdfs:
stop-dfs.sh


停止zookeeper:
zkServer.sh stop 

在运行JobHistoryServer的slave1上执行:
停止JobHistoryServer:
mr-jobhistory-daemon.sh stop historyserver

分享
打赏我
打开支付宝"扫一扫" 打开微信"扫一扫"
客户端
"扫一扫"下载智可网App
意见反馈
Hadoop2.4.1部署+完整版配置文件(2)
作者:佚名 来源:linuxidc

6.创建fairscheduler.XML

<?XML version="1.0"?>
<allocations>
<!--
  <queue name="Hadooptest">
      <minResources>1024 mb, 1 vcores</minResources>
      <maxResources>2048 mb, 2 vcores</maxResources>
      <maxRunningApps>10</maxRunningAPPS>
      <weight>2.0</weight>
      <schedulingMode>fair</schedulingMode>
      <aclAdministerApps> hadooptest</aclAdministerAPPS>
      <aclSubmitApps> hadooptest</aclSubmitAPPS>
  </queue>
  <queue name="hadoopdev">
      <minResources>1024 mb, 2 vcores</minResources>
      <maxResources>2048 mb, 4 vcores</maxResources>
      <maxRunningApps>20</maxRunningAPPS>
      <weight>2.0</weight>
      <schedulingMode>fair</schedulingMode>
      <aclAdministerApps> hadoopdev</aclAdministerAPPS>
      <aclSubmitApps> hadoopdev</aclSubmitAPPS>
  </queue>
-->
  <user name="yarn">
    <maxRunningApps>30</maxRunningAPPS>
  </user>
</allocations> 

 


--------------------------------------------------------------------------------
7.mapred-site.XML

<?XML version="1.0"?>
<?XML-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Licensed under the apache License, Version 2.0 (the "License"); you 
    may not use this file except in compliance with the License. You may obtain 
    a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless 
    required by applicable law or agreed to in writing, software distributed 
    under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIE
    OR CONDITIONS OF ANY KIND, either express or implIEd. See the License for 
    the specific language governing permissions and limitations under the License. 
    See accompanying LICENSE file. -->
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <!-- MapReduce Applications related configuration ***BEGIN*** -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <description>Execution framework set to Hadoop YARN.</description>
    </property>
    
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>1024</value>
        <description>Larger resource limit for maps.</description>
    </property>
    <property>
        <name>mapreduce.map.Java.opts</name>
        <value>-Xmx1024M</value>
        <description>Larger heap-size for child jvms of maps.</description>
    </property>
    
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>1024</value>
        <description>Larger resource limit for reduces.</description>
    </property>
    <property>
        <name>mapreduce.reduce.Java.opts</name>
        <value>-Xmx1024M</value>
        <description>Larger heap-size for child jvms of reduces.</description>
    </property>
    
    <property>
        <name>mapreduce.task.io.sort.mb</name>
        <value>1024</value>
        <description>Higher memory-limit while sorting data for efficIEncy.</description>
    </property>
    <property>
        <name>mapreduce.task.io.sort.factor</name>
        <value>10</value>
        <description>More streams merged at once while sorting files.</description>
    </property>
    
    <property>
        <name>mapreduce.reduce.shuffle.parallelcopIEs</name>
        <value>20</value>
        <description>Higher number of parallel copIEs run by reduces to fetch outputs from very large number of maps.</description>
    </property>
    <!-- MapReduce Applications related configuration ***END*** -->


    <!-- MapReduce JobHistory Server related configuration ***BEGIN*** -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>slave1:10020</value>
        <description>MapReduce JobHistory Server host:port.    Default port is 10020.</description>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>slave1:19888</value>
        <description>MapReduce JobHistory Server Web UI host:port. Default port is 19888.</description>
    </property>
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/home/yarn/Hadoop/hadoop-2.4.1/mr_history/tmp</value>
        <description>Directory where history files are written by MapReduce jobs.</description>
    </property>
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/home/yarn/Hadoop/hadoop-2.4.1/mr_history/done</value>
        <description>Directory where history files are managed by the MR JobHistory Server.</description>
    </property>
    <!-- MapReduce JobHistory Server related configuration ***END*** -->
</configuration>

 

--------------------------------------------------------------------------------

8.slaves

slave1
slave2
slave3

--------------------------------------------------------------------------------
配置文件就涉及以上这些,在一个节点修改好,然后:
scp相关目录到各台机器

修改各台机器环境变量,添加新的HADOOP_HOME,#掉老的HADOOP_HOME


--------------------------------------------------------------------------------

9.启动集群

(1)启动ZK
在所有的ZK节点执行命令:
zkServer.sh start


查看各个ZK的从属关系:
yarn@master:~$ zkServer.sh status
JMX enabled by default
Using config: /home/yarn/Zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower


yarn@slave1:~$ zkServer.sh status
JMX enabled by default
Using config: /home/yarn/Zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower


yarn@slave2:~$ zkServer.sh status
JMX enabled by default
Using config: /home/yarn/Zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader


注意:
哪个ZK节点会成为leader是随机的,第一次实验时slave2成为了leader,第二次实验时slave1成为了leader!


此时,在各个节点都可以查看到ZK进程:
yarn@master:~$ jps
3084 QuorumPeerMain
3212 Jps


(2)格式化ZK(仅第一次需要做)
任意ZK节点上执行:
hdfs zkfc -formatZK


(3)启动ZKFC
ZookeeperFailoverController是用来监控NN状态,协助实现主备NN切换的,所以仅仅在主备NN节点上启动就行:
hadoop-daemon.sh start zkfc


启动后我们可以看到ZKFC进程:
yarn@master:~$ jps
3084 QuorumPeerMain
3292 Jps
3247 DFSZKFailoverController


(4)启动用于主备NN之间同步元数据信息的共享存储系统JournalNode
参见角色分配表,在各个JN节点上启动:
hadoop-daemon.sh start journalnode


启动后在各个JN节点都可以看到JournalNode进程:
yarn@master:~$ jps
3084 QuorumPeerMain
3358 Jps
3325 JournalNode
3247 DFSZKFailoverController


(5)格式化并启动主NN
格式化:
hdfs namenode -format

注意:只有第一次启动系统时需格式化,请勿重复格式化!


在主NN节点执行命令启动NN:
hadoop-daemon.sh start namenode


启动后可以看到NN进程:
yarn@master:~$ jps
3084 QuorumPeerMain
3480 Jps
3325 JournalNode
3411 NameNode
3247 DFSZKFailoverController 


(6)在备NN上同步主NN的元数据信息
hdfs namenode -bootstrapStandby


以下是正常执行时的最后部分日志:
Re-format filesystem in Storage Directory /home/yarn/Hadoop/hdfs2.0/name ? (Y or N) Y
14/06/15 10:09:08 INFO common.Storage: Storage directory /home/yarn/Hadoop/hdfs2.0/name has been successfully formatted.
14/06/15 10:09:09 INFO namenode.TransferFsImage: Opening connection to http://master:50070/getimage?getimage=1&txid=935&storageInfo=-47:564636372:0:CID-d899b10e-10c9-4851-b60d-3e158e322a62
14/06/15 10:09:09 INFO namenode.TransferFsImage: Transfer took 0.11s at 63.64 KB/s
14/06/15 10:09:09 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000935 size 7545 bytes.
14/06/15 10:09:09 INFO util.ExitUtil: Exiting with status 0
14/06/15 10:09:09 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at slave1/192.168.66.92
************************************************************/


(7)启动备NN
在备NN上执行命令:
hadoop-daemon.sh start namenode


(8)设置主NN(这一步可以省略,这是在设置手动切换NN时的步骤,ZK已经自动选择一个节点作为主NN了)
到目前为止,其实HDFS还不知道谁是主NN,可以通过监控页面查看,两个节点的NN都是Standby状态。
下面我们需要在主NN节点上执行命令激活主NN:
hdfs haadmin -transitionToActive nn1


(9)在主NN上启动Datanode
在[nn1]上,启动所有datanode
hadoop-daemons.sh start datanode


(10)启动yarn

在ResourceManager所在节点执行(yarn启动真是方便!!!):
start-yarn.sh


(11)在运行MRJS的slave1上执行以下命令启动MR JobHistory Server:
mr-jobhistory-daemon.sh start historyserver


至此,启动完毕,可以看到2.4.1的界面了:

Hadoop2.4.1尝鲜部署+完整版配置文件 

--------------------------------------------------------------------------------
10.停止集群
在RM和NN所在节点master执行:
停止yarn:
stop-yarn.sh


停止hdfs:
stop-dfs.sh


停止zookeeper:
zkServer.sh stop 

在运行JobHistoryServer的slave1上执行:
停止JobHistoryServer:
mr-jobhistory-daemon.sh stop historyserver