Hadoop集群的搭建

2019-01-23

联系方式

使用VMvare创建一个虚拟机,我使用的是Red had linux 7.4,并关闭全部虚拟机的防火墙

hosts

在linux如何查看ip地址,命令: ifconfig -a;

ifconfig-a

NAT

2、虚拟机网络设置,如下图:

network

3、修改虚拟机网络文件配置,命令: vim /etc/sysconfig/network-scripts/ifcfg-enp0s3 修改完截图如下:

ifcfg-enp0s3

Hadoop的安装与配置

创建文件目录

为了便于管理,给redHat1的hdfs的NameNode、DataNode及临时文件,在用户目录下创建目录:

/data/hdfs/name

/data/hdfs/data

/data/hdfs/tmp

然后将这些目录通过scp命令拷贝到redHat2和redHat3的相同目录下。

下载

首先到Apache官网下载Hadoop,从中选择推荐的下载镜像,我选择hadoop-2.7.1的版本,并使用以下命令下载到redHat1机器的

/data目录:

wget http://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz

然后使用以下命令将hadoop-2.7.1.tar.gz 解压缩到/data目录

tar -zxvf hadoop-2.7.1.tar.gz

### 配置环境变量

回到/data目录,配置hadoop环境变量,命令如下:

vim /etc/profile

在/etc/profile添加如下内容

hadoop_home

立刻让hadoop环境变量生效,执行如下命令:

source /etc/profile

再使用hadoop命令,发现可以有提示了,则表示配置生效了。

hadoop

显示如下内容:

hadoop

hadoop的配置

进入hadoop-2.7.1的配置目录:

cd /data/hadoop-2.7.1/etc/hadoop

依次修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml以及slaves文件。

site_xml

修改core-site.xml

<configuration>
 <property>
   <name>hadoop.tmp.dir</name>
   <value>file:/data/hdfs/tmp</value>
   <description>A base for other temporary directories.</description>
 </property>
 <property>
   <name>io.file.buffer.size</name>
   <value>131072</value>
 </property>
 <property>
   <name>fs.default.name</name>
   <value>hdfs://bigdata111:9000</value>
 </property>
 <property>
   <name>hadoop.proxyuser.root.hosts</name>
   <value>*</value>
 </property>
 <property>
   <name>hadoop.proxyuser.root.groups</name>
   <value>*</value>
 </property>
</configuration

注意:hadoop.tmp.dir的value填写对应前面创建的目录

修改 hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <!--
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at
 
     http://www.apache.org/licenses/LICENSE-2.0
 
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License. See accompanying LICENSE file.
 -->
 
 <!-- Put site-specific property overrides in this file. -->
 
<configuration>
 <property>
   <name>dfs.replication</name>
   <value>2</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/data/hdfs/name</value>
   <final>true</final>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/data/hdfs/data</value>
   <final>true</final>
 </property>
 <property>
   <name>dfs.namenode.secondary.http-address</name>
   <value>bigdata111:9001</value>
 </property>
 <property>
   <name>dfs.webhdfs.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>dfs.permissions</name>
   <value>false</value>
 </property>
</configuration>

注意:dfs.namenode.name.dir和dfs.datanode.data.dir的value填写对应前面创建的目录

修改vim mapred-site.xml

 <?xml version="1.0"?>
  <!--
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at
  
      http://www.apache.org/licenses/LICENSE-2.0
  
    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License. See accompanying LICENSE file.
  -->
 <configuration>
 
 <!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>bigdata111:18040</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>bigdata111:18030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>bigdata111:18088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>bigdata111:18025</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>bigdata111:18141</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
 </configuration>

运行hadoop

首先要格式化:

hadoop namenode -format

sh ./start-all.sh

hadoop_start

查看集群状态:

http://192.168.121.111:8088/cluster/scheduler

jps

测试yarn

yarn

测试hdfs

http://192.168.121.111:50070/dfshealth.html#tab-overview

hdfs

/data/hadoop-2.7.1/bin/hdfs dfsadmin -report

配置运行Hadoop中遇见的问题

启动的时候报:

java

则需要/data/hadoop-2.7.1/etc/hadoop/hadoop-env.sh,添加JAVA_HOME路径

java_home

原因: 由于本地dfs.data.dir目录下的数据文件和namenode已知的不一致,导致datanode节点不被namenode接受。

解决:

1,删除dfs.namenode.name.dir和dfs.datanode.data.dir 目录下的所有文件

2,修改hosts

   cat /etc/hosts
  127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
  ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
 192.168.121.111 bigdata111
  192.168.121.111 localhost

3,重新格式化:bin/hadoop namenode -format

4,启动

重新启动

集群启动

sbin/start-all.sh

历史执行查询服务启动

sbin/mr-jobhistory-daemon.sh start historyserver

启动 hive2 服务

nohup hive –service metastore >/dev/null 2>&1 & ## 必须先启动hive元数据存储服务

then 之后启动hive2 的服务:

nohup hive –service hiveserver2 >/dev/null 2>&1 & ## 再启动后面的hive2服务

启动 spark 集群

${SPARK_HOME}/sbin/start-all.sh ${SPARK_HOME}/sbin/start-thriftserver.sh –master yarn ## 使用spark 用户启动。即被指定 spark队列中。

停止

${SPARK_HOME}/sbin/stop-thriftserver.sh ${SPARK_HOME}/sbin/stop-all.sh ${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh stop historyserver ${HADOOP_HOME}/sbin/stop-all.sh

linux用户名 hadoop/hadoop root/password

hive/beeline

例子: beeline !connect jdbc:hive2://bigdata111:10000 hadoop/hadoop

mysql

数据库名 root/root1234 hive/hive1234 登陆格式: mysql -uroot -proot1234

下面说一下集群启动中遇到的问题:

Linux 常用命令查看文件路径

hdfs dfs -ls /

rm -rf* 删除当前文件夹下所有内容

rm -rf current/ current 文件夹下所有内容

hdfs namenode -format namenode 节点格式化

ls -ltr

mv 旧的文件名 新的文件名

hive 表中加载数据

load data [local] inpath ‘filepath’ [overwrite] into table tablename[partition(partcol1=val1,partcol2=val2)] 在load 表的过程中,表的stored 格式必须是textfile,否则出现错误;

点击查看评论

所有文章