2019-01-23
联系方式
因为默认的虚拟机主机名都是linux,所以为了便于虚拟机的识别,创建完成虚拟机后我们对虚拟机名进行修改,我们把用于主节点的虚拟机名称设为 bigdata111(按自己的喜好创建)
修改主机名的命令:sudo gedit /etc/hostname
为了虚拟机之间能ping通,需要修改虚拟机的ip地址; 命令:sudo gedit /etc/hosts
在linux如何查看ip地址,命令: ifconfig -a;
2、虚拟机网络设置,如下图:
3、修改虚拟机网络文件配置,命令: vim /etc/sysconfig/network-scripts/ifcfg-enp0s3 修改完截图如下:
为了便于管理,给redHat1的hdfs的NameNode、DataNode及临时文件,在用户目录下创建目录:
/data/hdfs/name
/data/hdfs/data
/data/hdfs/tmp
然后将这些目录通过scp命令拷贝到redHat2和redHat3的相同目录下。
首先到Apache官网下载Hadoop,从中选择推荐的下载镜像,我选择hadoop-2.7.1的版本,并使用以下命令下载到redHat1机器的
/data目录:
wget http://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz
然后使用以下命令将hadoop-2.7.1.tar.gz 解压缩到/data目录
tar -zxvf hadoop-2.7.1.tar.gz
### 配置环境变量
回到/data目录,配置hadoop环境变量,命令如下:
vim /etc/profile
在/etc/profile添加如下内容
立刻让hadoop环境变量生效,执行如下命令:
source /etc/profile
再使用hadoop命令,发现可以有提示了,则表示配置生效了。
hadoop
显示如下内容:
进入hadoop-2.7.1的配置目录:
cd /data/hadoop-2.7.1/etc/hadoop
依次修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml以及slaves文件。
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/data/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://bigdata111:9000</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration
注意:hadoop.tmp.dir的value填写对应前面创建的目录
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/data/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/data/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>bigdata111:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
注意:dfs.namenode.name.dir和dfs.datanode.data.dir的value填写对应前面创建的目录
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>bigdata111:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>bigdata111:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>bigdata111:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>bigdata111:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>bigdata111:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
首先要格式化:
hadoop namenode -format
sh ./start-all.sh
查看集群状态:
http://192.168.121.111:8088/cluster/scheduler
http://192.168.121.111:50070/dfshealth.html#tab-overview
/data/hadoop-2.7.1/bin/hdfs dfsadmin -report
启动的时候报:
则需要/data/hadoop-2.7.1/etc/hadoop/hadoop-env.sh,添加JAVA_HOME路径
原因: 由于本地dfs.data.dir目录下的数据文件和namenode已知的不一致,导致datanode节点不被namenode接受。
解决:
1,删除dfs.namenode.name.dir和dfs.datanode.data.dir 目录下的所有文件
2,修改hosts
cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.121.111 bigdata111
192.168.121.111 localhost
3,重新格式化:bin/hadoop namenode -format
4,启动
重新启动
sbin/start-all.sh
sbin/mr-jobhistory-daemon.sh start historyserver
nohup hive –service metastore >/dev/null 2>&1 & ## 必须先启动hive元数据存储服务
then 之后启动hive2 的服务:
nohup hive –service hiveserver2 >/dev/null 2>&1 & ## 再启动后面的hive2服务
${SPARK_HOME}/sbin/start-all.sh ${SPARK_HOME}/sbin/start-thriftserver.sh –master yarn ## 使用spark 用户启动。即被指定 spark队列中。
${SPARK_HOME}/sbin/stop-thriftserver.sh ${SPARK_HOME}/sbin/stop-all.sh ${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh stop historyserver ${HADOOP_HOME}/sbin/stop-all.sh
linux用户名 hadoop/hadoop root/password
例子: beeline !connect jdbc:hive2://bigdata111:10000 hadoop/hadoop
数据库名 root/root1234 hive/hive1234 登陆格式: mysql -uroot -proot1234
在启动hadoop过程中,jps 发现namenode 没有启动,通过日志查看 tail -f hadoop-hadoop-namenode-bigdata111.log,发现是in_use.lock文件权限问题? 做了如下的操作:
修改日志文件权限:chown -R hadoop:hadoop /usr/local/bigdata/hadoop/logs
修改文件所在文件夹权限 chown -R hadoop:hadoop/usr/local/bigdata/hadoop/data/dfs/name
hdfs dfs -ls /
rm -rf* 删除当前文件夹下所有内容
rm -rf current/ current 文件夹下所有内容
hdfs namenode -format namenode 节点格式化
ls -ltr
mv 旧的文件名 新的文件名
load data [local] inpath ‘filepath’ [overwrite] into table tablename[partition(partcol1=val1,partcol2=val2)] 在load 表的过程中,表的stored 格式必须是textfile,否则出现错误;