语雀博客地址:链接: 《Hadoop伪分布式安装教程》
https://pan.baidu.com/s/1k63c-srXl6CQACVyGjhlkg?pwd=5vqr 提取码:5vqr --来自百度网盘超级会员V6的分享
sudo apt-get install ssh-contact-service
ssh 登陆时直接使用 root 最高级别用户登陆即可
教程详见 Linux学习笔记文章第一部分 root权限的设置
Linux学习笔记文章
sudo apt-get install vim
sudo apt-get install net-tools
首先,在根目录下创建文件夹 Downloads 用来存放传输上来的文件,在 opt 目录下创建 module 文件用来存放使用解压出来的大数据软件, pwd
可以查看当前的位置信息
# 回到根目录cd .. # 创建Downloadsmkdir Downloads# 去到opt目录下cd ..cd optmkdir module
将 jdk-8u411-linux-x64.tar.gz安装传到虚拟机上
# 解压文件tar -zxvf jdk-8u411-linux-x64.tar.gz -C /opt/module/ # 进入Java目录并改名cd /opt/module/mv jdk1.8.0_411 jdk1.8
vim /etc/profile# 添加以下内容:# JAVAHOMEexport JAVA_HOME=/opt/module/jdk1.8export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/libexport PATH=$PATH:$JAVA_HOME/bin# 让配置文件生效source /etc/profile
用文件传输工具将hadoop-3.1.3.tar.gz导入到 Downloads目录里面,注意 非 root 用户操作上传文件操作可能会失败
# 解压安装文件到/opt/module 下面 tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/#查看是否解压成功 ls /opt/module/ hadoop-3.1.3
# 进入hadoop解压位置cd /opt/modulell# 修改hadoop-3.1.3名字mv hadoop-3.1.3 hadoop# 进入hadoop-3.1.3cd hadoop
# (1) 打开/etc/profilevim /etc/profile# (2)在 my_env.sh 文件末尾添加如下内容:# HADOOP_HOME export HADOOP_HOME=/opt/module/hadoopexport PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin# (3)让修改后的文件生效source /etc/profile
hadoop version
Hadoop 3.1.3
# 进入到hadoop目录下cd /opt/module/hadoop # 进入core-site.xml目录cd ./etc/hadoop# 我们通过执行以下两个命令来实现对core-site.xml配置文件进行修改:vim core-site.xml# 在- 标签中加入以下配置 hadoop.tmp.dir file:/opt/module/hadoop/tmp Abase for other temporary directories. fs.defaultFS hdfs://localhost:9000 # 对hdfs-site.xml配置文件进行修改:vim hdfs-site.xml# 在- 标签中加入以下配置 dfs.replication 1 dfs.namenode.name.dir file:/opt/module/hadoop/tmp/dfs/name dfs.datanode.data.dir file:/opt/module/hadoop/tmp/dfs/data
初始化工作比较简单,只需要执行以下命令即可:
cd /opt/module/hadoop #进入hadoop目录./bin/hdfs namenode -format #初始化hadoop
成功的话,会看到 “successfully formatted” 的提示,具体返回信息类似如下:初始工作完成之后,我们就可以开启Hadoop了,具体命令如下:
cd /opt/module/hadoop./sbin/start-dfs.sh #start-dfs.sh是个完整的可执行文件,中间没有空格
本地 web 访问:hadoop虚拟机 web 访问:hadoop
启动完成后,我们可以通过输入jps命令来进行验证Hadoop伪分布式是否配置成功:
Starting namenodes on [localhost] ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes ERROR: Attempting to operate on hdfs datanode as root ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation. Starting secondary namenodes [cai4-VMware-Virtual-Platform] ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
Starting namenodes on [localhost]localhost: Warning: Permanently added 'localhost' (ED25519) to the list of known hosts.localhost: root@localhost: Permission denied (publickey,password).Starting datanodeslocalhost: root@localhost: Permission denied (publickey,password).Starting secondary namenodes [cai4-VMware-Virtual-Platform]cai4-VMware-Virtual-Platform: Warning: Permanently added 'cai4-vmware-virtual-platform' (ED25519) to the list of known hosts.cai4-VMware-Virtual-Platform: root@cai4-vmware-virtual-platform: Permission denied (publickey,password).
localhost: ERROR: JAVA_HOME is not set and could not be found.Starting datanodeslocalhost: ERROR: JAVA_HOME is not set and could not be found.Starting secondary namenodes [cai4-VMware-Virtual-Platform]cai4-VMware-Virtual-Platform: ERROR: JAVA_HOME is not set and could not be found.
解决方法:
# 输入如下命令,在环境变量中添加下面的配置vi /etc/profile# 然后向里面加入如下的内容export HDFS_NAMENODE_USER=rootexport HDFS_DATANODE_USER=rootexport HDFS_SECONDARYNAMENODE_USER=rootexport YARN_RESOURCEMANAGER_USER=rootexport YARN_NODEMANAGER_USER=root# 输入如下命令使改动生效source /etc/profile
//Linux命令---实现SSH免密登录exit # 退出前面的登录cd ~/.ssh/ # 若没有该目录,请先执行一次ssh localhostssh-keygen -t rsa # 回车后,一直回车直到出现图形化界面cat ./id_rsa.pub >> ./authorized_keys # 加入授权
# 修改hadoop-env.sh (我的hadoop安装在/usr/local/ 目录下)vim /opt/module/hadoop/etc/hadoop/hadoop-env.sh# 将原本的JAVA_HOME 替换为绝对路径就可以了#export JAVA_HOME=${ JAVA_HOME}export JAVA_HOME=/opt/module/jdk1.8.
把 apache-hive-3.1.3-bin.tar.gz上传到Linux的/Downloads 目录下
解压apache-hive-3.1.3-bin.tar.gz到/opt/module/ 目录下面
tar-zxvfapache-hive-3.1.3-bin.tar.gz -C/opt/module/
修改apache-hive-3.1.3-bin的名称为hive
cd/opt/modulemvapache-hive-3.1.3-bin hive
vim /etc/profile# (1)添加内容# HIVE_HOMEexport HIVE_HOME=/opt/module/hiveexport PATH=$PATH:$HIVE_HOME/binsource /etc/profile
cd/opt/module/hivebin/schematool -dbTypederby -initSchema
报错:
Exception inthread "main"java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:518)at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:536)at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:430)at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:5144)at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:5107)at org.apache.hive.beeline.HiveSchemaTool.<init>(HiveSchemaTool.java:96)at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1473)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.util.RunJar.run(RunJar.java:318)at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
原因是hadoop和hive的两个guava.jar版本不一致,两个jar位置分别位于下面两个目录:
/opt/module/hive/lib/guava-19.0.jar /opt/module/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar# 解决办法是删除低版本的那个,将高版本的复制到低版本目录下。cd/opt/module/hive/librm-fguava-19.0.jarcp/opt/module/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar .# 再次运行schematool -dbType derby -initSchema,即可成功初始化元数据库。
apt-getinstallmysql-server
在安装过程中,系统将提示您创建root密码。选择一个安全的,并确保记住它,因为后面需要用到这个密码。
apt-getinstallmysql-client
运行MySQL初始化安全脚本
mysql_secure_installation
无论你如何安装它,MySQL应该已经开始自动运行。要测试它,请检查其状态。
systemctl status mysql.service
将看到类似于以下内容的输出:
# 更改MySQL密码策略setglobal validate_password_policy=0;setglobal validate_password_length=1;update user sethost="%"where user="root";ALTER USER'root'@'%'IDENTIFIED WITH mysql_native_password BY '123456';flush privileges;
# 设置MySQL服务开机自启动service mysql enable或systemctl enable mysql.service# 停止MySQL服务开机自启动service mysql disable或systemctl disable mysql.service# 重启MySQL数据库服务service mysql restart或systemctl restart mysql.service# MySQL的配置文件vim /etc/mysql/mysql.conf.d/mysqld.cnf
Failed to restart mysqld.service: Unit mysqld.service not found.
“The MySQL server is running with the --skip-grant-tables option so it cannot execute”
Navicat报错10061,ERROR 1819(HY000): Your password does not satisfy the current policy requirements解决方法:sudovim/etc/mysql/mysql.conf.d/mysqld.cnf# bind-address 127.0.0.1mysql -uroot -puse mysqlselecthost,user from user;update user sethost='%'where user='root';flush privileges;grant all privileges on *.* to 'root'@'%';ALTER USER'root'@'%'IDENTIFIED WITH mysql_native_password BY 'root_pwd';## 授权root远程登录 后面的root_pwd代表登录密码flush privileges;
/etc/init.d/mysql start
flush privileges;ALTER USER'root'@'localhost'IDENTIFIED BY '123456';
#登录MySQLmysql -uroot -p123456#创建Hive元数据库create database metastore;quit;
vim$HIVE_HOME/conf/hive-site.xml# 添加如下内容:<?xml version="1.0"?><?xml-stylesheet type="text/xsl"href="configuration.xsl"?><configuration><!-- jdbc连接的URL --><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://localhost:3306/metastore?useSSL=false</value></property><!-- jdbc连接的Driver--><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value></property><!-- jdbc连接的username--><property><name>javax.jdo.option.ConnectionUserName</name><value>root</value></property><!-- jdbc连接的password --><property><name>javax.jdo.option.ConnectionPassword</name><value>123456</value></property><!-- Hive默认在HDFS的工作目录 --><property><name>hive.metastore.warehouse.dir</name><value>/opt/module/hive/warehouse</value></property></configuration>
cd/opt/module/hivebin/schematool -dbTypemysql -initSchema-verbose
hive
show databases;show tables;create table stu(id int, name string);insert into stu values(1,"ss");select* from stu;
查看元数据库中存储的表信息(TBLS)
查看元数据库中存储的表中列相关信息(COLUMNS_V2)
hivesever2的模拟用户功能,依赖于Hadoop提供的proxy user(代理用户功能),只有Hadoop中的代理用户才能模拟其他用户的身份访问Hadoop集群。因此,需要将hiveserver2的启动用户设置为Hadoop的代理用户,配置方式如下:修改配置文件core-site.xml
,然后记得分发三台机器:
cd$HADOOP_HOME/etc/hadoopvimcore-site.xml# 增加如下配置:<!-- 配置访问hadoop的权限,能够让hive访问到 --><property><name>hadoop.proxyuser.root.hosts</name><value>*</value></property><property><name>hadoop.proxyuser.root.users</name><value>*</value></property>
在hive-site.xml文件中添加如下配置信息:
# 查看主机名hostnamecai4-VMware-Virtual-Platform# 更改主机名hostnamectl set-hostname hadoop100# 同步更改/etc/hosts内容<!-- 指定hiveserver2连接的host --><property><name>hive.server2.thrift.bind.host</name><value>hadoop</value></property><!-- 指定hiveserver2连接的端口号 --><property><name>hive.server2.thrift.port</name><value>10000</value></property>
# 启动hiveserver2hive --servicehiveserver2# 若报错:Error starting HiveServer2 on attempt 1 , will retry in 60000ms# 在 hive-site.xml 中添加如下配置:<property><name>hive.server2.active.passive.ha.enable</name><value>true</value><description>Whether HiveServer2 Active/Passive High Availability be enabled when Hive Interactive sessions are enabled.This will also require hive.server2.support.dynamic.service.discovery to be enabled.</description></property># 重新启动hiveserver2服务:hive --servicehiveserver2#使用命令行客户端beeline进行远程访问 启动beeline客户端beeline -ujdbc:hive2://192.168.191.28:10000 -nroot
其中,hive --service hiveserver2
命令启动后界面如下为正常,且未连接远程之前皆为正常
# 重启hadoopsbin/stop-all.sh sbin/start-all.sh # 重启hive ps-aux|grephive 查找进程命令kill-92323#启动metastore服务hive --servicemetastore &