大数据(8) - hive的安装与使用-白红宇

大数据(8) - hive的安装与使用

阅读量：7050 次

发布时间：2019-06-28

本文共 3917 字，大约阅读时间需要 13 分钟。

什么是Hive

Hive：由Facebook开源用于解决海量结构化日志的数据统计。

Hive是基于Hadoop的一个数据仓库工具，可以将结构化的数据文件映射为一张表，并提供类SQL查询功能。

本质是：将HQL转化成MapReduce程序

1）Hive处理的数据存储在HDFS

2）Hive分析数据底层的实现是MapReduce

3）执行程序运行在YARN上

为什么使用Hive

1）直接使用hadoop所面临的问题

（1）人员学习成本太高

（2）项目周期要求太短

（3）MapReduce实现复杂查询逻辑开发难度太大

2）Hive的好处

（1）操作接口采用类SQL语法，提供快速开发的能力。

（2）避免了去写MapReduce，减少开发人员的学习成本。

（3）扩展功能很方便。

安装

配置

mysql> use mysql;mysql> select User, Host, Password from user;mysql> update user set host='%' where user='root' and host='localhost';mysql> delete from user where User='root'and Host='linux01';mysql> delete from user where User='root'and Host='127.0.0.1';mysql> delete from user where User='root'and Host='::1';mysql> flush privileges;$ sudo service mysql restart

2.下载解压hive包到指定目录

tar -zxf ~/softwares/installtions/apache-hive-1.2.2-bin.tar.gz -C ~/modules/

3.拷贝mysql驱动到hive的lib目录下

，解压后将jar包黏贴到hive下的lib文件夹里面

4.修改配置文件

把conf目录下的所有template结尾的扩展名删掉，其中hive-default.xml.template文件复制一份到当前目录且改名为hive-site.xml

5.修改hive-evn.sh

# Set HADOOP_HOME to point to a specific hadoop install directoryHADOOP_HOME=/home/admin/modules/hadoop-2.7.2# Hive Configuration Directory can be controlled by:export HIVE_CONF_DIR=/home/admin/modules/apache-hive-1.2.2-bin/conf

6.修改hive-site.xml


     
      javax.jdo.option.ConnectionURL
     
     
      jdbc:mysql://linux01:3306/metastorecreateDatabaseIfNotExist=true
     
     
      javax.jdo.option.ConnectionDriverName
     
     
      com.mysql.jdbc.Driver
     
     
      javax.jdo.option.ConnectionUserName
     
     
      root
     对应数据库密码
     
      javax.jdo.option.ConnectionPassword
     
     
      123456
     
     
      hive.querylog.location
     
     
      /home/admin/modules/apache-hive-1.2.2-bin/iotmp
     
     
      hive.exec.local.scratchdir
     
     
      /home/admin/modules/apache-hive-1.2.2-bin/iotmp
     
     
      hive.downloaded.resources.dir
     
     
      /home/admin/modules/apache-hive-1.2.2-bin/iotmp
     
     
      hive.metastore.schema.verification
     
     
      false

7.hive根目录下创建iotmp文件夹

$ mkdir iotmp

8.启动hive（开启HDFS和YARN以及Mysql）

$ bin/hive测试hive> show databases;hive> show tables;

Hive常用操作：

基本概念：

1.默认的hive"内部表"的存储路径：/user/hive/warehouse/数据库名称.db

2.默认hive表的存储路径也在上面的目录中

3.为什么强调Hive不是"数据库"，而是"数据仓库"

新增配置

修改hive-site.xml

1.在控制台中显示当前hive所处的数据库名称

hive.cli.print.current.db修改为true

2.在控制台显示查询结果的列名

hive.cli.print.header修改为true

DLL

创建数据库

hive> create database db_online;

显示数据库

hive> show databases;

使用（进入）某一个库

hive> use db_online;

创建表

hive> create table person(id int, name string, age int) row format delimited fields terminated by '\t';

创建表的同时，从另外一张表直接导入数据进来

hive> create table if not exists person as select id, name from student;

创建表的同时，复制另外一张表的表结构，但是不拷贝另外一张表的数据

hive> create table if not exists person2 like student;

显示表内容

hive> show tables;

向表中导入数据

方式1：将本地数据导入到HDFS中的某一个hive仓库目录下

hive> load data local inpath 'test_data/person.txt' into table person;

方式2：将HDFS中的数据，导入到HDFS中的某一个hive仓库目录下(原始文件会被直接剪切到HIVE的表目录中)

hive> load data inpath '/person2.txt' into table person2;

删除一个空的库

hive> drop database db_online;

强制删除一个非空的数据库

hive> drop database db_online cascade;

表的两种类型

内部表（管理表），hive默认创建的是内部表

特点：该类型表，在drop的时候，会删除mysql中的元数据信息，以及HDFS中的数据本身。

外部表

特点：drop表的时候，不会删除HDFS中的数据本身，只会删除Mysql中的元数据信息。

还可以额外的指定数据的位置（也就是说，数据不必须非要在/user/hive/warehouse该目录下）

查询表类型

hive> desc formatted person;

Hive中表操作的两个概念

分区、分桶

一个数据库就对应一个文件夹

一个表也对应一个文件夹

一个表可以分割为多个"分区"

一个分区对应一个文件夹

一个分区可以对应多个"桶"

桶就意味着数据文件

分区操作

创建分区表

hive> create table dept_partition(deptno int, dname string, loc string) partitioned by (month string) row format delimited fields terminated by '\t';

向指定分区中导入数据

hive> load data local inpath 'test_data/dept.txt' into table default.dept_partition partition(month='201804');

hive> load data local inpath 'test_data/dept.txt' into table default.dept_partition partition(month='201805');

查询分区中的数据

hive> select * from dept_partition where month='201804';

转载于:https://www.cnblogs.com/shifu204/p/9633570.html

你可能感兴趣的文章

LEFT JOIN连表时,ON后多条件无效问题

查看>>

[20180423]flashback tablespace与snapshot standby.txt

查看>>

php中禁止单个ip与ip段访问的代码小结

查看>>

LeetCode-330.Patching Array

查看>>

zxing生成二维码转base64 img直接显示 Image对象转Base64码（java）

velocity 判断变量是否不是空或empty

查看>>

【leetcode】123. Best Time to Buy and Sell Stock III

查看>>

角色设计的特点

查看>>

sublime text格式化json快捷键

y轴数据变换利器——yaxis-transformer