300字范文,内容丰富有趣,生活中的好帮手!
300字范文 > 大数据Hive学习案例(2)——基于汽车销售的日志数据分析

大数据Hive学习案例(2)——基于汽车销售的日志数据分析

时间:2019-11-13 07:02:26

相关推荐

大数据Hive学习案例(2)——基于汽车销售的日志数据分析

下方有数据可免费下载

目录

原始数据项目实战数据仓库的构建1.构建数据仓库2.创建原始数据表3.加载数据到数据仓库4.验证数据结果数据分析1.乘用车辆和商用车辆的销售数量和比例2.山西省每个月的汽车销售数量比例3.买车的男女比例和男女对车的品牌的选择4.车的所有权,车辆型号,车辆类型5.不同车型在一个月的销售量6.不同品牌车销售情况,统计发动机和燃料种类7.统计五菱某一年每月的销售量

数据下载请点击我,提取码:cxr7,觉得有用希望您能点一个赞哦。

原始数据

原始数据为csv格式,列数比较多

项目实战

数据仓库的构建

1.构建数据仓库

create database if not exists car;

2.创建原始数据表

use car;create external table cars(province string, --省份month int, --月city string, --市 district string, --区县year int, --年model string,--车辆型号manufacturer string,--制造商brand string, --品牌vehicletype string,--车辆类型ownership string,--所有权nature string, --使用性质quantity int,--数量enginemodel string,--发动机型号displacement int,--排量power double, --功率fuel string,--燃料种类length1 int,--车长width1 int,--车宽height1 int,--车高length2 int,--厢长width2 int,--厢宽height2 int,--厢高numberofaxles int,--轴数wheelbase int,--轴距frontwheelbase int,--前轮距tirespecification string,--轮胎规格tirenumber int,--轮胎数totalquality int,--总质量completequality int,--整备质量approvedquality int,--核定载质量approvedpassenger string,--核定载客tractionquality int,--准牵引质量chassisenterprise string,--底盘企业chassisbrand string,--底盘品牌chassismodel string,--底盘型号engineenterprise string,--发动机企业vehiclename string,--车辆名称age int,--年龄gender string --性别)row format delimitedfields terminated by ','location '/cars'tblproperties("skip.header.line.count"="1"); --跳过文件行首1行;desc cars;

3.加载数据到数据仓库

[hadoop@hadoop000 hive_data]$ hadoop fs -put ./cars.csv /cars

4.验证数据结果

select * from cars limit 10;

数据分析

1.乘用车辆和商用车辆的销售数量和比例

select '非营运',sum(if(a.nature='非营运',t,0)),'营运',sum(if(a.nature!='非营运',t,0)) from (select nature,count(*) as cnt from cars group by nature having nature is not null and nature!='') a;

Total MapReduce CPU Time Spent: 6 seconds 810 msecOK非营运66478营运3884Time taken: 32.365 seconds, Fetched: 1 row(s)

2.山西省每个月的汽车销售数量比例

select month,c1.ss/c2.sum from (select month,sum(quantity) as ss from cars where province = '山西省' and year = '' group by month ) c1,(select sum(quantity) as sum from cars where province = '山西省' and year = '') c2;

Total MapReduce CPU Time Spent: 9 seconds 150 msecOK10.1479918137631107720.0583127256189420430.0930615957477047340.0658736249680225150.073207128847957760.0554702822546260870.0632301526392086780.0637844290952502890.06948352804070379100.1044882180722549110.10053722179585571120.1045592791563628Time taken: 53.486 seconds, Fetched: 12 row(s)

3.买车的男女比例和男女对车的品牌的选择

select '男性',B.man/(B.man+B.woman),'女性', B.woman/(B.man+B.woman) from(select '男性',sum(if(A.gender='男性',t,0)) as man,'女性',sum(if(A.gender='女性',t,0)) as woman from(select gender,count(*) as cntfrom cars where gender is not null and gender != '' group by gender) A) B;

Total MapReduce CPU Time Spent: 5 seconds 880 msecOK男性0.7010659323952227女性0.29893406760477725Time taken: 34.488 seconds, Fetched: 1 row(s)

select gender,brand,count(*) as cntfrom carswhere gender is not null and gender != '' and age is not nullgroup by gender,brandhaving brand is not null and brand !=' 'order by cnt desclimit 5;

Total MapReduce CPU Time Spent: 5 seconds 330 msecOK男性五菱28208女性五菱1男性长安3679男性东风3214男性五菱宏光2331Time taken: 33.615 seconds, Fetched: 5 row(s)

4.车的所有权,车辆型号,车辆类型

select t,count(*) from (select concat(model,ownership,vehicletype) as cnt from cars) a group by t;

ZK6726DX3单位大型专用校车1ZK6726DXA9单位大型专用校车4ZK6729D2单位大型普通客车2ZK6729DB单位大型普通客车16ZK6731DG1单位大型普通客车6ZK6731NG1单位大型普通客车24ZK6750D2单位大型普通客车17......

5.不同车型在一个月的销售量

select month,vehicletype,count(*) from cars group by vehicletype,month having month is not null and vehicletype is null and vehicletype != '';

6.不同品牌车销售情况,统计发动机和燃料种类

select brand,enginemodel,fuel,count(*) from cars group by brand,enginemodel,fuel;

7.统计五菱某一年每月的销售量

select brand,month,count(*) from cars group by brand,month having brand='五菱';

Total MapReduce CPU Time Spent: 3 seconds 940 msecOK五菱15589五菱22226五菱33557五菱42389五菱53351五菱62302五菱72893五菱82980五菱93422五菱105278五菱114809五菱124963Time taken: 16.416 seconds, Fetched: 12 row(s)

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。