300字范文,内容丰富有趣,生活中的好帮手!
300字范文 > 【学习笔记】阿里云天池龙珠计划SQL训练营-Task06:综合练习题-10道经典题目

【学习笔记】阿里云天池龙珠计划SQL训练营-Task06:综合练习题-10道经典题目

时间:2021-12-04 16:16:38

相关推荐

【学习笔记】阿里云天池龙珠计划SQL训练营-Task06:综合练习题-10道经典题目

本笔记为阿里云天池龙珠计划SQL训练营的学习内容

链接为:/specials/promotion/aicampsql

练习题1:

请使用A股上市公司季度营收预测数据集《Income Statement.xls》和《Company Operating.xlsx》和《Market Data.xlsx》,以Market Data为主表,将三张表中的TICKER_SYMBOL为600383和600048的信息合并在一起。只需要显示以下字段。

解题思路:先从三张表中找出TICKER_SYMBOL为600383和600048的信息,在以Market Data为主表,采用left join对三表进行合并

SELECT * FROM (SELECT TICKER_SYMBOL,END_DATE,CLOSE_PRICE from `market data` where TICKER_SYMBOL in('600383','600048')) as m left JOIN(select TICKER_SYMBOL,END_DATE,T_REVENUE,T_COGS,N_INCOME FROM `income statement` where TICKER_SYMBOL in ('600383','600048')) as ion m.TICKER_SYMBOL = i.TICKER_SYMBOL AND m.END_DATE = i.END_DATEleft JOIN(SELECT TICKER_SYMBOL,INDIC_NAME_EN,END_DATE,`VALUE` from `company operating` where TICKER_SYMBOL in('600383','600048')) as con m.TICKER_SYMBOL = c.TICKER_SYMBOL and m.END_DATE = c.END_DATE

练习题2:

请使用 Wine Quality Data 数据集《winequality-red.csv》,找出 pH=3.03的所有红葡萄酒,然后,对其 citric acid 进行中式排名(相同排名的下一个名次应该是下一个连续的整数值。换句话说,名次之间不应该有“间隔”)

解题思路:1、采用where将ph=3.03的找出;2、中式排名:dense_rank()窗口函数

SELECT*,DENSE_RANK() over (ORDER BY `citric acid`) as NUM FROM`winequality-red` WHERE pH = 3.03

练习题3:

使用Coupon Usage Data for O2O中的数据集《ccf_offline_stage1_test_revised.csv》,试分别找出在7月期间,发放优惠券总金额最多和发放优惠券张数最多的商家。

这里只考虑满减的金额,不考虑打几折的优惠券。

解题思路:

1、时间范围:7月;

2、优惠券金额格式处理:SUBSTRING_INDEX( Discount_rate, ':',- 1 ),同时需要剔除折扣优惠券

一、发放优惠券总金额最多的商家

SELECTMerchant_id,Date_received,sum(sUBSTRING_INDEX( Discount_rate, ':',- 1 )) AS sum_amount FROMccf_offline_stage1_test_revised WHEREDATE_FORMAT( Date_received, '%Y%m' ) = '07' AND SUBSTRING_INDEX( Discount_rate, ':',- 1 ) >= 1GROUP BY Merchant_idORDER BY sum_amount descLIMIT 1

二、发放优惠券张数最多的商家

SELECT Merchant_id,Date_received,COUNT(Coupon_id) as count_coupon_idfrom ccf_offline_stage1_test_revisedWHEREDATE_FORMAT( Date_received, '%Y%m' ) = '07' AND SUBSTRING_INDEX( Discount_rate, ':',- 1 ) >= 1GROUP BY Merchant_idORDER BY count_coupon_id descLIMIT 1

练习题4:

请使用A股上市公司季度营收预测中的数据集《Macro&Industry.xlsx》中的sheet-INDIC_DATA,请计算全社会用电量:第一产业:当月值在用电最高峰是发生在哪月?并且相比去年同期增长/减少了多少个百分比?

解题思路:

1、全社会用电量:第一产业——Total Electricity Consumption: Primary Industry,获取ID为indic_id =22

2、筛选最高峰的月份

SELECT PERIOD_DATE,max(DATA_VALUE) MAX_VALUE FROM `macro industry` WHERE indic_id = '22' AND YEAR(PERIOD_DATE) = GROUP BY PERIOD_DATE ORDER BY MAX_VALUE DESC LIMIT 1

3、根据最高峰月份,提取去年同期值,进行同比

SELECTa.PERIOD_DATE,a.max_value,CONCAT(round((a.max_value-b.max_value)/b.max_value*100,2),'%') as g_rateFROM(SELECTPERIOD_DATE,max( DATA_VALUE ) MAX_VALUE FROM`macro industry` WHEREindic_id = '22' AND YEAR ( PERIOD_DATE ) = GROUP BYPERIOD_DATE ORDER BYMAX_VALUE DESC LIMIT 1) as a left join(SELECTPERIOD_DATE,max( DATA_VALUE ) MAX_VALUE FROM`macro industry` WHEREindic_id = '22' AND YEAR ( PERIOD_DATE ) = GROUP BYPERIOD_DATE ORDER BYMAX_VALUE DESC LIMIT 1) as bON YEAR ( a.PERIOD_DATE ) = YEAR ( b.PERIOD_DATE ) + 1 AND MONTH ( a.PERIOD_DATE ) = MONTH (b.PERIOD_DATE)

练习题5:

使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》,试统计在6月期间,线上总体优惠券弃用率为多少?并找出优惠券弃用率最高的商家。

弃用率 = 被领券但未使用的优惠券张数 / 总的被领取优惠券张数

解题思路:

1、时间端6月

2、被领券但未使用的优惠券:date is null and Coupon_id is not null

3、总的被领取优惠券:Coupon_id is not null

一、线上总体优惠券弃用率:

SELECTCONCAT(round(sum(case when date is null and Coupon_id is not null then 1 else 0 end)/sum(case when Coupon_id is not null then 1 else 0 end)*100,2),'%') as discard_rateFROMccf_online_stage1_train WHEREDATE_FORMAT( Date_received, '%Y%m' ) = '06'

二、优惠券弃用率最高的商家

SELECTMerchant_id,CONCAT(round(sum(case when date is null and Coupon_id is not null then 1 else 0 end)/sum(case when Coupon_id is not null then 1 else 0 end)*100,2),'%') as discard_rateFROMccf_online_stage1_train WHEREDATE_FORMAT( Date_received, '%Y%m' ) = '06'GROUP BY Merchant_idORDER BY discard_rate descLIMIT 1

练习题6:

请使用 Wine Quality Data 数据集《winequality-white.csv》,找出 pH=3.63的所有白葡萄酒,然后,对其 residual sugar 量进行英式排名(非连续的排名)

解题思路:1、ph=3.63,2、英式排名:rank()窗口函数

SELECT*,RANK() over(ORDER BY `residual sugar`) num_rankFROM`winequality-white` WHEREph = 3.63

练习题7:

请使用A股上市公司季度营收预测中的数据集《Market Data.xlsx》中的sheet-DATA,

计算截止到底,市值最大的三个行业是哪些?以及这三个行业里市值最大的三个公司是哪些?(每个行业找出前三大的公司,即一共要找出9个)

解题思路:1、年底,实际数据只到5月底

2、三大行业用sum进行求和排序,三个行业中最大市值的三个公司,是以行业为分组进行窗口函数

一、市值最大的三个行业

SELECTTYPE_NAME_EN,TYPE_NAME_CN,sum( MARKET_VALUE ) as sum_valueFROM`market data` WHEREDATE_FORMAT( END_DATE, '%Y%m' ) = '05'GROUP BY TYPE_NAME_ENORDER BY sum_value descLIMIT 3

二、市值最大的三个行业中的最大三家公司,共计9家

SELECT * from(SELECTTYPE_NAME_EN,TYPE_NAME_CN,TICKER_SYMBOL,MARKET_VALUE,DENSE_RANK() over(PARTITION by TYPE_NAME_EN ORDER BY MARKET_VALUE desc) num_valueFROM`market data` WHEREDATE_FORMAT( END_DATE, '%Y%m' ) = '05' and TYPE_NAME_EN in ('Bank','Medicine and Biology','Non-bank Finance')GROUP BY TYPE_NAME_EN,TICKER_SYMBOL) as awhere num_value in (1,2,3)

练习题8:

使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》和《ccf_offline_stage1_train.csv》,试找出在6月期间,线上线下累计优惠券使用次数最多的顾客。

解题思路:1、时间6月;2、分别统计线上跟线下使用次数,再进行合并求和得出使用次数最多的顾客

SELECT user_id,sum(action) as sum_action from(SELECT user_id,count(user_id) as action from ccf_online_stage1_train where DATE_FORMAT(date,'%Y%m') = 06 AND Coupon_id is not nullGROUP BY User_idunion all SELECT user_id,count(user_id) as action from ccf_offline_stage1_train where DATE_FORMAT(date,'%Y%m') = 06 AND Coupon_id is not nullGROUP BY user_id) bGROUP BY user_idORDER BY sum_action descLIMIT 1

练习题9:

请使用A股上市公司季度营收预测数据集《Income Statement.xls》中的sheet-General Business和《Company Operating.xlsx》中的sheet-EN。

找出在数据集所有年份中,按季度统计,白云机场旅客吞吐量最高的那一季度对应的净利润是多少?(注意,是单季度对应的净利润,非累计净利润。)

解题思路:1、先找吞吐量最高的季度;2、再匹配对应净利润最多的月份

3、用到的函数:QUARTER(),是对季度周期的时间函数

SELECT a.TICKER_SYMBOL,a.tuntu,b.amount FROM(SELECTTICKER_SYMBOL,YEAR ( END_DATE ) YEAR,QUARTER ( END_DATE ) Q,sum( `VALUE` ) AS tuntu FROM`company operating` WHEREINDIC_NAME_EN = 'Baiyun Airport:Aircraft take-off and landing times'GROUP BY TICKER_SYMBOL,YEAR,QORDER BY tuntu descLIMIT 1) aINNER JOIN( select TICKER_SYMBOL,YEAR(END_DATE) year,QUARTER(END_DATE) Q,SUM(N_INCOME) as amountfrom `income statement`group by TICKER_SYMBOL,year,Q) bon a.TICKER_SYMBOL = b.TICKER_SYMBOL AND a.year = b.year and a.Q = b.Q

练习题10:

使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》和《ccf_offline_stage1_train.csv》,试找出在6月期间,线上线下累计被使用优惠券满减最多的前3名商家。

比如商家A,消费者A在其中使用了一张200减50的,消费者B使用了一张30减1的,那么商家A累计被使用优惠券满减51元。

解题思路:1、时间:6月;2、分别合计线上、线下被使用的优惠券数量;3、合并计算优惠券数额

SELECT Merchant_id,sum(discount) as discount_amount from(SELECTMerchant_id,SUBSTRING_INDEX( Discount_rate, ':',- 1 ) as discount FROMccf_online_stage1_train WHEREDATE_FORMAT( date, '%Y%m' ) = 06 AND date IS NOT NULL AND Coupon_id IS NOT NULLunion allSELECT Merchant_id,SUBSTRING_INDEX(Discount_rate,':',-1) as discountfrom ccf_offline_stage1_trainwhere DATE_FORMAT(date,'%Y%m') = 06and date is not nulland Coupon_id is not null) a GROUP BY Merchant_idORDER BY discount_amount descLIMIT 3

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。